In this paper I showcase the development of a fast running and highly predictive fatigue failure surrogate model based on data obtained from simulation runs of a complex mechanical system. The goals for the surrogate model were to create a Simulation Democratization tool for design exploration and a Digital Twin tool for field predictive maintenance and adaptive controls.

Catastrophic failure or sudden failures that are caused by damage accumulation are very difficult to predict. Due to the nature of the failure, in the field, sensors placed in the machine, report normal operation until the sudden failure occurs. Therefore, outside of operational time, there is not really any other variable that is changing enough to support the evaluation of a potential correlation between this and the sudden failure. However, in the design process, to ensure that parts meet operational life requirements, catastrophic failure analysis using simulation is routine. Often these simulations are complex and incorporate multiple physical phenomena interacting in complex and highly coupled ways. Given this, simulations are often time consuming, labor intensive, and require expensive software. These characteristics limit the amount of data that can be generated using simulation as a data source to train machine learning algorithms since they require large amounts of data.

In the example being showcased, my team and I faced the challenge of converting data from a small number of simulation runs for prediction of fatigue failure in an engine exhaust manifold, into a highly predictive surrogate model that could be used to determine real-time damage accumulation and expected failure point in field deployed products. Additionally, the same surrogate model could be used for design exploration. Thus, both operational and geometric variables were considered in the prediction of fatigue life and damage accumulation in the part.

To accomplish these goals, we used our in-house Physics Informed Machine Learning (PIML™) technology and methods. The PIML™ approach has better accuracy than the traditional statistical and machine learning methods and requires a fraction of the data required for training.

The outcome of this project was a physics informed PIMLᵀᴹ model that only used 17 simulation runs or experiments for training. The overall R² value for the PIMLᵀᴹ model was 0.97 & the average log error of the predicted life was 2% when compared to finite element simulation. Note that each finite element simulation took 3 weeks to run. Given this long run time and its expense, the use of conventional machine learning methods, which would have required thousands of simulations runs to create a low error surrogate for this problem, would have been impractical, if not impossible.

## Approach

## Data Generation Through Simulation

For this study, a parametric model of an exhaust manifold with three ports was built in a CAD software. The model had 6 geometry variables, to address different exhaust models and 2 cycle parameters to address various working conditions. Then, the Computational Fluid Dynamics (CFD) analysis and Conjugate Heat Transfer (CHT) analysis mapped the temperature cycle loads to a structural analysis simulation using a Finite Element Analysis (FEA) approach. The FEA analysis exported the converged strain and stress cycles for the durability analysis. In this case, the principal strain criterion has been used to find the critical life in a multi-axial load distribution and Morrow mean stress correction was used to adjust the non-zero mean stress in fatigue life calculations. Figure 1 shows a resultant log life prediction of an exhaust manifold under the illustrated cyclic thermal load.

Now, in order to define the design space to feed our **PIMLᵀᴹ **technique, we had to setup a Design of Experiment study and build the training set for our surrogate model developments. Therefore, an Optimal Latin-Hypercube DoE strategy was utilized to build the training sets for the current use case. Given the expense and long-run time for every simulation, our design of experiments only considered 17 runs.

## Surrogate Model Development Using PIML™

The implementation of PIMLᵀᴹ methodology for this problem used a multistage empirical physics model and it was built to represent the fatigue life calculations. The predictive capability of this empirical model was then improved by creating fit weight vectors whose values were optimized to reduce the predictive error between known and predicted values from the Surrogate Model. The PIMLᵀᴹ model predictive error was then evaluated using a 5-fold cross validation technique. (For more information on the goodness of fit please visit **this article.****)**

## Results

Figure 2 shows a comparison of the **PIMLᵀᴹ **fast running surrogate model versus the life prediction from finite element analysis after 5-folds of cross-validation study with 17 experiments. The overall R² value for the PIMLᵀᴹ model was **0.97** & the average log error of the predicted life was **2%** **for blind data (data not used for training).** Note that the dotted red line represents 20% log error with respect to actual life calculations. This criterion has been considered to narrow the error bound in low-life predictions.

*The PIML**ᵀᴹ** model could be also modified by adding proper physics corresponding to important regions of solution. That’s one of the reasons we can reach higher accuracy with less data.*

## Discussion

We have proven that the **PIMLᵀᴹ** algorithm can predict the outcome of highly complex systems accurately and fast. This fact opens many doors towards a Real Digital Twin in industrial applications. Dr. Daniel Betts in his **Article** defined the Digital Twin as series of models that predict the state or the performance of a device by evaluating data generated by such device. An example of its industrial application would be, an onboard processor in a vehicle or truck ECU that analyzes the sensor data and uses** PIMLᵀᴹ **technology to predict the machine’s performance in real-time, since the data processing is exceptionally light and the outcome generation is very fast.

The following video is an example of the surrogate model developed being applied to predict the remaining useful life of an exhaust manifold system that’s being updated live once the data from sensors comes is received. This kind of application allows us to monitor the health of every system in a machine and predict the maintenance requirements to prevent potential failures accurately. With this system we also avoid unscheduled maintenance because we are able to precisely monitor the remaining useful life for each individual component for a given machine. In fact, the **PIMLᵀᴹ **model provides us with enough insight to reduce wear accumulation in a given machine by changing controls parameters or modifying machine usage, in this way reducing maintenance costs.

## Final Words…

Let’s revisit the initial claim of this article; we believe that **PIMLᵀᴹ **would be the only methodology that can predict the performance of complex systems, where there is limited accurate data, with high accuracy.

Considering other approaches, they all have different detriments. For instance, in the above example, if we would like to predict the next failure time with a new loading cycle, each new finite element analysis job requires at least three weeks of computational time, which is unfeasible for Digital Twins. On the other hand, defining a purely data-driven surrogate model requires significant amounts of data for training. For example, a common rule of thumb for training neural networks is to have at least 3 data points per degree of freedom in the model. For the example problem, even if a simple multilayer perceptron were to be used, we would have required thousands of data points in order to start obtaining reasonable predictions.

The **PIMLᵀᴹ** model is based on the underlying physics, and the solution has been tuned for the desired outcome and region instead of trying to fit a function for the entire design space so in this example, the models have higher accuracy while predicting the critical life in low cycle, mid-cycle and high cycle fatigue.

In short, the **PIMLᵀᴹ **methodology reduces the amount of data needed to develop a fast running surrogate model. This opens the possibility of thinking about design process, or Digital Twin applications differently. **PIMLᵀᴹ **models can be easily deployed on-board on the ECU of a machine that’s operating in the field. This allows us to monitor the health of the structure real-time or to optimize the performance of the machine to meet performance requirements.

Since PIMLTM predicts simulation results, designers can also benefit from real-time design guidelines for robust product design. This reduces the product design cycle and improves overall product quality.

For more information, I recommend reading my colleagues’ articles as they explain more about the roadmap of the **Digital Twin** and **PIMLᵀᴹ.**