Since my start at PredictiveIQ, I have been pioneering the development of Physics Informed Machine Learning (PIMLᵀᴹ) techniques to develop highly predictive Surrogate Models. These Surrogate Models have much higher accuracy than those created using purely Data-Driven Machine Learning techniques, while requiring significantly less data to train.
In this article, I provide an illustrative example that compares the development of a Surrogate Model using state-of-the-art machine learning method against PIMLᵀᴹ. The data obtained for the Surrogate Model was obtained from a Finite Element Modeling and Simulation.
Historically in the field of engineering, computational simulations such as Structural FE (Finite Element) Analysis, CFD (Computational Fluid Dynamics) analysis were conducted to predict the behavior of a design subjected to a specified set of boundary conditions. However, Simulations of complex real-world problems using Computer Aided Engineering (CAE) software tools may take hours or even days to compute. Due to these limitations, quite often these computationally expensive CAE simulations can’t be used directly for design space exploration and optimization.
Global product competition has placed greater importance for reduced product development cycles and for digital twin applications across several engineering fields.
Could CAE Simulations be used to understand the complex interactions between product design features, operating regimes, and customer use cases? The answer is yes. Most commercial CAE tools, enable parametric change of simulation conditions for the creation of Surrogate Models, which are, in essence, statistically optimized curve fit models that predict Simulation results. These enable the user to explore the design space fast.
Building efficient Surrogate Models or Reduced Order Models to predict the outcomes of complex engineering problems has become the need of the hour.
Currently several data driven approaches such as Neural Networks, Radial Basis Functions, Kriging etc. are implemented to build these models. In order to build a good quality surrogate model, a sufficiently large training data set is typically needed and created using a Design of Experiments (DoE) strategy which will spread the designs uniformly across the design space. However, for expensive CAE problems; that may take days to compute, and have non-linear responses, building such DOE’s would take significant amounts of data (i.e. Simulation), which can take several months of computational effort, if not more, to obtain. Therefore, the conventional Datat-Driven Surrogate Modeling development methods are often not feasible.
However, using physics informed machine learning (PIMLᵀᴹ) techniques developed at PredictiveIQ, surrogate models are built over significantly smaller training data sets with a high predictive accuracy, as illustrated below in the example powertrain application.
A key factor that differentiates PIMLᵀᴹ methodology from other data driven approaches is that the empirical models used for fitting the available training data set are based on the underlying physics of the engineering problem and thus is highly efficient in extrapolating outcomes outside of the training data, and in between the points in the data set. These unique capabilities of PIMLᵀᴹ methodology make it the most favorable surrogate modeling technique for predicting complex physics phenomenon. This opens the doors to several opportunities in the field of digital twins and performing extensive design space explorations.
Here is an illustration of PIMLᵀᴹ technique used to build a surrogate model for a powertrain application. The objective of this use-case was to determine the spur gear’s tooth bending stress and the subsequent life calculation when subjected to the induced stresses. Setup for stress calculation with finite element simulation is shown in Figure 1.

For the purpose of this study a parametric model of a spur gear with 8 geometry variables based on AGMA standards was built using an automated python script. A static structural analysis with constant power applied to the pinion was performed in Abaqus to calculate the bending stresses (max von mises stress) on the spur gear tooth. The durability of spur gear was then evaluated using the theory of critical distances for different load cycles in Fe-Safe. These finite element simulations with 8 geometry design variables and 2 process design variables, were then used to setup a Design of Experiment study to build the training set for several surrogate model developments. An optimal latin-hypercube DoE strategy was utilized to build the training sets for the current use case.
While implementing PIMLᵀᴹ methodology for the current use case a proprietary simple 2-stage empirical physics model was built to illustrate the bending stress and life calculations accordingly. The predictive capability of this empirical model was improved by creating fit weight vectors whose values were optimized to reduce the predictive error between known and predicted values from the Surrogate Model. The PIMLᵀᴹ model predictive error was then evaluated using a 5-fold cross validation technique.
Figure 2 shows a comparison of the blind data fit from 5-folds of cross-validation study with 30 experiments (Simulations). The overall R² value for the PIMLᵀᴹ model was 0.85 & the average log error of the predicted life was 7% compared to an R² of 0.57 & average log error of 15.03% for a data driven surrogate model built using radial basis functions (RBF).
Also, since the PIMLᵀᴹ model is based on the underlying physics, the models have higher accuracy while predicting the critical low life regions than for those at high life regions.
This is important, because the PIMLᵀᴹ model can be tuned to have higher predictive capacity for the conditions that are important. In this case, the design of the PIMLᵀᴹ model did not consider important very high life conditions, since they would represent low impact accumulated damage to the part or a design that would have a high life. However, the PIMLᵀᴹ model could be modified by adding physics corresponding to high life conditions, in case these become important in the application.

The PIMLᵀᴹ was also compared to other data driven machine learning methodologies using different sizes of training data sets. A graph comparing the R² values for each of the machine learning methodologies built using different DOE sample sizes is shown in Figure 3.

The results show that PIMLᵀᴹ surrogate model built on 30 experiments of training data has a 5-fold-predictive R² value of 0.85 while the data driven models such as RBF, Kriging & Linear regression models have their 5-fold-predictive R² values at 0.57, 0.76, 0.54, respectively. This clearly indicates that the superior accuracy of PIMLᵀᴹ model over the other data driven models at smaller training data sets. Therefore, using PIMLᵀᴹ presents an efficient approach in building highly accurate mathematical representations (Surrogate Models) for complex real-life engineering problems while using significantly small training data sets.
The power of the PIMLᵀᴹ to reduce the amount of data required to develop insight is game changing. It opens the possibility to develop real-time design guides for engineers, and Digital Twins. Of these, perhaps the Digital Twin application is the most impactful. The PIMLᵀᴹ surrogate model is simple enough that it can be embedded in machine’s data acquisition and controls system and could provide deep engineering insight about the operation of the machine, how this operation could be improved, and predict future states of the machine.