CFD+ML Tutorial

Applying Machine Learning to Airfoil Design

In this tutorial, we will present to you how to run a design optimization workflow using machine learning techniques. Specifically, we’ll use PCA (Principal Component Analysis) method to accelerate a set of CFD simulations for a 2D airfoil. We will then use this dataset to develop a surrogate model by gaussian process regression. Finally, this surrogate model will be used for a rapid optimization of this airfoil under varying constraints and goals. We split this whole process into two stages: 

The first stage, shown in the green box below,  includes preprocessing and dimensionality reduction of the airfoil shape data. The original airfoil data is obtained from UIUC Airfoil Coordinates Database while the preprocessed airfoil database is provided here so the users can directly use it to reproduce this project. 

The second stage, shown in the red box, is to generate high fidelity training samples to build the surrogate model. The training samples are accelerated by a PCA-based reduced order model (ROM). Once we get enough training samples, a surrogate model is built and used for optimization.

new diagram
Figure 1: Overall flow diagram of the data generation, training and optimization process.

Transonic airfoil optimization video tutorial

Transonic airfoil optimization workflow tutorial

This section shows how to run a machine learning (principal component analysis, PCA) accelerating transonic airfoil optimization on the Rescale platform. The computational fluid dynamics (CFD) analysis software used is OpenFOAM rhoSimpleFoam solver,  which is a steady-state solver for compressible turbulent flow. For transonic airfoil optimization, the design point is Ma=0.7. Turbulence model used here is k-omega Shear Stress Transport (SST). The PCA library is imported from scikit-learn for dimensionality reduction and  gaussian process regression method (also known as Kriging) used as surrogate model. The optimization related tools are used from MDO Lab (Gaetan Kenway et al, 2010; Peter Lyu et al. 2014; Ping He et al, 2018; Neil Wu et al,2020), including mesh auto generation package pyHyp and optimizer pyOptSparse.

2 4
Figure 2:  Job workflow

This job will first generate training samples (1300 samples) for building surrogate models in the low-dimensional space, train the surrogate model and apply the surrogate model in the optimization and find the best design. 

In the data generation step, a two-stage approach is used to accelerate the process. In the first stage, 100 high-fidelity CFD simulations are run to full convergence starting from freestream conditions. The results of these 100 runs are then used to prepare an reduced order model to predict flow fields to initialize simulations for the remaining 1200 samples. In the second stage of this step, those 1200 high-fidelity simulations were carried out by starting from these predicted flow fields, instead of free stream conditions. This aided the CFD simulations to converge to final results in much fewer number of iterations. As a result, as stated later, this approach dramatically reduced the total wall time and increased the overall computational efficiency without sacrificing accuracy of the training dataset.  After generating these 1300 training samples, the actual  surrogate model is built and verified  as the second step. Finally, in the third step of the job, the surrogate model is used to rapidly (e.g. near real-time) predict the results needed in the optimization loop. 

You can access and directly launch the job by clicking the Import Job Setup button or view the results by clicking the Get Job Results button below.

Input Files

The input file of this job is a compressed file

Screen Shot 2022 04 27 at 4.53.46 PM


Select MDAO Framework by entering “MDAO Framework” in the text box field, highlighted in red, and clicking on the icon.

4 1
Figure 4:  Software selection

Edit the Command field to read as follows:

pip install -U scikit-learn

pip install scikit-optimize


Here we install the machine learning library from Scikit-learn. The Rescale platform automatically extracts the Allrun script with other files contained in the compressed archive prior to beginning the simulation.


Screen Shot 2022 04 27 at 4.44.42 PM
Figure 5: Hardware Selection

There are two hardware settings to edit: Core Type and Number of Cores. In this case, we select Core Type as Emerald and the Number of Cores of 8. Rescale offers On-Demand and On-Demand Pro coretype options. For this purpose, pick On-Demand to reduce cost if available, otherwise pick On-Demand Pro. You can find more information on these options here.  

The maximum job duration can be set by changing the Walltime. Here we set 3 hours.   You can find more information on job duration limits here.

Click Save and then Next (this job omits the optional post-processing step), click Next again. You should now see a summary as in this figure:

6 1
Figure 6:  Settings review

Review the settings and click the Submit button to begin the job.


The status page is shown below after the job is running. It will take about 2 hours to finish generating the 1300 training samples. 

Figure 7:  Status page after the job is running

You can check the Status page to view the contents of output files in realtime in Liveview.This can be very useful in monitoring the progress of a job. In this case, the output files of particular interest is process_output.log. 

Also you can access all the generated dataset through the Jupyter notebook on the web page and execute data post process and analysis by creating a notebook. 

Note: To import all the libraries from the loaded softwares into jupyter notebook, run the following command in a terminal and create a new kernel named Python-uenv that can be used in the notebook. 

python -m ipykernel install --name ipython --display-name 'Python-uenv' --user

8 1
Figure 8:  Jupyter notebook environment


After completion of the job, you will be presented with the Results page. In this process, the job will generate 1300 sample folders with each containing OpenFOAM output files.  The design variables, lift and drag coefficients are saved out to the file named train_sample.dat  for surrogate model training. Here we choose to remove these flow field files to avoid longer stopping cluster time after running the job. If you want to keep the flow field results for all the samples, you can modify the by uncommenting the second last line of the code to zip and save  these files instead of removing them.

Now are the results for transonic airfoil optimization:  
In the figure below, DesignSpace.png, the gray area shows the design space which is spanned by the 1300 airfoils sampled by Latin hypercube sampling (LHS) from low dimensional design space. The red line is the base airfoil, RAE2822.

9 1
Figure 9:  Baseline airfoil and design space

The verification of the surrogate model is shown in the figure of SurrogateModel.png. The plots below compare the prediction (red points) and test (black points) samples for lift coefficient (CL) and drag coefficient (CD). The mean absolute error is 2.e-4 for CL and 3e-05 for CD, which verifies the accuracy of the surrogate model built. The fully-trained surrogate models are also saved out as gpr_cd.sav for CD and gpr_cl.sav for CL prediction.

10 1
Figure 10: Verification of surrogate model

The optimization history is saved  to file hisoty_airfoil.hst. The convergence history of objective (CD) and constraints (maximum thickness and CL) are shown in the figure Opt_history.png. The blue lines show all the samples evaluated during the whole optimization process while the red dotted line represents the optimum sample satisfying the constraints of each iteration.  

11 1
Figure 11:  Optimization history
CD (count)1 drag count = 0.0001CL
Optimal by PSO93.70.466
Verify by CFD94.30.465
Table 1: Optimization results comparison

Table 2 shows optimization results found by particle swarm optimization (PSO) provided by MDOLab tile. The RAE2822 has a maximum thickness of t=12%. At M=0.7, it has a Cd=0.0099 and a CL=0.47. Under the constraints to keep around  CL=0.47 and the maximum thickness=0.12, we found the optimal shape which has a CD=93.7. 

12 1
Figure 12: The baseline and the optimized airfoil shapes compared.


In this project, we used applied machine learning techniques to accelerate CFD simulations and optimization processes by 8 times faster on the Rescale platform. First, we applied PCA to reduce the dimensionality of a large airfoil database. Then we built a PCA-based reduced order model to produce low-fidelity flow field predictions to accelerate the convergence of high-fidelity simulations needed for training data generation. Finally, based on this fully generated training data, we built a Gaussian-process based surrogate model that is used for a rapid optimization loop to minimize the drag of a baseline airfoil under given constraints. 

Running each high-fidelity case would take around 50 seconds (8 cores / sample) when started from initial conditions, and 4 seconds (8 cores / sample) when started from a predicted flow field. When the training data is generated naively, i.e. with no intermediate acceleration using an intermediate surrogate model, generating  the whole dataset would take around 17 hours. However, when an intermediate acceleration is used to initialize a large majority of the high-fidelity  simulations with intermediate predictions, data generation takes only 2 hours. The time it takes to train the surrogate model is negligible compared to generating the training data. It takes a couple of hours to generate the data needed to train the final surrogate model; but once trained, the model produces sufficiently accurate predictions, each within a fraction of a second. As a result, the front-loaded cost of the training data generation becomes significantly amortized during the subsequent optimization studies which may involve multiple optimization with different constraints and objectives.    

In conclusion, we demonstrated that using a surrogate model to aid or replace high-fidelity simulations greatly accelerates the design and optimization studies and makes it possible to dynamically change constraints or objectives without having to repeat high fidelity simulation for each study.