AI-Enabled R&D with NVIDIA Modulus on Rescale
Overview
In this tutorial, we are going to show you how to easily get started with Modulus to perform a lid-driven cavity problem, a common physics problem for validating computational methods. This problem can be solved with traditional CFD methods but Modulus is unique because it uses AI-assisted methods that can accelerate the initial design discovery in R&D and engineering scenarios.
NVIDIA Modulus is a neural network framework that blends the power of physics in the form of governing partial differential equations (PDEs) with data to build parameterized surrogate models with near-real-time response. NVIDIA Modulus can support your work with AI-driven physics problems, designing digital twin models for complex non-linear, multi-physics systems, solving parameterized geometries and inverse problems. Digital twins have emerged as powerful tools for tackling problems ranging from the molecular level like drug discovery up to global challenges like climate change. NVIDIA Modulus gives scientists a framework to build highly accurate digital reproductions of complex and dynamic systems that will enable the next generation of breakthroughs across a vast range of industries. For these problems which usually require large computational resources, Rescale can provide the best hardware for your work and all the assets you need to build your AI workflow in one place.
Modulus can be easily run by either using a batch job using the Rescale command line or interactively using Rescale Workstations. See the tutorial steps below.
Batch job (Multi-node)
You can access and directly launch the sample job (Lid Driven Cavity Flow) by clicking the Import Job Setup button or view the results by clicking the Get Job Results button below.
You can get access to other use case examples on Rescale (using Modulus V21.06) with this link, which includes blood flow in intracranial aneurysm, multi-physics simulations of conjugate heat transfer, parameterized simulations and design optimizationOptimization is a systematic method to determine the values ... More, 3D heat sink, and many other possibilities.

Steps to run a batch job on Rescale
Select input file
Upload your job files, which in this case are Modulus python scripts. These will be loaded automatically when you select Import Job Set Up above.
Select software
For this tutorial, we will use a Singularity (Apptainer) containerA package of self-sustaining application and operating syste... More to load Modulus containers and run the batch job on one nodeIn traditional computing, a node is an object on a network. ... More or multi-node.
For a typical batch job, you can directly modify the pre populated cmd line. This command will automatically use all the GPUsGPUs (Graphics Processing Units) are specialized electronic ... More selected in the Hardware settings.
mpirun -np $[$RESCALE_GPUS_PER_NODE*$RESCALE_NODES_PER_SLOT] -N $RESCALE_GPUS_PER_NODE
singularity exec --nv /usr/bin/Modulus_v21.06.sif python <your-code.py>
Notes
- We can add –xla=True at the end of the above cmd which is Accelerated Linear Algebra (XLA) to accelerate the training.
- For multi-node batch job, sometimes it cannot detect the folder “network_checkpoint” the first time, so we could first create a folder named “network_checkpoint_XXX”. For example
For example (multi-node):
mkdir network_checkpoint_ldc_2d
mpirun -np $[$RESCALE_GPUS_PER_NODE*$RESCALE_NODES_PER_SLOT] -N
$RESCALE_GPUS_PER_NODE singularity exec --nv /usr/bin/Modulus_v21.06.sif python
ldc_2d.py
Hardware
Recommended Rescale Coretypes:
- Ankerite, Celestine (NVIDIA A100)
- Dolomite, Aquamarine V3 (NVIDIA V100)
- Aquamarine V2 (NVIDIA P100)
Check process_output.log for the output.

You can also open a terminal by clicking the Open in New Window button and running NVIDIA-smi to monitor the GPU usage. To access all the results files, run cd ~/work/shared.


To access all the results files, including the trained modelA numerical, symbolic, or logical representation of a system... More, go to the Shared directory by running this command:
$ cd ~/work/shared
The surrogate model can now be queried for lid-driven cavity flow fields under different boundary conditions.
As you can see, with just a few mouse clicks, users can run Modulus jobs with multi-GPU and multi-node scaling through batch jobsBatch jobs are automated tasks submitted to a computing syst... More on Rescale. Although we enable Jupyter Notebook and SSH console in the batch job workflow, the visualizationVisualization is the representation of complex scientific or... More ability during the job running is limited. In the next section, we show the steps to run Modulus job interactively.
Interactive workflow (Multi-node)
You can access and directly launch the sample job (Lid Driven Cavity Flow) by clicking the Import Job Setup button or view the results by clicking the Get Job Results button below.
Select the End to End Desktop job type. The input file, software selection, and hardware settings are the same as batch job workflow.

After the job is running, we can open the terminal through the Connect button on the top to enter into the desktop.

Similar to the batch job, here is the MPI cmd to run a Modulus simulationSimulation is experimentation, testing scenarios, and making... More.
$ cd ~/work/shared
$ mpirun -np $[$RESCALE_GPUS_PER_NODE*$RESCALE_NODES_PER_SLOT] -N
$RESCALE_GPUS_PER_NODE singularity exec --nv /usr/bin/Modulus_v21.06.sif python
<your-code.py>
Jupyter notebook
Open a new SSH terminal, cd to the working directory and load the Modulus container by
$ cd ~/work/shared
$ singularity shell --nv /usr/bin/Modulus_v21.06.sif
and run an interactive shell within it.
$ jupyter notebook list
This shows the token and URL link. Copy the URL. Start the jupyter notebook by
$ jupyter notebook
Paste above the URL in the DCV web browser and accept the warnings.


Tensorboard
We can also start the tensorboard within the singularity container by
$ tensorboard --logdir=./
Paste https://localhost:6006/ in the DCV web browser and you can check the convergence.

Paraview
https://catalog.ngc.nvidia.com/orgs/nvidia-hpcvis/containers/paraview-index
Open a new terminal and pull the image by
$ singularity pull docker://nvcr.io/nvidia-hpcvis/paraview-index:5.7.0-egl-pvw
Start the image by bonding the results data folder. For example, start the paraview in the directory ~/work/shared by:
$ singularity run --nv -B ${PWD}/network_checkpoint_ldc_2d:/data
paraview-index_5.7.0-egl-pvw.sif
Here, we visualize the val_domain/results/ *.vtu format file. Open the paraview in DCV web browser by https://localhost:8080/ and then check your results.

Summary of running Modulus as an interactive job on Rescale
Through interactive jobs, users can develop your code, monitor your neural network training process and do post processing within one job. With an interactive job, you can use visualization and other tools while developing and prototyping your model on a virtual workstationA workstation is a powerful computer system designed for pro... More with the latest NVIDIA GPU hardware on Rescale. This allows you to develop and test your model on a workstation that you may not have on-premises access. Furthermore, if needed, you can scale out your development workflow to multiple nodes using Singularity (Apptainer) container runtime, while still having an interactive GUI environment.
Summary and additional resources
We hope that this tutorial helped you get started with running Modulus on Rescale. As you can see, you can run your Modulus workflow with multi-node GPUs through a Rescale batch job. Or you can run Modulus and other tools interactively on a Rescale End-to-End Desktop job, for developing code, monitoring training and post processing the results in a virtual workstation with multiple GPUs. Our platform provides various GPU architectures for you to choose from and all the assets you need to build your AI workflow.
You can use Modulus to produce surrogate models for your rapid design explorationThe iterative process of investigating and evaluating differ... More studies. It can be applied in almost any engineering design problem that can be expressed in the form of PDEs (partial differential equations), ranging from fluid dynamics to structural dynamics and to multidisciplinary optimization. For more information, please refer to NVIDIA Modulus docs and Resale’s Modulus presentation at NVIDIA GTC 2022.