Run Hundreds of Free AI & HPC Applications in the NVIDIA NGC Catalog Accelerated on Rescale

Overview

NVIDIA GPU Cloud (NCC) Catalog is a curated set of free and open-source GPU-optimized software for AI, HPC and Visualization. The NGC Catalog consists of containerized applications, pre-trained models, Helm charts for Kubernetes deployments, and industry specific AI toolkits with software development kits (SDKs). Deploying NGC on Resale makes it easier for engineers and scientists to get started with a variety of new use cases for research and development on a single platform. Rescale automates the necessary hardware infrastructure, software, and workflow steps to make the latest computational tools more accessible and accelerated.

In this tutorial we are going to show you how to easily run NGC Catalog on Rescale. NGC can be run by either a batch job using the Rescale command line or interactively using Rescale Workstations.

Batch Job

Batch Job Video Tutorial

Sample Job and Results

You can access and directly launch the sample job by clicking the Import Job Setup button or view the results by clicking the Get Job Results button below. 

In the above sample job, we run a python script building a TensorFlow machine learning model with MNIST dataset. (Reference link).

To submit the batch job, the first step is to choose Basic Job Type.

Inputs

Upload running scripts as input files. In the sample job, the python scripts are inside the compressed archive file test.zip.

Software

Next choose the NVIDIA NGC Catalog tile.

In the Command box, we provide the pre-populated command to login your NGC account and pull and run docker containers.

Users can pull containers from NGC Catalog. Here, we use the TensorFlow container as an example.

To pull the Tensorflow container, go to the NGC Catalog and click the Pull Tag to copy the docker pull cmd.

In the command box, paste the pull tag command and input the following commands to run the container, as in the sample job.

docker pull nvcr.io/nvidia/tensorflow:22.06-tf1-py3
docker run --gpus all -v ${PWD}/test:/test nvcr.io/nvidia/tensorflow:22.06-tf1-py3 
bash -c 'cd /test;  python test.py'

Users can login their NGC Catalog account through Rescale platform.Check the Use Existing License box on the Rescale in the License Options. Input your API Key and Org Code here, as shown in the figure below.

To get your API key and Org code, users need to login to your NGC Catalog account and generate your own API Key and Org code.

We provide a prepopulated command to login NGC account. Users just need to specify their container to be pulled and command to run in the command box. 

Hardware

We provided a variety of architectures for your job, like different CPU and GPU core types. Here we choose 4 GPUs for TensorFlow training.

Status

Click the Submit button to submit the job. Once the job is running,  you can check process_output.log for the output.

Results

You can also open a terminal by clicking the Open in new window button to debug your script or run nvidia-smi to check the GPU usage and results while the job is running.

After the job is finished, all results data files will be saved to the Results tab. We can also open the file and check the results.

In the process_output.log file, we can see the NGC Catalog account is logged in successfully.

Here are the final output and results for the sample job.

Other use cases

NVIDIA Modulus

Input file is an example.zip download from Modulus official website. For this use case, the sample job can be loaded through this link and results here. 

Cmd (pull latest version NVIDIA Modulus from NGC. Run lid driven cavity (LDC) flow case in the  Modulus container with docker run ). For this container, NGC_API key is not required.

docker pull nvcr.io/nvidia/modulus/modulus:22.03.1
docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864            
--runtime nvidia -v ${PWD}/examples:/examples nvcr.io/nvidia/modulus/modulus:22.03.1 
bash -c 'cd ldc;python ldc_2d.py'

Monitor the process_output.log and check the outputs.

Workstations

Sample Workstation

On the Rescale platform, users can also work interactively with Workstations. To submit a job, click the New workstation button. Here is a sample job for reference.

Configuration

For the Input, upload the scripts you need to run. Next, choose NGC Catalog Interactive Workflow tile as Software. Unlike batch jobs, we don’t need to specify a command script to run a Workstation; instead you will interact with it using a GUI. 

Check the Use Existing License box in the License Options to login NGC account. Input your NGC API Key and Org Code here. The generation steps of the NGC API key and Hardware settings are the same as batch jobs and not repeated here. 

Once the workstation is ready, click the Connect button on the top or Connect using local client (Download and install the NiceDCV client) to log into the Workstation.

In the Workstation, we can open a terminal window by clicking Activities on the top left corner and find the Terminal and launch it.

In the opened terminal, we can do interactive work. In a terminal window,  run the following two lines cmd to login the NGC account.

$ echo -e "\n \n $(echo $NGC_ORG_ID) \n \n \n" | ngc config set
$ docker login -u '$oauthtoken' --password-stdin nvcr.io <<< $(echo $NGC_CLI_API_KEY)

Then we are ready to pull containers from NGC Catalog.

Tensorflow Container

In this example, we pull the Tensorflow container:

$ docker pull nvcr.io/nvidia/tensorflow:22.06-tf1-py3

Start the Tensorflow container by:

$ docker run --gpus all -it --rm -v ${PWD}/test:/test nvcr.io/nvidia/tensorflow:22.06-tf1-py3 bash

Run the following simple case in the container (Reference link):

$ python3
$ import tensorflow as tf
$ tf.enable_eager_execution()
$ mnist = tf.keras.datasets.mnist
$ (x_train, y_train), (x_test, y_test) = mnist.load_data()
$ x_train, x_test = x_train / 255.0, x_test / 255.0
$ model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])
$ predictions = model(x_train[:1]).numpy()
$ predictions

If everything works well, we can check the predicted results output on the terminal.

Modulus Container

Pull the latest version of NVIDIA Modulus from NGC.

$ docker pull nvcr.io/nvidia/modulus/modulus:22.03.1

Start the Modulus container by the directory:

$ docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864
--runtime nvidia -it -v ${PWD}/examples:/examples
nvcr.io/nvidia/modulus/modulus:22.03.1 bash 

Successfully load the latest Modulus.

Run the lid driven cavity case in the container by:

$ cd ldc
$ python ldc_2d.py

As the above simulation is running, we can pull the Paraview GUI container for visualization purposes.

ParaView GUI container

Open a new terminal window and pull the paraview docker container from NGC.

$ docker pull nvcr.io/nvidia-hpcvis/paraview:egl-py3-5.9.0
$ docker run --gpus all -p 8080:8080 -v ${PWD}:/ldc     
nvcr.io/nvidia-hpcvis/paraview:egl-py3-5.9.0         ./bin/pvpython 
share/paraview-5.9/web/visualizer/server/pvw-visualizer.py         --content 
share/paraview-5.9/web/visualizer/www/         --port 8080         --data /ldc

Open the ParaView in DCV web browser (like Firefox web) by https://localhost:8080/

Then you can open the GUI and visualization the result files.

Conclusion

We hope that this tutorial helped you get started with running NGC Catalog on Rescale. As you can see, you can pull and run all the containers available on NGC Catalog with multiple GPUs through a Rescale batch job. Or you can run and launch containers interactively on a Rescale Workstation, for developing code, monitoring training and post processing the results with multiple GPUs. Our platform provides various GPU architectures for you to choose from and all the assets you need to build your AI workflow. For additional information you can visit the NVIDIA documentation page for NGC here.