Building a Biphasic System Using GROMACS and VMD Tutorial

Overview

The life sciences industry comprises businesses and research that work towards improving the lives of organisms. These companies include biomedical technologies and engineering, the development and manufacturing of pharmaceuticals, and cell biology which may require high performance computing (HPC). The industry continues to remain at the forefront of addressing the effects of the growing aging population and pandemics such as COVID-19; pharmaceutical companies were able to develop a coronavirus vaccine with the help of HPC.

This tutorial shows you how to build a heterogeneous biphasic system in GROMACS, a molecular dynamics software that simulates proteins, nucleic acids, and lipids, followed by a visualization of the system using Visual Molecular Dynamics (VMD). A heterogeneous biphasic system consists of two different types of molecules in two different types of phases. In this tutorial, the two types of molecules shown are hydrophobic (cyclohexane) and hydrophilic (water). Since cyclohexane is hydrophobic, cyclohexane molecules will repel water molecules.  In fact, aqueous biphasic systems can be used in the recovery, extraction, and purification of proteins and antibodies, which can help those in the life sciences industry develop pharmaceuticals and vaccines to help the greater population. 

Rescale is able to make the setup of this tutorial significantly easier by simplifying the process to three stages: inputs, software, and hardware. From there, once the job completes, a set of output files are generated in which you can use to visualize the results (either directly, through a post-processing script, or interactively with Rescale Workstations). In doing so, compared to other techniques of manually extracting and purifying proteins, enzymes, and antibodies for biphasic systems or running the simulation on just one software, Rescale allows you to reduce the run time with your choice of hardware, makes the job scalable and efficient, and allows you to create and visualize biphasic systems in one centralized platform. 

The expected time to complete the job from start to finish is about fifteen minutes. Alternatively, you can also click the Get Job Results button below, and review the full setup and results for the job immediately. For the purpose of this tutorial, we ran a short and easy job. In the future, one could make a clone of the job and run the job on different hardware and a different number of cores to see how the type of hardware and number of cores affects the time to complete a job. This is easily made doable by performance profiles on Rescale Workstations. 

This tutorial is from Dr. Justin A. Lemkul from the Virginia Tech Department of Biochemistry (“GROMACS Tutorial: Building Biphasic Systems”). 

Video Tutorial

Job Files

Constructing the Biphasic System in GROMACS with Rescale Jobs

In this part of the tutorial, you will be constructing the biphasic system within a box of a given size by filling the box with the cyclohexane and water molecules. Rescale Jobs does the hard computational part of the tutorial and simplifies the setup by prompting you to input the necessary files, software, and hardware.

Configuring Your Job

Go through the following sections to properly configure your job.

  1. To start using Rescale, go to platform.rescale.com and sign up or log in using your account information. Using Rescale requires no download or implementation of additional software. Rescale is browser-based, which allows you to securely access your analyses from your work station or from your home computer.
  2. From the main screen of the platform, click on the + Create New Job button at the top left corner of your screen. This will take you to a job Setup page.
  3. There are now five (5) Setup stages to complete.

First, you need to give the job a name. Since Rescale saves all of your jobs, we recommend you to choose a unique name that will help you to identify it later. To change the name of your project, click the pencil icon next to the current job name in the top left corner of the window. 

Next, download the cyclohexane coordinate file and topology. You can choose any hydrophobic molecule you would like. Topology files for molecules can be obtained from the PRODRG server, but all CH2 molecules must be assigned 0 charge. In order to build a biphasic system, the density must be stabilized. For simplicity’s sake, the already equilibrated box is linked here

Then upload the cyclohexane coordinate file and topology files along with the equilibrated box by clicking on the Upload from this computer button.

On completion of this step, the Inputs setup page should look like that shown below:

Click Next to move onto the Software Settings section of the Setup. Now, you need to select the software module you want to use for your analysis. You can scroll down or use the search bar to search for a software. For this demo, scroll down and click on GROMACS.

Next, the Analysis Options must be set: 

  • The drop-down selector allows users to choose their preferred version of GROMACS. 

  • The input files used in this tutorial were tested with GROMACS version 2021.3, so select that option.

Once the above step is completed, you need to add the analysis execution command for your project. This is a command specific for each software package and each input file being used. For these input files and GROMACS, the execution commands are shown below.Since the equilibrated box file was already inputted, there is no need for this first command. However, it is shown to show you what happened before the box was equilibrated and how to change the dimensions of the box if one were to do this from scratch. This command randomly inserts the cyclohexane molecules into a box. The box can be any size – just change the three ‘5’ numbers to any numbers you would like. The -nmol number is random. However, you can change the number to any size and the box will fill up with that number of cyclohexane molecules: 

 gmx_mpi insert-molecules -ci chx.gro -nmol 1200 -box 5 5 5 -o chx_box.gro

This command enlarges the box to be twice as tall with the cyclohexane molecules placed on the bottom half of the box. 

So that the water molecules fill the same amount of the box as the cyclohexane molecules, keep the x and y dimensions of the box fixed and double the z dimension of the box. These are the first three numbers in the command. The last three numbers in the command are the system center (just x/2, y/2, z/4), basically the original box dimensions divided by 2 (x/2, y/2, z/2) because we double the z dimension to account for adding the water molecules. 

Editconf will automatically center the cyclohexane layer in the box: 

gmx_mpi editconf -f chx_10ns.gro -o chx_newbox.gro -box 4.30795 4.30795 8.6159 -center 2.153975 2.153975 2.153975

The last command fills the other half of the box with the water molecules:

gmx_mpi solvate -cp chx_newbox.gro -cs spc216.gro -p chx.top -o chx_solv.gro

These should be the same commands that you would use on a local terminal. 

On completion, the Software Settings page should look like that below:

Now that you have chosen the analysis code you want to use, the next step is to select the desired computing hardware for the job. Click on the Hardware Settings icon.

  • On this page, you must select your desired Core Type and how many cores you want to use for this job. A “core” is a virtualized computing unit, with each core representing a single core from a physical computer. For this demo, select Iolite-1 since GROMACS utilizes GPUs. For a more thorough explanation on why Iolite-1 was selected, please refer to the Performance Profiles part of the Further (Optional) Steps section below. 
  • The Number of Cores should be set to 4.
  • The Walltime is how long you want the job to run until it automatically stops. Keep in mind that once a job is stopped (either by the walltime running out or by clicking the red Stop button in the upper right hand corner of the screen), it cannot be restarted. You want to choose a reasonable amount of time that allows you to complete the job and for the job to produce all of the desired output files while balancing the monetary cost of running the simulation for too long. For this job, set the walltime to 2 hours.

Your Hardware Settings screen should look like this:

Move on to the Post Processing screen by clicking the Post Processing icon. For this tutorial, we will not need post processing because we will be using Workstations in an interactive environment to visualize the biphasic system using VMD, so hit the Next button at the bottom right hand corner of the screen to proceed to Review.

Finally, move to the Review stage of Setup and check that the setup is correct by reviewing the table. It should look like that below:

Now, hit the Submit button in the upper right hand corner of the screen.

Now you can monitor the progress of your job from the Status tab. To run your job, Rescale boots the cluster as you defined it, runs the simulation, and shuts down the cluster immediately upon completion. The entire process is completely automated and secure, and requires no further input from the user. The whole analysis should take less than fifteen minutes.

The Status window will look like that shown below:

The Results tab shows all the resulting files that are associated with your job. Given that some analyses result in many output files, Rescale gives you the option to download all files simultaneously or individually as needed.

As the job is completed, the results show up on this page. Click Refresh Results to show all of the completed results, and then click Download to download your files. Note that if you click Download before the Status page shows Cluster stopped in the Job Log section, then you may download a zip file containing data for only some part of the job. 

Here is a screenshot of the Results page after the job is done running; information for each individual run was added to the table as they were completed:

Visualization of the Box with Cyclohexane and Water Molecules with Rescale Workstations

Now we will be moving onto visualizing the biphasic system using Visual Molecular Dynamics (VMD) using Rescale Workstations. Rescale Workstations is an interactive environment to post-process a job – it allows you to use the generated output files from a job and visualize the results and allows you to upload coding notebooks and work with them in real time. In this case, we will be doing the former.

Configuring Your Workstation

Go through the following sections to properly configure your workstation.

  1. First, click on the Workstations icon at the upper left hand corner of the screen. This will take you to the Workstations home screen. Click + New Workstation and then Create New Workstation when prompted by the side screen.This will take you to a Workstations Setup page.
  2. Or, to access Workstations, click the Visualize icon at the top right hand corner of the Results page of the Biphasic System job. 
  3. There are now 4 Setup stages to complete.

Like a Rescale Job, you want to give the workstation a unique name because Rescale saves all of the jobs and workstations that you do. To change the name of your project, click on the pencil next to the current workstation name in the top left corner of the window. 

Next, attach the Biphasic System job that we just ran by clicking the Attach Jobs button. On completion of this step, the Attachments setup page should look like that shown below:

Click Next to move onto the Software Settings section of the Setup. Now, you need to select the software module you want to use for your analysis. You can scroll down or use the search bar to search for a software. For this demo, scroll down and click on VMD Workstation (Windows).

Next, the Analysis Options must be set: 

  • The drop-down selector allows users to choose their preferred version of VMD. The input files used in this tutorial were tested with VMD Workstation (Windows) version 1.9.3, so select that option.

On completion, the Software Settings page should look like that below:

Now that you have chosen the analysis code you want to use, the next step is to select the desired computing hardware for the workstation. Click the Hardware Settings icon.

  • On this page, you must select your desired Core Type and how many cores you want to use for this workstation. A “core” is a virtualized computing unit, with each core representing a single core from a physical computer. For this demo, select Emerald (On Demand Priority). There is no need for a GPU to run VMD on Rescale Workstations, so Emerald (On Demand Priority) was chosen because of its low cost and high efficiency. 
  • The Number of Cores should be set to 1. 
  • The Walltime is how long you want the workstation to run until it automatically stops. Keep in mind that once a workstation is stopped (either by the walltime running out or by clicking the red Stop button in the upper right hand corner of the screen), it cannot be restarted. You want to choose a reasonable amount of time that allows you to complete the workstation and for the workstation to produce all of the desired output files while balancing the monetary cost of running the simulation for too long. For this workstation, set the walltime to 8 hours, so that you have enough time to account for mistakes and interact with the visual model. 

Your Hardware Settings screen should look like this:

Finally, move to the Review stage of Setup and check that the setup is correct by reviewing the table. It should look like that below:

Now, hit the Submit button in the upper right hand corner of the screen. 

When the default access settings popup shows up, click None. However, if you want to visualize using the local client, then you could select Use this IP.

Monitoring Your Workstation

You can monitor the progress of your workstation from the Workstations home screen.

On the Workstations home screen, you can look at My Workstations to look at all of the workstations you have created as a Rescale user:

Or you can look at Active Workstations to look at the workstations you are currently running:

In both My Workstations and Active Workstations, you can track the status of your workstation by looking under Status. As you can see, the status of the Biphasic System Workstation is currently at ‘Starting’ which means the workstation is loading the attachments, software, and hardware you chose in the Setup stage to prepare for the interactive part of Rescale Workstations.

Once submitted, the workstation will be at the status ‘Starting’ for a couple of minutes. After about 15 minutes, the status will change to ‘Active’ and a blue Connect button will pop up next to the Name of the workstation you are looking to use:

Once that happens, press it and it will take you to another browser tab – the in-browser workstation.

In-Browser Workstation

This is where you will be able to interact with your visual model on Rescale Workstations.

First, click on the VMD Icon on the left hand side of the screen.

Three windows will appear: One is the display which will show the model, one is the terminal, and one is the main window which you will use to manipulate/create the model. On the VMD Main window, hover over File and click on New Molecule

A new window, the Molecule File Browser, will pop up on the same screen. Click the Browse button next to Filename. It should look like the picture shown below:

Then, once the files window pops up, click Desktop on the left hand side. Then, click attached_jobs. The attached_jobs file should contain the job you attached in the first step of the Workstation. It was downloaded into the interactive Workstations system when you attached it in Setup: Attachments.

Then click on the Biphasic System file that is located within the attached_jobs folder. In this case, the file is called “dQBdX-Biphasic_System,” but yours may be called something different.

Next, now that you are within the dQBdX-Biphasic_System file, double click on the “run1” file.

Finally, select the chx_solv.gro file that was completed within the Biphasic System job. This file should have all of the necessary components to create a visual of the biphasic system – it is the output file.

Once selected, the chx_solv.gro file will be inputted into the Molecule File Browser. Click Load.

The biphasic system model should pop up on the VMD Display. You can interactively play around with the model and twist it around to see it at different angles.

The red molecules are the water molecules while the blue molecules are the cyclohexane molecules. As you can tell, the two different types of molecules which are in two different phases (hence, called a biphasic system) are separated molecularly into two separate parts of the box because cyclohexane is hydrophobic and water is hydrophilic. 

Note: Some of the water molecules may be on the other half of the box where the cyclohexane molecules are. To change this, one must change the radius of the cyclohexane molecules from 0.17 to 0.35 at the very least and run the simulation again, so that the cyclohexane molecules are large enough so that the water molecules are unable to be seen.

Further Steps (Optional)

Here are some optional steps that you can do to take the tutorial one step further.

To add a protein to the aqueous half of the box, you would place the protein in a unit cell of desired dimensions and manually set its center by changing the numbers in the command line (x/2, y/2, z/4). This is because we doubled the z dimension of the box to account for adding the water molecules. 

For example, to assign a generic peptide to the center of the aqueous half of the box, the command line would be:

gmx_mpi editconf -f peptide.gro -o peptide_newbox.gro -box 4.30795 4.30795 8.6159 -center 2.153975 2.153975 6.461925

All of the other input files, hardware, software, and command lines for the job would stay the same. You would just add this new command to the Analysis Options in Software Settings.

In order to see a cost vs. time to solve graph of varying coretypes for your job, first completely run the job and then click on the Performance icon at the top of the main screen of the Rescale platform. The performance profile will help you decide which hardware is the best for your job by viewing how long it takes for the job to complete and the cost for each hardware you select. 

Like a Rescale Job and Rescale Workstation, you want to give the performance profile a unique name because Rescale saves all of the jobs, workstations, and performance profiles that you do. To change the name of your project, click on the pencil next to the current job name in the top left corner of the window. 

Next, click on Select Configuration near the middle right. When the side icon pops up, click Jobs and then select the Biphasic System job by clicking Create from Job

To select and compare hardware, click the +Add button under the Hardware Benchmark Runs table. You can even change the number of cores that the hardware runs on.

The selected hardware will pop up in the Hardware Benchmark Runs table. 

Once done selecting all of the desired hardware to compare, click the blue Run button at the top right of the Hardware Benchmark Runs.

The Biphasic System job will be run concurrently with each different hardware. Once the performance profile is completely done running, the Cost vs. Time to Solve Chart will pop up in which you can analyze which hardware is the best for your job in terms of your values.

As you can see, the Biphasic System job takes under a minute to run on each hardware. This showcases Rescale’s powerful and efficient computational platform. If you value time to solve over cost, then the Dolomite hardware at 4 cores would be best. If you value cost over time to solve, then the Iolite-1 hardware at 4 cores would be the best. Iolite-1 was chosen for this tutorial based on its low costs and moderate time-to-complete efficiency. Performance profiles can help you make a decision on which hardware and number of cores to use when completing a job according to your values and motivations. 

Conclusion

This tutorial shows you how to fill a box of given dimensions with cyclohexane molecules on one half and water molecules on the other half. In the future, you can change certain numbers on each command line to change the dimensions of the box and how much the box is filled with each molecule. In addition, you can control which molecules to put inside the box by changing the input coordinate file and topology file.

Biphasic systems are largely used in the life sciences industry for the development of vaccines and drugs and analysis of biocompatible environments. Completing the tutorial on Rescale helps you leverage high computing power, access to different types of hardware and software to test scalability and diversity for the same simple job, and having the creation and visualization of molecules within the same workspace for easy access for members among a team.