Lightweight Azure InfiniBand Cluster Setup

One of the key criticisms leveled against HPC in the cloud is the relatively slow interconnect speed between nodes when compared to on-premise clusters. While a number of niche providers offer InfiniBand connectivity to address this gap, Microsoft is the first major provider to offer this type of high-bandwidth, low-latency interconnect with its new Big Compute solution. This is exciting news because there are relatively few companies that have the resources necessary to manage data centers on a large scale while also dealing with the security compliance issues and certifications needed for enterprises to move their workloads over to the cloud. Fair or not, having the backing of a Microsoft, Amazon, or Google can make a big difference in obtaining corporate IT buy-in.

According to specs, the new A8 and A9 instance sizes provide InfiniBand connectivity with RDMA. This last bit is especially important because, as this blog post correctly points out, having InfiniBand alone is not enough. The transport being used makes a critical difference and TCP performs very poorly. The Big Compute instances support virtualized RDMA that provide near bare metal performance according to Microsoft. This announcement should be a boon for users looking to run tightly coupled simulations in the cloud. This type of “chatty” MPI application is highly sensitive to the latency of the underlying network. However, after taking the platform out for a spin, I think there are a few barriers to entry in its current incarnation.
First, the RDMA capabilities are exposed through an interface called Network Direct which is currently only supported by MS-MPI–Microsoft’s MPI implementation. Applications will need to be recompiled against these libraries. This is not too big of a hurdle since MPI is a well-defined standard and MS-MPI is based on MPICH, which is widely supported. A bigger issue, however, is that applications will need to be written to run on Windows. Thankfully, many of the popular engineering applications being used today already have Windows versions that support MS-MPI. Anecdotally at least, it seems like applications can be recompiled with little effort.
Second, configuring an MPI cluster is very different in the Windows world compared to Linux. While Windows is certainly capable of putting up impressive MPI benchmark numbers, the vast majority of HPC practitioners are currently running on Linux. Configuring an MPI cluster in the cloud for Linux generally boils down to: “Launch your instances, use the package manager to install the MPI flavor of your choosing, set up passwordless SSH amongst all the nodes in your cluster, and create a machinefile”. On Windows, the recommended approach is to install and configure HPC Pack on a Windows Server box (either on premise or in the cloud). This can be difficult for someone familiar with Linux and not versed in the nuances of Windows server administration. While the HPC Pack solution is robust and full-featured, it does feel a bit heavyweight if you just want to run a few benchmarks or a simple one-off simulation. What would be nice is a tool like Starcluster to get people up and running as quickly as possible without having to configure Active Directory, install SQL Server, or figure out Powershell and REST APIs.
It turns out that you can install MS-MPI on Azure without HPC Pack but there doesn’t seem to be a lot of guidance out there on how to do this. Further, there are a number of SSH servers and UNIX utilities that have been ported to Windows. We wanted an easier way to launch an MPI cluster in Windows without having to install, configure, and manage a separate HPC Pack instance. What we ended up experimenting with was using the PaaS offering to deploy a Cloud Service containing a set of startup tasks to perform the following operations on each node:

Install MS-MPI (a standalone installer is available here)
Launch SMPD
Install and configure an OpenSSH server and a standard set of UNIX command line utilities

Each Cloud Service has a single virtual IP (VIP) assigned to it. To work around this, we used Instance Internal Endpoints to allow users to SSH into individual nodes using different ports. Internal Endpoints are opened up so that each Role Instance can connect to the SMPD daemon running on the others. The end result of all this is an easy to deploy .cspkg file and accompanying configuration xml. Users can SSH into Role Instances and use the UNIX commands that they know and are familiar with.

We wanted to run a couple of latency and bandwidth benchmarks against 2 A9 instances. First, we recompiled the osu_latency and osu_bibw benchmarks from the OSU Microbenchmark library against MS-MPI. Then, we deployed the Cloud Service above, copied the benchmark executable to each machine with SCP (note that SCP is not a viable solution if you have large files that need to be moved around but it works fine for smaller files like these benchmark executables). Finally, we SSHed into one of the nodes and launched the executables:

The results of the benchmarks are below. As you can see, the 0-byte latency numbers are ~3us and we are seeing ~7.5GB/s being transferred in the bidirectional bandwidth test for larger message sizes, which is pretty close to full saturation.

# OSU MPI Latency Test # Size Latency (us) 0 3.28 1 3.69 2 3.70 4 3.67 8 3.69 16 4.11 32 4.53 64 5.35 128 6.60 256 2.85 512 3.06 1024 3.44 2048 4.19 4096 5.96 8192 7.60 16384 10.64 32768 15.31 65536 23.32 131072 53.65 262144 85.02 524288 156.81 1048576 299.23 2097152 567.89 4194304 1098.55 # OSU MPI Bi-Directional Bandwidth Test # Size Bi-Bandwidth (MB/s) 1 0.43 2 0.87 4 1.69 8 3.35 16 6.82 32 13.69 64 18.64 128 29.12 256 486.75 512 1174.69 1024 2170.21 2048 3844.66 4096 5982.22 8192 2873.87 16384 7078.87 32768 6669.85 65536 4926.26 131072 4878.30 262144 5853.30 524288 6674.26 1048576 7066.08 2097152 7344.74 4194304 7479.30
These are very impressive performance numbers. I suspect that the real tipping point for Big Compute usage however, will be once Microsoft adds support for Linux VMs with their IaaS solution. It is not clear from the documentation available online what the timeline for this is right now (IaaS support for Windows Server was just recently added). It will be interesting to see how the new low-latency interconnect wars play out in 2014. As always, Rescale intends to remain agnostic on providers and offer our customers the best available hardware.

Ryan Kaneshiro

View all posts

Cookie	Duration	Description
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
player	1 year	Vimeo uses this cookie to save the user's preferences when playing embedded videos from Vimeo.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
sync_active	never	This cookie is set by Vimeo and contains data on the visitor's video-content preferences, so that the website remembers parameters such as preferred volume or video quality.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-32985745-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
utm_campaign	past	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	past	This cookie is used for storing the session content value if present.
utm_source	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_chtbl	session	No description available.
_dtses	30 minutes	No description available.
_dtuid	10 years	No description available.
BIGipServersj30web-nginx-app_https	session	No description
email	past	No description available.
gclid	past	No description
handl_ip	1 month	No description available.
handl_landing_page	1 month	No description available.
handl_original_ref	past	No description available.
handl_ref	past	No description available.
handl_url	1 month	No description available.
li_gc	2 years	No description
muc_ads	2 years	No description
username	past	No description available.

Rescale Platform

Overview

HPC & AI Software

HPC & AI Architectures

Security & Compliance

Ecosystem Integrations

Pricing

HPC as a Service

Intelligent Batch

Elastic Cloud Workstation

Storage Fabric

Enterprise Management

Multi-Team Management

Performance Management

Software Publisher

Digital Engineering

AI Physics

Knowledge Management

Computational Pipelines

Author

Similar Posts

Newsletter Sign Up