Solving Queuing Quarrels: Slurm-Rescale Connector Makes Hybrid-Cloud Seamless for On-prem HPC Users

Rescale just announced support of Slurm workload manager via a seamless cloud connector, opening up a new path to hybrid-cloud for organizations who use Slurm to manage their on-premises high performance computing (HPC) clusters. Slurm is a popular, open-source workload scheduler that is widely used in two-thirds of the most powerful supercomputers in the world, especially for government, national lab, and higher education systems.

HPC Schedulers, Queuing, and the Limits of On-Prem Hardware

In traditional, on-premises computing, schedulers are a critical part of the technology stack, allowing many users to interact with their organizations’ servers for computational tasks. Schedulers allow users and administrators to use commands to monitor and manipulate when jobs should be executed and how much available resources they should utilize. Using a scheduler ensures HPC jobs are completed in a sequential and efficient manner, maximizing the hardware utilization of a given set of hardware or ‘cluster’. This “scheduling” becomes “queuing” when multiple jobs stack up causing delays, an inconvenient side effect of on-premises (fixed) computing causing scientists and engineers to wait until resource capacity frees up for their job. Frequent delays caused by insufficient resources are detrimental to R&D and commercialization timelines which is why there is such a strong emphasis on expanding access to additional hardware e.g. bursting to cloud. It’s all too common for IT/HPC managers and end-users to disagree on how best to balance the trade-off between capacity availability and utilization for on-prem deployments.

Bridging Traditional and Cloud-based Operations

Cloud computing largely solves the problem of queuing with the availability of virtually unlimited capacity. While market data shows that cloud HPC growth outpaces on-prem by 2-3x, many aging on-prem systems are still in operation and managed by schedulers today. Most organizations (78%) say they have already begun using cloud for HPC but many of them will run these infrastructures separately. For many organizations in the middle of their digital transformation, being full-cloud is still a destination further down the pike and they need solutions that allow them to fully utilize all available resources. One thing is for sure, computing requirements rarely decrease so bursting to the cloud is often a first step on the way to a cloud-first HPC operating model. With Slurm expanding support for cloud and specialized CPU and GPU architectures, it’s an ideal scheduler to partner with to bring Rescale’s advanced cloud HPC automation to bigger audience.

True Hybrid, Multi-Cloud HPC with Rescale and Slurm

Building an HPC practice in the cloud from scratch is nuanced, especially one that takes full advantage of cloud’s benefits for HPC. Having control and flexibility to scale up or down across multiple clouds and multiple architectures has a big impact on workload performance and cost-efficiency. IT administrators are accustomed to having full control and understanding of their HPC operations on-premises, and incorporating cloud can introduce new complexity. Where many point solutions or homegrown tools struggle, Rescale seamlessly connects many common digital tools across the computing stack. Rescale’s Slurm-connector co-developed with RedLine Performance Solutions allows HPC users and admins to submit jobs using familiar Slurm commands to any cloud of their choice including AWS, Azure, Google, and other hyperscale and specialty cloud service providers – all without any prior experience with Rescale required!

Slurm-Rescale Connector Workflow – While Slurm can control jobs on the local HPC, jobs submitted on Rescale are controlled by interaction with the Rescale API using additional flags on the Slurm commands.

The Slurm-Rescale Connector demonstrates a capability to submit jobs from Slurm to the Rescale platform using the Rescale API. The connector code is a modified version of the Slurm source code that allows users to access the Rescale platform using familiar Slurm commands. To accomplish this, the source code of the Slurm repository was forked into a separate branch and customized with Rescale specific updates. It will be maintained by Rescale and updated after every new Slurm release. To accomplish hybrid orchestration of workloads, typical Slurm scripts are modified and extended to fork the workflow to either on-prem or Rescale resources based on policies set by the user’s organization.

Empower Unconstrained and Accelerated Digital R&D

This additional functionality for Rescale users opens up new possibilities to many organizations who were operating disparate systems or were cloud hold-outs due to concerns about changing their user experience. Using Slurm’s familiar commands for job submission and monitoring, engineers and scientists can leverage their existing resources like normal, while administrators can solve computing resource constraints by automatically shifting to the cloud as-needed. Any IT or HPC manager who deploys a successful Hybrid cloud solution is an instant hero, scoring big points for 1) reducing user wait times to zero and 2) maximizing the useful life of existing computing investments. Having a consistent workload management experience can ensure continuous operations across multiple compute environments, while the added performance optimization of Rescale will ensure organizations select the latest and best architecture for each workload.

For organizations that require high levels of security and compliance, they can now enjoy the benefits of cloud with the assurances of Rescale’s leading standards like ITAR, FedRAMP, and ISO-27001. Rescale is the first and only platform for full-stack HPC with FedRAMP Moderate Authorization and continues to invest in additional measures to ensure that that the public cloud is accessible to both public and private sector organizations.

Getting Started with Slurm on Rescale

We are excited to showcase this new capability at Supercomputing 22 and invite anyone interested to stop by our booth (#2741) for a demonstration. You can read the official announcement of the news here.

Garrett VanLee

Garrett VanLee leads Product Marketing at Rescale where he works closely with customers on the cutting edge of innovation across industries. He enjoys sharing customer success stories, research breakthrouths, and best-practices from Rescale engineers, scientists, and IT professionals to help other organizations. Garrett is currently focused on the convergence of supercomputing, HPC, and AI simulation models and how these trends are driving discoveries in science and industry.

View all posts

Cookie	Duration	Description
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
player	1 year	Vimeo uses this cookie to save the user's preferences when playing embedded videos from Vimeo.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
sync_active	never	This cookie is set by Vimeo and contains data on the visitor's video-content preferences, so that the website remembers parameters such as preferred volume or video quality.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-32985745-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
utm_campaign	past	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	past	This cookie is used for storing the session content value if present.
utm_source	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_chtbl	session	No description available.
_dtses	30 minutes	No description available.
_dtuid	10 years	No description available.
BIGipServersj30web-nginx-app_https	session	No description
email	past	No description available.
gclid	past	No description
handl_ip	1 month	No description available.
handl_landing_page	1 month	No description available.
handl_original_ref	past	No description available.
handl_ref	past	No description available.
handl_url	1 month	No description available.
li_gc	2 years	No description
muc_ads	2 years	No description
username	past	No description available.

Rescale Platform

Overview

HPC & AI Software

HPC & AI Architectures

Security & Compliance

Ecosystem Integrations

Pricing

HPC as a Service

Intelligent Batch

Elastic Cloud Workstation

Storage Fabric

Enterprise Management

Multi-Team Management

Performance Management

Software Publisher

Digital Engineering

AI Physics

Knowledge Management

Computational Pipelines