| | |

Rescale Maturity Index: Intelligence to Maximize the Reliability of Cloud and Multi-Cloud Operations

The maturity of cloud hardware influences cost, performance, and reliability. The Rescale Maturity Index is your competitive edge for compute-driven innovation.

Today, technology analysts and vendors promote the importance of having a multi-cloud strategy by raising concerns about vendor lock-in. Common reasons cited include cost efficiency, business continuity, and overall resiliency in operations…but many companies may be missing out on one of its biggest benefits: flexibility and choice of many architectures.

Modern scientific research, engineering, and product R&D depend on high performance computing, specifically the high-powered, specialized hardware that power their data- and compute-intensive applications. In this R&D context, having flexibility to switch to the latest chip architectures can be the key to getting to market first with a superior product. While resilience and continuity are indeed great benefits of cloud and multi-cloud strategies, more organizations are realizing the significant value of multi-architecture flexibility. 

Cloud service providers (CSP) are now introducing new computing chip architectures at an astonishing pace. Typically through partnership with hardware vendors such as Intel, AMD, Arm and NVIDIA, CSPs are announcing new and better chip architectures every quarter. Additionally, cloud providers have begun making their own computer chips, providing an even greater array of chips from which to choose. 

The Promise and Challenge of Being Multi-cloud and Multi-Architecture 

For companies that proactively and continuously adopt the latest technologies, application performance gains and cost savings can be a major competitive advantage.  This approach is increasingly relevant for both multi-cloud and single-cloud operations, as all cloud providers now offer a range of architectural options. 

However, most organizations struggle to achieve continuous architectural adoption due to the complexity of doing it effectively. Behind the scenes, there are many steps in the onboarding, testing, and deploying of new architectures, but the primary obstacle is simply not knowing how to make selections across an increasingly diverse set of hardware options available to them. 

In most cases, organizations conduct benchmarking at the time of onboarding a new software to determine the best hardware configuration. Benchmarking, while valuable for measuring performance of a small set of architectures at that point in time, does not account for the continuous release of new architectures nor their scale and regional availability. 

Maturity Intelligence: How Rescale Helps Customers Get More from Their Cloud Investments

Amidst the explosion of chip choices, it’s more important than ever to pick the best hardware and software configuration for a given workload. Maturity—the measure of reliability and scale of specific architectures or hardware configurations in the cloud—is an essential criteria in multi-cloud operations, specifically to harness the latest and greatest architectures and get the best cost performance possible. Failing to factor in maturity can result in overspending on cloud infrastructure and software licensing. Some organizations are beginning to see the benefits of maturity tracking, while others aren’t sure where to begin.

Rescale gives organizations access to the world’s largest selection of architectures from our network of hyperscale and specialty CSPs. With so many choices, picking the right architecture for your application needs is critical, which is where having a thorough understanding of maturity helps our customers capture more value from their cloud operations. Under the hood, Rescale tracks these issues with our Maturity Index, a composite score of each architecture’s production-readiness, service level-assurance, and capacity across each CSP and region. Rescale continuously maintains this index as new chips reach the market and older ones fade from use.

How Maturity Fits into a Broader Framework of Computing Intelligence

While Rescale’s Maturity Index provides organizations a consistent framework to compare cloud provider and architectural options, it’s one piece of a broader intelligence framework. Rescale’s Intelligent Computing Framework is made up of four combined indices: maturity, performance, cost, and sustainability. Depending on a given customer’s objectives, they can use this intelligence to make decisions and set policies to achieve them. 

Rescale Intelligent Computing Framework consists of 4 indices: maturity, performance, cost, and sustainability

This intelligence framework governs how each architectural configuration is scored and recommended on Rescale. These configurations are known as Rescale Coretypes, which we pre-configure, benchmark, and tune for optimal application performance before scoring. After scoring, Rescale’s proprietary Compute Recommendation Engine (CRE) factors in this intelligence to recommend the best-fit architectural configurations when users set up new software or as new architectures become available. 

Why Finding The Right Data on Hardware Is So Difficult

When chip manufacturers launch a new architecture such as AMD’s EPYC Genoa, Intel Icelake, or NVIDIA H100, they typically start with a single cloud provider in a specific region. A typical scenario might be that cloud provider A worked closely with the chip manufacturer early on in the process and so is ahead of other cloud providers in adopting the new chip type. In addition, cloud provider A introduced this offering in a single US region, then expanded to Europe after six months and Japan after 18 months. So the maturity of such cloud service offerings using new chip types is usually not consistent across geographies. 

Rescale customers, however, have real-time access to this information on the Rescale platform. In addition, Rescale performs a thorough evaluation of qualitative performance, maturity, and capacity. And this allows users to compare the offerings from the different cloud service providers. Rescale invests significant effort to understand new architectures as soon as they enter the market. This is automated and enhanced by intelligence about cloud infrastructure and application requirements we gather across the usage patterns on the platform. 

The Rescale Advantage: Getting Customers Cutting-Edge Technologies Faster

Rescale’s HPC hardware and software experts use powerful automations and industry experience to conduct a thorough evaluation of all relevant architectures. For most organizations, the same process would be costly, requiring months of initial work (and ongoing work) that divert resources from strategic projects and innovation efforts. To see behind the curtain, here are the three phases of Rescale’s maturity assessment process

Before infrastructure offerings are broadly available to the public, Rescale is already working to onboard, evaluate, and prepare new architectures for customer use. This effort includes:

  • Determining which new architectures are best suited for HPC, AI, and other R&D applications.
  • Requesting early/preview access to the infrastructure from the CSP.
  • Running internal testing and micro-benchmarks to ensure the architecture performance meets the manufacturers’ advertised specifications.
  • Running application-specific benchmarks and tuning, and thoroughly assessing several variables such as the operating system, software versions, and MPI versions to ensure best possible performance. For this step, Rescale often works directly with CSPs, software, and hardware vendors. Rescale customers might also have early beta access to these new services.

Once the internal assessment is completed, new architectures are released as generally available to customers. Rescale’s ongoing activities that measure maturity include: 

  • Ensuring that GA architectures meet Rescale’s unique service level assurance (SLA). Our SLA guarantees that jobs submitted will complete successfully (a high bar) which requires that architectures are not just available but stable to meet software requirements. 
  • Ensuring that adequate scale (capacity) is available for widespread usage across various regions. As the demand grows for computing and launching larger workloads, Rescale enables organizations to perform parallel computing across hundreds of thousands of cores and millions of jobs. 

Even after the release of Coretypes to customers, Rescale continues to monitor the availability and reliability of architectures by:

  • Monitoring the global cloud infrastructure network for non-Rescale service issues.
  • Phasing out (deprecating) aging architectures that do not provide optimum value, have reliability issues, or are no longer supported. 

“We are passionate about getting our customers best-in-breed performance and efficiency for their specific R&D workloads. We are constantly evaluating, testing, and tuning new architectures and doing the upfront groundwork for our customers. Our in-house HPC expertise coupled with the Rescale platforms’ automation and intelligence ensures that customers are able to get the most value from their HPC investments.”

– Radhika Gundavelli, HPC Engineering Manager, Rescale

Customer Results Driven by Rescale Maturity Index

To illustrate how dynamic the computing landscape can be, here are a few examples of how Rescale’s Maturity Index helped our customers make the best decision about their HPC infrastructure to support their R&D efforts.

  • A life sciences customer with a global footprint of laboratories conducting genomics analyses and diagnostics needed to be extremely cost conscious and while still focused on getting results fast. Based on their custom application needs, Rescale was able to identify a new Arm architecture that could deliver improved reliability and cost-performance, helping keep costs low and reliability high. 
  • An aerospace manufacturer needed more power for running its computational fluid dynamics (CFD) analysis. Based on Rescale’s assessment, the company transitioned to AMD EPYC Milan processors at a large scale across multiple clouds, which has provided both greater reliability, lower costs (about 20 percent savings), and greater performance. The scaling and parallel efficiency on these new architectures allows the manufacturer to run each CFD job on 1000+ cores, significantly accelerating their ability to develop and test new designs. Rescale’s intelligent automation also allows the customer to switch seamlessly to AMD EPYC Rome processors when high-demand Milan processors are not available. (Rescale customers can enable this capability with the Coretype Sets feature).
  • An automotive manufacturer investigated switching to GPUs instead of CPUs to run their computational fluid dynamics (CFD) solver. GPU acceleration has potential to accelerate many traditional computer-aided engineering (CAE) software by as much as 10 times. For this customer, the switch improved their cost-performance about 30 percent while greatly shortening their R&D cycle. Despite the intense global demand for GPUs and the organization’s strict geographic compliance requirements, Rescale was able to to find the cloud service provider and chip architecture that met their needs.

Now That You Know, What Will You Achieve Next?

If you’ve made it this far, you probably share our passion for hardware performance and, more importantly, what it means for industry innovation.

While all this information about chip performance and maturity might be new or complicated, this message is for you: We believe that cloud should make innovation simpler and faster, which is why we automate the critical aspects of high performance computing for you, including offering our hardware infrastructure maturity assessment. No need to start a spreadsheet and enlist the help of several business analysts, we have you covered.

If you are curious about new cutting-edge architecture alternatives or want to save money on your cloud bill, we would be happy to make some recommendations. Please contact our HPC experts. We look forward to hearing about your computing goals and discussing how we can help.

Author

  • Garrett VanLee

    Garrett VanLee leads Product Marketing at Rescale where he works closely with customers on the cutting edge of innovation across industries. He enjoys sharing customer success stories, research breakthrouths, and best-practices from Rescale engineers, scientists, and IT professionals to help other organizations. Garrett is currently focused on the convergence of supercomputing, HPC, and AI simulation models and how these trends are driving discoveries in science and industry.

Similar Posts