The economics of cloud HPC are highly competitive. So why are some organizations still held back by cost misconceptions?
Editor’s note: This is the 2nd blog post of the ebook Dispelling the Myths of Cloud HPC. Read the full piece here.
One of the biggest myths to shifting to high performance computing (HPC) in the cloud is that it’s not cost efficient compared to on-premises options. HPC applications have long run times, so the assumption is that on-demand cloud resources are more expensive over time when compared to the fixed costs of on-prem infrastructure. These assumptions may have held true in the early days of cloud HPC while cloud providers scaled up their specialized infrastructure availability and cloud management platforms were still nascent. But since then, an ecosystem of cloud-native technologies have emerged to completely transform the economic competitiveness of the cloud operating model making it a compelling alternative to on-prem.
What Factors Changed to Make Cloud Cost-Competitive?
Architecture Improvements, Deployed Faster
In addition to scaling up infrastructure capacity, cloud service providers have rapidly onboarded newer, more performant chip architectures that directly lead to cost-performance gains. While the same architectures are available on-premises, organizations with a cloud HPC model are able to adopt and capture value from these technologies much faster instead of waiting for the next hardware refresh which could be years away. On average, the cost-performance ratio offered by cloud service providers improves roughly 30% per quarter. That ratio will continue to improve as new classes of processors designed for specific high-performance workload types become available at an increasing rate, especially with competition heating up between chip makers like Intel, AMD, NVIDIA, and ARM.
New Cost Models with Flexibility to Choose Based on Goals
Cloud HPC also gives customers more cost controls in how they buy. From spot instances/vms to reserved instances/vms, new cloud cost models are significantly driving down the cost of infrastructure. Now, with more choice than ever, organizations can select the right infrastructure options based on business needs or workload prioritization – e.g. accelerating time-to-solve or reducing cost-per-job. For example, reserved cloud infrastructure can offer as much as 72% less than typical on-demand pricing which offers enormous savings for teams that commit upfront set contract terms which is ideal for predictable, steady-state compute needs. Whereas teams with fluctuating needs can still choose from on-demand or spot options, ideal for bursting or as needs scale-up.
Full-Stack Optimization of Hardware and Software
As application complexity drives performance requirements, HPC users look for specialized architectural configurations to meet their needs. At the same time, IT and HPC admins aim to find a balance between meeting user demands and business objectives. Optimizing the full stack of computing resources can become challenging with more applications under management, but cloud’s architectural flexibility can make this easier. Organizations with intelligence on workloads’ cost-performance can codify policies to ensure each workload has the optimal cost or performance profile, or some middle ground of the two. Continuously capturing the value from ongoing cost-performance improvements in compute hardware can be an ongoing source of savings.
For teams that rely on commercial software, licensing costs can often be 2x higher than infrastructure, so HPC teams are leveraging cloud to maximize the use of each available license token/seat. Rightsizing infrastructure in the cloud or taking advantage of on-demand infrastructure and licensing allows organizations to reduce the overall costs of computing. Rescale helps operationalize this strategy with features like license queuing that enable job sequencing to increase utilization of available licensing and avoid unnecessary licensing costs. The example below illustrates a given starting point and the various scenarios that could be chosen with different hardware combinations.
Realistic Total Cost of Ownership Comparisons of On-Prem vs. Cloud
Some comparisons of TCO for on-prem vs. cloud HPC fail to recognize many of the advancements above and resulting cost improvements. They also often leave out many of the hidden on-prem costs of overhead to power and maintain these systems, costs which are simply baked into the core/hour costs. These flawed cost models also don’t factor in any incremental value gained in the cloud – e.g. enhanced user experience, accelerated implementation, improved sustainability, etc. – but we’ll leave those out to focus on cost.
To consider an example, let’s estimate an average costs $.03/core-hour to run an HPC application in an on-premises environment. This estimate doesn’t include the overhead costs of managing on-premises environments so, in reality, it’s much more likely that the fully-loaded cost is closer to $.05/core-hour to run an HPC application in an on-premises environment. A recent 2021 State of Cloud Computing Report finds that on a pure price-performance basis on-premises, home-grown HPC environments are less and less cost-effective in the wake of recent advances made by cloud service providers who offer some VMs running at $.02-.03/core-hour. Even though we’re talking about a difference of pennies per core-hour, anyone who manages these systems knows that can make a huge difference in annual spend.
Financial Accounting Practices are Shifting
Organizations are no longer required to treat these cloud platforms as an operational expense. Organizations can take advantage of capital budgets to consume reserved instances of cloud infrastructure or employ a mix of both financing models as they best see fit. And because cloud hardware selections get better over time, versus five years of the same on-prem hardware, many IT teams are shifting to cloud to maximize the performance and efficiency of other major investments like talent and costly software licensing.
Now What Will You Do With This Information?
Now that you’re armed with this information, the myth of cloud price extortion doesn’t have to hold your organization back any longer. You can use these strategies and data to be a champion of positive change for your R&D, IT, and finance stakeholders. Rescale is here to help with cloud HPC solutions and expertise to meet you wherever you’re at in your journey, from running your first job to orchestrating your global HPC practice in the cloud. We are happy to help our customers take advantage of these cost efficiencies while also driving innovation with improved computing capabilities. We’ll even let you take all the credit when your CFO commends you on a job well done!