Up to 80% savings with AWS Spot Instances
In a previous post, we outlined the case for selecting cloud infrastructure over an on-premises deployment for managing big data workloads. Taking advantage of Spot instances to realize substantial cost savings is one of the benefits of selecting the cloud. Spot instances are a feature of AWS consisting of spare EC2 instances offered at a discount. The price of the instance changes in real-time based on demand. AWS users make a bid indicating the most they are willing to pay for the instance. If the Spot price is less than the bid, the user receives that instance.
Big Data and Cost Benefits of Using Spot Instances
Big data workloads can be bursty, with data teams needing to scale jobs at a moment’s notice. By incorporating Spot instances, data teams can better manage the cost of rapidly growing workloads.
Since Spot instance prices are based on AWS users’ bids, using them leads to significant cost savings. When compared to On-Demand instances, users who utilize Spot instances can save up to 80% on the price, even for the same instance type.
Challenges with Managing Spot Instances
While Spot instances can save on cost, they require extensive time and resources to manage due to the following factors:
Automated Spot Instance Management with Qubole
Qubole Data Service (QDS) provides a policy-based way to automate the Spot instance bidding process, allowing data teams to take full advantage of Spot instances without devoting resources to managing it. Qubole can use AWS Spot nodes when dynamically adding cluster nodes or as part of the core minimum nodes for a cluster (not recommended for stability purposes).
QDS users can select a maximum bid they are willing to pay for a Spot instance. The system then automatically places bids for them, making the process easy to use. Qubole Hadoop clusters begin with nodes at On-Demand instances and can be rebalanced automatically by switching On-Demand instances for Spot instances when Spot instance availability is higher. It works by identifying On-Demand instances that aren’t busy performing tasks, provisioning Spot instances from AWS, then terminating the previously identified On-Demand instances.
With this ease of use, Qubole clusters can be used for advanced provisioning strategies. Those strategies come in three categories:
Additional built-in intelligence in using Spot nodes with QDS include:
You can read more about integration with AWS Spot nodes in Qubole’s documentation. This feature is available for Hadoop, Spark, and Presto clusters.
Customer Data on Usage and Savings
A majority (~82%) of Qubole customers’ clusters use automated Spot instances. Qubole customers run nearly half of all their workloads using Spot instances, which yields up to 80% cost benefits as compared to on-demand pricing.
One particular customer, BloomReach, which offers a data-driven marketing solution, was able to utilize Spot instances in 85% of its workloads.
Jorge Rodriguez, Tech Lead in BloomReach’s data platform team, explained. “The nice thing about Qubole is that even when Spot instances are reclaimed, your job doesn’t necessarily have to fail … because it will just spawn a new Spot instance, and your job will continue running.”
To learn more about how BloomReach incorporated Spot instances into its big data ETL environment, click here.
This is part of a series exploring the benefits of cloud architecture. See the first post of the series here, and come back for more on the separation of compute and storage and the economics of provisioning to peak.