Managed Spot Block Instances for Cost-Savings

Start Free Trial
June 23, 2020 by Updated March 21st, 2024

Amazon EC2 Cost Optimization

Qubole is excited to announce the general availability of Managed Spot Block instances that provide up to 40% cost savings over On-Demand EC2 Instances. Managed Spot Block instances are an excellent choice for certain workloads that require the same reliability provided by On-Demand instances but at a lower cost.

Cluster Lifecycle Management

With Managed Spot Block instances, Qubole clusters would automatically acquire the required number of Spot Block instances during cluster upscaling, schedule jobs on these instances, and terminate them when not needed. Qubole clusters will also proactively replace the existing Spot Block instance that is about to expire (based on pre-defined duration), with a new Spot Block instance in an automated manner. This proactive and seamless replacement ensures cluster continuity with zero downtime and near-zero interruption to jobs. 

Spot Block Instances

Using Spot Block instances in Qubole to replace your existing Auto-Scaling* On-Demand instances is simple, easy, and one click away, as illustrated below. 

Amazon EC2

Amazon EC2 instances come in different flavors, where the availability/reliability of the machines is determined by various factors. The two well-known types of VMs in AWS are On-demand and Spot. However, there is a third type of VM that can be leveraged – Spot Block.  Spot Blocks can offer up to 40% cost savings over On-Demand VMs with higher reliability than Spot Instances. Spot Block instances are guaranteed to be available for a finite duration (1-6 hours) and are provisioned based on the available capacity in the Spot instance market. 

Benefits Of Spot Block Instances

When comparing all four types of instances (including Reserved Instances), Spot Block instances are a great economic choice, especially for workloads that require Five 9’s of reliability and have to complete within a predictable amount of time. Essentially for these types of workloads, Spot Blocks provide the same Cloudonomic benefits as Reserved Instances but without the burden of long-term commitment or penalty for under-utilization. Spot Block Over On-Demand without compromising on the reliability

Spot Block Management

While Spot Block instances can provide you with five-nines of reliability (much like an on-demand instance), they are only guaranteed for a maximum of six (6) hours (unlike an on-demand instance). This means that any cluster that is configured with a Spot Block instance but does not have Spot Block Management can only be used continuously up to a maximum of six (6) hours. After this time period, the cluster automatically terminates. This could result in the following:

  • Failures: Workloads (jobs and tasks) that were currently running would fail
  • Downtime: Unplanned Interruption to cluster availability
  • Low Availability: The maximum cluster duration is limited to only six (6) hours. 

Spot Block Cluster Lifecycles

As illustrated below, with an unmanaged spot block cluster, jobs that are running at the 6-hour mark will be terminated and resulting in failures. Whereas with a managed spot block cluster, these jobs are allowed to complete successfully without any interruption. In many cases, multi-tenant clusters are expected to run for more than six hours, and workloads can be submitted to the cluster at any given point in time (just minutes before the six-hour expiry time). Guaranteeing cluster and workload execution beyond six hours expiry time requires careful and intelligent maneuvering of Spot Block instances.

Qubole Spot Block Management

To address the above challenges and provide a seamless usage of Spot Blocks, Qubole’s Open Data Lake Platform provides Intelligent Spot Block Management for Spark, Presto, and Hive clusters. By automating the Spot Block management, Qubole delivers the following benefits. 

EC2 Cluster Termination

Spot Block instances will be interrupted by AWS after a pre-defined duration. When the instances are interrupted, Application Master and tasks running on Spot Block nodes will be abruptly terminated resulting in Job failures. To avoid this failure, Qubole monitors the Spot Block instance’s elapsed duration in real time and prevents scheduling tasks on instances that are about to be interrupted by AWS in the next few minutes. 

EC2 Cluster Task Failure

Certain tasks take a longer duration (>30 mins) and cannot be completed before the expiry of the Spot Block instance leading to task failures. Qubole identifies these failed tasks and automatically retries them on a different instance to ensure successful job completion. 

Spot Instance Availability

Spot Block instance is a type of Spot instance. This means that the availability of a Spot Block instance is governed by the same factors that determine the availability of a Spot instance. When spare capacity is not available for a particular instance type or family, the request to provision a Spot Block instance is declined by AWS. To address this problem, and increase the success rate of provisioning a Spot Block instance, Qubole supports requesting multiple instance types through its Heterogeneous capabilities. When AWS does not have the provision of the required number of instances for a particular instance type/family, Qubole automatically retries with other instance types/families until the request is fulfilled. 

EC2 Cluster Autoscaling

Despite Qubole’s best attempts through diversification,  in some cases, Spot Block instances might not be available in the AWS Spot market. When this happens, upscaling will be paused, severely impacting the workloads submitted to the cluster. To recover from this, Qubole automatically provisions On-Demand nodes temporarily to ensure the cluster can upscale. 

Spot Block Rebalancer

Falling back to On-Demand instances is a good contingency plan. However, this can result in unexpected or unplanned cost increases. Qubole’s Automated Spot Block management is opportunistic and monitors the spot market in real-time. Once the Spot Block nodes are available in the market again, Spot Block Rebalancer swaps these temporary on-demand nodes with Spot Block instances.

 Business Continuity

Qubole proactively replaces a Spot Block node that is about to expire with a new Spot Block node. This will ensure that there is a required number of nodes in the cluster at any given point of time and that the cluster is not negatively impacted due to Spot Block node expiration. Once the new node joins the cluster, it will be ready to accept all the tasks that the Spot Block node going through graceful decommissioning would have accepted.

Summary

AWS Spot Block Instances combined with Qubole’s Open Data Lake Platform help customers run their data analytics and machine learning use cases without interruption and optimize their TCO.

*Note: While Qubole supports Spot Block instances as part of Master, Minimum, and Autoscaling nodes Managed Spot Block instance capability is applicable when Spot Block instances are configured only as part of Auto Scaling worker nodes. 

To learn more about Qubole’s Intelligent Spot Management, read our AWS Blog on Spot Optimization. You can also experience the benefits firsthand by signing up for a free trial.

Start Free Trial
Read Columnar Format