An Intro to AWS Spot Instances Learn How to Maximize Your Cost Savings


Host: Hey, everyone. Welcome to today’s TED talk called An Intro to AWS Spot Instances, How to Maximize Your Cost Savings. In today’s session we’re going to be talking about how you can significantly reduce the cost of running your applications in AWS using Spot Instances. This time, I’d like to introduce our speaker for the day, Alex Aidum. Alex joins us from right here in Qubole, where he leads our education initiative. With that, I’ll go ahead and hand it over to you, Alex.

Alex Aidun: Hey, folks. Thanks for being with us today and taking the time to speak with us about AWS Spot Instances. First and foremost, we want to establish why we are having this conversation here today. When we think about moving to the Cloud or working with Cloud services, we are thinking about scenarios where we are going to be working with elastic infrastructure.

We are no longer going to be responsible for buying large amounts of hardware and maintaining them. We’re going to lean on other services to do that. One of those services is AWS, Amazon Web Services. AWS has done the job of buying all of this hardware and then, making it available to us as a service in the cloud.

When we think about interacting with AWS to take advantage of the Cloud services, in particular, we’re thinking about EC2 instances, Elastic Cloud Compute, which are going to be those computers that are being launched for us inside of the Cloud, that we’re going to run our processes.

There are two different methods we want to think about when we purchase these instances. On-demand instances are purchased when you need them. They are the same price every time. They’re paid at the start of the hour of usage. The other method of purchasing these instances is known as Spot Instances. These types of instances are, as well, requested when needed, but they can be purchased at a major discount in price. They are paid for at the end of the hour of usage.

The reason why we have these two different methods is because Amazon is taking on the burden of purchasing all of the hardware and then, making it available to us as a service. Not all of the capacity is used all the time. Amazon want to solve the problem of utilization just like us. They’re making these instances available, as Spot Instances, in the hope that people will pick these up at a discount. That gives us an opportunity to save some money while still achieving our processing goals.

There are two other options that are available inside of Qubole. I will come back to these. In case you are on the phone and have been exposed to these concepts before, I do want to acknowledge that there is a sub-type of on-demand, which is a Reserved On-Demand Instance. That is a type of instance which you request in advance. You can get a minor discount in price. You pay for this in advance.

There’s also this idea of a Spot Block, which you request in advance. You get a minor discount in price and its paid when granted. Again, these two concepts are going to be sub-concepts to the concepts that we are currently discussing. They are relevant to our conversation. We’ll come back to these a little bit later.

When we start thinking about our AWS Spot Instances and how do we actually acquire these instances, first we need to acknowledge that we’re going to this through a bid system. It is this bidding process where you say, “I’m willing to pay a certain dollar amount for an instance.” Where you can achieve significant saving because you can purchase these instances at 50, 60, 70, 80% discount of the standard on-demand price. This gives us a significant opportunity to improve our budgets while still pushing through the workloads that we want to get done. We can reduce our infrastructure cost by taking advantage of this feature that AWS is offering us.

There are two great resources for helping to determine the near-real time and historic bid prices for EC2 Instances. These are provided by AWS. The first is the Spot Bid Adviser. This is available via the link currently on the slide. The second would be the Spot Price History, which is provided inside of AWS EC2 console. Both of these resources are provided by AWS to help you in the decision making process around the bidding that you’re going to do for these discounted instances.

Let’s take an example of what that behavior might look like. The AWS Spot Instance will be won based on your bid. However, you will only pay the market price for the instance at the end of the hour of usage. Putting some numbers to this. Suppose you bid 75 cents for an instance that costs one dollar. Suppose the market price is currently 50 cents. Then, you can win that instance because you’re bidding 75 cents and the market price is 50.

At the end of the hour of usage that you pay, that you pay for the Spot at the end of the hour, you will pay the market price, assuming it is still below 75 cents. Therefore, even though you were bidding 75% of the standard on-demand price for the Spot Instance, not only did you win it, you paid 50%. That’s a pretty significant saving. One beyond which we were even bidding for at the start of this discussion.

When we think about AWS bid resolution process, it’s important to keep in mind that even if your bid wins, you might not pay your bid value. You’ll end up paying the market price, which could be below your bid value. In some cases, substantially lower.

It is important to keep in mind that you can lose AWS Spot Instances. If we think about this example where we are bidding 75 cents for a one dollar instance. Suppose the market price is currently 50 cents. Then, we can win this instance. However, if the market price rises above 75 cents while you have the instance or if another user or enterprise comes into the system and would like to purchase the same instance via on-demand and there is no more capacity left, you could potentially lose your Spot Instance.

It’s important to keep in mind, you will not pay for the instance, if it is lost, but it can have a major impact on the workload which are being processed. We have to keep in mind that when working with AWS Spot Instances, there is a potential for a Spot Instance to be lost. This could affect workloads that are running in environments that we are trying to optimize from a cost or budget perspective.

On the back of that thinking, let’s talk about how Qubole fits into this picture that we’re painting here today. Everything that we’re discussed in regards to the bidding, can be done manually. We would expect to leverage automation to help improve the bidding logic, so that we can reduce the maintenance demands on the administration and on people who are doing development in the system.

We are going to be working with many different machines. They are going to be coming online in Clusters. Maybe two, maybe five. Maybe 20, 100, 400 even.

Determining when to bring additional machines online and what type to bring online can potentially be a very complicated task. We think about the collection of machines as an orchestra. Just like an orchestra, they all have to move together in a harmonious way to produce something beautiful. With an orchestra it’s music.

In this case, data. If we think about the collection of computers as an orchestra, then we can think about Qubole as the conductor of that orchestra. Qubole, as the conductor, will be responsible for making sure that all of the machines are communicating together. Doing so effectively and making sure that there is the right amount of processing power present in the system to accommodate what is being submitted by the users.

Of course, all of this is there to help improve the experience for the administrators, and the developers, and the users, so that they focus less on the infrastructure and more on the work they are doing. We can take advantage of the AWS Spot Instances from within Qubole,` and letting Qubole serve as our conductor, we can hedge against the potential loss of Spot Instances from within the Qubole product.

We can take advantage of Spot Blocks inside of Qubole. These are not subject to the same loss that the standard Spot Instances are subject to. We can still get discounts with stability. We also have the option to take advantage of Qubole’s Placing Policy for blocks stored on Spot Instances to increase redundancy of data necessary for workloads currently being processed in the Cluster.

By working with the Qubole product not only can we employ a policy for bidding and maintaining our infrastructure dynamically. We can also hedge against the potential loss of Spot Instances, while we are trying to take advantage of the cost-saving feature that AWS is providing us with.

This point is going to share my desktop and walk us through some of the settings inside of the Qubole Cluster that are going to effect this behavior when it comes to bidding and orchestration of the Cluster. Looking at a spark Cluster inside of Qubole. I’m currently inside of the products on the Clusters page. I can select the, edit button, here and I’ll be taken to my Cluster configuration.

You’ll notice here that my Slave Node type is R3 to extra-large. You’ll notice here that I have a minimum Slave Node count and a maximum Slave Node count. This is when we start to see an opportunity to take advantage of these Spot Instances. I can deploy my Cluster with a base number of instances. Then, as we said, because Qubole is the conductor of our cloud orchestra, we can lean on Qubole too based on the collective computing needs of the users in the environment.

Scale this Cluster from two to eight instances as necessary. If we look at the composition here, we’ll see that within this composition pane we have the details about how this Cluster is to be structured. You’ll see that the master of minimum slave notes are currently set to on-demand nodes. That would be the two minimum Slave Nodes that we saw in the previous pane in my master node which serves as the brain for my Cluster.

Then, you’ll see the auto-scaling Slave Nodes as Spot Nodes. This is where we are taking advantage of the Qubole product to serve as the conductor of our orchestra. When we are scaling from two to eight Nodes, we can do that with Spot Nodes. That is where we are going to see an opportunity to take advantage of the cost savings feature that is made available by AWS.

During this Cluster growth we will add more computing capacity and we will do so at a discount. We are getting, quite literally, more bang for our buck. Notice the maximum bid price. It just set to a 100% of the on-demand instance price by default. We know that because of the way bid resolves. Even if I bid 100%, I am likely going to be spending less than a 100% of the on-demand price.

You could bid more than 100% and there are some cases when the market can be volatile and you can see bid prices for Spot Instances rising above the standard on-demand price. This happens because someone is willing to pay more money to keep the instance than to lose it due to the bid rising high or the market rising higher than their bid.

There’s request time out value for how long we will wait during the bid in minutes before we cancel the bid.

There is a percentage of Spot Nodes, how many of the Nodes in this Cluster. During a period of growth can be spot. There’s an option here too, in the event no Spot Nodes are available at the place you are request then use on-demand Nodes. There is also the option here to use the Qubole placing policy, where you can enforce replication of the data that is currently on the Spot Nodes to the on-demand nodes. So that, if a Spot Nodes does get lost we can retry the task as needed.

Notice that under the auto-scaling slate node I could change from Spot Nodes to Spot Block. If I want to take advantage of that feature inside of AWS. Notice I could also change my master. I’m in the Slave Nodes from on-demand to Spot Nodes or Spot Block. Keeping in mind that Spot Nodes are subject to loss, we would recommend against making your master and course Slave Spot Nodes because if your master Nodes goes down then your Cluster will no longer usable.

If you are running highly critical workloads you’ll probably want to tend towards on-demand Nodes. If you are offering users environment or ad hoc supporting, query in, long-running, engineering workloads then this is definitely a feature you want to think about taking advantage of.

Host: All right. Thanks, Alex. At this time we’re going to be doing Q&A. If you have a question feel free to submit it now. Alex, our first question is,when it comes to Spot Instances, how do I know what I should bid?

Alex: That’s an interesting question. When you think about the Spot Instances market, again, keeping in mind that the Spot Instance Market is a second price sealed bid, which for those of you on the phone have studied your game theory, basically means, again, even though you’re bidding a certain value you’re paying a lower value. I would recommend bidding higher than a 100% of the on-demand instance price because probability says the majority of the time you’re going to pay lower than your bid.

In those moments that it does rise above the on-demand price, it’s relatively infrequent. If you’re sitting at 120% of your on-demand price, you’re probably in the 90-95 percentile of bidders. The likelihood you’ll keep your instance during potential spot loss is pretty high. I think when you keep in mind how the market works and how the instances behave, I would tend toward getting a little bit over the 100% mark for the on-demand instances.

Just for the sake of stability. Again, acknowledging that the likelihood you’re actually going to pay 100% or more is very low. Does that make sense?

Host: Yes. Our next question is, can I run a fully spot instance Cluster?

Alex: Absolutely. That is your prerogative, you may totally do that. I would say for environments that are running super critical workloads that is not recommended because, again, if you lose your Master Node you lost the head to the snake, there’s no more capacity to control the Cluster will become unusable. That being said you could use Stop Blocks because they are not subject to the same loss.

That would still give you significant discounts although Spot Blocks are used for a certain amount of time. If the duration that you needed to be online is exceeding what you’ve actually paid for it then you could run into some complications. For ad hoc environments where there’s a lot of people running sequel and say Presto, for example, then you might very well run a sizable Spot Instant Cluster because you don’t have the foresight as to what users are going to be doing.

Depending on the demands of the day or the week or the month or the individuals teams there might be pretty erratic demand from the user base. Running a high percentage Spot Cluster will give you the capacity to satisfy those demands and do so at a discount. You don’t have to have a very large expensive Cluster running for them. In the event they need it you can have a small Cluster running and then scale via Spot for those moments of growth.

Host: Great. Alex I’ve got one more question for you. Beyond AWS, is there a comparable concepts for the other Cloud?

Alex: That’s a good question. In terms of the other Cloud services, AWS has the most sophisticated of these options. The Google Cloud product will also have a similar concept available to it. This is not something that’s universal across all of the different Cloud providers and the AWS Spot market is arguably the most recognizable of these cost-saving features across all of these, not to say the other Clouds don’t also have competitive pricing, but this is been around for quite a while it’s probably the most developed feature.

I would say that the other Clouds will potentially develop these types of features or will come in at a low cost point, so that can be competitive.

Host: All right. That wraps up our questions for today. Thank you so much to our speaker Alex, as well as everyone who joined us live today. If you enjoyed the webinar today we have another webinar coming up, which you can see on our screen here, and that will be on August 3rd, and we’ll be doing a Cluster comparison. We’d love to have you join us live for that one.

We also recently posted a blog on Spot Blog and we just published a new book on building a modern data platform, with all that being said, thank you again for joining us today and I hope you all have a great week.