Qubole was the first big data platform to offer a true auto-scaling Hadoop-as-a-Service solution. Now, Qubole is pleased to announce the industry’s first auto-scaling Presto-as-a-Service solution.
Why Auto-Scaling Presto-as-a-Service
Explorative analytics is one area that can get quite bursty. A single business question can easily require multiple short queries. For example, let’s say a data consumer wants to generate a list of male customers who spent over a $100 in Q2 of this year. He writes a query and runs it. Finding the answer too broad, he modifies the query by adding criteria like geographical location or a price range (e.g. $100 – $150).
This leads to a burst of queries, wherein each burst translates to a surge in computational resources. However, each time the data consumer stops to investigate results for insights, that demand quickly subsides, and the cluster goes into a temporary lull.
To meet the peaks of bursty workloads, you’d normally have to over-provision your infrastructure, but that would also mean having a grossly underutilized infrastructure most of the time.
The highly scalable nature of the cloud provides an easy solution to this problem. In the cloud, computational resources can be provisioned and de-provisioned according to computational demand. By enabling the cluster to expand and contract quickly and automatically in proportion to the workload, auto-scaling results in full utilization all the time. In addition, because the cloud bills by-the-hour or by-the-minute, auto-scaling enables customers to realize significant cost savings.
Qubole’s auto-scaling service polls Presto to determine which queries are still running and obtains a progress report based on Presto’s own internal statistics regarding query execution. The auto-scaling service runs a sequence of these reports and then derives the optimal number of nodes off of it.
If that number is higher than the current cluster size, the needed nodes are automatically requested from the cloud provider and then added to the cluster. If it’s lower, then nodes are likewise automatically quiesced, and the cloud provider is requested to release those quiesced nodes.