Presto as a Service

QDS supports Presto as a Service running on the AWS Cloud. Presto is an ANSI SQL-based real-time querying engine developed by Facebook. On QDS, analysts using Presto can query data on HDFS or stored in S3. In particular, Presto is best at certain workloads where a faster query engine is needed to offer interactive speeds for data exploration, or a wide variety of connectors are required to query multiples data sources. This level of performance can be achieved at any scale without the extensive costs required for data warehousing implementations.

Process High Velocity Data

Interactive, near real-time performance for SQL queries over petabyte scale data.

Cloud Optimized with Autoscaling

Read and write optimization for cloud storage dramatically enhances query performance and the user experience and reduces processing costs. Plus, with advanced Autoscaling, you’ll pay only for resources actually used

Complete Cluster Life Cycle Management

Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don’t have to worry about starting or stopping Presto clusters. QDS does all the heavy lifting for you.

Instance Selection Options

QDS supports a wide variety of Amazon EC2 instance types for your Presto cluster, giving you the freedom to optimize instance selection for your workload requirements and AWS pricing options.

Spot Instance Pricing

QDS lets you automatically incorporate Amazon spot instances that can cost up to 90% less than on-demand instances.

Elastic Pricing Model

With QDS’ pay-per-use pricing model, you’ll only pay for what you actually use by compute hour.

Amazon Virtual Private Cloud (VPC) Support

QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2 instances in your virtual network. Plus, you can launch dedicated instances within a VPC on single-tenant hardware.

Quick and Easy Debugging

QDS makes it easy to debug both active and historical Presto queries. Results and logs are always available even without active running clusters.

User-Defined Functions (UDFs)

Analysts can create custom user-defined functions in addition to standard open source functions to easily migrate existing Presto work to QDS.

In-Cluster Caching

QDS supports caching within the cluster to improve performance for queries that frequently make use of the same data set.

Visualize in the Moment

Presto-backed visualization tools on QDS enable up to the minute pivot summaries and dashboards for petabytes of data. BI and visualization tools connect to QDS-managed Presto clusters through the ODBC driver.

When Should I Use QDS for Presto?

QDS for Presto works best when users are SQL-proficient and need access to quickly query data, but do not want to invest in and move to a data warehouse solution.

Ad Hoc Queries

Use Presto to query your petabyte scale data and get results quickly. With Qubole, data is persistent but compute clusters are elastic. You only pay for the compute when you actually run queries. Leveraging QDS for Presto enables the speed and scale of an always-on solution such as Redshift without paying for or managing always-on clusters.

Multiple Data Sources

Presto users can write ANSI SQL queries that unify data across sources, including object stores such as S3, relational databases such as MySQL, and real-time streams such as Amazon Kinesis. With QDS providing a central metastore for defining the structure of your data, you can join multiple data sources together to get a complete analytical view of your organization.

We continue to see the fast pace that Presto achieves, and as our data quickly scales over time, I would not be surprised to see Presto’s query time inversely match that rate of expansion.

Elian Smith

Senior Analyst for MediaMath

How Does Presto Fit into the QDS Landscape

QDS gives you the freedom to work with Presto along with Spark, Hadoop MapReduce, and Hive as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. Presto is designed for interactive, ad-hoc querying over large data sets. For batch or ETL workloads where reliability is paramount, Qubole offers Hive and MapReduce as options that maximize both performance and scale. For machine learning and iterative algorithm design, Qubole offers Spark with a Notebook interface.

Share our Post