Presto on Qubole

Presto is a high performance, distributed SQL query engine for big data. Presto was originally designed and developed at Facebook for their data analysts to run interactive queries on its large data warehouse in Apache Hadoop.

Presto’s architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. One can even query data from multiple data sources within a single query.

Want to learn more about big data processing engines?

Presto on Qubole: Built for the Cloud

Qubole has been offering a managed Presto service since 2014. We offer our customers multiple Presto versions and maintain a regular upgrade process. Qubole’s managed Presto offering has been tailored to the needs of our customers. Qubole blends the latest features form the open source community with Qubole’s proprietary solutions that boost performance, lower cost, improve user experience, and provide smooth administration of Presto clusters.

Key Benefits of Presto on Qubole

Performance Boost

  • Dynamic Filtering
  • Fast Caching with RubiX
  • Smart Query Retry

Lower Cloud Operation Cost

  • Intelligent Flexible Node Management
  • Workload-aware Autoscaling
  • Heterogeneous Cluster Support
  • Safeguards against runaway queries

Ease of Use

  • Simplified Cluster Configuration
  • Zero downtime upgrades
  • Comprehensive Administration Experience

Enterprise-Ready

  • Enterprise-grade security
  • Apache Ranger Support
  • JDBC/ODBC connectors
  • Integration with 3rd party tools

Presto on Qubole vs. Open Source Presto

 

Cost Efficiency and Scalability

QuboleOpen Source
Graceful Low-cost Compute Shutdown *
Spot (AWS) Rebalancing
Spot Block (AWS) Support
Workload-Aware Autoscaling
User-Based Autoscaling
Aggressive Downscaling with graceful decommissioning
Heterogeneous Clusters
Per-second billing
Smart Query Retry
Cost Explorer & Analysis
Strict Mode
(prevent runaway queries)

 

* AWS Spot, Azure Lo-cost VMs, Google Pre-emptible VMs

Performance

QuboleOpen Source
Compute Optimization for joins and filters
Required Worker Node
S3 Direct writes optimization
S3 listing optimization
Rubix (distributed caching)

Workspaces

QuboleOpen Source
Versioning
Scheduling
Dashboarding (Presto Notebook)
Collaboration and sharing

Debugging and Profiling

QuboleOpen Source
Monitoring (Ganglia, DataDog, etc)
Intelligent Log Access

Security

QuboleOpen Source
Access control for notebooks, clusters, jobs, structured data
Audit end-user activity logs
Apache Ranger Integration
SSO with SAML 2.0 support
Data encryption
HIPAA, SOC2 Type2, ISO-27001 compliant environments

Integrations

QuboleOpen Source
Custom Connector with BI tools (Tableau, Looker, etc.)
REST API
AWS Glue Support
Data Source Connectors (Redshift, Postgres, Kinesis*, etc)

 

* Kinesis is being contributed back to OSS

Service & Support

QuboleOpen Source
24/7 support from our Presto experts
Support multiple versions of Presto

Resources

WEBINAR
The Power of Presto for Analytics and Business Intelligence (BI)
WEBINAR
Delivering Self-Service Analytics and Discovery from your Data Lake