Avoid Unexpected Rogue Queries with Presto Strict Mode on Qubole

Start Free Trial
May 23, 2019 by Updated March 1st, 2021

Qubole Data Platform orchestrates thousands of clusters in the cloud for our customers on a daily basis. From years of experience, we’ve learned from cluster administrators that even with Qubole’s accurate workload-aware aut6oscaling, they always have a ceiling for the cluster operation budget (so they have to set the maximum number of nodes for the cluster). When that ceiling is reached, a simple miswritten SQL statement can potentially hog the entire cluster’s resources and starve other workloads.

Typical issues include:

  • Scanning a large amount of data from the entire table
  • Having a massive CROSS join between two large tables without conditions
  • Sorting millions of rows without limits or reduced scope

All of these issues not only result in a poor user experience but also significantly inflate a company’s cloud costs, which they only find out in hindsight.

Taking Inspiration from an Open Source Solution

The same problem is handled by the Hive Community by setting the following parameter:

When it comes to Presto, Qubole refers to Hive’s design along with the typical query patterns we’ve seen among our Presto customer base. In R55, Qubole added a feature called Presto Strict Mode, which, once enabled, restricts users from executing certain queries.

Currently, this implementation supports three types of restrictions:

  1. MANDATORY_PARTITION_CONSTRAINT : Queries on a partitioned table should have a predicate on at least one of the partition columns.
    Example:select * from <TABLE_NAME> will fail throwing an error of type Strict Mode Error with the message:
    Table scan on partitioned table: <TABLE_NAME > without filter or constraint
    whereas select * from <TABLE_NAME> where <predicate> executes.
  2. DISALLOW_CROSS_JOIN : Queries with cross join will fail.
    Query: select * from <TABLE_1> cross join <TABLE_2> will fail throwing an error of type Strict Mode Error with the message:
    Cross joins are not allowed when strict mode is enabled
  3. LIMITED_SORT : Queries are allowed to sort only a limited number of output rows. Example:
    Query: select * from <TABLE_1> order by <COL_1>
    will fail throwing an error of type Strict Mode Error with the message:
    Sorting without limit clause is not allowed when strict mode is enabled

These constraints are applied on the final plan after query optimizations are done.

This feature is implemented as a visitor on top of Presto’s Logical Plan. Until very recently in Presto, connectors were not allowed to participate in query planning, so we had to find a workaround to access the Hive table–related information in the planning module.

Over time, we plan to extend this list of restrictions by adding more such constraints based on customer feedback.

Configuring Presto Strict Mode

To enable Presto Strict Mode at the cluster level, we can set qubole-strict-mode-restrictions in etc/config.properties to semicolon separated list of restrictions.

This restriction will fail queries that are without partition constraints or those doing unlimited sort.

Values supported:

  • NONE

We will gradually roll this feature out to all Qubole customers. If you are a Qubole customer and need this earlier, you can enable it at the cluster level or reach out to Qubole support to enable this feature for your account.

Developing a Long-Term & Forward-Looking Enterprise Solution

Improving query performance and user experience while keeping customers’ TCO low has always been the core value proposition of Qubole. Adding Presto Strict Mode on Qubole is the first step toward that goal and gives cluster administrators protection against unexpected costs or simple human mistakes.

Moving forward, Qubole is also adding Financial Governance across all services and products on the platform to provide a holistic tool for administrators to manage their costs and budget allocation at the account and user level.

Start Free Trial
  • Blog Subscription

    Get the latest updates on all things big data.
  • Recent Posts

  • Categories

  • Events

    QUBOLE LIVE DEMO: Google Cloud Platform (GCP) Enables You To Simplify Today and Future Proof for Tomorrow

    Jan. 27, 2022 | Global

    Data Lake and Data Warehouse – A modern data strategy discussion

    Feb. 2, 2022 | Online

    QUBOLE LIVE DEMO: Stop The Cloud Cost Madness With Graviton and AWS. Switch And Save to Reduce Your Data Lake Costs Today

    Feb. 3, 2022 | Global

    CONTINUOUS INTELLIGENCE DAY – Continuous Intelligence in Finance 2022 and beyond

    Feb. 24, 2022 | Global

    Data Innovation Summit MEA 2022

    Mar. 7, 2022 | Global

    Data2030 Summit 2022 – APAC Edition – Data Strategies For Data And AI-Driven Organisations

    May. 24, 2022 | Global
  • Read Airflow on Anaconda: A Match Made in Heaven, Perfected by Qubole