Qubole Data Platform orchestrates thousands of clusters in the cloud for our customers on a daily basis. From years of experience, we’ve learned from cluster administrators that even with Qubole’s accurate workload-aware autoscaling, they always have a ceiling for the cluster operation budget (so they have to set the maximum number of nodes for the cluster). When that ceiling is reached, a simple miswritten SQL statement can potentially hog the entire cluster’s resources and starve other workloads.
Typical issues include:
- Scanning a large amount of data from the entire table
- Having a massive CROSS join between two large tables without conditions
- Sorting millions of rows without limits or reduced scope
All of these issues not only result in a poor user experience but also significantly inflate a company’s cloud costs, which they only find out in hindsight.
Taking Inspiration from an Open Source Solution
The same problem is handled by the Hive Community by setting the following parameter:
When it comes to Presto, Qubole refers to Hive’s design along with the typical query patterns we’ve seen among our Presto customer base. In R55, Qubole added a feature called Presto Strict Mode, which, once enabled, restricts users from executing certain queries.
Currently, this implementation supports three types of restrictions:
- MANDATORY_PARTITION_CONSTRAINT : Queries on a partitioned table should have a predicate on at least one of the partition columns.
select * from <TABLE_NAME> will fail throwing an error of type Strict Mode Error with the message:
Table scan on partitioned table: <TABLE_NAME > without filter or constraint
whereas select * from <TABLE_NAME> where <predicate> executes.
- DISALLOW_CROSS_JOIN : Queries with cross join will fail.
Query: select * from <TABLE_1> cross join <TABLE_2> will fail throwing an error of type Strict Mode Error with the message:
Cross joins are not allowed when strict mode is enabled
- LIMITED_SORT : Queries are allowed to sort only a limited number of output rows. Example:
Query: select * from <TABLE_1> order by <COL_1>
will fail throwing an error of type Strict Mode Error with the message:
Sorting without limit clause is not allowed when strict mode is enabled
These constraints are applied on the final plan after query optimizations are done.
This feature is implemented as a visitor on top of Presto’s Logical Plan. Until very recently in Presto, connectors were not allowed to participate in query planning, so we had to find a workaround to access the Hive table–related information in the planning module.
Over time, we plan to extend this list of restrictions by adding more such constraints based on customer feedback.
Configuring Presto Strict Mode
To enable Presto Strict Mode at the cluster level, we can set qubole-strict-mode-restrictions in etc/config.properties to semicolon separated list of restrictions.
This restriction will fail queries that are without partition constraints or those doing unlimited
We will gradually roll this feature out to all Qubole customers. If you are a Qubole customer and need this earlier, you can enable it at the cluster level or reach out to Qubole support to enable this feature for your account.
Developing a Long-Term & Forward-Looking Enterprise Solution
Improving query performance and user experience while keeping customers’ TCO low has always been the core value proposition of Qubole. Adding Presto Strict Mode on Qubole is the first step toward that goal and gives cluster administrators protection against unexpected costs or simple human mistakes.
Moving forward, Qubole is also adding Financial Governance across all services and products on the platform to provide a holistic tool for administrators to manage their costs and budget allocation at the account and user level.