Understanding Data Security in QDS

Qubole secures data by caching the command results within a Qubole account, through access control and Amazon S3 access. It also supports Hive authorization to provide table-level-security in Hive, Presto (only to Hive tables). In the near future, Qubole plans to support Hive authorization on Spark data (Hive table data).

See also Understanding Data Encryption in QDS.

Securing Data through Amazon S3 Access

Qubole accounts on Qubole-on-AWS supports two types of authentication to access AWS resources that are mentioned below:

Qubole supports adding S3 locations for the files in commands UI editor/API call requests before they are run. The Hive command results can be exported to an Amazon S3 location. From the QDS UI, you can use the Explore page to access the Amazon S3 buckets. You can use Explore page to do export data to S3 location and analyze data in the command editor UI on the Analyze page. For more information, see Data Exploration.

Protecting IAM Credentials

Qubole encrypts IAM access credentials and stores them on QDS servers.

Securing AWS S3 Data

Qubole secures the AWS S3 data by providing each user with a unique S3 access token/role only if the user has raw S3 access.

Securing Hive-table Data

Hive authorization is one of the methods to authorize users for various accesses and privileges. Qubole provides SQL Standard-based authorization with some additional controls and differences from the open source. See SQL Standard Based Hive Authorization for more information.

Qubole’s Hive authorization is aimed at providing Qubole Hive users the ability to control granular access to Hive tables and columns. It is also aimed at providing granular control over the type of privileges a Hive user can have over a Hive table.

For more information, see:

Hive authorization is also supported for Hive-table-data in Presto. For more information, see:

Securing Data Stores

Qubole provides these two ways to secure data stores:

  • Allow Qubole’s tunnel server in data stores, which requires you to only allow a specific port. After allowing, the data store port is accessible through Qubole’s tunnel server. Tunneling with Bastion Nodes for Private Subnets in an AWS VPC lists the IP addresses of the Qubole tunnel servers.
  • Bring up data stores in a private subnet in a VPC. The VPC must be accessible through a Bastion node in the public subnet of the same VPC. In this case, data stores can be accessed only through the Bastion node. To provide data store access to Qubole, allow Qubole’s tunnel servers in the Bastion node for port 22. Qubole uses an SSH tunnel to access data stores through the Bastion node.