Data Sheets

The Open Data Lake Platform Brief

Qubole Data Sheets

Issue link:

Contents of this Issue


Page 0 of 2

The exponential growth of data, combined with analytics and machine learning (ML) applications, calls for an open data lake architecture. One that ensures openness to data storage, data management, data processing, operations, data access, governance, and security while supporting a diverse range of analytics. An open data lake provides a robust and future-proof data management paradigm to support a wide range of data processing needs, including data exploration, ad-hoc analytics, streaming analytics, and machine learning. Qubole is an open data lake platform that is simple and secure for enterprises working with their trusted and raw datasets in data lakes. THE OPEN DATA LAKE PLATFORM Qubole stores data in multiple formats that are accessible through open standards-based connectors and APIs. It is agnostic to cloud platforms (AWS, GCP, Azure), open-source frameworks (Presto, Apache Spark, Apache Hive, Apache Airflow), or data file formats such as ORC or Parquet. Qubole provides robust data access controls and security features through non-proprietary technologies and APIs. The platform facilitates table, row, and column level granular security enabling regulatory compliance (GDPR and CCPA). Security and policy administrators can grant permissions against already-defined user roles in enterprise directories such as Active Directory, Google Cloud Identity Management. Qubole enables near-zero administration, a unified environment for data pipeline creation, and robust orchestration for continuous data engineering. It allows collaboration through a common workbench, shareable notebooks, dashboards for data scientists, data engineers, and data analysts. SECURE OPEN SIMPLE LEARN FROM THE PAST UNDERSTAND THE PRESENT PREDICT THE FUTURE Business Systems Clickstream Web & Social Geolocation Sensor & Machine OPEN DATA LAKE PLATFORM Server Logs Unstructured AI AND ML DATA DISCOVERY AND ANALYTICS CLOUD OBJECT STORAGE CONTINUOUS DATA ENGINEERING STREAMING ANALYTICS

Articles in this issue

view archives of Data Sheets - The Open Data Lake Platform Brief