Blog

×

Big Data Analytics: Microsoft Azure Data Lake Store and Qubole

By September 27, 2017

Co-authored by Ajay Bhave and Rohan Garg, Members of Technical Team, Qubole.

We are excited to announce the integration of Microsoft’s Azure Data Lake Store (ADLS) with Qubole Data Service (QDS). This is a major milestone in the journey we started when we launched QDS for Azure Blob Storage in 2017. With this integration, it will now be possible to run rich queries and derive deeper insights from your data in ADLS as well.

ADLS is an enterprise-grade hyper-scale repository for big data workloads. It enables you to capture and process data of any size, type, and ingestion speed in one single place. ADLS supports any application that uses the open source Apache Hadoop Distributed File System (HDFS) standard. With its HDFS support, you can easily migrate your existing Hadoop and Spark datasets to the cloud without recreating your HDFS directory structure.

ADLS vs Azure Blob Store

ADLS is storage optimized for big data workloads of all kinds – batch, interactive and streaming. On the other hand, Azure Blob Store is a general purpose object store that works well for a variety of use cases and is not specially tuned for read/write accesses of big data workloads. With ADLS there are no limits on the amount of data you can store and it is optimized for high-throughput and input/output operations per second (IOPS). ADLS also enforces HTTPS protocol for data transfer to and from the store, thereby enforcing better security.

For more details, visit https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage.

Highlights of ADLS and QDS Integration

  • Configure QDS accounts with ADLS credentials for seamless and transparent access to ADLS on all (Hadoop, Spark, etc.) clusters in your account.
  • Run Apache Hive, Hadoop and Spark queries through QDS platform which is now capable of accessing data in your ADLS.
  • Migrate data from on-premise storage to ADLS using built-in native tools (in QDS) from a diverse set of storage solutions such as Azure SQL Service, Azure SQL Data Warehouse, Microsoft SQL Server, MySQL and more.
  • Migrate data from cloud object stores using distributed hadoop (MapReduce) job from Azure Blob Storage to ADLS.

Getting Started

ADLS

Sign up for Azure portal and create an ADLS account.
For detailed steps, visit 
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-portal#create-an-azure-data-lake-store-account

Free QDS Business Edition on Azure

Sign up for free* QDS Business Edition on Azure by visiting https://azure.qubole.com.
For detailed steps, visit
 http://docs.qubole.com/en/latest/quick-start-guide/Azure-quick-start-guide/azure.html

Using ADLS with QDS

For detailed steps, visit http://docs.qubole.com/en/latest/quick-start-guide/Azure-quick-start-guide/azure.html#azure-getting-started-account.

 
*Qubole offers Qubole Data Service (QDS) Business Edition at no cost, but usage is limited by Qubole compute hours (AVMU for Azure) per month, which is approximately a $1000/month value. You must provide your own Azure cloud account and you are responsible for the infrastructure costs managed by Qubole on your behalf.

Share our Post

Leave a Reply

Your email address will not be published. Required fields are marked *