Qubole, the big data-as-a-service company, today announced it has open-sourced StreamX, an ingestion service to help data teams efficiently and reliably capture large-scale, real-time data. Qubole will be adding support for StreamX as a managed service on the Qubole Data Service (QDS) platform to simplify and automate the ingestion of data for big data analysis in the cloud.
Enterprises are grappling with increasing volumes of data and the need for real-time analysis from multiple data sources to drive business growth. To address this issue, Qubole has created StreamX, an open-source service that ingests the data logs from Kafka and persists it to cloud object stores such as Amazon S3. Without an ingestion service such as StreamX, maintaining reliability and data integrity on Kafka is challenging, particularly in guaranteeing the delivery is without duplicates that could be harmful to critical systems. StreamX is built on the Kafka Connect framework and is designed for reliable, exactly-once delivery.
QDS is a self-service platform for big data analytics that runs on the three major public clouds: Google Compute Engine, Microsoft Azure, and Amazon AWS. QDS supports the latest open source technologies, such as Apache Hadoop, Hive, Presto, Pig, Oozie, Sqoop, and Spark, to provide the only comprehensive cloud-based data analytics platform, complete with enterprise security features, an easy to use UI, and built-in data governance. Now with support for StreamX, Qubole customers will be able to use Kafka to capture high-velocity data generated across thousands of data sources.
“Real-time analytics is being used for everything from mobile applications, financial trading, gaming, and even social networks. As the number of data producers increase and become more disparate, it is increasingly valuable to have a central platform to manage the ingestion of this data,” said Joydeep Sen Sarma, co-founder, and CTO of Qubole. “Adding StreamX was a natural extension for the Qubole platform, which was purpose-built to process ever-growing data and data sources, and we look forward to providing a fully managed service that can do this reliably with just a few clicks.”
This comes just weeks after Qubole open-sourced Quark, its SQL optimization project to help simplify and optimize access to data for data analysts. Qubole is committed to contributing its projects that address the most critical demands of today’s data teams to the open-source community. Streaming analytics is increasingly becoming a necessity for enterprises across industries, and as such, Qubole will continue to create tools that enable fast, scalable, and reliable real-time data analytics.
If you would like to learn more, visit Qubole at the Kafka Summit at Booth #1 or contact [email protected] Look for updates to the StreamX project at https://github.com/qubole/streamx.