|While many large data sources are available as files -these frequently need
to be integrated with other data sources. Qubole Data Service provides
a suite of connectors to easily pull data from different data sources
and integrate and analyze them in the Cloud. We provide connectivity to relational databases like MySql, Vertica, Oracle and AWS RedShift
-as well as NoSql data sources like MongoDB. In comparison to traditional tools to integrate data- QDS offers the following advantages:
Scale and Performance
QDS uses an open-source technology called Sqoop – which can import and export data in parallel using Hadoop. Mated to Qubole’s auto-scaling Hadoop technology – this means users can move large amounts of data quickly with zero overhead. We have also optimized Sqoop for data movement over wide area networks – a common situation in Cloud computing environments
Users can use the Mongo Connector in Hive Commands, which make it possible to create and query Hive tables backed directly by MongoDb. We are actively working on extending such capabilities to other databases.
As with other features in QDS – data import/export are available through a rich graphical interface that allows users to easily explore and select data sources. QDS has made extensions to Sqoop to detect and communicate errors back to users early and to be able to kill any running commands.
Connectors can be embedded within workflows and scheduled to run periodically. Our data integration interfaces are embedded throughout the product line and available both via the browser and REST APIs
We are also working on making it easy to import data from well-known data sources like Omniture, Kiss-Metrics, MixPanel, App-Nexus and SnowPlow – and to make it easy to incrementally load data warehouses from such sources. If you are interested in knowing more about these upcoming features – please