What is Hadoop as a Service?
QDS optimizes MapReduce to run on Amazon Web Services (AWS), Google Compute Engine (GCE) and Microsoft Azure so that you can have the flexibility you need to succeed. Choose the cloud that’s right for you, knowing that QDS will make it simple, fast, cost effective and secure to process your big data. Running MapReduce jobs on QDS provides the most control for ETL and data transformation for data scientists.
Complete Cluster Life Cycle Management
Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don’t have to worry about starting or stopping Hadoop clusters. QDS does all the heavy lifting for you.
Read and write optimization for cloud storage dramatically enhances query performance and the user experience and reduces processing costs. Plus, with advanced auto-scaling, you’ll pay only for resources actually used
Spot Instance Pricing
If you’re running on AWS, QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.
Elastic Pricing Model.
With QDS’ pay-per-use pricing model, you’ll only pay for what you actually use by compute hour.
Extensive User Interfaces
QDS gives you user interface options to match your use case. The web-based Workbench UI are suited for interactive analysis, and the SDKs and the REST API are ideal for programmatic access.
You can take care of your batch processing needs by scheduling your MapReduce jobs to run at periodic intervals.
Persistent logs and outputs
You can debug the logs and analyze results of MapReduce jobs even when the cluster is not running.
When Should I Use QDS for MapReduce?
MapReduce is a core part of the Hadoop ecosystem and works well with large datasets for ETL and batch processing jobs.
Fine-grain control for ETL data transformation
With Qubole’s support for Hadoop MapReduce, you get the most granular control for your ETL processing needs. Take your unstructured data and transform that into structured data using custom-defined logic.
Scheduled batch processing
Use Qubole’s built-in scheduler and workflow scapabilities to define a set of jobs that get run on a recurring schedule. Qubole’s cluster lifecycle management will automatically bring up clusters when the jobs start and when all jobs are done, the cluster will be turned off automatically. All logs and results are persisted so you can still debug even without the running cluster.
“With Pinterest’s current setup, Hadoop is a flexible service that’s adopted across the organization with minimal operational overhead. Pinterest has over 100 regular MapReduce users running over 2,000 jobs each day through QDS’ web interface, ad-hoc jobs and scheduled workflows.”
Pinterest Data Engineer
“DataXu needs high performance for its Big Data queries, and Qubole optimizes performance several ways including MapReduce split computations, and S3 I/O optimization.”
Vice President, Technology, DataXu
How Does MapReduce Fit into the QDS Landscape
QDS gives you the freedom to work with Spark, Hadoop MapReduce, Presto, and Hive as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. MapReduce gives you the most control for defining data transformation of your data from unstructured to structured data. MapReduce is ideal for data engineers and developers that are comfortable with lower-level Hadoop functionality. In addition, Qubole offers services that allow for higher-level analysis, such as SQL querying with Hive and scripting analysis with Pig.
Get Started Now
Qubole offers 2 weeks of QDS usage for free to explore MapReduce and other data engines. Users simply need to authenticate with SSO or enter their choice of cloud credentials to begin interacting with their data in their own cloud environment. Try MapReduce Today!