Apache Pig as a Service

 

QDS support Pig Latin for scripts written to focus on ETL workflow logic. Pig on QDS enables programmers to quickly test scripts then move them into production on large-scale clusters. With Pig, QDS users can query data in the Hadoop Distributed File System (HDFS) or Amazon Simple Storage Service (S3) and store results in variables for reference in subsequent processing steps. Without worrying about managing the underlying infrastructure, developers can focus on building complex data pipelines by combining many data transformation steps in one script instead of a combination of SQL queries.

 

Complete Cluster Life Cycle Management

 

Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don't have to worry about starting or stopping Spark... SEE MORE

Complete Cluster Life Cycle Management

 

Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don't have to worry about starting or stopping Spark clusters. QDS does all the heavy lifting for you.

 

Quick and Easy Spark Debugging

 

QDS makes it easy to debug both active and historical jobs with a Spark Application UI. Results and logs are always available even without active running clusters.

 

Instance Selection Options

 

QDS supports a wide variety of Amazon EC2 instance types for your Spark workload, giving you the freedom to optimize instance selection for your workload requirements and AWS pricing options.

 

Spot Instance Pricing

 

QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.

 

Elastic Pricing Model

 

With QDS' pay-per-use pricing model, you'll only pay for what you actually use by compute hour.

 

Extensive User Interfaces

 

QDS gives you user interface options to match your use case. The Spark Notebook and a web-based UI are suited for interactive analysis, and the SDK'ss and the REST API are ideal for programmatic access.

 

Amazon Virtual Private Cloud (VPC) Support

 

QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2... SEE MORE

Amazon Virtual Private Cloud (VPC) Support

 

QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2 instances in your virtual network. Plus, you can launch dedicated instances within a VPC on single-tenant hardware.

 

When Should I Use QDS for Pig?

Hive is used mostly for batch processing of large ETL jobs and batch SQL queries on very large data sets.

Split Pipelines

Unlike SQL which creates only a single output, Pig can support splits for more complex data pipelines. Pig's ability to read an input once for separate processing flows and outputs can lead to substantial performance improvements.

 

Interactive Prototyping

Running scripts in parts enables programmers to quickly prototype, test regressions and employ Test Drive Development to ensure their data processing runs correctly.

“Our Analytics Team use, “Pig As a Service” from Qubole very extensively. The Qubole UI QDS is very intuitive and make us very productive to write pig queries faster. Ability to test your pig script on smaller data set without changing the input path is really innovative and very helpful.”

Shailesh Garg, Sr. Engineering Manager, Komli Media

How Does Pig Fit into the QDS Landscape?

QDS gives you the freedom to work with any data engine as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. Pig is great for developers and non-SQL engineers who want to build complex data pipelines with the ability to store multiple outputs and frequently iterate on their scripts for refinement. SQL proficient users may be better served working with Hive and MapReduce, and users with streaming data may prefer using Spark or Presto as their engine of choice.

 

Get Started Now

Qubole offers 2 weeks of QDS usage for free to explore Pig and other data engines. Users simply need to authenticate with SSO or enter their AWS credentials to begin interacting with their data in their own cloud environment.

Try Pig Today!
 
clear