Apache Pig as a Service

QDS support Pig Latin for scripts written to focus on ETL workflow logic. Pig on QDS enables programmers to quickly test scripts then move them into production on large-scale clusters. With Pig, QDS users can query data in the Hadoop Distributed File System (HDFS) or Amazon Simple Storage Service (S3) and store results in variables for reference in subsequent processing steps. Without worrying about managing the underlying infrastructure, developers can focus on building complex data pipelines by combining many data transformation steps in one script instead of a combination of SQL queries.


Complete Cluster Life Cycle Management

Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don’t have to worry about starting or stopping Pig clusters. QDS does all the heavy lifting for you.

Quick and Easy Debugging

QDS makes it easy to debug both active and historical Pig jobs. Results and logs are always available even without active running clusters.

Instance Selection Options

QDS supports a wide variety of Amazon EC2 instance types for your Pig scripts, giving you the freedom to optimize instance selection for your workload requirements and AWS pricing options.

Spot Instance Pricing

QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.

Elastic Pricing Model

With QDS’ pay-per-use pricing model, you’ll only pay for what you actually use by compute hour.

Extensive User Interfaces

QDS gives you user interface options to match your use case. The QDS Workbench is suited for interactive analysis, and the SDKs and the REST API are ideal for programmatic access.

Amazon Virtual Private Cloud (VPC) Support

QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2 instances in your virtual network. Plus, you can launch dedicated instances within a VPC on single-tenant hardware.


When Should I Use QDS for Pig?

QDS for Pig is ideal for programmers who want to focus on data logic in building complex data pipelines and iterate quickly on complex scripts with the option to store many outputs without having to run multiple queries.

Split Pipelines

Unlike SQL which creates only a single output, Pig can support splits for more complex data pipelines. Pig’s ability to read an input once for separate processing flows and outputs can lead to substantial performance improvements.

Interactive Prototyping

Running scripts in parts enables programmers to quickly prototype, test regressions and employ Test Drive Development to ensure their data processing runs correctly.


How Does Pig Fit into the QDS Landscape

QDS gives you the freedom to work with any data engine as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. Pig is great for developers and non-SQL engineers who want to build complex data pipelines with the ability to store multiple outputs and frequently iterate on their scripts for refinement. SQL proficient users may be better served working with Hive and MapReduce, and users with streaming data may prefer using Spark or Presto as their engine of choice.


Get Started Now

Qubole offers 2 weeks of QDS usage for free to explore Pig and other data engines. Users simply need to authenticate with SSO or enter their AWS credentials to begin interacting with their data in their own cloud environment. Try Pig Today!

Contact Qubole

Join the Conversation!

Participate in big data discussions with experts, ask questions, get advice, and more

clear