QDS support Pig Latin for scripts written to focus on ETL workflow logic. Pig on QDS enables programmers to quickly test scripts then move them into production on large-scale clusters. With Pig, QDS users can query data in the Hadoop Distributed File System (HDFS) or Amazon Simple Storage Service (S3) and store results in variables for reference in subsequent processing steps. Without worrying about managing the underlying infrastructure, developers can focus on building complex data pipelines by combining many data transformation steps in one script instead of a combination of SQL queries.
Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don't have to worry about starting or stopping Spark... SEE MORE
Like all big data solutions supported by QDS, clusters scale up and down automatically during query execution. And you don't have to worry about starting or stopping Spark clusters. QDS does all the heavy lifting for you.
QDS makes it easy to debug both active and historical jobs with a Spark Application UI. Results and logs are always available even without active running clusters.
QDS supports a wide variety of Amazon EC2 instance types for your Spark workload, giving you the freedom to optimize instance selection for your workload requirements and AWS pricing options.
QDS lets you automatically incorporate Amazon spot instances that can be up to 90% less than the cost of on-demand instances.
With QDS' pay-per-use pricing model, you'll only pay for what you actually use by compute hour.
QDS gives you user interface options to match your use case. The Spark Notebook and a web-based UI are suited for interactive analysis, and the SDK'ss and the REST API are ideal for programmatic access.
QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2... SEE MORE
QDS supports Amazon VPC, a service that extends your private network into the cloud. Amazon VPC provides fine-grained access control both to and from Amazon EC2 instances in your virtual network. Plus, you can launch dedicated instances within a VPC on single-tenant hardware.
Unlike SQL which creates only a single output, Pig can support splits for more complex data pipelines. Pig's ability to read an input once for separate processing flows and outputs can lead to substantial performance improvements.
Running scripts in parts enables programmers to quickly prototype, test regressions and employ Test Drive Development to ensure their data processing runs correctly.
QDS gives you the freedom to work with any data engine as part of one unified interface with unified metadata. Choose the right solution for the right workload rather than being locked into any single technology. Pig is great for developers and non-SQL engineers who want to build complex data pipelines with the ability to store multiple outputs and frequently iterate on their scripts for refinement. SQL proficient users may be better served working with Hive and MapReduce, and users with streaming data may prefer using Spark or Presto as their engine of choice.
Qubole offers 2 weeks of QDS usage for free to explore Pig and other data engines. Users simply need to authenticate with SSO or enter their AWS credentials to begin interacting with their data in their own cloud environment.Try Pig Today!