SparkSQL in the Cloud: Optimized Split Computation
When it comes to Big Data processing in the cloud compared to on-premise, one of the fundamental differences between the two is how the data…
When it comes to Big Data processing in the cloud compared to on-premise, one of the fundamental differences between the two is how the data…
Intro In a recent blog post, we benchmarked auto-scaling and demonstrated that an auto-scaling cluster was a lot less expensive and only a little bit…
When Hadoop is deployed with on-premises architecture, compute and storage are combined together. As a result, compute and storage must be scaled together and the…
We are excited to announce the general availability of GitHub integration for QDS Notebooks. GitHub is an effective way to collaborate on development projects. GitHub…
Intro Have you ever had trouble deciding how large to make a cluster? Do you sometimes feel like you’re wasting money when a cluster isn’t…
In a previous post, we outlined the case for selecting cloud infrastructure over an on-premises deployment for managing big data workloads. Taking advantage of Spot…
This blog post explores how queries can be sped up by keeping optimized copies of the data. First, we will explore the techniques and benchmark…
Managing big data creates several challenges for data infrastructure teams: Managing “bursty” and unpredictable workloads Coordinating ad hoc and batch workloads Storing rapidly growing data…
One of the important functions of a database administrator is to manage storage structures to optimize performance in a relational database. Admins use tables, views,…
The company Appoints David Hsieh as Senior Vice President of Marketing and Ken Tamura as Vice President of Finance MOUNTAIN VIEW, CA–(Marketwired – Jun 21,…
Qubole introduced first-generation Caching for S3 files in Presto in 2014 and documented the observed performance gains. In a nutshell: for CPU-efficient engines like Spark…
Apache Spark remains a growing force in the realm of big data. Perhaps that shouldn’t come as a surprise considering the overall momentum behind big…
Free access to Qubole for 30 days to build data pipelines, bring machine learning to production, and analyze any data type from any data source.
See what our Open Data Lake Platform can do for you in 35 minutes.