WEBINAR: Moving Big Data To The Cloud? Here’s Why You Need A Cloud-Native Data Platform
Watch Now

Case Study

×

Case Study: Oracle uses Heterogeneous Cluster to achieve cost effectiveness

View in PDF format

System Analyst Oracle Data Cloud“As our EC2 costs kept climbing and the spot market became more volatile, heterogeneous was the only option that made sense.

Even as our usage has grown over the past 6 months, since switching to heterogeneous, our costs have either gone down or, at least, stayed the same.”

Justin Wainwright
System Analyst, Oracle Data Cloud

About Oracle Data Cloud

Oracle Data Cloud has 82 clusters with Qubole, the distribution and heterogeneous usage are as following:

oracle data cloud

Cluster TypeTotal # of Cluster
Configured with Heterogeneous
Hadoop1120 (not supported)
Hadoop2 (Hive)2825
Spark4114
Presto10

Challenges

At Oracle Data Cloud (ODC), big data is our business. We tend to use larger clusters and process for sustained periods of time. Over the past couple years, we’ve seen the demand for certain instance families increase and put the health of our clusters at risk. As our EC2 bills were already climbing at an alarming rate, we couldn’t justify moving to more on-demand nodes, so we needed to keep using spot nodes in the majority of cases while keeping clusters stable.

“Because we run almost 100% spot nodes, we were suffering catastrophic spot losses given the size of our clusters and the scale at oracle,” explains Justin Wainwright, System Analyst Oracle. “This resulted in missing SLA, job failures and most importantly wasted time on business pipeline. Heterogeneous was the feature we were looking for.”

Oracle Data Cloud Team became the first beta customer for heterogeneous cluster feature when Qubole launched it back in August 2016. They were excited about the great potential this feature can bring to their daily operation.

Justin and his team started with smaller, non-critical operation clusters to test the water then with positive results. They expanded the configurations to entire cluster fleet Oracle Data Cloud owns. Qubole has been doing cost comparison and analysis along with Oracle for this amazing transformation journey – based on the statistics, we’ve seen up to 90% cost saving compared to on-demand node cost and usually 20-50% cost saving compared to homogeneous spot configurations.

Oracle Data Cloud team also helped Qubole to make a better product during this journey and shared their experience in configuring heterogeneous cluster with other Qubole users, such as:

  • For long-running jobs: split jobs into smaller chunks to better adapt to smaller heterogeneous instance types (e.g. make size appropriately for executors and overhead settings). Make the size based on average workload
  • For burst jobs: choose stronger heterogeneous instance types (e.g 10xLarge against default 4xLarge)
  • For Outliers: you need custom configurations to achieve the best heterogeneous cluster performance – this would require some trials and tweaks over time.

Results

Oracle Data Cloud was able to reduce spot loss and significantly lower the total operation cost further with heterogeneous configuration.

At pre-heterogeneous peak usage (Fall 2016), almost 40% of their EC2 costs were Qubole jobs running on on-demand nodes. As of May 2017, on-demand costs are no more than 20% and nearly half of the clusters have been configured as heterogeneous.

Cost Comparison:

  • September 2016: 34K USD spent on-demand nodes
  • January 2017: 17K USD spent on-demand nodes

QDS empowers Oracle Data Cloud operation team with the capability to scale out fast while keeping the cost low and their customer satisfaction high.

What’s Next?