Cloud Data Lakes – Four Must-have TCO Optimization Capabilities

Start Free Trial
April 13, 2020 by Updated August 19th, 2021

Enterprises leverage cloud providers’ compute and storage services for their ad-hoc data analytics, streaming analytics, and ML use cases as cloud data lakes provide significant cost advantages, agility and scale from the get-go. Proof of Concepts (POCs) for data-driven initiatives start easily and without any huge upfront bill. But over time as projects mature or ad hoc queries take longer or model iteration cycles increase, the seemingly endless supply of underlying resources leads to wasteful expenditure on compute and resources.

The usage comes with cost unpredictability and lacks financial governance and thus negatively impacts TCO. In the cloud, rising costs are not necessarily bad; it means that the data team is using more services, which theoretically means the team is doing more “good stuff” and hopefully is delivering business value. TCO optimization makes sure that wasteful spending is identified and eventually eliminated. Cloud data lake platforms should be able to help enterprises keep check on this wasteful spending to lower TCO. Admins should be able to do the following for optimizing TCO within their data lake platforms:

  1. Control and design the infrastructure spend at will, override policy, preference, or autonomous self-learning
  2. Leverage built-in capabilities to optimize clusters for lower infrastructure spend based on custom-defined parameters
  3. Monitor total costs at the application, user, account, cluster, cluster-instance level to drive accountability and meaningful discussions across teams
  4. Identify areas of cost optimization to drive maximum performance for the lowest TCO



Watch this 6-minute overview to see how Qubole’s open data lake platform saved $230m+ in cloud costs for customers in 2019 alone.


As platforms provide these core TCO focussed capabilities, it should be autonomous and policy-based TCO optimization without sacrificing Service Level Agreements (SLAs).

With Qubole, the open data lake platform, enterprises address all 4 key above requirements for optimizing TCO by:

  1. Reducing costs continuously in an automated manner based on a set or default policy, preference, and autonomous self-learning.
  2. Optimizing the consumption of resources consistently like performance improvements to the underlying engine so that jobs are completed efficiently.
  3. Finding and consuming lower-priced resources on a continual basis with workload-aware autoscaling; admin-defined heterogeneous cluster configurations and only provision resources when needed, whether On-demand or Spot.
  4. Eliminating unnecessary resource consumption with aggressive downscaling, optimized upscaling, and at-will shut down.
  5. Throttling queries based on monetary limits based on the budget set by the administrator.
  6. Providing insights for user, job, and cluster-level cost metrics in a multi-tenant environment to do data-driven show-back discussion.

In summary, a cloud data lake platform should be able to understand what is currently happening and build a financial profile of your cloud spending, help put measures in place to control spending and optimize by taking the advantage of cloud data platform facilities to reduce costs and improve overall TCO.

Start Free Trial
  • Blog Subscription

    Get the latest updates on all things big data.
  • Recent Posts

  • Categories

  • Events

    Data Lake & Data Warehouse – A Modern Data Strategy Discussion

    Oct. 22, 2021 | North America

    Get Technical With Qubole Solution Architects & Engineers

    Oct. 27, 2021 | Online

    Get Technical With Qubole Solution Architects & Engineers

    Nov. 10, 2021 | Online

    The Future of Data Science and Machine Learning at Enterprise Scale

    Nov. 12, 2021 | North America

    Open Data Science Conference

    Nov. 16, 2021 | North America - West

    Data Lake Vs Data Warehouse

    Nov. 17, 2021 | Middle East
  • Read Introducing Qubole Cost Explorer