Real-Time Data Query: The Next Competitive Advantage

By Published April 3, 2014 Updated November 10th, 2017

Back when Big Data was just getting off the ground, early adopters of open source Hadoop achieved competitive advantage through analysis of vast stores of multi-structured data to gain actionable insights. Over the last few years, with the extensive adoption of Big Data analytics platforms by business, the playing field has begun to level off. As a result, more and more organizations are looking for new and better ways to leverage large data sets and once again gain a competitive edge.

Today, thanks to fast SQL-on-Hadoop solutions such as Presto-as-a-Service in the cloud, that next competitive advantage has arrived, in the form of real-time data query. What follows is a look at some of the many benefits real-time query of Big Data brings to business.

Accelerated Speed to Insight – The explosion of multi-structured data flowing from mobile devices and other source applications has caused data-gridlock in traditional warehouses, disrupting the ability to capture source data and make it usable for analytics in a timely manner. In addition, interactive queries of Hadoop data in the past necessitated the movement of data to another system—a move that delayed the delivery of insights needed for decision making even further. Today, the addition of real-time query on Hadoop has changed all that. By allowing users to circumvent sluggish data-refinement pipelines by streaming detailed data sets directly into Hadoop, solutions such as Presto-as-a-Service accelerate speed to insight through an interactive and incremental process.

Affordability – Traditional data-warehousing models are expensive, primarily due to up-front costs incurred in using costly commercial scale-up servers and proprietary enterprise software. These and other associated costs can result in up-front expenditures amounting to tens of thousands of dollars per terabyte of data. By contrast, a cloud-based Hadoop solution—utilizing open-source software running on a cluster of commodity servers—can bring initial costs down from thousands to hundreds of dollars per terabyte. In addition, since real-time Hadoop supports interactive analysis, the cost of duplicate storage in a data warehouse or analytical database can be significantly reduced. And with real-time query on Hadoop, the need to move data from one system to another is completely eliminated, saving organizations money and more importantly, valuable time.

Accelerated Discovery – A major drawback of performing analytics with traditional systems is that the process of gradually discovering and enriching the data is impeded rather than accelerated. The reason for this impediment is that during analysis, data becomes closely tied to the specific analytic processes of that system—meaning that data that is specific to one process doesn’t get shared with the other system processes. However, metadata analyzed in Hadoop is shared by all processes. This means that if users are able to extract additional meaning from the data during real-time query sessions, any additions users make to the metadata become visible to the other processes in the system. As a result, discovery is accelerated.

Flexible, Full-Fidelity analysis – Real-time query on Hadoop allows organizations to carry out full-fidelity analysis of data, picking up where insight and discoverability leave off. Along with providing full access to both summary and detailed information, real-time query software such as Presto-as-a-Service gives analysts the flexibility to ask unanticipated questions ad hoc, without difficulty. And with the ability to interact iteratively with vast stores of structured, semi-structured and unstructured data, end users can see, not only trends, relationships and patterns hidden in the raw data, but all of the supporting details as well.

Increased Revenue – The ultimate desirable outcome of real-time query on Hadoop for business is increased profits. Real-time query software—in particular Presto-as-a-Service—benefits the bottom-line of businesses by allowing key decision makers to gain the actionable insights they need to make better decisions faster than the competition. In addition, real-time query, in conjunction with geo-location tools, gives companies the ability to track, interact with and influence customers in a way that drives sales while enhancing the customer experience—all in real time.

The benefits of real-time query to business are many and varied. And Presto-as-a-Service provides additional benefits over other fast SQL-on-Hadoop software solutions. Developed for and battle-tested by Facebook, Presto supports a wide-range of data sources including data stored on Hadoop Distributed File System (HDFS), Amazon Web Services’ S3, HBase, RDMS, Scribe, or any other data source, including legacy or custom data stores. And when it comes to tackling Big Data problems such as volume and variety, Presto is the ideal solution for combining a variety of data of any volume.

With all these benefits and more, Presto is the next competitive advantage for business.

Qubole is providing the first Presto-as-a-Service in the cloud – inexpensive and scalable – to everyone. You can sign up for our development preview and try it out yourself. Qubole also provides Hadoop as a Service, Hive as a Service and Pig as a Service, which complement the Presto offering.

Read more about real-time SQL on Hadoop