Big Data Dilemma: Quantity Vs. Quality
Advances in big data technologies and capabilities have increased adoption in the enterprise dramatically. But as organizations implement big data platforms to collect, manage, and analyze large data sets for competitive advantage, they are faced with a dilemma. Should they direct most of their resources on collecting massive volumes of data? Or should they focus primarily on collecting quality data?
While both schools of thought have merit, the most effective way to reap the full benefits of big data is to strike a balance between quantity and quality.
The Quantity Argument
The term “big data” refers to data of such volume, variety, and velocity that traditional relational database management systems (RDBMS) are incapable of handling it. This dilemma has given rise to the argument that massive quantities of disparate data from sources such as emails, photos, monitoring devices, social media sites, mobile devices, etc., are chaotic, low in quality and difficult to derive value from. While that may have been somewhat true early on, storing, managing and analyzing massive data sets is exactly what today’s big data analytics platforms, such as Hadoop, are designed to do.
Yes, today’s unstructured data is raw and complex, but that kind of data holds real value. Properly analyzed, this disorganized data mix can provide context to customer and client behaviors—not just what people are doing, but when and where they are doing it—that can lead to hidden insights and informed decisions that can give businesses a competitive edge. Case in point: Using big data analytics, Smartphone data, and geo-location data can be analyzed in real-time, allowing companies to interact quickly with customers in meaningful and relevant ways.
Clearly the benefits for organizations of collecting and analyzing vast stores of information can no longer be disputed. On the other hand, it also cannot be disputed that better data leads to better insights, which brings us to…
The Quality Argument
“Garbage in equals garbage out.” That would be one way to sum up the argument for quality data over quantity. After all, just because new technologies now allow companies to store all of that messy raw data, that doesn’t necessarily mean that they should.
When making important decisions based on analytical data, organizations must operate on the assumption that the data they are using is valid and relevant. Proponents of quality data are quick to point out that massively large sets of disparate data are bound to contain data that is biased, noisy, abnormal, and irrelevant.
Data analysts agree that when it comes to true data quality, there is no substitute for structured data. That fact is indisputable. Neat and elegant by nature, the flow of structured data is constant and predictable, making it ideal for analysis via the traditional data warehouse. And since quality data is going in, analysts have greater confidence that the data they are using is meaningful to the problem being analyzed.
Still, the fact remains that vast stores of valuable unstructured data continue to accumulate. In addition, data visualization, enrichment and validation technologies capable of transforming raw data into reliable, readily analyzed data are challenging the validity of the quality data argument—at the same time giving rise to…
The Quantity And Quality Argument
While organizations may be content to stay with the status quo—basing critical business decisions primarily on the analysis of structured data—the reality is that big data analytics has many benefits. As such, late adopters of Hadoop platforms and strategies could have a difficult time remaining competitive. This doesn’t mean that organizations should replace their traditional databases for on-premise or cloud-based Hadoop platforms. It’s more about striking a balance and leveraging both technologies synergistically to gain and retain greater profitability and competitive advantage.