Top 10 Industry Examples of HDFS
- By Nate Philip
- October 8, 2013
Big Data Uses
Top 10 Industry Examples of HDFS
Not everyone comes to us with a clear strategy for harnessing the potential of Hadoop. There are even those who, for instance, are still unsure whether the benefits of using an HDFS cluster apply to their organization at all.
Actually, practically any organization who wants to draw insightful or actionable information from large data sets can benefit from HDFS. The inexpensive, highly scalable, and highly available nature of HDFS clusters combined with the applications that run on them can provide huge benefits from a cost, operational, and analytics perspective.
If it’s not clear to you how, allow us to share with you 10 industries who should find (or are even already finding) HDFS clusters extremely valuable. You might belong to one of them.
1. Electric Power
To monitor the health of smart grids, the power industry deploys PMUs throughout their transmission networks. PMUs can record various physical quantities like voltage, current, frequency, and location. The data they collect can be analyzed in order to detect system faults at specific network segments and enable the grid to respond accordingly, like performing load adjustment or switching to a backup power source.
Because PMU networks typically clock thousands of records per second, power companies can benefit from inexpensive, highly available file systems like HDFS.
PMUs aren’t the only sources of data. On the billing side of the power industry, massive amounts of data are collected from homes and businesses via smart meters. The data gathered from these endpoints can be used by utility firms to forecast energy usage and achieve better alignment between supply and demand.
This is one industry where legislation is playing a significant role in the surge of information and where data comes in a wide range of formats.
Spurred on by the HIPAA and HITECH Acts, which promote the use of EDI and interoperable EHR systems, health organizations have been gathering unprecedented volumes of structured data. In addition, image and video files from X-rays, ultrasound, CT scans, MRI scans, endoscopies, and other medical imaging methods have likewise been piling up by the gigabyte.
On the Internet front, there are heaps of unofficial but nevertheless relevant unstructured data (such as discussions regarding symptoms, side effects, and medications) accumulating in blogs, forums, and social media.
All this data, when processed over Hadoop, can provide useful insights for improving patient care. For example, they can be integrated with real-time data from health monitors and used to alert physicians or nurses whenever possible complications are anticipated. They can also be used to spot symptoms or patterns of highly contagious diseases before these can cause epidemic outbreaks.
The logistics arena, being crowded with numerous data-producing players, including shippers, 3PL and 4PL logistics providers, freight forwarders, ocean freight carriers, trucking companies, rail transports, air cargo, airports, sea ports, train stations, and warehouses, is fast becoming a fertile ground for big data.
Many of these players have already established business process automation systems and are either collecting or spewing out data through online systems (e.g. for booking), EOBRs, RF tags, NFC tags and consumer mobile devices like smartphones and tablets.
By loading all that data into Hadoop and performing big data analytics on it, logistics providers can gain a deeper understanding regarding booking patterns as well as transit, dwelling, loading, unloading, and driving times. The information gained can then be used to establish just-in-time practices, minimize losses, reduce costs, streamline delivery, and improve supply chain processes.
Targeted marketing campaigns are highly dependent on how much a marketer knows about his target audience. The good news is that there are so many sources out there where the marketer can get the information he needs. First, there are off-line sources such as POS systems, CRMs, direct mail responses, and coupon redemptions. Then there are online sources like Facebook, Twitter, online ad CTRs, browsing behavior, and geolocation systems.
That’s where the bad news lies. He’d probably have to sift through a mountain of data to find any relevant information. Since a large part of that data is unstructured, an HDFS cluster would be the most cost-effective staging area prior to analytics.
5. Media and Entertainment
With the inherently large file sizes of today’s HD movies and games, you’d think big data analytics in the Entertainment industry would come from them. Not exactly. Valuable business insights from big data in this particular industry are best gleaned online.
Think Facebook and Twitter. We can confidently say no industry comes close to generating the same volume of data Entertainment effortlessly whips up on social media platforms. Whether it’s a record-breaking opening weekend, a simple miscasting of Batman, or a twerky performance at the VMA, these incidents can spark a blazing trail on social media in just a matter of minutes. In just one day, you can easily gather a ton of data from a single hashtag.
The correct interpretation or misinterpretation of people’s reactions on social media can spell the difference between a potential blockbuster and a flop; between a big break and a catastrophic downward spiral. Of course, before any interpretation can be made, all relevant data must first be stored and processed in a suitable location. That’s where an HDFS cluster can come in handy.
6. Oil and Gas
When a regular person’s asked to picture the oil and gas industry, what immediately comes to mind are massive mechanical behemoths like oil rigs, pipelines, and tankers. The Oil and Gas Industry is characterized by behemoths alright, but not all are mechanical. In fact, this industry is largely sensor-driven. In other words, another aspect of its massiveness is data; specifically, large volumes of structured and unstructured data.
Like healthcare, the oil and gas industry deals with various data formats. 3D earth models, videos, well log data, and a host of machine sensor data, are just some of the kinds of data this industry consumes on a daily basis. And like the other industries on this list, its data sets are extremely large.
A raw seismic data set generated during oil exploration can reach hundreds of gigabytes, which when processed can then amount to terabytes. It doesn’t end there. Drilling operations produce numerical sensor, log, and microseismic data. An entire oil field, with sensors sprawling everywhere, can generate petabytes of data.
But why collect (and subsequently, analyze) all this data? Finding, drilling, and processing oil costs millions of dollars. Hence, oil firms need to make sure each project is economically viable. An HDFS cluster can certainly help firms in both bringing costs down and providing a suitable platform for big data analytics.
Data analysis has always been an essential part of research. But while research labs have long been dealing with large amounts of data, they’ve never been anywhere near the order of magnitude today’s laboratory equipments are able to churn out on a single run. A single experiment carried out on CERN’s Large Hadron Collider, for example, can churn out a million petabytes of raw data per year.
Since most research institutions aren’t as financially endowed as business establishments, it is necessary for them to invest in inexpensive but highly effective infrastructure. HDFS clusters, with their ability to store and process large amounts of data, can help researchers perform data analytics in a very cost-effective manner.
Like marketers, retailers need to have a good understanding of their customers in order to succeed. To streamline business processes, they have to get a firm grasp of their suppliers’ delivery practices as well. Fortunately, a good part of the information they need is already at their fingertips. It’s found in their voluminous collection of transaction data from orders, invoices, and payments. Just like in the marketing industry, this information can be augmented with data from social media streams.
Telecommunications carriers and their trading partners are facing an onslaught of big data from two fronts. Leading the charge on the more visible front are the end users, about 5 billion-strong worldwide. Equipped with laptops, smartphones, tablets, and wearable devices, consumers are creating, storing and transmitting data at unimaginable rates.
Last year alone (2012), mobile data volume reached 0.9 exabytes per month. With an estimated CAGR of 66%, that volume is set to hit 17 exabytes by 2017. If it’s the first time you’ve encountered the term, it’s probably because one exabyte is actually a formerly unheard of one billion gigabytes.
In the past, consumer mobile data only came from text and calls. Today’s data, on the other hand, comes from a diverse collection of SMS, calls, social media updates, video and music streaming, app downloads, web browsing, and online purchases. As telcos roll out ever larger bandwidths to meet the growing demand, data consumption in the mobile space is only going to get bigger.
With mobile usage increasing at the consumer end, data volumes are also growing at another front, i.e., the provider side. Carriers are reaching milestones after milestones through the CRD and geolocation data they collect.
The wealth of information from all this data can be analyzed and used to streamline bandwidth consumption, improve customer satisfaction, and boost success rates of new products and services.
In case you haven’t noticed, these industries have only been sorted alphabetically. So, being the last item on this list doesn’t mean the Transportation industry generates the least amount of data.
Like the Power and Oil & Gas Industry, the Transportation industry relies heavily on sensor data. Certain aircraft can already generate hundreds of gigabytes of data on a single flight. Practically every part of a large passenger plane, from the engine, to the flaps, down to the landing gear, constantly transmits vital information to monitoring systems to help ensure passenger safety.
Even land transportation such as trains and buses contribute to the data deluge through timetable systems, GPS, inductive-loop traffic detectors, and CCTVs. And like the other industries on this list, there’s a large volume data from social media and booking sites as well. Assimilating all this data can reveal insights for improving safety, timeliness, and cost-effectivity.