It’s easy to get caught up in the hype and opportunity of big data. However, one of the reasons big data is so underutilized is because big data and big data technologies also present many challenges. One survey found that 55% of big data projects are never completed. This finding was repeated in a second survey, that found the majority of on-premises big data projects aren’t successful. So what’s the problem with big data?
While Hadoop and the surrounding ecosystem of tools is lauded for its ability to handle massive volumes of structured and unstructured data, the software isn’t easy to manage or use. Since the technology is relatively new, many data professionals aren’t familiar with how to manage Hadoop. Add to that the fact that Hadoop frequently requires extensive internal resources to maintain, and many companies are left devoting most of their resources to the technology rather than to the actual big data problem they are trying to solve. In the survey mentioned above, 73% of respondents claimed understanding the big data platform was the most significant challenge of a big data project. Read More.
With big data, it’s crucial to be able to scale up and down on-demand. Many organizations fail to take into account how quickly a big data project can grow and evolve. Constantly pausing a project to add additional resources cuts into time for data analysis. Big data workloads also tend to be bursty, making it difficult to predict where resources should be allocated. The extent of this big data challenge varies by solution. A solution in the cloud will scale much easier and faster than an on-premises solution. Read More.
Businesses are feeling the data talent shortage. Not only is there a shortage of data scientists, but to successfully implement a big data project requires a sophisticated team of developers, data scientists and analysts who also have a sufficient amount of domain knowledge to identify valuable insights. Many big data vendors seek to overcome this big data challenge by providing their own educational resources or by providing the bulk of the management.
Having more data doesn’t necessarily lead to actionable insights. A key challenge for data science teams is to identify a clear business objective and the appropriate data sources to collect and analyze to meet that objective. The challenge doesn’t stop there, however. Once key patterns have been identified, businesses must be prepared to act and make necessary changes in order to derive business value from them. Read More.
Data quality is not a new concern, but the ability to store every piece of data a business produces in its original form compounds the problem. Dirty data costs companies in the United States $600 billion every year. Common causes of dirty data that must be addressed include user input errors, duplicate data and incorrect data linking. In addition to being meticulous at maintaining and cleaning data, big data algorithms can also be used to help clean data. Read More.
Keeping that vast lake of data secure is another big data challenge. Specific challenges include:
It’s difficult to project the cost of a big data project, and given how quickly they scale, can quickly eat up resources. The challenge lies in taking into account all costs of the project from acquiring new hardware, to paying a cloud provider, to hiring additional personnel. Businesses pursuing on-premises projects must remember the cost of training, maintenance and expansion. Big data in the cloud projects must carefully evaluate the service-level agreement with the provider to determine how usage will be billed and if there will be any additional fees.
While the number of big data challenges can be overwhelming, it also presents an opportunity. Those businesses who are able to identify the right infrastructure for their big data project and follow best practices for implementation will see a significant competitive advantage. Entrepreneurs have also capitalized on big data technology to create new products and services.
Qubole is a significantly more polished product than EMR. Data scientists can explore their data in S3, create tables and query those tables all via an easy-to-use web UI
Qubole’s fantastic support has been key in our successful deployment. They continue to deliver of new features and revisit the ones that we ask for
Our goal at MediaMath was to take our existing industry leading infrastructure to the next level handling new complex analytics tasks. Qubole has helped us enable this goal with minimal risk.
Instead of worrying about provisioning clusters of machines or job flows or whatever, Qubole lets you focus on your data and your queries … The Qubole guys have been extremely helpful!
The service spins up users’ clusters only when a job is started, then automatically scales or contracts them based on the workload, and spins the servers down once the job is done.
Qubole’s Hadoop and Hive interfaces are vastly superior to the default CLIs, which scare business analysts and hinder meaningful analyses of the gaming logs that we collect. With Qubole, business analysts are self-sufficient in using a Big Data platform to meet their advanced analytic needs.
Online Gaming Company
top-performing technologies in the data industry are definitely taking aim at democratizing data tools and bringing the power of data to smaller businesses. This is a major change in the data industry, and Qubole Data Service is a great example
I’m very happy to be using Qubole in production. Qubole has saved me a lot of time, effort, and trouble in getting my data processing pipelines up and running. My data pipelines process Appnexus data in Amazon S3 which is then stored in Vertica. The engineering team understands the complexities and provided awesome support!
Real-time Ads Retargeting Startup
There’s a whole world of web companies, SMBs and other non-Facebooks or Yahoos that will want to use Hadoop but not want to run it in-house…offering a cloud service makes it easier for these users to get started with the platform and for Qubole to keep improving.
Qubole offers a big data ETL and exploration service through auto-scaling Hadoop clusters with a web user interface for data exploration and integration with various data sources. The service can do (nearly) everything EMR can do, and it goes further
Big Data Republic
Simba knows Big Data access. Qubole knows Big Data. Qubole’s founders authored Apache Hive, built key parts of the Hadoop eco-system and brought Apache HBase to Facebook
“The integration of Tableau and Qubole makes it faster and easier for our customers to operationalize Big Data…lowers the resource barriers to deriving the benefits of Big Data because customers can deploy our joint solution seamlessly and cost effectively.”