How to Choose a Big Data-as-a-Service Company
- By Jonathan Buckley
- October 29, 2015
The world of big data is all around us. Transactions, sensors, social media, mobile devices, wearables, and a host of other sources are generating datasets of unprecedented volume, velocity and variety.
This big data explosion presents enormous opportunities for organizations that are able to capture, manage, and analyze massive volumes of disparate data for insights that can drive decisions and create competitive advantage. Still, the big challenge with big data is finding innovative technological solutions that can pick up where traditional databases and existing scalable architectures leave off.
That’s where Big Data-as-a-Service (BDaaS) comes in.
Technological tools delivered as a service are not new. Software as a Service (SaaS), Platform as a Service (PaaS), and Data as a Service (DaaS) are a few of the many data solutions offered by third party big data vendors. BDaaS takes things to a whole new level, combining these tools and applying them to massively large data sets to help large and small organizations meet today’s big data demands in a cost-effective manner.
What’s so great about Big Data-as-a-Service?
While Hadoop has made it possible for organizations to analyze data using commodity hardware and open source software, the costs of launching a big data initiative can still be substantial. Not to mention the ongoing investment of time and resources needed to store and manage large data. In contrast, BDaaS allows organizations to outsource a variety of big data functions to the cloud and pay only for the compute power they require. This out-of-the-box/on demand solution eliminates many of the costs associated with a Hadoop deployment and allows organizations to focus on gaining actionable big data insights to drive business growth. And when it comes to keeping data secure, BDaaS providers vary greatly in strategy and collaboration with security experts.
Which BDaaS Company is right for your organization?
For those looking to bring the power of big data analytics to their organizations, here are some things you need to consider when choosing a Big Data-as-a- Service provider.
Where is your data?
Starting afresh or moving your big data initiatives to the cloud can be a timely and risky process. Some BDaaS vendors require you to move data to a vendor’s storage system, while others can leverage public cloud infrastructure companies and their ecosystems of compatible products. When deciding where and how to store data for use with a BDaaS solution, keep in mind your needs around data ownership, compatible tools, and vendor lock-in risks.
What are your organization’s big data needs?
The first step in choosing a BDaaS provider is to identify your organization’s actual big data requirements with respect to capturing, managing and analyzing data. For example, organizations with large volumes of primarily structured data would do well to stick with a traditional database solution. On the other hand, organizations with massive volumes of rich unstructured data streaming in from multiple sources should consider BDaaS in order to gain the valuable insights that only this type of data holds. If your organization falls into this category you will need to determine exactly which big data functions should be outsourced before beginning the BDaaS provider selection process.
What types of BDaaS are there?
Cloud-based BDaaS offerings currently fall into one of three competing types:
1. Core BDaaS – Around for a number of years now, the Core BDaaS offering employs a minimal platform of Hadoop with YARN and HDFS and a few other services such as Hive. This service has found favor with many companies that use it as part of a larger architecture or for irregular workloads.
A prominent example of core BDaaS is Amazon Web Service’s Elastic Map Reduce (EMR), which integrates readily with the NoSQL store, DynamoDB, S3 storage, and other services. The generic nature of Amazon’s EMR service allows companies to combine other services around it to build anything from data pipelines to full company infrastructures.
2. Performance BDaaS – As the name implies, this service is focused on helping companies already working with Hadoop to streamline their infrastructure and optimize Hadoop performance. Companies that are rapidly growing and find themselves limited by scale and complexity, and at the same time are hesitant to take on the formidable task of building their own data architecture and related SaaS layer, are a good fit for Performance BDaaS. By outsourcing their infrastructure and platform needs to a provider, companies can focus on the domain specific processes that add value while eliminating many of the headaches associated with complex big data deployments.
3. Feature BDaaS – For organizations that require additional features that go beyond those offered in the common Hadoop ecosystem, Feature BDaaS is worth consideration. Focusing on productivity and abstraction, the feature driven BDaaS approach used by Qubole and other cloud-based Hadoop providers is designed to get users up and running with big data quickly, easily and economically. Qubole’s BDaaS offering accomplishes this through web and programming interfaces and database adapters that effectively put Hadoop technologies behind the curtain—starting, scaling and stopping Hadoop clusters transparently as the workload requires.
Choosing a BDaaS company is no small undertaking. Before the process begins it’s important to make sure that all decision-makers within your organization are fully onboard with your big data initiative. In selecting BDaaS candidates try to find those that have a proven track record in your industry and therefore have the expertise to interpret results correctly and in context.
Once the selection is made it’s crucial to have a full understanding of the vendor’s security policies and procedures for keeping your corporate data safe before entering into a legally binding contract. Additionally, you’ll need to carefully consider the risks and benefits of leveraging private systems or a solution that leverages public cloud. Finally, it’s best to start small, making sure that the data is clean, the metrics are right, and the results are accurate before tackling a large and more complex big data project.