Blog

 
 

5 Tips for Boosting Public Cloud Security

  • By Ari Amster
  • February 11, 2016
624x154-5-tips-securing-data-in-cloud
 

It’s a long held belief that data stored on-premises is a lot more secure than storing that data in the public cloud. However, that may not be the case. While cloud security concerns have been around as long as cloud computing has existed, cloud providers have gone to great lengths to address them, improving their […]

 
Read More..

Hadoop Happenings: Warning Signs

  • By Ari Amster
  • February 9, 2016
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week articles discussed warning signs that a Hadoop cluster is underperforming, predictions for Hadoop’s second decade and how big data is used for humanitarian aid. Read the full stories below. 1. Why Most Business Intelligence Tools Fail the ‘Hadoop Test’ Information-Management.com- […]

 
Read More..

Our own Swati Singhi at the Grace Hopper Celebration

  • By Xing Quan
  • February 8, 2016
 

Swati Singhi, a Member of the Technical Staff at Qubole, was recently featured as a speaker at the Grace Hopper Celebration of Women in Computing, held in Bangalore, India. The Grace Hopper Celebration is the world’s largest technical conference for women in computing, and it is designed to bring the research and career interests of […]

 
Read More..

Optimizing S3 Bulk Listings for Performant Hive Queries

  • By Amogh Margoor
 

Introduction We previously wrote about the optimizations we made to optimize Hadoop and Hive on S3. Since then, we’ve applied those same changes across the rest of our Big Data analytics offerings, including Spark and Presto. Today, we’ll discuss some new recent optimizations we’ve made to further make querying of data performant and efficient for […]

 
Read More..

Infographic: Big Data Belongs in the Cloud

  • By Xing Quan
  • February 4, 2016
qubole-infographic-blog-2
 

Big Data infrastructure is complex, difficult to build and operate, and often requires highly specialized talent to maintain. To alleviate these challenges, businesses are turning to the cloud to provide simplicity, flexibility and agility. The graphic below highlights Qubole customers’ leadership due to the ease of administration, scaling, lifecycles, flexibility, and costs.     Qubole […]

 
Read More..

CIO Focus 2016: Technology and Team Management

  • By Ari Amster
  • February 3, 2016
Modern CIO 2016
 

In today’s world of big data, information technology is advancing at unprecedented rates. This presents some major challenges for organizations in general, and CIOs in particular, as they search for ways to boost growth and profits in the face of mounting competition. Not long ago the terms “big data” and “competitive advantage” were dismissed as […]

 
Read More..

Hadoop Happenings: Happy Birthday Hadoop!

  • By Ari Amster
  • February 2, 2016
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week industry thought leaders weighed in on Hadoop’s 10th birthday. Multiple posts addressed potential big data use cases including applications of GeoSpatial data. See the full stories below. 1.Hadoop turns 10, Big Data industry rolls along ZDNet.com- Hadoop’s founder Doug Cutting […]

 
Read More..

Big Data’s Moment in the Cloud Has Been Acknowledged

  • By Xing Quan
  • January 29, 2016
 

We were delighted to see the announcement of the latest version of Cloudera Director, and a corresponding write up on Curt Monash’s DBMS2 blog. The industry’s movement toward cloud-optimized features, such as support for Spot Instances and dynamic creation and termination of clusters, validates the direction that we’ve set for our company and product. Qubole’s […]

 
Read More..

Cassandra vs. Hadoop: A Comparative Look

  • By Ari Amster
  • January 28, 2016
cassandra vs hadoop
 

Technology is reshaping our world. The proliferation of mobile devices, the explosion of social media, and the rapid growth of cloud computing have given rise to a perfect storm that is flooding the world with data. The challenge for enterprises is that, according to Gartner estimates, 80 percent of this “big data” is unstructured, and […]

 
Read More..

Hadoop Happenings: Hadoop Just Getting Started

  • By Jonathan Buckley
  • January 26, 2016
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week a new Hadoop survey was released, Qubole secured additional funding, and PCWorld covered why open source is the new normal. Read the full stories below. 1. Why open source is the ‘new normal’ for big data PCWorld.com- Talend’s CEO believes […]

 
Read More..

Building a Collaborative Team With Data Scientists, Business Analysts, and Developers

  • By Jonathan Buckley
  • January 21, 2016
624x154-building-collabrotive-team
 

This blog post originally appeared on the Import.io blog. Start the new year off right by making sure your Big Data team is aligned. It is the goal of many business leaders to effectively utilize big data analytics to improve their companies. That means having the best people on the job as part of a […]

 
Read More..

Qubole Closes $30 Million Investment to Extend Leadership in Big Data in the Cloud

  • By Jonathan Buckley
  • January 20, 2016
 

IVP leads Series C financing along with existing investors CRV, Lightspeed Venture Partners and Norwest Venture Partners Qubole, the big data-as-a-service company, today announced that it has closed a $30 million Series C financing, bringing its total funding to $50 million. IVP led the financing and General Partner Somesh Dash will join the Qubole board […]

 
Read More..

Hadoop Happenings: Spark on the Rise

  • By Jonathan Buckley
  • January 19, 2016
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week surveys indicated a growing interest in Spark deployment, and Datanami discussed the growing SQL on Hadoop market. Read the full stories below. 1. How Barclays is cashing in on big data & Hadoop to stay ahead in fintech CBROnline.com- Head […]

 
Read More..

Meetup: Machine Learning at Scale Using Spark and Hive

  • By Jonathan Buckley
  • January 14, 2016
624x154-oracle-qubole-presentation
 

A large crowd recently attended the Boulder/Denver Big Data Meetup group hosted by Oracle where experts from Qubole discussed their latest findings from a real world case study. The evening’s presentations were titled “Case Study: Machine Learning at Scale using Spark and Hive” and detailed practical ways businesses can implement machine learning techniques using the […]

 
Read More..

Hadoop Happenings: Cognitive Analytics in 2016

  • By Jonathan Buckley
  • January 12, 2016
hadoop-happenings
 

Grab the latest news and commentary on Hadoop in this week’s Hadoop Happenings. This week articles offered predictions on the future of big data and cognitive computing and discussed why Hortonworks’ stock value continues to falter. Read the full stories below. 1. 16 for ’16: What you must know about Hadoop Spark right now InfoWorld.com- […]

 
Read More..

Building Qubole: Metrics and Alerts

  • By Rajat Venkatesh
  • January 11, 2016
 

In this blog post, we’ll show you how we collect metrics and set up alerts to ensure the availability of Qubole Data Service (QDS).   QDS Architecture Before getting into the details about monitoring, we’ll give a quick introduction to the QDS architecture.   QDS runs and manages Hadoop/Spark/Presto clusters in our customers’ AWS, GCP, […]

 
Read More..

5 Signs You’re Failing at Data Science

  • By Jonathan Buckley
  • January 7, 2016
624x154-five-signs-failing-data-science
 

Most businesses understand that big data analytics is where it’s at. They view data science as the one new thing they need to truly improve their operations and become even more successful as an organization. The problem, though, is that too many companies are failing at data science. One report from Pricewaterhouse Coopers (PwC) and […]

 
Read More..

Hadoop Happenings: Data Governance

  • By Jonathan Buckley
  • January 5, 2016
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week articles focused on data governance and getting the most out of a Hadoop deployment. InformationWeek offered its predictions for the coming year. See the full stories below. 1. Data governance process taxed by self-service BI, big data Techtarget.com- Data governance […]

 
Read More..

The Public Cloud Market Continues to Expand

  • By Jonathan Buckley
  • December 31, 2015
public cloud growth
 

Businesses are truly coming around to all that cloud computing has to offer. While the cloud has been around for years, only recently has it reached levels of popularity where it isn’t hyperbole to refer to it as a global phenomenon. This has lead to many companies taking advantage of the public cloud’s many benefits, […]

 
Read More..

Hadoop Happenings: Looking to 2016

  • By Jonathan Buckley
  • December 29, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. It was a short news week due to the holiday. This week stories covered the expected growth of big data analytics and the growing data science talent shortage. Read the full stories below. 1. The Top 3 Big Data Trends of 2016 […]

 
Read More..

Qubole Appoints Jonathan Trail as Vice President of Customer Success

  • By Jonathan Buckley
  • December 22, 2015
 

Qubole, the big data as-a-service company, today announced that it has appointed Jonathan Trail as Qubole’s first Vice President of customer success. As VP of customer success, Trail will work closely with Jonathan Buckley, SVP of marketing, and Marcy Campbell, SVP of worldwide sales and business development. Together, they will work to continue the company’s […]

 
Read More..

Hadoop Happenings: New Performance Benchmark

  • By Jonathan Buckley
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week the Transaction Processing Performance Council release a new performance benchmark, a podcast discussed big data in the election season, and an article explores how eBay uses big data. 1. Hadoop in Banking: Changing the Game PredictiveAnalyticsWorld.com- Big data has multiple […]

 
Read More..

Apache Spark vs. Hadoop Which Big Data Framework is the Best Fit?

  • By Jonathan Buckley
  • December 17, 2015
spark vs hadoop
 

In the early days of big data, Apache Hadoop wasn’t just the “elephant in the room”, as some have called it. Hadoop was the room. But that is all changing as Hadoop moves over to make way for Apache Spark, a newer and more advanced big data tool from the Apache Software Foundation. There’s no […]

 
Read More..

Qubole Ignites Apache Spark on Google Cloud Platform

  • By Jonathan Buckley
 

Qubole, the big data-as-a-service company, today announced the availability of Apache Spark on Qubole Data Service (QDS) for Google Cloud Platform. The integration will enable Google Cloud Platform customers to use QDS’s 1-click persistent Spark Notebooks for fast data analysis, and auto-scale Spark clusters that deliver the right compute power for specific workloads. Qubole Data […]

 
Read More..

Getting started with Spark on QDS for Google Cloud Platform

  • By Ashish Sachdeva
 

Starting today, Qubole Data Service (QDS) users can launch Auto-scaling Spark Clusters and 1-click Persistent Notebooks to analyze data persisting in Google Cloud Storage. To set up a trial account, follow the instructions in our Google Cloud Platform Quick Start Guide. With auto-scaling, you no longer need to manually set the cluster size to achieve […]

 
Read More..

Hadoop Happenings: Apache Kylin

  • By Jonathan Buckley
  • December 15, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week Apache Kylin was moved to top-level status, discussion continued on Apache Spark vs. Hadoop, and Forbes offered big data predictions for the coming year. See the full stories below. 1. CIO Explainer: What is Hadoop? WSJ.com-This post provides a brief […]

 
Read More..

Keeping Big Data Safe: Common Hadoop Security Issues and Best Practices

  • By Jonathan Buckley
  • December 10, 2015
624x154-keep-big-data-safe
 

The big data explosion has given rise to a host of information technology tools and capabilities that enable organizations to capture, manage and analyze large sets of structured and unstructured data for actionable insights and competitive advantage. But with this new technology comes the challenge of keeping sensitive information private and secure. Big data that […]

 
Read More..

Hadoop Happenings: Adoption Barriers

  • By Jonathan Buckley
  • December 8, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week articles addressed Hadoop’s complexity and barriers to Hadoop adoption. See the full stories below. 1. Why Spark and Hadoop are Both Here to Stay ReadWrite.com- This post debunks common myths about Hadoop including that Spark will replace Hadoop. Read More […]

 
Read More..

Where’s the Value in Big Data—Storage or Apps?

  • By Jonathan Buckley
  • December 3, 2015
624x154-value-in-storage-or-apps
 

Big data has become a big industry. The lofty promise of big data analytics to deliver actionable insights and create competitive advantage is being realized. And organizations that once dismissed the idea of implementing a big data strategy are giving it a second look as they consider the benefits of capturing, managing and analyzing mountains […]

 
Read More..

Hadoop Happenings: Optimal Big Data Platform

  • By Jonathan Buckley
  • December 1, 2015
hadoop-happenings
 

Grab the latest news and commentary on Hadoop in this week’s Hadoop Happenings. This week articles discussed the managerial challenges of using Hadoop, overall job satisfaction among data scientists and survey data indicating continued growth in Hadoop adoption. See the full stories below. 1. Three Reasons Why I Love Hadoop, and You Should Too! SupplyChainshaman.com- […]

 
Read More..

Hadoop Happenings: Personalized Medicine

  • By Jonathan Buckley
  • November 24, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in the latest Hadoop Happenings. This week articles explored various big data use cases from personalized medicine to mapping the waters around Antarctica. See the full stories below. 1. Spark or Hadoop: Which is the Best Big Data Framework? DataScienceCentral.com- This post discusses the key differences between […]

 
Read More..

The Main Types of Big Data Vendors: A Comparative Look

  • By Jonathan Buckley
  • November 19, 2015
624x154-main-types-of-big-data-vendors
 

The big data boom has given rise to a host of vendors, each promoting their own unique ways of meeting the growing data demands of today’s businesses. As a result, businesses seeking a big data solution have a fairly long list of big data vendors to choose from. Selecting the right vendor is both a […]

 
Read More..

Hadoop Happenings: Thick Data

  • By Jonathan Buckley
  • November 17, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week a new study was released on the state of big data jobs, articles focused on boosting big data security, and one post promoted combining big data with thick data. See the full stories below. 1. Top 10 Priorities for a […]

 
Read More..

Share RDDs Across Jobs with Qubole’s Spark Job Server

  • By Rohit Agarwal
  • November 16, 2015
 

When we launched our Spark as a Service offering in February, we designed it to run production workloads. Users would write standalone Spark applications and run them via our UI or API. We then enhanced the offering by adding support for running these standalone Spark applications on a schedule using our scheduler or as part […]

 
Read More..

4 Tips For Breaking Down Data Silos

  • By Jonathan Buckley
  • November 12, 2015
624x154-breaking-silos
 

Companies are eager to use big data analytics to improve their business operations, but many have found that fully implementing the strategy is extremely difficult. Granted, big data can be complex, but many of the challenges businesses have encountered have nothing to do with big data itself. The real problem lies in the organizational structure […]

 
Read More..

Hadoop Happenings: Most Failing at Big Data

  • By Jonathan Buckley
  • November 10, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week addressed the surging opportunities presented by big data technology coupled with the challenges of hiring big data talent and keeping big data secure. Read the full stories below. 1. Big Data Knowledge Base: Hadoop, Spark, Flink SparkBigData.com- This post provides […]

 
Read More..

Building Blocks of a Data-Driven Organization

  • By Jonathan Buckley
  • November 5, 2015
624x154-building-blocks-big-data
 

Organizations have seen the value that big data can add. It’s no mistake that so many businesses have chosen to adopt big data solutions in recent years, since the potential those solutions bring can be monumental. Success always seems right around the corner when using big data, but too often, success can be hard to […]

 
Read More..

Share Data Across Accounts with Data Exchange

  • By Xing Quan
  • November 4, 2015
 

This post was written by Vikram Agrawal and Aswin Anand, who are both lead engineers at Qubole. Qubole has the concept of users and accounts. While customers sign in as a single user, they can also belong to one or more accounts. This account segregation provides some nice logical separation for compute clusters and metadata. […]

 
Read More..

Hadoop Happenings: Applications Platform

  • By Jonathan Buckley
  • November 3, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week focused on applications of Hadoop in energy and agriculture and the growth of Hadoop in the enterprise. Read the full stories below. 1. The Real Scoop on Hadoop SAS.com- Cloudera’s Mike Olson discusses the latest trends and changes to Hadoop. […]

 
Read More..

Introducing Hadoop, Spark, and Presto Clusters With Zero Local Disk Storage

  • By Sourabh Goyal
  • November 1, 2015
 

We’re excited to announce that Qubole can now run Hadoop, Spark, and Presto clusters with zero local disk storage. We now support AWS M4 and C4 instance types, which do not include local disk storage and instead utilize either S3 (for long-lived data) or EBS (network attached disk-storage for holding intermediate and temporary data) for […]

 
Read More..

How to Choose a Big Data-as-a-Service Company

  • By Jonathan Buckley
  • October 29, 2015
Big Data as a Service
 

The world of big data is all around us. Transactions, sensors, social media, mobile devices, wearables, and a host of other sources are generating datasets of unprecedented volume, velocity and variety. This big data explosion presents enormous opportunities for organizations that are able to capture, manage, and analyze massive volumes of disparate data for insights […]

 
Read More..

Hadoop Happenings: APM Market

  • By Jonathan Buckley
  • October 27, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week Gartner argued businesses will find value in algorithms not data. Information Week addressed the rise of the APM market, and several posts addressed big data use cases. See the full stories below. 1. Big data is useless […]

 
Read More..

A 5-Minute Guide on How NOT to do Big Data

  • By Jonathan Buckley
  • October 22, 2015
624x154-how-not-to-do-big-data
 

The verdict is in. Big data is delivering big benefits to businesses large and small. It’s little wonder that more and more organizations are anxious to dive into vast stores of data to extract hidden insights and gain competitive advantage. But big data adoption comes with a caveat—do it right or don’t do it at […]

 
Read More..

Hadoop Happenings: Hadoop in action

  • By Jonathan Buckley
  • October 20, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week commentary focused on Hadoop in action. Walmart discussed its plans for Hadoop, and an article on CIO.com discussed Hadoop’s impact on the insurance industry. See the full stories below. 1. Turning data scientists into action heroes: The rise of self-service […]

 
Read More..

A 5-Minute Guide to Apache Spark

  • By Jonathan Buckley
  • October 14, 2015
624x154-five-min-guide-apache-spark
 

When it comes to big data tools, more than a few have peculiar names. You’ve got Hadoop, Hive, MongoDB, Pig, Presto—the list of quirky words goes on. And then there’s Apache Spark, which sounds a lot like the name of a 60’s rock band. In reality “Spark” is a formidable big data processing engine that’s […]

 
Read More..

Hadoop Happenings: Expanding Use Cases

  • By Jonathan Buckley
  • October 13, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week commentary focused on big data use cases from marketing and healthcare to agriculture and pharmaceuticals. Read the full stories below. 1.Big Data & Brews from Strata NY 2015: Tony Baer on Spark in the Hadoop Ecosystem Datameer.com- […]

 
Read More..

Hadoop Happenings: Strata + Hadoop World

  • By Jonathan Buckley
  • October 6, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week thousands flocked to Strata + Hadoop World, Uber revealed its big data architecture, and Pivotal open sourced some of its big data technology. Read the full stories below. 1. Pivotal open sources tech for SQL and machine learning on Hadoop […]

 
Read More..

Interning at Qubole: What I Learned From Working on Hive, Spark, and Sqoop

  • By Xing Quan
  • October 5, 2015
 

This is a guest post from Akhilesh Anandh, who was an engineering intern with us. My journey with Qubole began in January 2015, when I joined as an intern for 6 months (my final semester of college) under the PS-2 programme of my alma mater BITS Pilani. I spent another 2 months at Qubole from […]

 
Read More..

Webinar Recap: Democratizing Big Data

  • By Jonathan Buckley
  • October 1, 2015
Webinar-624x154
 

Big data projects can offer a lot to businesses, and the impact that comes from them may affect every employee. Managing and amplifying that impact becomes a vital step to big data success, and the best way to achieve it is to make the data widely accessible within the organization, while also making sure it […]

 
Read More..

Hadoop Happenings: Product Announcements

  • By Jonathan Buckley
  • September 29, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week Google and Cloudera announced new product offerings, and BlueCross discussed its big data initiative seeking to bring more transparency to the healthcare industry. See the full stories below. 1. Google Launches Cloud Dataproc, A Managed Spark and Hadoop Big Data […]

 
Read More..

A 5-Minute Guide to Big Data

  • By Jonathan Buckley
  • September 24, 2015
Big Data Guide
 

Big data: Ask 5 average people what it is and you’re bound to get several different answers—and at least one glazed over expression. That’s not surprising. When the term was first being tossed around in the analytics field several years ago, a lengthy debate began about what “big data” was all about. Today that debate […]

 
Read More..

Hadoop Happenings: Back to Basics

  • By Jonathan Buckley
  • September 22, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week articles returned to the basics with a focus on which tools to use when, turning data insight into action and what the definition of big data really is. See the full stories below. 1. Why Hadoop Matters […]

 
Read More..

A 5-Minute Guide to Big Data Tools

  • By Jonathan Buckley
  • September 17, 2015
624x154-five-min-guide-to-big-data-tools
 

Hadoop, Hive, Spark, Presto, Pig, NoSQL—these are words you’d expect to find in a whimsical Dr. Seuss tale. In fact, they are the names of powerful tools found in a world once thought to be just as nonsensical as any story Dr. Seuss could dream up—the world of Big Data. For those who would like […]

 
Read More..

Hadoop Happenings: Will Spark Replace MapReduce?

  • By Jonathan Buckley
  • September 15, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week Cloudera announced it plans to replace MapReduce with Apache Spark, Pinterest open-sourced a replacement for HBase, and Airbnb opened up about its internal infrastructure. See the full stories below. 1. Cloudera Plans to Replace Hadoop MapReduce with Apache Spark Fortune.com- […]

 
Read More..

Will Hadoop Consume Your Company?

  • By Jonathan Buckley
  • September 10, 2015
will-hadoop-consume-your-business
 

The difficulties and challenges of managing a big data project are many, and unfortunately, failure is the result more often than not. Though businesses may give their projects all their attention, certain shortcomings will usually lead to them struggling to achieve their goals. This can be seen in a recent study from Capgemini, which discovered […]

 
Read More..

Hadoop Happenings: Building Successful Deployments

  • By Jonathan Buckley
  • September 8, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week articles discussed attributes of successful Hadoop deployments, determining ROI for big data technology, and how big data is impacting every industry. See the full stories below. 1. SAP brings Hadoop into the Hana fold with Vora, a […]

 
Read More..

Announcing Support for AWS IAM Roles

  • By Xing Quan
  • September 3, 2015
 

We’re excited to announce support for Identity and Access Management (IAM) Roles for delegating permissions and access to Qubole. IAM Roles are a security best practice on AWS. Customers no longer need to provide access and secret keys to Qubole, making access control more secure. Here’s some background on why Qubole requires access to our […]

 
Read More..

5 Ways to Leverage Social Media Data For Your Business

  • By Jonathan Buckley
social media data
 

Big Data is transforming the business world. The ability to capture, manage and analyze massive volumes of unstructured data for insights that lead to competitive advantage is a game-changer for businesses large and small. With the explosion of social media, never ending streams of data flowing in from Facebook, Twitter, Pinterest, and other social sites […]

 
Read More..

Hadoop Happenings: Rethinking Enterprise Search

  • By Jonathan Buckley
  • September 1, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week articles focused on big data’s growth across industries, data security and governance and improving enterprise search. See the full stories below. 1. Are you a data hoarder? Hadoop offers little choice InfoWorld.com- Data governance tools are being […]

 
Read More..

Causes of Dirty Data and How to Combat Them

  • By Jonathan Buckley
  • August 27, 2015
624x154-clean-data-points
 

By now, most businesses understand the appeal of using big data analytics. With big data, companies can improve their efficiency, increase productivity, and gain valuable insights that drive their work forward. Few will deny the important role big data now plays in organizations all over the world, but gaining those unique benefits requires having high […]

 
Read More..

Hadoop Happenings: Spark won’t Die

  • By Jonathan Buckley
  • August 25, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week posts focused on why Spark will continue to grow, case studies within HR and retail, and Hortonworks’ recent acquisition of Onyara. See the full stories below. 1. Hortonworks buys better Hadoop data flow management InfoWorld.com- Hortonworks has […]

 
Read More..

Multi-tenant Job History Server for Ephemeral Hadoop and Spark Clusters

  • By Rohit Agarwal
 

Introduction Qubole Data Service (QDS) allows users to configure logical Hadoop and Spark clusters that are instantiated when required. These clusters auto-scale according to the workload and shut down automatically when there is a period of inactivity, resulting in substantial cost savings. This feature, however, presents an additional challenge for supporting and debugging logs. For […]

 
Read More..

The Benefits of Decoupling Storage and Compute

  • By Jonathan Buckley
  • August 20, 2015
decoupling storage and compute
 

Big data has come to dominate advantages in nearly every type of business out there, and the need to gather and analyze enormous amounts of data has become extremely important. To make the most of big data, many companies are utilizing big data platforms capable of sorting all that information into actionable data. Such platforms […]

 
Read More..

Infographic: 5 Crucial Considerations for Big Data Adoption

  • By Jonathan Buckley
 

Big data has the potential to enhance, evolve and drive business, but big data adoption must be carefully planned and executed in order to be effective. The graphic below highlights 5 crucial factors that all organizations should take into account before selecting a big data vendor. Are you interested in big data adoption? Check out […]

 
Read More..

Hadoop Happenings: SQL-on-Hadoop Evaluation

  • By Jonathan Buckley
  • August 18, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week LinkedIn open-sourced its Hadoop plugin, Pearson offered a SQL-on-Hadoop evaluation, and MapR’s Ted Dunning weighed in on open source projects that aren’t really open. Read the full stories below. 1. Hadoop: What is it and Why Does it Matter? SAS.com- […]

 
Read More..

5 Best Practices for Big Data Project Management

  • By Jonathan Buckley
  • August 13, 2015
624x154-best-practice-data-project-management
 

Big data has gone mainstream. The constant, exponential growth of volumes of structured and unstructured data has significantly increased the number of big data projects, especially over the last few years. Thanks to the increased availability of the open-source Hadoop analytics platform, and the growth of big data in the cloud services, big data’s barriers […]

 
Read More..

SQL-On-Hadoop Evaluation by Pearson

  • By Nate Philip
 

This is a guest post written by Sumit Arora, Lead Big Data Architect at Pearson, and Asgar Ali, Senior Architect at Happiest Minds Technologies Pvt., ltd. About Pearson Pearson is the world’s leading learning company, with 40,000 employees in more than 80 countries working to help people of all ages to make measurable progress in […]

 
Read More..

Hadoop Happenings: Navigating Hadoop’s Ecosystem

  • By Jonathan Buckley
  • August 11, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week articles focused on navigating the complex Hadoop ecosystem and understanding the differences between Hadoop and Spark. See the full stories below. 1. Big data startup Platfora appoints Jason Zintak as CEO; founder Ben Werther steps down VentureBeat.com- […]

 
Read More..

5 Factors That Impact the Performance of Your Big Data Project

  • By Jonathan Buckley
  • August 6, 2015
624x154-Impact-Performance
 

The drive to make the most out of big data is in full swing, with companies eagerly looking into big data analytics tools designed to get the most out of the valuable information they are collecting. The insights gained from proper analysis of big data can lead to big dividends later on, but getting to […]

 
Read More..

Hadoop Happenings: Compliance and Dirty Data

  • By Jonathan Buckley
  • August 4, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week articles focused on meeting regulatory compliance, avoiding dirty data, and the role of bias in machine learning. See the full stories below. 1. Lack of Legacy Lets Capital One Build Nimble Infrastructure ThePlatform.net-Capital One has always relied on data analytics […]

 
Read More..

Top Apache Spark Use Cases

  • By Jonathan Buckley
  • July 30, 2015
apache spark use cases
 

Apache Spark is quickly gaining steam both in the headlines and real-world adoption. UC Berkeley’s AMPLab developed Spark in 2009 and open sourced it in 2010. Since then, it has grown to become one of the largest open source communities in big data with over 200 contributors from more than 50 organizations. This open source […]

 
Read More..

Qubole’s Big Data as a Service Platform Gains Rapid Traction in Mobile Data Applications

  • By Nate Philip
 

MOUNTAIN VIEW, Calif.—July 30, 2015—Qubole, the big data-as-a-service company founded by the team that developed Facebook’s data infrastructure, today reported rapid adoption of its self-service big data analytics platform for mobile applications in the first half of 2015. The Qubole big data as a service platform processes data stored on the three major public clouds: […]

 
Read More..

Hadoop Happenings: Big Data Is Doing The Thinking For Us

  • By Jonathan Buckley
  • July 27, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week’s articles focus on big data’s growing ability to think for us and the safety precautions every company should be taking. 1. What Big Data Strategists Can Learn From a Con Artist Forbes.com – Avoid getting lost in […]

 
Read More..

6 Tips for Big Data Marketing

  • By Jonathan Buckley
  • July 23, 2015
624x154-6-tips-for-big-data
 

Big data has ushered in the era of data driven marketing. Massive volumes of data, streaming in at lightening speeds from a variety of channels, is rich with raw customer information containing valuable insights marketers can use to create more personalized, relevant and effective campaigns. McKinsey studies show that companies that factor data insights heavily […]

 
Read More..

Hadoop Happenings: Best Practices

  • By Jonathan Buckley
  • July 21, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week articles focused on best practices for securing and managing Hadoop. VMWare also published a benchmark test of virtualized Hadoop. See the full stories below. 1. Your Checklist for getting started with Hadoop SAS.com- 8 items you need […]

 
Read More..

The Future of Big Data

  • By Jonathan Buckley
  • July 16, 2015
624x154-future-of-big-data
 

Big Data is both revolutionary and evolutionary. The early promise that organizations could derive valuable insights through the analysis of massively large sets of unstructured data was seen as a potential game changer for how businesses operate and compete. However, early big data adoption was complex and costly, requiring large investments in hardware and teams […]

 
Read More..

Presto-Amazon Kinesis Connector for Interactively Querying Streaming Data

  • By Sivaramakrishnan Narayanan
 

This content was authored by Qubole and originally published on the AWS Big Data Blog. Amazon Kinesis is a scalable and fully managed service for streaming large, distributed data sets. As applications (particularly on mobile and wearable devices) start to collect more and more data, Amazon Kinesis is becoming the starting point for data ingestion […]

 
Read More..

Drag-n-Drop upgrades of Hadoop, Spark and Presto Clusters

  • By Mayank Ahuja
  • July 15, 2015
 

Introduction As the Big Data stack has matured, many companies have started using large clusters for running business critical applications. Workloads in such clusters are often long running (for hours or even days) and restarting a cluster poses a big problem: What happens to jobs that are already running? Restarting all these jobs wastes a […]

 
Read More..

Hadoop Happenings: The March Continues

  • By Jonathan Buckley
  • July 14, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop all in one place with this week’s Hadoop Happenings. This week, discussion continued on Apache Spark and the growing Hadoop ecosystem. An article from Forbes discussed the complexity of hiring a data scientist, and articles covered additional big data use cases. Read the full stories below. 1. […]

 
Read More..

Hive JDBC Storage Handler

  • By Divyanshu Goyal
 

Untitled Document As a part of my summer internship project at Qubole, I worked on an open-source Hive JDBC storage handler (github). This project helped me improve my knowledge on distributed systems and gave me exposure of working on a team on large projects. In many big data projects, integrating data from multiple sources is […]

 
Read More..

NoSQL and Big Data: Is a NoSQL Database for You?

  • By Jonathan Buckley
  • July 9, 2015
nosql databases
 

Big data is getting bigger and more chaotic every day. Thanks to the Internet, social media, mobile devices and other technologies, massive volumes of varied and unstructured data—streaming in at unprecedented speeds—are bombarding today’s businesses both large and small. This explosion of data is proving to be too large and too complex for relational databases […]

 
Read More..

Hadoop Happenings: A Better Hadoop Cluster

  • By Jonathan Buckley
  • July 7, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week Gartner once again addressed the question of, “what is hadoop?” Several posts addressed big data challenges, and LANDR gained funding for its intelligent mastering engine. See the full stories below. 1. Now, What is Hadoop? Gartner.com- In […]

 
Read More..

Announcing Saved Queries for Qubole Data Service

  • By Raghunandan Balachandran
  • July 2, 2015
 

We are always striving to add features to simplify the experience of our customers using Qubole Data Service (QDS). One of the major feature asks which has come up time and again is the ability to design queries and save them in a design time repository. This concept would allow separation of design time artifacts […]

 
Read More..

Qubole Recognized as Advanced Technology Partner by Amazon Web Services

  • By Nate Philip
  • July 1, 2015
 

With Qubole on AWS, any size organization can become data-driven with self-service access to the latest big data technologies MOUNTAIN VIEW, Calif., July 1, 2015—Qubole, the big data-as-a-service company founded by the team that developed Facebook’s data infrastructure, today announced it is now an Amazon Web Services (AWS) Advanced Technology Partner. Qubole’s self-service platform for […]

 
Read More..

Hadoop Happenings: Business Applications

  • By Jonathan Buckley
  • June 30, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop all in one place with this week’s Hadoop Happenings. This week posts focused on industry use cases from industry management to HR. Read the full stories below. 1.comScore CTO shares big data lessons CIO.com- Mike Brown, CTO at comScore, shares some of the lessons he’s learned in […]

 
Read More..

Hadoop is Hard! But Big Data Doesn’t Have To Be

  • By Jonathan Buckley
  • June 25, 2015
Hadoop-is-hard-624x154
 

When it comes to big data analytics, Hadoop has been heralded as the all-in-one solution for the enterprise. And while the many benefits of Hadoop adoption tend to support all the praise, the reality is that organizations that attempt to manage Hadoop themselves quickly discover that doing so is flat out hard, if not impossible. […]

 
Read More..

CUBE Keyword in Apache Hive

  • By Rajat Venkatesh
  • June 19, 2015
 

Introduction As part of a recent project – I had to experiment with CUBE functionality in Hive. This functionality was added somewhat recently to Hive (version 0.10) and is an advanced use case in Hive. Perhaps for these reasons – it is difficult to find examples other than the one in the Hive Wiki. In […]

 
Read More..

Big Data Challenges: Why the Majority of Big Data Projects Fail

  • By Jonathan Buckley
  • June 18, 2015
big-data-project-fails
 

  To truly experience growth in the future, most businesses are turning to big data. In many cases, big data is seen as the new trend guaranteed to make companies more successful. Businesses frequently turn to big data solutions for special projects designed to integrate data into normal operations and open up new business opportunities. […]

 
Read More..

Hadoop Happenings: Spark Rises

  • By Jonathan Buckley
  • June 16, 2015
hadoop-happenings
 

Grab the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week Apache Spark and other new open source projects came to the forefront. Meanwhile, use cases for Hadoop abounded at the recent Hadoop Summit. See the full stories below. 1. Companies Move on From Big Data Technology Hadoop […]

 
Read More..

5 Tips for Creating a Data-Driven Culture

  • By Jonathan Buckley
  • June 11, 2015
624x154-data-driven-culture
 

  The way businesses operate is rapidly changing every single day. Perhaps no example better illustrates this than the rapid growth and adoption of big data solutions. To say many companies are seeking to become more data-driven would be an understatement. Right now, organizations are working hard to utilize new business tools intended for the […]

 
Read More..

Hadoop Happenings: Success Stories

  • By Jonathan Buckley
  • June 9, 2015
hadoop-happenings
 

Grab all the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week a new real-time engine for Hadoop was released, discussion continued on Hadoop’s growth, and commentators discussed several big data success stories. Get the full stories below. 1. Can Hortonworks Dominate the Hadoop Market? Forbes.com- CEO of Hortonworks […]

 
Read More..

Rebalancing Hadoop Clusters for Higher Spot Utilization

  • By Hariharan Iyer
 

Running Hadoop clusters efficiently is an important customer use case at Qubole. When running in AWS, this often means using Spot instances efficiently. In this post we introduce the notion of Rebalancing Hadoop clusters to achieve a higher mix of Spot instances – while still maintaining reliability and meeting SLAs. Spot Instances At Qubole, many […]

 
Read More..

Apache Hadoop 2.6.0 Now Generally Available on Qubole

  • By Xing Quan
  • June 4, 2015
 

We’re excited to announce that Apache Hadoop 2.6.0, the latest stable release* of Apache Hadoop, is now generally available on Qubole. Hadoop 2.6.0 is compatible with all of the usual services that Qubole offers, including Spark, Hive, Pig, and MapReduce. In addition, the optimizations that we’ve made for operating in the cloud, such as auto-scaling […]

 
Read More..

Hadoop Happenings: Semantic Data Lake

  • By Jonathan Buckley
  • June 2, 2015
hadoop-happenings
 

Grab all of the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week commentary continued on how and why Hadoop will move into the mainstream. The semantic data lake was introduced, and a new list of the most influential people in big data was released. See the full stories below. 1. […]

 
Read More..

Choosing the Right Infrastructure: The Key to Success With Big Data

  • By Jonathan Buckley
  • May 29, 2015
624x154-cloud-vs-on-premise-banner
 

The benefits of big data analytics are no longer debatable. Businesses large and small are enjoying greater profitability and competitive advantage through the capture, management, and analysis of vast volumes of unstructured data. The main debate with big data now is whether an on-premise big data analytics infrastructure offers the flexibility needed to be successful […]

 
Read More..

Hadoop Happenings: It’s Still Complicated

  • By Jonathan Buckley
  • May 26, 2015
hadoop-happenings
 

Grab all the latest news and commentary about Hadoop in one place with this week’s Hadoop Happenings. This week vendors and analysts pushed back against a Gartner study indicating that Hadoop adoption is slowing. Apache Drill and Apache Hive were updated, and big data is taking off in the oil industry. Get the full stories […]

 
Read More..

5 Reasons Savvy New Gen Companies Turn to the Cloud for Big Data

  • By Jonathan Buckley
  • May 21, 2015
624x154_5_reasons_Saavy_Companies
 

Of all the current trends in technology, few have created as much buzz as cloud computing and big data. While both grew in popularity, it only stands to reason that they would eventually cross paths. This is exactly what has happened in recent years as the number of cloud services based around big data analytics […]

 
Read More..

Hadoop Happenings: Hadoop Growth Slowing?

  • By Jonathan Buckley
  • May 19, 2015
hadoop-happenings
 

Grab all the latest news and commentary about Hadoop in this week’s Hadoop Happenings. Gartner’s latest survey indicating slow growth in demand for Hadoop garnered extensive media attention this week. Commentators pointed out that the high opportunity cost for deploying Hadoop can be overcome by Hadoop-as-a-Service solutions, and others dismissed the concerns altogether. See the […]

 
Read More..

Hadoop Happenings: ORC, Spark and Flink

  • By Jonathan Buckley
  • May 12, 2015
hadoop-happenings
 

Grab all the latest news and commentary about Hadoop in this week’s Hadoop Happenings. This week Apache ORC became a top-level project. Commentary continued on Apache Spark and Apache Flink, and Forbes discussed whether big data will have an impact on next year’s presidential election. See all the stories below. 1. Apache ORC Launches as […]

 
Read More..

7 Big Data Security Concerns

  • By Jonathan Buckley
  • May 7, 2015
624x154-Big-Data-Security
 

Big data is more than just some trending business phrase that’s big on style and low on substance; it brings with it tangible benefits for any company willing to use it. The advantages of leveraging big data are real and oftentimes far-reaching, which is why so many organizations have adopted big data for their own […]

 
Read More..
 
 
 

Get Blog Updates

Search Blog

 
 
 
 

Featured Blogs