Blog

 
 

Qubole announces Heterogeneous Clusters on AWS – Reduce costs up to 90% with Spot Fleet

 

Co-authored by Hariharan Iyer, Member of the Technical Staff at Qubole. Introduction Big data engines like Hadoop and Spark are known to work well when running on homogeneous clusters. This allows the underlying resource manager to optimally place tasks on the nodes and also lets users tune their jobs as per the configuration of a […]

 
Read More..

The Importance of being Data Driven

 

Understanding the business side of big data is just as important as the technical side, particularly when it comes to ensuring the real-world success of big data projects. Ashish Thusoo, Co-founder/CEO Qubole, recently shared with attendees of Data Driven NYC some of the key aspects of Qubole’s approach to big data. Including making data insights […]

 
Read More..

Qubole Announces Support for SaaS Subscriptions on AWS Marketplace

 

Customers purchasing Qubole through AWS Marketplace will get the first two weeks of Qubole free. SAN FRANCISCO – Nov. 16, 2016 – Qubole, the big data-as-a-service company, today announced that it is now available on AWS Marketplace with support for the new SaaS Subscriptions functionality, enabling customers to subscribe directly on AWS Marketplace and benefit […]

 
Read More..

QDS on Oracle Bare Metal Cloud Service is Generally Available

 

  We previously announced our partnership with the Oracle Cloud Platform and also shared the results of our preliminary benchmark on the Oracle Bare Metal Cloud Service. Today, the Oracle Bare Metal Cloud Service is generally available for access and usage. We at Qubole are continuing to work closely with the Oracle team to bring Qubole […]

 
Read More..

Advanced security using AWS Identity Access Management (IAM) on QDS

 

  For Big Data analyses and processing, Qubole Data Service (QDS) orchestrates storage and compute resources owned in the customer’s account. To enable this, customers delegate the necessary permissions to QDS. With IAM Roles promoted as security best practice on AWS, customers no longer need to provide access and secret keys to QDS. Thereby, making […]

 
Read More..

Airflow as a service on QDS is Generally Available

 

Co-authored by Yogesh Garg and Sumit Maheshwari, Members of the Technical Staff at Qubole. Sumit Maheshwari is also part of Apache Airflow PPMC. We are excited to announce that Airflow as a service on Qubole Data Service (QDS) is GA and joins the family of Hadoop 1, Hadoop 2, Spark, Presto, and HBase offered as […]

 
Read More..

IBM and Qubole take data science and Apache Spark to the public cloud

 

This morning IBM and Qubole made an exciting announcement that will provide the growing number of data scientists a comprehensive environment based on public cloud infrastructure. IBM is well known for its long standing leadership in data science and its Watson Data Platform. It’s also a major committer to Apache Spark. The recently announced IBM […]

 
Read More..

Qubole selected to enter into Joint Venture Partnership with the National Technical Information Service (NTIS)

 

The U.S. Commerce Department’s National Technical Information Service (NTIS) announced Oct 19 that, following a rigorous merit review process, it has selected Qubole as an eligible joint venture partner (JVP) of the NTIS. Once the JV agreement is finalized, Qubole will have the opportunity to compete to work with NTIS on groundbreaking data projects conducted […]

 
Read More..

Cost Analysis of Building Hadoop Clusters Using Cloud Technologies

 

This is a guest post written by Shailesh Garg, Director of Engineering at RevX. A programmatic ad-tech platform like RevX generates terabytes of data on a daily basis. To effectively process and leverage this data, we use big data tools like Hadoop for reporting and analytics. Our infrastructure is hosted in Amazon AWS across multiple […]

 
Read More..

My Qubole Internship Experience

  • By Danny Leybzon
 

When I walked into Qubole’s office on June 10th—first day of my Product Analyst Internship there—I had nothing with me except for a notebook and a pen. Within an hour I had been given a laptop, a desk, and a roadmap. It was the fastest ramp-up I have ever experienced and I loved it. Since […]

 
Read More..

Intelligence in QDS

  • By Xing Quan
  • September 27, 2016
 

The concept of intelligent automation has always played a key role in Qubole Data Service (QDS). It’s one of the main reasons why we can help our customers bring self-service Big Data access across the enterprise. Intelligent automation also plays a big role in the small ops footprint that QDS requires, helping our customers achieve […]

 
Read More..

Creating Customized Plots in Qubole Notebooks

  • By Mohan Krishnamurthy
  • September 22, 2016
 

Important stories live in our data, and data visualization is a powerful means to discover and understand these stories, and then to present them to others. Within Qubole notebooks users can leverage the built-in charting tools to create visualizations. In addition to the built-in charting capabilities, users sometimes find the need to create custom charts. […]

 
Read More..

Qubole and Oracle: Combining the flexibility and scale of the cloud with better performance than on-prem

 

We are excited to be working with Oracle as the premier Big Data partner for the Oracle Cloud Platform. Qubole is the fastest growing Big Data as a service platform, with our flagship product Qubole Data Service (QDS) providing self-service access to Spark, Hadoop, Hive, and many other open source analytics tools to enterprises across […]

 
Read More..

Qubole Offers Big Data Service For The Oracle Cloud Platform

 

Native integration with Oracle’s IaaS ecosystem offers comprehensive big data solution for the cloud SAN FRANCISCO – ORACLE OPENWORLD – Sept. 19, 2016, Qubole, the big data-as-a-service company, today announced that it will be offering a big data service solution for the Oracle Cloud Platform. The Qubole Data Service (QDS) will now natively integrate with […]

 
Read More..

Apache Spark On Qubole: Sky Is the Limit

 

At Qubole, we’ve made significant progress on our adoption of Spark on QDS with new features and scalability. Here are some recent stats pertaining to Apache Spark on Qubole Data Service (QDS):   Apache Spark on QDS–New Features and Highlights To accommodate growing demands and leverage technological advancements made by the Apache Spark community, we […]

 
Read More..

Presto Ruby Client in QDS

 

Co-authored by Somya Kumar, Member of the Technical Staff at Qubole. Presto is an open source distributed SQL query engine developed at Facebook. It’s built for interactive analytics queries, and like other Big Data processing engines such as Apache Spark, Hadoop, and Hive offered as a service on Qubole Data Service (QDS), Qubole also offers […]

 
Read More..

SparkSQL in the Cloud: Optimized Split Computation

 

When it comes to Big Data processing in the cloud compared to on-premise, one of the fundamental differences between the two is how the data is stored and accessed. Not having a clear understanding of this underlying difference between, for example, AWS S3 in the cloud and HDFS on-prem leads to a suboptimal service to […]

 
Read More..

The Value of Auto-scaling

 

Intro In a recent blog post, we benchmarked auto-scaling and demonstrated that an auto-scaling cluster was a lot less expensive and only a little bit slower than a static, max-sized cluster. In this post, we decided to quantify this benefit in terms of dollars and cents. Based on our results, we estimate that auto-scaling is […]

 
Read More..

The Cloud Advantage: Decoupling Storage and Compute

624x154-decoupling-storage-compute
 

When Hadoop is deployed with on-premises architecture, compute and storage are combined together. As a result, compute and storage must be scaled together and the clusters must be persistently on otherwise the data becomes inaccessible. On the cloud, compute and storage can be separated with a service such as EC2 and S3 used as the […]

 
Read More..

Qubole’s Notebook Integration with Github is Generally Available

  • By Mohan Krishnamurthy
  • August 10, 2016
 

We are excited to announce the general availability of GitHub integration for QDS Notebooks. GitHub is an effective way to collaborate on development projects. GitHub is version control software that allows users to track the changes they make to their code, as well as being able to easily revert these changes, share development efforts and […]

 
Read More..

Benchmarking Auto-scaling Spark Clusters

Cmds per Hour vs. Nodes per Hour
 

Intro Have you ever had trouble deciding how large to make a cluster? Do you sometimes feel like you’re wasting money when a cluster isn’t being fully utilized? Or do you feel like your analysts’ time is being wasted, waiting for a query to return? At Qubole, we developed auto-scaling in order to help combat […]

 
Read More..

Qubole Continues Strong Momentum, Reports a Strong First Half of 2016

  • By Jo McDougald
  • August 4, 2016
 

Lyft, Box, Amgen and Scripps Join Growing Roster of Top-Tier Customers MOUNTAIN VIEW, CA–(Marketwired – Aug 4, 2016) – Qubole, the big data-as-a-service company, reported exponential growth over the past six months and strong momentum heading into the second half of 2016. Following its $30 million Series C funding round in January, Qubole has continued […]

 
Read More..

Qubole Continues Strong Momentum, Reports a Strong First Half of 2016

  • By Jo McDougald
 

Lyft, Box, Amgen and Scripps Join Growing Roster of Top-Tier Customers MOUNTAIN VIEW, CA–(Marketwired – Aug 4, 2016) – Qubole, the big data-as-a-service company, reported exponential growth over the past six months and strong momentum heading into the second half of 2016. Following its $30 million Series C funding round in January, Qubole has continued […]

 
Read More..

5 Big Data Infrastructure Implementations

624x154-5-big-data-infrastructure
 

One of the great things about the big data industry is how willing practitioners are to share their knowledge, thinking process and experience. We love it when our customers talk about their implementations and it’s amazing to see what they’ve accomplished. Here’s a collection of some our favorite blog posts: 1.Powering Big Data at Pinterest […]

 
Read More..

Up to 80% savings with AWS Spot Instances

spot instances
 

In a previous post, we outlined the case for selecting cloud infrastructure over an on-premises deployment for managing big data workloads. Taking advantage of Spot instances to realize substantial cost savings is one of the benefits of selecting the cloud. Spot instances are a feature of AWS consisting of spare EC2 instances offered at a […]

 
Read More..

Optimize Queries with Materialized Views and Quark

  • By Rajat Venkatesh
  • July 14, 2016
 

This blog post explores how queries can be sped up by keeping optimized copies of the data. First we will explore the techniques and benchmark some sample results. Later, we talk about how one can use Quark (which we detailed in a previous post) to easily implement these performance optimizations in a Big Data analytics […]

 
Read More..

Qubole and WANdisco Move Enterprises to the Cloud with Cloudera Migration Program

  • By Ari Amster
  • July 11, 2016
 

New partnership with WANdisco reduces cost and complexity, and eliminates downtime during cloud migration MOUNTAIN VIEW, CALIF. — July 11, 2016—Qubole, the big data-as-a-service company, today announced the launch of its Cloudera Migration Program to assist enterprises in expanding their use of big data by leveraging the advantages of the cloud. As part of the […]

 
Read More..

Build or Buy: The Case for Cloud Infrastructure

  • By Ari Amster
  • July 7, 2016
624x154-big-data-cloud-alternative
 

Managing big data creates several challenges for data infrastructure teams: 1. Managing “bursty” and unpredictable workloads 2. Coordinating ad hoc and batch workloads 3. Storing rapidly growing data stores that require the capability to scale quickly 4. Integrating data generated at the edge 5. Managing storage and compute costs While an on-premises deployment has been […]

 
Read More..

Quark: Control and Optimize SQL Across Hadoop and RDBMS

  • By Rajat Venkatesh
  • June 27, 2016
 

One of the important functions of a database administrator is to manage storage structures to optimize performance in a relational database. Admins use tables, views, index, and cubes to tune the database as well as control the behavior of users (e.g., discourage full table scans and cross joins). There are similar well-known techniques in the […]

 
Read More..

Qubole Makes Key Hires to Leadership Team to Support Accelerating Market Demand

  • By Ari Amster
  • June 21, 2016
 

Company Appoints David Hsieh as Senior Vice President of Marketing and Ken Tamura as Vice President of Finance MOUNTAIN VIEW, CA–(Marketwired – Jun 21, 2016) – Qubole, the big data-as-a-service company, today announced that it has made two new additions to its executive team in key leadership roles. David Hsieh has been appointed senior vice […]

 
Read More..

RubiX: Fast Cache Access for Big Data Analytics on Cloud Storage

  • By Shubam Tagra
 

Qubole introduced first-generation Caching for S3 files in Presto in 2014 and documented the observed performance gains. In a nutshell: for CPU-efficient engines like Presto and Spark, caching remote files on local disk storage improves performance by removing bottlenecks in network IO. Our users also benefited from these performance gains, as this blog post from […]

 
Read More..

The Numbers Don’t Lie: Apache Spark is on the Rise

  • By Ari Amster
  • June 10, 2016
624x154-spark-usage-trends
 

Apache Spark remains a growing force in the realm of big data. Perhaps that shouldn’t come as a surprise considering the overall momentum behind big data analytics, but the growth in just the past few months has been nothing short of impressive. No doubt part of the reason behind that growth — besides a greater […]

 
Read More..

Qubole’s HBase-as-a-Service is Generally Available on AWS

  • By Rajat Venkatesh
  • June 9, 2016
624x154-apache-hbase
 

The HBase team at Qubole is happy to announce the general availability of QDS HBase-as-a-Service on AWS. Through the Beta program, QDS has helped administrators run HBase at scale in production with higher uptime and reliability while exploiting cloud elasticity for more agile deployments. In building our HBase offering, we worked closely with early customers […]

 
Read More..

The Future of Deep Learning

  • By Ari Amster
  • June 2, 2016
Deep Learning
 

Once an obscure academic topic like “big data” used to be, deep learning has evolved into one of tech’s most exciting and promising disciplines in the field of AI—all in just a few short years. And in light of recent breakthroughs, deep learning technology is literally positioning itself (no pun intended) to transform AI altogether. […]

 
Read More..

Big Data and the Rise of Self-Service Analytics

  • By Ari Amster
  • May 27, 2016
624x154-self-service-analytics
 

In the beginning, analyzing massive datasets on open source Hadoop was a complex process best left to PhDs. But over the past few years that has dramatically changed. Today, cloud platforms, paired with powerful business intelligence tools, have ushered in the rise of self-service analytics, enabling data analysis power users—along with users lacking a technical […]

 
Read More..

Hadoop Happenings: Hadoop 3.0

  • By Ari Amster
  • May 24, 2016
hadoop-happenings
 

Grab the latest news and commentary on Hadoop in this week’s Hadoop Happenings. This week articles discussed Hadoop 3.0, how big data is being applied in the healthcare industry, and the importance of machine learning. See the full stories below. 1. How Spark and Hadoop are Advancing Cancer Research Datanami.com- Researchers are using big data […]

 
Read More..

Qubole Meets BI Tools: 5 Machine Learning Libraries and their Big Data Use Cases

  • By Ari Amster
  • May 19, 2016
machine learning libraries
 

In an ongoing effort to extract more useful information and insights from massive volumes of structured and unstructured data, many organizations have turned to cloud based Hadoop big data analytics solutions such as Qubole. And as effective as these solutions are at capturing and analyzing large data volumes, their ability to interact with powerful Business […]

 
Read More..

Which Programming Language Should You Use For Your Big Data Project?

  • By Ari Amster
  • May 6, 2016
624x154-programming-language
 

Big data projects are becoming much more common as organizations seek to take advantage of all that big data has to offer. While many companies are on board with the idea of implementing a big data project, properly executing one is another matter entirely. Many factors have to be considered, from what types of legacy […]

 
Read More..

The Big Data Lifecycle At TubeMogul

  • By Ari Amster
  • April 29, 2016
 

This post was written by Chris Chanyi, Senior Data Architect at TubeMogul. It originally appeared here. TubeMogul handles over a trillion HTTP requests a month. To understand how we handle this amount of data, it’s important to understand how we started. Read on for an in-depth look at our big data history. One of our […]

 
Read More..

3 Major Challenges to Implementing Big Data

  • By Ari Amster
data management
 

With all the hype, it’s little wonder that organizations are getting caught up by the idea of having their own big data initiatives. But as promising as that idea sounds, the reality is that over half of all big data projects never reach fruition. And when it comes to on-premise big data initiatives, the majority […]

 
Read More..

Qubole and Looker Join Forces to Empower Business Users to Make Data-Driven Decisions

  • By Ari Amster
  • April 27, 2016
 

Qubole, the big data-as-a-service company, and Looker, the company that is powering data-driven businesses, today announced that they are integrating Looker’s business analytics with Qubole’s cloud-based big data platform, giving line of business users across organizations access to powerful, yet easy-to-use big data analytics. Business units face an uphill battle when it comes to gleaning […]

 
Read More..

Qubole Extends Big Data-as-a-Service Platform with StreamX

  • By Ari Amster
  • April 26, 2016
 

Qubole, the big data-as-a-service company, today announced it has open sourced StreamX, an ingestion service to help data teams efficiently and reliably capture large scale, real-time data. Qubole will be adding support for StreamX as a managed service on the Qubole Data Service (QDS) platform to simplify and automate the ingestion of data for big […]

 
Read More..

Will Poor Data Management Cause Your Big Data Project to Fail?

  • By Ari Amster
  • April 22, 2016
624x154-poor-data-management
 

Most organizations have grand visions when it comes to using big data. Needless to say, there’s been a lot of hype surrounding big data analytics, with a lot of emphasis placed on businesses starting their own big data projects. Perhaps your company is interested in a big data project or has already started one. While […]

 
Read More..

Is Your Big Data Initiative Scalable?

  • By Ari Amster
  • April 14, 2016
Scaling Big Data
 

The benefits of big data in the enterprise are no longer in question. Thanks to Hadoop, organizations both large and small are finding real value in capturing, storing, and analyzing large volumes of unstructured data. However, as data volumes continue to rise at exponential rates, organizations looking to stay profitable and competitive must be able […]

 
Read More..

Hadoop Happenings: Announcements and New Releases

  • By Ari Amster
  • April 12, 2016
hadoop-happenings
 

Grab the latest news and commentary on Hadoop in this week’s Hadoop Happenings. This week LinkedIn released another Hadoop tool, Hortonworks made several announcements, and MarketShare shared what it has learned since its Hadoop deployment. See the full stories below. 1. MarketShare’s big data do-over: Hadoop deployment overhaul ZDNet.com- MarketShare initially tackled the problem of […]

 
Read More..

Hadoop and the Data Warehouse: A Winning Combination for Your Business

  • By Ari Amster
  • April 7, 2016
Hadoop vs Data Warehouse
 

This post was originally published August 2014 and has since been updated. Once the subject of speculation, big data analytics has emerged as a powerful tool that businesses can use to manage, mine, and monetize vast stores of unstructured data for competitive advantage. As a result, the rate of adoption of Hadoop big data analytics […]

 
Read More..

Qubole Open Sources Quark for SQL Virtualization

  • By Ari Amster
  • April 5, 2016
 

Qubole, the big data-as-a-service company, today announced that it has open sourced Quark, a cost-based SQL optimizer that helps to simplify and optimize access to data for data analysts. Traditionally, the data sets generated by data teams are aggregated and copied to multiple analytics systems to balance performance and cost, making it near impossible to […]

 
Read More..

The Growth of the Industrial Internet of Things

  • By Ari Amster
  • March 31, 2016
Industrial Internet of Things
 

The Internet of Things (IoT) has become a popular topic of discussion as it represents the direction the world is likely headed. Imagining a world filled with connected devices, each communicating with each other, opens up so many intriguing possibilities that can change our lives in new and exciting ways. While the prospect of the […]

 
Read More..

Moving past infrastructure limitations

  • By Ari Amster
  • March 24, 2016
 

This is a guest post written by Rory Sawyer, Software Engineer at MediaMath Here at MediaMath, we’re quite fond of data. It would be surprising to hear someone say they’re not fond of data, of course, but we’ve spent the last 18 months proving to ourselves and our clients that we really mean it. Our […]

 
Read More..

Big Data Applications: Use Cases for Big Data

  • By Ari Amster
Big Data Applications
 

The lure of big data analytics is unmistakeable and strong, and with good reason. Businesses have quickly caught on to the numerous advantages big data can give them. The benefits and potential are tremendous, and companies are responding by freeing up more budget for big data endeavors. A survey from Gartner in 2014 indicated that […]

 
Read More..

Qubole Appoints its Head of Web Services Division

  • By athusoo
  • March 18, 2016
 

The appointment of Suresh Ramaswamy will help Qubole scale its multi-tenant SaaS platform and develop highly responsive big data platforms to cater to industry demands. Qubole, the big data-as-a-service company, today announced that it has appointed Suresh Ramaswamy as Qubole’s Head of Web Services. In this role, Suresh will help Qubole scale the web services […]

 
Read More..

Big Data and Customer Service: A Guide to Call Center Analytics

  • By Ari Amster
  • March 17, 2016
Big Data Customer Service
 

In today’s ultra competitive business world, mobile technology, social media have made the customer king. No longer is it enough for a company to have quality products and services. In order to truly stand out from the competition and build a solid reputation, companies need to provide quality customer service on a consistent basis. Fortunately, […]

 
Read More..

Top Apache Spark Use Cases

  • By Ari Amster
  • March 10, 2016
624x154-top-apache-cases-expanded
 

This post was originally published in July 2015 and has since been expanded and updated. Apache Spark is quickly gaining steam both in the headlines and real-world adoption. UC Berkeley’s AMPLab developed Spark in 2009 and open sourced it in 2010. Since then, it has grown to become one of the largest open source communities […]

 
Read More..

Qubole Appoints its First Chief Information Security Officer

  • By athusoo
 

Andrew Daniels brings more than 20 years of experience in enterprise security to address industry-specific needs Qubole the big data as-a-service company, today announced that it has appointed Andrew Daniels as Qubole’s first chief information security officer (CISO) and vice president of security, compliance and privacy. As CISO, Daniels will focus on developing industry-leading security […]

 
Read More..

Qubole Extends Customer Support with New Education Program

  • By Ari Amster
  • March 7, 2016
 

Qubole, the big data as-a-service company, announced today it will be extending its customer support services with the launch of Qubole Education, an extensive resource to empower data users throughout an organization with the skills needed to successfully implement a cloud-based data project. Qubole’s cloud-agnostic big data platform allows users to implement the right data […]

 
Read More..

Survey: State of Big Data Adoption

  • By Ari Amster
Big Data Adoption
 

In a recent survey of 766 respondents, Qubole uncovered several insights about the state of big data adoption. Among those currently using a big data implementation, Big Data-as-a-Service users were 33% more likely to be satisfied with their big data projects. The survey also demonstrated significant growth of BDaaS adoption in the enterprise, and echoed […]

 
Read More..

Applications of Business Intelligence in Banking and Finance

  • By Ari Amster
  • March 3, 2016
Business Intelligence and Finance
 

Technology is transforming the banking and finance industry. Thanks to the Internet and the proliferation of mobile devices and apps, today’s financial institutions face mounting competition, changing client demands, and the need for strict control and risk management in a highly dynamic market. At the same time, technology has given rise to powerful business intelligence […]

 
Read More..

Cashing in on the New Currency- 5 Ways to Monetize Data

  • By Ari Amster
  • February 24, 2016
624x154-monetizing-data
 

“Show me the money.” That’s not just a line made famous in a Tom Cruise movie. It’s what the CEOs and CFOs of organizations that have bought into Big Data initiatives are now demanding of their IT departments—“Show us how you are deriving monetary value from our data.” It’s a valid request. After all, today’s […]

 
Read More..

Qubole’s Marcy Campbell Honored with the Silicon Valley Business Journal’s Women of Influence Award

  • By Ari Amster
  • February 19, 2016
 

MOUNTAIN VIEW, CALIF. – Feb. 19, 2016 – Qubole, the big data-as-a-service company, is pleased to announce that Silicon Valley Business Journal is honoring Marcy Campbell, Qubole’s senior vice president of worldwide sales and business development, with its Women of Influence Award. The Women of Influence honorees exercise power and influence within their industry and […]

 
Read More..

Qubole Donates Access to Big Data Cloud Platform for University Research

  • By Ari Amster
  • February 18, 2016
 

Students Will Be Able to Conduct Data Analysis on Any Size Data Sets Using the Latest Technologies Such as Apache Spark, Presto, Hive and Hadoop on Qubole’s Self-Service, Infinitely Scalable Cloud Platform Qubole, the big data as-a-service company, announced today it will be donating time on the Qubole Data Service (QDS) to university classes, giving […]

 
Read More..

The Role of Machine Learning in Big Data

  • By Ari Amster
624x154-role-of-machine-learning
 

With businesses eagerly pursuing big data analytics, it only stands to reason that they’d look for the methods and strategies that will best help them get the most out of it. There are many ways to perform analytics, and each will change depending on the type of business and what insights organizations want to gain. […]

 
Read More..

Open Source Integration of Airflow and Qubole

  • By Xing Quan
  • February 17, 2016
 

This post was written by Yogesh Garg and Sumit Maheshwari, who are Members of the Technical Staff at Qubole. We are pleased to announce that Qubole has open sourced an Airflow extension to connect with Qubole Data Service (QDS). Using this extension, our customers will be able to use Airflow for creation and management of […]

 
Read More..

5 Tips for Boosting Public Cloud Security

  • By Ari Amster
  • February 11, 2016
624x154-5-tips-securing-data-in-cloud
 

It’s a long held belief that data stored on-premises is a lot more secure than storing that data in the public cloud. However, that may not be the case. While cloud security concerns have been around as long as cloud computing has existed, cloud providers have gone to great lengths to address them, improving their […]

 
Read More..

Our own Swati Singhi at the Grace Hopper Celebration

  • By Xing Quan
  • February 8, 2016
 

Swati Singhi, a Member of the Technical Staff at Qubole, was recently featured as a speaker at the Grace Hopper Celebration of Women in Computing, held in Bangalore, India. The Grace Hopper Celebration is the world’s largest technical conference for women in computing, and it is designed to bring the research and career interests of […]

 
Read More..

Optimizing S3 Bulk Listings for Performant Hive Queries

  • By Amogh Margoor
 

Introduction We previously wrote about the optimizations we made to optimize Hadoop and Hive on S3. Since then, we’ve applied those same changes across the rest of our Big Data analytics offerings, including Spark and Presto. Today, we’ll discuss some new recent optimizations we’ve made to further make querying of data performant and efficient for […]

 
Read More..

Infographic: Big Data Belongs in the Cloud

  • By Xing Quan
  • February 4, 2016
qubole-infographic-blog-2
 

Big Data infrastructure is complex, difficult to build and operate, and often requires highly specialized talent to maintain. To alleviate these challenges, businesses are turning to the cloud to provide simplicity, flexibility and agility. The graphic below highlights Qubole customers’ leadership due to the ease of administration, scaling, lifecycles, flexibility, and costs.     Qubole […]

 
Read More..

CIO Focus 2016: Technology and Team Management

  • By Ari Amster
  • February 3, 2016
Modern CIO 2016
 

In today’s world of big data, information technology is advancing at unprecedented rates. This presents some major challenges for organizations in general, and CIOs in particular, as they search for ways to boost growth and profits in the face of mounting competition. Not long ago the terms “big data” and “competitive advantage” were dismissed as […]

 
Read More..

Big Data’s Moment in the Cloud Has Been Acknowledged

  • By Xing Quan
  • January 29, 2016
 

We were delighted to see the announcement of the latest version of Cloudera Director, and a corresponding write up on Curt Monash’s DBMS2 blog. The industry’s movement toward cloud-optimized features, such as support for Spot Instances and dynamic creation and termination of clusters, validates the direction that we’ve set for our company and product. Qubole’s […]

 
Read More..

Cassandra vs. Hadoop: A Comparative Look

  • By Ari Amster
  • January 28, 2016
cassandra vs hadoop
 

Technology is reshaping our world. The proliferation of mobile devices, the explosion of social media, and the rapid growth of cloud computing have given rise to a perfect storm that is flooding the world with data. The challenge for enterprises is that, according to Gartner estimates, 80 percent of this “big data” is unstructured, and […]

 
Read More..

Building a Collaborative Team With Data Scientists, Business Analysts, and Developers

  • By Jonathan Buckley
  • January 21, 2016
624x154-building-collabrotive-team
 

This blog post originally appeared on the Import.io blog. Start the new year off right by making sure your Big Data team is aligned. It is the goal of many business leaders to effectively utilize big data analytics to improve their companies. That means having the best people on the job as part of a […]

 
Read More..

Qubole Closes $30 Million Investment to Extend Leadership in Big Data in the Cloud

  • By Jonathan Buckley
  • January 20, 2016
 

IVP leads Series C financing along with existing investors CRV, Lightspeed Venture Partners and Norwest Venture Partners Qubole, the big data-as-a-service company, today announced that it has closed a $30 million Series C financing, bringing its total funding to $50 million. IVP led the financing and General Partner Somesh Dash will join the Qubole board […]

 
Read More..

Meetup: Machine Learning at Scale Using Spark and Hive

  • By Jonathan Buckley
  • January 14, 2016
624x154-oracle-qubole-presentation
 

A large crowd recently attended the Boulder/Denver Big Data Meetup group hosted by Oracle where experts from Qubole discussed their latest findings from a real world case study. The evening’s presentations were titled “Case Study: Machine Learning at Scale using Spark and Hive” and detailed practical ways businesses can implement machine learning techniques using the […]

 
Read More..

Building Qubole: Metrics and Alerts

  • By Rajat Venkatesh
  • January 11, 2016
 

In this blog post, we’ll show you how we collect metrics and set up alerts to ensure the availability of Qubole Data Service (QDS).   QDS Architecture Before getting into the details about monitoring, we’ll give a quick introduction to the QDS architecture.   QDS runs and manages Hadoop/Spark/Presto clusters in our customers’ AWS, GCP, […]

 
Read More..

5 Signs You’re Failing at Data Science

  • By Jonathan Buckley
  • January 7, 2016
624x154-five-signs-failing-data-science
 

Most businesses understand that big data analytics is where it’s at. They view data science as the one new thing they need to truly improve their operations and become even more successful as an organization. The problem, though, is that too many companies are failing at data science. One report from Pricewaterhouse Coopers (PwC) and […]

 
Read More..

The Public Cloud Market Continues to Expand

  • By Jonathan Buckley
  • December 31, 2015
public cloud growth
 

Businesses are truly coming around to all that cloud computing has to offer. While the cloud has been around for years, only recently has it reached levels of popularity where it isn’t hyperbole to refer to it as a global phenomenon. This has lead to many companies taking advantage of the public cloud’s many benefits, […]

 
Read More..

Qubole Appoints Jonathan Trail as Vice President of Customer Success

  • By Jonathan Buckley
  • December 22, 2015
 

Qubole, the big data as-a-service company, today announced that it has appointed Jonathan Trail as Qubole’s first Vice President of customer success. As VP of customer success, Trail will work closely with Jonathan Buckley, SVP of marketing, and Marcy Campbell, SVP of worldwide sales and business development. Together, they will work to continue the company’s […]

 
Read More..

Apache Spark vs. Hadoop Which Big Data Framework is the Best Fit?

  • By Jonathan Buckley
  • December 17, 2015
spark vs hadoop
 

In the early days of big data, Apache Hadoop wasn’t just the “elephant in the room”, as some have called it. Hadoop was the room. But that is all changing as Hadoop moves over to make way for Apache Spark, a newer and more advanced big data tool from the Apache Software Foundation. There’s no […]

 
Read More..

Qubole Ignites Apache Spark on Google Cloud Platform

  • By Jonathan Buckley
 

Qubole, the big data-as-a-service company, today announced the availability of Apache Spark on Qubole Data Service (QDS) for Google Cloud Platform. The integration will enable Google Cloud Platform customers to use QDS’s 1-click persistent Spark Notebooks for fast data analysis, and auto-scale Spark clusters that deliver the right compute power for specific workloads. Qubole Data […]

 
Read More..

Getting started with Spark on QDS for Google Cloud Platform

 

Starting today, Qubole Data Service (QDS) users can launch Auto-scaling Spark Clusters and 1-click Persistent Notebooks to analyze data persisting in Google Cloud Storage. To set up a trial account, follow the instructions in our Google Cloud Platform Quick Start Guide. With auto-scaling, you no longer need to manually set the cluster size to achieve […]

 
Read More..

Keeping Big Data Safe: Common Hadoop Security Issues and Best Practices

  • By Jonathan Buckley
  • December 10, 2015
624x154-keep-big-data-safe
 

The big data explosion has given rise to a host of information technology tools and capabilities that enable organizations to capture, manage and analyze large sets of structured and unstructured data for actionable insights and competitive advantage. But with this new technology comes the challenge of keeping sensitive information private and secure. Big data that […]

 
Read More..

Where’s the Value in Big Data—Storage or Apps?

  • By Jonathan Buckley
  • December 3, 2015
624x154-value-in-storage-or-apps
 

Big data has become a big industry. The lofty promise of big data analytics to deliver actionable insights and create competitive advantage is being realized. And organizations that once dismissed the idea of implementing a big data strategy are giving it a second look as they consider the benefits of capturing, managing and analyzing mountains […]

 
Read More..

The Main Types of Big Data Vendors: A Comparative Look

  • By Jonathan Buckley
  • November 19, 2015
624x154-main-types-of-big-data-vendors
 

The big data boom has given rise to a host of vendors, each promoting their own unique ways of meeting the growing data demands of today’s businesses. As a result, businesses seeking a big data solution have a fairly long list of big data vendors to choose from. Selecting the right vendor is both a […]

 
Read More..

Share RDDs Across Jobs with Qubole’s Spark Job Server

 

When we launched our Spark as a Service offering in February, we designed it to run production workloads. Users would write standalone Spark applications and run them via our UI or API. We then enhanced the offering by adding support for running these standalone Spark applications on a schedule using our scheduler or as part […]

 
Read More..

4 Tips For Breaking Down Data Silos

  • By Jonathan Buckley
  • November 12, 2015
624x154-breaking-silos
 

Companies are eager to use big data analytics to improve their business operations, but many have found that fully implementing the strategy is extremely difficult. Granted, big data can be complex, but many of the challenges businesses have encountered have nothing to do with big data itself. The real problem lies in the organizational structure […]

 
Read More..

Riding the Spotted Elephant

Riding-the-Spotted-Elephant
 

Introduction: One of the benefits of moving Hadoop workloads to the cloud is reducing cost and risk. No up front capital expense on hardware is required and on-going expenditure scales only in response to actual usage. This greatly lowers risk. Services like Qubole eliminate administration overhead as well. Amazon EC2 offers multiple instance purchasing options. […]

 
Read More..

Building Blocks of a Data-Driven Organization

  • By Jonathan Buckley
  • November 5, 2015
624x154-building-blocks-big-data
 

Organizations have seen the value that big data can add. It’s no mistake that so many businesses have chosen to adopt big data solutions in recent years, since the potential those solutions bring can be monumental. Success always seems right around the corner when using big data, but too often, success can be hard to […]

 
Read More..

Share Data Across Accounts with Data Exchange

  • By Xing Quan
  • November 4, 2015
 

This post was written by Vikram Agrawal and Aswin Anand, who are both lead engineers at Qubole. Qubole has the concept of users and accounts. While customers sign in as a single user, they can also belong to one or more accounts. This account segregation provides some nice logical separation for compute clusters and metadata. […]

 
Read More..

Introducing Hadoop, Spark, and Presto Clusters With Zero Local Disk Storage

 

We’re excited to announce that Qubole can now run Hadoop, Spark, and Presto clusters with zero local disk storage. We now support AWS M4 and C4 instance types, which do not include local disk storage and instead utilize either S3 (for long-lived data) or EBS (network attached disk-storage for holding intermediate and temporary data) for […]

 
Read More..

How to Choose a Big Data-as-a-Service Company

  • By Jonathan Buckley
  • October 29, 2015
Big Data as a Service
 

The world of big data is all around us. Transactions, sensors, social media, mobile devices, wearables, and a host of other sources are generating datasets of unprecedented volume, velocity and variety. This big data explosion presents enormous opportunities for organizations that are able to capture, manage, and analyze massive volumes of disparate data for insights […]

 
Read More..

A 5-Minute Guide on How NOT to do Big Data

  • By Jonathan Buckley
  • October 22, 2015
624x154-how-not-to-do-big-data
 

The verdict is in. Big data is delivering big benefits to businesses large and small. It’s little wonder that more and more organizations are anxious to dive into vast stores of data to extract hidden insights and gain competitive advantage. But big data adoption comes with a caveat—do it right or don’t do it at […]

 
Read More..

A 5-Minute Guide to Apache Spark

  • By Jonathan Buckley
  • October 14, 2015
624x154-five-min-guide-apache-spark
 

When it comes to big data tools, more than a few have peculiar names. You’ve got Hadoop, Hive, MongoDB, Pig, Presto—the list of quirky words goes on. And then there’s Apache Spark, which sounds a lot like the name of a 60’s rock band. In reality “Spark” is a formidable big data processing engine that’s […]

 
Read More..

Interning at Qubole: What I Learned From Working on Hive, Spark, and Sqoop

  • By Xing Quan
  • October 5, 2015
 

This is a guest post from Akhilesh Anandh, who was an engineering intern with us. My journey with Qubole began in January 2015, when I joined as an intern for 6 months (my final semester of college) under the PS-2 programme of my alma mater BITS Pilani. I spent another 2 months at Qubole from […]

 
Read More..

Webinar Recap: Democratizing Big Data

  • By Jonathan Buckley
  • October 1, 2015
Webinar-624x154
 

Big data projects can offer a lot to businesses, and the impact that comes from them may affect every employee. Managing and amplifying that impact becomes a vital step to big data success, and the best way to achieve it is to make the data widely accessible within the organization, while also making sure it […]

 
Read More..

A 5-Minute Guide to Big Data

  • By Jonathan Buckley
  • September 24, 2015
Big Data Guide
 

Big data: Ask 5 average people what it is and you’re bound to get several different answers—and at least one glazed over expression. That’s not surprising. When the term was first being tossed around in the analytics field several years ago, a lengthy debate began about what “big data” was all about. Today that debate […]

 
Read More..

A 5-Minute Guide to Big Data Tools

  • By Jonathan Buckley
  • September 17, 2015
624x154-five-min-guide-to-big-data-tools
 

Hadoop, Hive, Spark, Presto, Pig, NoSQL—these are words you’d expect to find in a whimsical Dr. Seuss tale. In fact, they are the names of powerful tools found in a world once thought to be just as nonsensical as any story Dr. Seuss could dream up—the world of Big Data. For those who would like […]

 
Read More..

Will Hadoop Consume Your Company?

  • By Jonathan Buckley
  • September 10, 2015
will-hadoop-consume-your-business
 

The difficulties and challenges of managing a big data project are many, and unfortunately, failure is the result more often than not. Though businesses may give their projects all their attention, certain shortcomings will usually lead to them struggling to achieve their goals. This can be seen in a recent study from Capgemini, which discovered […]

 
Read More..

Announcing Support for AWS IAM Roles

  • By Xing Quan
  • September 3, 2015
 

We’re excited to announce support for Identity and Access Management (IAM) Roles for delegating permissions and access to Qubole. IAM Roles are a security best practice on AWS. Customers no longer need to provide access and secret keys to Qubole, making access control more secure. Here’s some background on why Qubole requires access to our […]

 
Read More..

5 Ways to Leverage Social Media Data For Your Business

  • By Jonathan Buckley
social media data
 

Big Data is transforming the business world. The ability to capture, manage and analyze massive volumes of unstructured data for insights that lead to competitive advantage is a game-changer for businesses large and small. With the explosion of social media, never ending streams of data flowing in from Facebook, Twitter, Pinterest, and other social sites […]

 
Read More..

Causes of Dirty Data and How to Combat Them

  • By Jonathan Buckley
  • August 27, 2015
624x154-clean-data-points
 

By now, most businesses understand the appeal of using big data analytics. With big data, companies can improve their efficiency, increase productivity, and gain valuable insights that drive their work forward. Few will deny the important role big data now plays in organizations all over the world, but gaining those unique benefits requires having high […]

 
Read More..

Multi-tenant Job History Server for Ephemeral Hadoop and Spark Clusters

 

Introduction Qubole Data Service (QDS) allows users to configure logical Hadoop and Spark clusters that are instantiated when required. These clusters auto-scale according to the workload and shut down automatically when there is a period of inactivity, resulting in substantial cost savings. This feature, however, presents an additional challenge for supporting and debugging logs. For […]

 
Read More..
clear