Imagine reliably asking Amazon Alexa, Amazon Echo Dot, Google Home, or a chatbot to run analytics queries against a big data platform. For example, “What were the top three revenue-generating products last week?” or better yet “Start my Spark cluster” — all without firing up your computer, scrolling through a report, looking through spreadsheet columns, or asking an analyst or a data admin. Big Data at the tip of your tongue–pun intended.
The concept of conversing with a computer is very interesting and has been around for a while–think Star Trek’s “LCARS” and Hal from “A Space Odyssey”. While we might be a long way off from those realities, recent advancements from Amazon, Google, Microsoft, IBM, and other natural language and AI technologies have brought us closer. We can expect a lot of new, creative services to be built in the near future.
Meanwhile, in the big data space, with the massive amounts of data generated along with advancements in machine learning algorithms and the speed at scale of computing, it’s only a matter of time before Artificial Intelligence (AI) and Machine Learning (ML) will also power big data analytics. In ways and at speeds never experienced before. These systems will meld with the technological innovations in peripheral areas like the Internet of Things (IoT), cloud computing, and natural language processing.
The movement towards conversational interfaces will accelerate,” — Stuart Frankel, CEO, Narrative Science. “The recent, combined efforts of a number of innovative tech giants point to a coming year when interacting with technology through conversation becomes the norm. Are conversational interfaces really a big deal? They’re game-changing. Since the advent of computers, we have been forced to speak the language of computers in order to communicate with them and now we’re teaching them to communicate in our language.
The early adopters of AI and machine learning in analytics will gain a huge first-mover advantage in the digitalization of business. — Quentin Gallivan, CEO, Pentaho
All of this was racing through my mind as I was absorbing the barrage of information while attending sessions, bootcamps, and workshops at AWS re:Invent 2016 in Las Vegas.
Meet Qulexa (Qubole + Alexa)
So… inspired by the industry predictions, technological advancements, and my own thoughts, I’ve created a voice-enabled conversational interface that runs against Big Data platforms in the cloud. Meet Qulexa!
Although it’s just a sample application, Qulexa in my mind is a preview of what I am envisioning where we’re headed when it comes to accessing insights using AI, ML, and Big Data technologies as never possible before.
Below I’ve outlined the technical details of Qulexa. I know, I know, if you want to skip reading and get your hands dirty, you can find the codebase on GitHub.
Technical Know-how & Prerequisites
Even though this is a pretty lightweight application, I’ve used various technologies that hopefully get my point across. And walking through the entire codebase is beyond the scope of this post so a good understanding and working knowledge of the following technologies is expected.
- AWS Skills Kit
- For a quick start guide, click here
- AWS Lambda
- For detailed documentation, click here
- Qubole Data Service (QDS) REST API
You will also need:
- AWS developer account
- QDS account — for a free trial, click here
- QDS is the leading enterprise & cloud-scale big data platform that enables you to process and analyze large amounts of structured, unstructured, and raw data stored on any public cloud infrastructure using any engine–Hadoop, Spark, Presto.
NOTE: Google developers can use recently announced Actions on Google to create similar conversational interfaces and applications for Google Home. (In fact, I might port this app myself so stay tuned!
Application At A Glance
At a higher level, here’s what you can do out-of-the-box by talking to Amazon Alexa or Echo Dot:
- List all clusters in your QDS account
- Start Spark, Hadoop, Presto, HBase, or Airflow clusters in your QDS account
- Terminate any of your active clusters
- Retrieve results of a saved query (based on known command Id)
Imp to Note: These are just a handful of commands I’ve implemented so far and QDS provides a very extensive set of REST APIs. For example, you can:
- Add nodes to a cluster
- Execute Spark, Presto, Hive, or Pig commands
- Execute workflows
- Schedule jobs
- Run Spark Notebooks
- And more…
Before you run or test this app in your environment, be sure to update the following attributes in config.js
If you don’t have access to Alexa-enabled devices such as Amazon Alexa or Echo Dot, you can use Amazon’s browser-based Alexa Skill Testing Tool to test.
Ok, where’s the code?
It’s available here on GitHub.
Forward-thinking companies and technologies will enable us to change how we generate and gain insights in ways and at speeds never possible before. If you’d like to join in on the conversation or if you have any feedback/comments, I’d love to hear from you. Feel free to reach out to me on Twitter or on LinkedIn.