Ad-Hoc Analytics

Data Analyst

As you can see here, I’ve got a tablet instance open. Tableau is one of many BI tools or analytics tools that the analyst persona. A lot of our customers will have groups of analysts that want access to running queries against the data lake or building dashboards and exploring the data that’s available in the data lake.

Analytics Tool

Tableau is one of the many tools that Qubole supports. In fact, for Tableau specifically, we have a native connector built in for it. As you can see here, Qubole presto. Big difference here between just Qubole presto and connecting to a vanilla open source presto cluster is a couple of things. One is Qubole really drives a lot of the automation here. If you’ll see the endpoint is us.qubole.com. That’s one of Qubole’s several production endpoints. This is the same end point that you would use in the workbench, which I’ll show you in a moment.

You’re pointing to a quest for label. It really takes away the I should say it simplifies this for the analyst persona, not having to worry about, well, what’s the host name of my presto cluster here, what’s the port? So on and so forth. In fact, in this case, if there’s nothing running behind the scenes, people will auto start that cluster for the user. That said, I actually have a connection set up here. As you can see, this one is also on us.Google.com, we got a schema here. And this schema is the Ecommerce database. I’m actually going to be using this throughout the demo as an example data set. What this data set is, it contains sample data around products, orders… It really represents an ecommerce data set. What I’ve done here to save us some time is I’ve already kind of dragged and dropped a couple of tables here order items and products, and I’ve got to join on the product ID.

Data Visualization

In both of the table, you can see it starts to render some of the information here. I can look at some column information, but then with a few clicks, I can actually get some visualization and dashboarding around this stuff. Let’s say I wanted to look at the product name. All right, got some names here. Let’s say I wanted to grab a view of how many were ordered. Let’s take the item quantity, make that a count. All right, we start to get some scatter plus here. It’s not really helping me visualize this really well. Let’s see if we can grab another view of this. That’s much better, right? I can start to see either that one or maybe this one gives me the user quick couple of clicks into figuring out which products are most often ordered. You can see it looks like there’s a perfect fitness rip deck and then some Nike Polo and things like that.

Again, this is just example data, but really just wanted to show you how easy it is for analyst that’s using a tool like Tableau. It could be Qlik, it could be Power BI… to start to analyze the data that’s available to them in their data lake. In this particular example, all of the data that’s backing all this sits in an Amazon AWS S3 bucket. As we heard earlier, Qubole supports multiple clouds. That could have very well been data in QCP or on Azure.

Qubole UI

Now what I’d like to do is actually walk you through the UI for Qubole. And again, I’m in us.qubole.com and carrying on the same story or the same user, the persona of the analyst. I

SQL

might want to run some freeform sequel. Queries Qubole Out of the Box provides a workbench for that. We’re already looking at that. As a matter of fact. I’m in a feature called Collections, which allows me to save off my queries, as you can see here.

Metadata

Let’s go to the table explorer the very same data set and metadata were looking at in Tableau is available here. And again, the same tables. Obviously, if I wanted to take a quick look at what this metadata is, the catalog. The data management capabilities within Qubole allow me to browse the Table metadata. This is a hive table, which is backed by a location on S3. Here’s the location. This particular data set was imported by Scoop.

Query Data

I can also get some interesting table insights here. For example, if I was analyst that was first time trying to query this data, I might need some help. I might need to go find out, hey, what are the columns that I should be looking at, or are there any of my colleagues that I can go and tap on the shoulder and get some help with writing a query on this or whatnot?

I can run this in here, and similarly, it’ll kick off an instance of that query and you’ll see some details about what’s happening here. It’s submitting the query. It’s processing it. This is running on a presto cluster. And then it spits back the results. I can download the results, or I can link to this particular query, share that, share the results and the query with a colleague of mine that might be interested, and they can have a look at that without having to rerun it.