User 101 will establish the foundation of knowledge needed to successfully interact with and leverage Qubole. The presentation will focus on platform usage and utility and the fundamentals of running jobs with big data clusters.
Qubole Education User 101 contains “Try It” sections which contain instructions that can be followed inside of a personal account if you have access to the default_qubole_airline_origin_destination demonstration table. Please contact your administrator if you have issues accessing the sample datasets from your personal accounts.
Qubole Education User 101 contains quiz questions after several of the lessons – it will be necessary to answer the quiz questions to complete the lessons and the course.
Estimated Time: 30 to 45 minutes
Qubole Communication1 of 5
Qubole Communication Administrators will configure a Qubole account to communicate with Azure - during this process Qubole will be provided with credentials for storage and compute services offered by Azure. While browsing data through the platform the machines owned by Qubole will read the data from customer cloud storage for display. As a result no [...]
Users & Accounts2 of 5
Users & Accounts A user in Qubole is tied to the email address provided during signup and represents a single individual. In Qubole there are also Accounts which are tied to a specific set of cloud credentials and these represent different configurations. The platform allows a single user to have multiple accounts and users can [...]
Informing Qubole3 of 5
Informing Qubole Schemas act as an abstract layer between Qubole, users and the data to reduce overhead and resource consumption. Schemas are defined for Data Files inside of the Cloud Storage option and are saved by Qubole. The platform contains a MySQL backed Hive Metastore for all schema table definitions created at no cost to [...]
Qubole Clusters4 of 5
Qubole Clusters Upon execution of a query or command requiring MapReduce the platform will perform several steps. First there is a check to determine if any active clusters are available. If there is an active cluster and the job throughput is satisfactory then Qubole will use the same cluster for processing the data. If there [...]
Try It - Qubole Clusters (Azure)5 of 5
Overview You will execute a SQL query using Hive on Hadoop in Qubole and review the available Resources to analyze the behavior of the environment. The SQL query will return the list of available airports from the dataset as well as a count of the number of records for each airport. Code Navigate to the [...]
Qubole Templates1 of 5
Qubole Templates Templates are a powerful tool in Qubole and can be developed to host queries run repeatedly by a single user or multiple users. Templates can be written in any of the available languages in Qubole and support user prompts which are displayed as forms and as well as macros. Each template is assigned [...]
Try It - Qubole Template2 of 5
Overview You will create a Template with a SQL query with variables which can modify the values contained in the WHERE clause. The user executing the Template may choose to keep the default variable values or can enter new values to modify the query. Code Navigate to the Templates interface, select New Template and make sure the [...]
Qubole Notebooks3 of 5
Qubole Notebooks Qubole Notebooks allow users to write programs and get information back in an interface supporting visualizations on top of the data. Notebooks may be constructed from clusters built with either Spark or Presto extending the existing versatility offered by Qubole. In order to create a Notebook the account must have access to at [...]
Try It - Qubole Notebooks4 of 5
Overview You will create a Notebook pointing to a Spark cluster and run several SQL statements inside of the Notebook. The results can be formatted to make the Notebook appear more like a dashboard for reporting purposes. In order to edit the Notebook the associated cluster must be online. Code Navigate to the Notebook interface, select the [...]
Qubole Scheduler5 of 5
Qubole Scheduler Qubole Scheduler allows developers to create recurring commands with a specified number of retries. Users may also specify a Fair Scheduler Pool and the Concurrency policy for the jobs triggered by the command. Users may configure the Schedule to be dependent on the availability of partitions in Hive or files in the cloud.