Tez 101 will explore the basics of the Tez solution including advantages over Hive MapReduce and the DAG that Tez creates in response to SQL.
Qubole Education Tez 101 contains “Try It” sections which contain instructions that can be followed inside of a personal account if you have access to the default_qubole_airline_origin_destination demonstration table. Please contact your administrator if you have issues accessing the sample datasets from your personal accounts.
Qubole Education Tez 101 contains quiz questions after several of the lessons – it will be necessary to answer the quiz questions to complete the lessons and the course.
- User 101
- Hive 101
Estimated Time: 30 to 45 minutes
Tez Solution1 of 2
Tez Solution Traditional MapReduce is limited by the need to write data back to disk between MapReduce jobs, as a result data cannot be streamed between Reducers. Essentially every MapReduce job must have both Mappers and Reducers even if the Mappers are just reading the data needed for the Reducers. The additional writes and reads [...]
Try It - Tez vs MapReduce2 of 2
Overview You will run the same SQL statements in both MapReduce and Tez to observe the behavior between the two different engines. The following must be completed in Hive 1.2 or greater against a Hadoop2 cluster. Code In the Analyze interface select Compose and make sure that the query type is set to Hive and [...]
Tez DAG Execution1 of 3
Dag Vertices Tez visualizes the commands submitted by the user as a Directed Acyclical Graph (DAG) and the Vertices in the graph are the Mapper and Reducer stages that are required for processing. The arrangement of the Vertices in the DAG dictates the processing order and also represents the direction of data movement in the [...]
Tez DAG Visualization2 of 3
Additional Reading: Tez Documentation Qubole Knowledge Base - Tuning Tez Queries
Try It - Tez Application UI3 of 3
Overview You will analyze the Application UI details generated by Tez and observe the Directed Acyclical Graph generated by Tez in response to the SQL query previously submitted. Code In the History pane of the Analyze interface identify the SQL query previously submitted in Tez, select the Resources tab and select the link to the [...]
Tez Memory Management1 of 3
Application Master Out Of Memory errors may occur when there are an exceptionally large number of tasks being executed in parallel or there are too many files involved in the split computation. Managing the Application Master configuration can ensure that these types of issues do not occur. This memory is controlled with tez.am.resource.memory.mb and this [...]
Tez Split Size2 of 3
Split Size Split computation takes place in the Application Master and by default the Max Split Size is 1 GB and the Min Split Size is 50 MB. Developers may modify the Split Sizing policy by modifying tez.grouping.max-size and tez.grouping.min-size. Tez uses the HiveInputFormat in conjunction with the grouping settings to ensures that the numbers [...]