White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue


Page 18 of 63

areas you can record the city in which a person lives, whereas in lower-population areas you should scrub the city out of the data and record only the county. In food stores, a record of a $15,000 pur‐ chase is outrageous and should be flagged for potential errors, whereas at an automobile dealership such a purchase is on the low side. If the shape of data changes, you must help business users reevaluate its suitability for their applications. Organizational needs change also. For instance, a speed-up in decision making or a push for finer-grained accuracy may call for more data and faster processing. Business Intelligence (BI) and Serving the Analysts Before the pressures of modern data processing led to constant, changing requirements for analytics, a kind of staged, waterfall approach to developing data was in effect. The analysts would think up a research project and ask for a graph or table of data. The DBA would find the data and provide it to programmers who would cre‐ ate the visualization for the analysts. In today's data engineering environment, analysts want immediate access to data, although they will probably wait for the data engineer to clean it. The analysts will then create new tables of derived data, which they want updated quickly—perhaps on a real-time basis— with the latest data. So the data engineer will implement the analysts' transformations as a pipeline that accepts streaming or batch input and outputs the necessary visualizations. One way to look at the relationship between analysts and data engineers is that analysts cre‐ ate a prototype, after which data engineers create a production system. Example of Data Exploration Qubole used this kind of iterative process to create a Presto-as-a- service connector for Microsoft's business analytics tool Power BI, designed to help data engineers and data scientists uncover impor‐ tant fields and their relationships by running queries across multiple data sources. Data analysis often combines fields from relational and nonrelational data. You can read about it in more detail on the Qubole blog. Data Engineering Today | 11

Articles in this issue

Links on this page

view archives of White Papers - The Evolving Role of the Data Engineer