White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue


Page 49 of 63

2. Convert all text to uppercase for consistency. 3. Remove certain sensitive fields such as credit card information. 4. Check for misspellings by running analytics. If "Towne Grill" appears 600 times in a data set, and "Town Grill" appears three times, the analytics correct the three "Town Grill" instances to "Towne Grill." The streaming data tools may have built-in functions for some of these tasks. For instance, in Flink you can issue UPPER for upper‐ case conversion, and one of several date and time functions to add the provenance metadata. Other functions can be written by data scientists or a data engineer who has learned some basic program‐ ming in Java or Python. Development Best Practices Like programmers, data engineers need a robust process for devel‐ opment, testing, bug fixing, and maintenance. Virtual machines and containers make it easy to set up multiple stages or tiers for your work: development, test, and production. You should also collect metrics that help indicate where you can improve efficiency or the use of your data. Common Development Tools To perform your task like a software engineer, you can adapt the popular tools that programmers now use for the tasks of develop‐ ment and deployment: Version control This can manage everything you write in support of your work: code, configuration files, test suites, and documentation. The version-control system ensures that old versions of all these resources can be retrieved quickly in case you find a bug and need to roll back a change. Old versions provide a valuable his‐ tory, and you can use them to trace when changes were made in case of a problem. They also play a critical role in tying together a team, because everybody has access to the work done by everybody else. You can even use version control to manage contributions from outside your organization and free/open source projects. 42 | The Evolving Role of the Data Engineer

Articles in this issue

view archives of White Papers - The Evolving Role of the Data Engineer