As data technologies have evolved, data engineering has become a distinct role that creates value for data scientist, and data analysts, not to mention a number of operational business units that rely on the data supplied by data engineering pipelines to make key business decisions.
This site is is dedicated to data engineers who are interested in on-going education in big data technologies and relevant topics that affect the data engineering function. We will continue to add relevant content including training and best practices over time. While this site is sponsored by Qubole, our intent is that you should be able to get value from this Data Engineering Community whether or not you are a current Qubole customer.
Data engineers’ primary function is to build and maintain the pipelines that get the data ready for use by the rest of the data team as well as other business units. This means dealing with a variety of systems where the data is generated, different formats, quality, governance, timeliness, as well as maintaining a scalable infrastructure all within the budgets allocated to the data engineering team.
There are 3 distinct types of data engineers, they can be differentiated by the tool(s) they use to build their data pipelines:
Regardless of which type you identify with, data engineers build the pipelines that deliver production ready data sets and format and cleanse the data so that its ready for use.
Data Engineers need to have skills in several areas such as relational and non-relational data stores, file formats, ingestion tools, BI and visualization tools, persistent storage, cluster management, programming languages (SQL, Python, Scala, Java), and some basic knowledge of artificial intelligence to start.
Natural curiosity about data is important because it leads to constant improvement of their skill sets as well as tighter integration into the extended data team, this also includes a good understanding about the systems where the data originates, who uses it and how it is consumed.
Free access to Qubole for 30 days to build data pipelines, bring machine learning to production, and analyze any data type from any data source.