White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue

Navigation

Page 19 of 63

Batch versus streaming Data engineers serve many types of users in different ways. Although streaming data, which comes in quickly and continuously, is the hot topic in data circles, most data analysts still derive insights from large quantities of data through batch processing. When this report contrasts batch data with streaming data, it's not talking about some trait inherent in the data, but simply how the data is processed. If you store it in a file or database and run analytics over large numbers of records or rows relatively slowly, it's batch data. The very same data may be delivered in a steady stream and con‐ sumed one message at a time. Thus, streaming data tools also often operate on batches of data sent as files. Sales information and other transactions often come through this way. Many applications treat the same data in a batch and streaming manner. For instance, a fraud detection application might run ana‐ lytics over large data sets to assign traits to customers, then compare those traits to the same data received in real time in a streaming manner to determine whether a particular credit card transaction is fraudulent. "Streaming Data Processing" on page 41 lays out some popular tools for streaming data and their uses in data engineering. Limitations on Data Use With the rampant collection and crunching of data, ethical issues also arise. Governments, as well as the general public, are now tak‐ ing security and privacy more seriously, as we see in the rush to pre‐ vent a repeat of the kind of use Cambridge Analytica made of Facebook data. More recently, Twitter has admitted to misusing data for advertising. Everybody understands the need for Twitter to col‐ lect personal data to serve its users, and Twitter advertising is also widely accepted—the problem comes when data is used for a pur‐ pose that it shouldn't be used for. As another example of the importance of social expectations, con‐ sider the famous case in which Target sent pregnancy-related offers to a 17-year-old who was trying to hide her pregnancy from her father. This public relations disaster highlights the differences between laws, ethics, and plain good sense about business goals. Legally, Target was perfectly entitled to send pregnancy-related offers. Although there's a difference between personal medical infor‐ 12 | The Evolving Role of the Data Engineer

Articles in this issue

Links on this page

view archives of White Papers - The Evolving Role of the Data Engineer