White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue


Page 21 of 63

ing the European Union and California) presents a court order telling you to suppress data about the person, or if consent to store the data is withdrawn or expires for other reasons, you have to stop sharing data about the person, at least within that jurisdiction. You might comply with a "right to be forgotten" order by finding every instance of the data and permanently removing it, or keeping it for internal use and aggregate data but making it unavailable within that jurisdiction. The data you need to delete might already be in many different columns in a number of tables and data stores. Deleting data could also make the data store inconsistent with previ‐ ously calculated aggregate data, such as an average, although the effect of a single deletion is probably negligible. Some sites maintain PII in one place and link it to a random identi‐ fier that contains the non-identifying information. If instructed to forget the person, these sites can simply delete the link between the PII and the random identifier, anonymizing the person's information. Beyond the firm conditions imposed by terms of use and regula‐ tions, it is valuable to think about the purpose of data use. Will the uses open up new opportunities for your employees and clients? Or will they exploit people and put them at a greater disadvantage in relation to large institutions that control aspects of their lives? Con‐ straining people from doing bad things, such as committing fraud or posting false news stories, is necessary in order to give legitimate activities room to spread, but most data use should be aimed at sup‐ porting people in doing what they desire. Data Is Different Today When computer databases first became widespread, most data came from human input: paper forms sent in by customers, receipts from sales, and so forth. Nowadays we have a plethora of data sources: sensors, cameras, log files from web servers or other hubs, social media, and more. Naturally, the volume of data is much greater. Other changes in data are more subtle. Architectural challenges The types of errors generated by automated input are different from errors in human input. Error-checking must be done differently, and the enormous data sizes call for automating this error-checking. 14 | The Evolving Role of the Data Engineer

Articles in this issue

Links on this page

view archives of White Papers - The Evolving Role of the Data Engineer