White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue


Page 24 of 63

Best Practices • Data meets different needs by being duplicated in different tables and databases, each sorted by a single key instead of providing multiple indexes. Because retrieval from such dedicated data stores has become simpler, users tend to interact with them through programming languages instead of SQL. • Data is partitioned by key to allow each user to query a single node, or just a few nodes, to get the data needed for the application such as the sales for a single country. Access is accelerated even more by reading a whole data set into memory, when possible. • Partitions, whether done dynamically on stream‐ ing data or statically on stored data, allow parallel processing by short-lived processes that live in virtual machines or containers. Because the data stores discussed in this report are used for analytics instead of transactions, users don't ask for strict consistency, such as Atomic, Consistent, Isolated and Durable (ACID) guarantees. Data is processed in stages to provide better value to users, with the goal being to get fresh data to the users quickly. The complicated acknowledgements and checkpoints that would be needed to ensure that every copy is in sync would just slow down the process. How‐ ever, to preserve historical accuracy, the raw data is usually pre‐ served as long as space restrictions make it practical to do so. Structuring Data Modern data stores categorize data quite differently from relational databases. The basics of relational data are: • The database stores each item of data as a record or row, con‐ taining a fixed set of fields or columns. Columns cannot be repeated within a row. For instance, if a customer makes two purchases, you create a separate table of purchases and use a join table to indicate which customer made which purchases. Structuring Data | 17

Articles in this issue

view archives of White Papers - The Evolving Role of the Data Engineer