White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue


Page 26 of 63

tomer row. If you choose to create a separate table of purchases, that's probably not done to normalize the data but to speed up oper‐ ations that search for and manipulate purchases. Nested data struc‐ tures are common in data rows, along with data structures not recognized in the relational model, such as lists, arrays, and maps. Because storage is more ample and compute speeds are faster than they were in the 1970s when the relational model was designed, variable size is commonplace and few data stores ask you to assign a size to a column. Fields, Columns, and Schemas The key difference between the relational model and big data projects is that the latter have a looser idea of a schema. Traditional projects involve long planning times to define relational schemas before data collection can begin. Even though big data projects are often called "schema on read" (which suggests that the data can be written in any format and structured later by the user), the projects need planning for data structures, too—but this plan‐ ning requires significantly different kinds of thinking. MongoDB, Cassandra, and other nonrelational data stores are by no means raw data; they still expect input to have structure. But the structure can differ for each row. These kinds of databases are some‐ times called document stores because they have nested key/value fields resembling the structures found in HTML or XML documents. Structuring Data | 19

Articles in this issue

view archives of White Papers - The Evolving Role of the Data Engineer