White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue


Page 37 of 63

streaming data so that a single Presto query can combine data from multiple sources, allowing for analytics across your entire organization. Presto is designed for distributed queries, working across many nodes in parallel. You can tune it to respect the amount of memory on nodes where its workers run, and you can set limits on resources used. You can combine nodes into resource groups and allocate different amounts of resources to queries on these resource groups. Authentication uses Kerberos, but a simple form of username/ password authentication with Lightweight Directory Access Protocol (LDAP) is also available. SQL has met critical research needs for decades, so it can still be val‐ uable for issuing the kinds of queries where it functions well on both relational and nonrelational databases. Recognize, however, where other types of batch and streaming analytics work better with the newer tools and APIs. Cloud Storage More than three-quarters of organizations now use third-party cloud services, which offer a variety of free/open source and com‐ mercial data stores along with their own proprietary offerings. Some vendors claim that their proprietary data stores offer better perfor‐ mance than other data stores running in the cloud. Nevertheless, many data engineers avoid the vendor's unique offerings because they want data to cross cloud boundaries. They might be running a data store on-premises as well as in the cloud—a hybrid public/ private deployment—or they might simply be afraid of vendor lock- in. You can easily transfer data in and out of the cloud, but your use of the data may become dependent on the optimizations offered by your cloud vendor. How cloud storage differs and is similar to on-premises storage Whether you deploy a standard data store or one of the cloud ven‐ dor's alternatives, the advantages in scalability and reduced adminis‐ tration certainly make the cloud appealing. Security is probably controlled better in cloud environments than the staff in most organizations can muster, although the use of the cloud has no impact on the most common reasons for breaches: abuse by insid‐ 30 | The Evolving Role of the Data Engineer

Articles in this issue

Links on this page

view archives of White Papers - The Evolving Role of the Data Engineer