White Papers

The Evolving Role of the Data Engineer

Issue link: https://www.qubole.com/resources/i/1243713

Contents of this Issue


Page 53 of 63

Performance experts distinguish two major types of performance: throughput and latency. Throughput measures the amount of data you can deliver; it's important when a user needs to process a lot of information and can't afford to fall behind. Latency measures the time required for a single message or packet to reach its destination, and is important when a user wants to respond in real-time to a par‐ ticular stimulus. Often you can improve both throughput and latency by increasing resources. For instance, "Data Ingestion and Transfer: Message Brokers" on page 38 showed how to improve throughput by adding more instances of a streaming processing tool. You can improve latency by increasing physical resources or by moving data closer to the user (such as caching it on the local host). Sometimes you must trade off between throughput and latency. A good analogy here is a traffic light. During rush hour, the traffic light changes once a minute or so, to let a lot of cars through at once and thus maximize throughput. Late at night when very few cars pass through, the traffic light is configured to change very quickly when a car comes to the red light, because throughput is no longer important and the critical variable is latency. It's important to experiment with performance changes, because you may be surprised to find out that the impact you expected is not what you actually get. For instance, router manufacturers in the early 2000s reacted to congestion by adding more memory to devi‐ ces to buffer traffic, but actually slowed down traffic by doing so, a phenomenon identified as bufferbloat. More commonly, an adminis‐ trator tries to fix a problem by increasing some resource that actually was not the cause of the problem, thus wasting the resource while leaving the problem unchanged. Orchestration Orchestration means putting in place all the tools and processes you need for smooth operation. To understand what orchestration does, imagine you're an event caterer. You are managing the catered food for a huge wedding party. Ten chefs are busy in the kitchen. As they finish the appetizers, they load the food onto trays and start working on the main course. Meanwhile, you notify five waiters to come and deliver the appetizers. Because several events are being hosted at the same venue, you want to make sure the waiters arrive exactly when the appetizers are ready, but no sooner. (I have never actually 46 | The Evolving Role of the Data Engineer

Articles in this issue

Links on this page

view archives of White Papers - The Evolving Role of the Data Engineer