White Papers

TDWI Checklist - The Automation and Optimzation of Advanced Analytics Based on Machine Learning

Issue link: https://www.qubole.com/resources/i/1161239

Contents of this Issue

Navigation

Page 6 of 8

6 T DW I RESE A RCH tdwi.org T DW I C HEC K L IS T RE P O R T: T H E A U T O M AT I O N A N D O P T I M I Z AT I O N O F A D VA N C E D A N A LY T I C S B A S E D O N M L Whew! That's a lot of steps and all of them are complex and time-consuming. Although some steps are performed by the same person, there are still many people required to complete the process. This situation—until recently—was exacerbated by the fact that technical personnel for data, analytics, testing, and production were on different teams, each fairly siloed. Communications among teams was slow, inaccurate, alienated from business goals, and a barrier to innovation. DataOps cures team silos for better collaboration, speed, and alignment. Outside of data management and analytics, applications developers had similar silo problems. They cured them with DevOps, a practice of combining software engineering, quality assurance (QA), and operations into a single, agile, and collaborative team structure. Very recently, data and analytics people have adapted the principles of DevOps to create DataOps, a new way of managing data that promotes communication between—and integration of—formerly siloed data, teams, tools, technologies, and platforms. DataOps fosters collaboration among everyone handling data, including data developers, engineers, and scientists, as well as analysts and businesspeople. Let's take another look at the ML model development process as described at the beginning of this section to see how DataOps simplifies, speeds up, and aligns ML development and production: • Days and weeks of downtime may pass between adjacent steps. The collaborative relationships created by DataOps among technical team members reduce the downtime so that ML models and other solutions get into users' hands much sooner. • In many cases, independent teams work from specifications that are impossible to keep current. The documentation approach— created for data modeling—does not adapt well to analytics modeling. Luckily, the direct communication that DataOps fosters eliminates the need for time-consuming, misleading, and distracting specs and documentation. • Ideally, the unified DataOps team also has collaborative relationships with businesspeople who are committed to fast and frequent reviews of prototypes and iterations. With the right businesspeople involved at many stages in the process, building an analytics model that aligns with business goals is far more likely. • An extended DataOps team concentrates substantial domain expertise—from analytics to enterprise data management— that can quickly and easily be tapped for the accelerated development of data-driven products. For example, with machine learning, a data scientist may lead development but be supported by colleagues who become the stewards of the data pipes that data scientists need for their ML algorithms. Other modern team structures can accelerate machine learning development. Note that there are other modern team structures and methods that achieve results similar to those of DataOps in improving the speed, efficiency, standards, and alignment of data-driven development and related services: • The Agile Manifesto. This was originally written as a method for application developers. Its adaptation to data practices has revolutionized analytics and data set development, making them leaner, nimbler, and better aligned with business goals. • Data stewardship. Originally a way for businesspeople to collaboratively guide data quality projects, stewardship has been adapted to data warehousing, data integration, and master data management projects. To assure analytics-to-business alignment, a data steward can provide business requirements and review iterative versions of analytics models. • Data competency centers (or centers of excellence). These consolidate siloed data-driven teams into a single, centrally managed team based on a shared-services model. Specifics vary greatly, but most competency centers also enforce enterprise standards for data and are strongly allied with data governance efforts. REMEMBER THAT MACHINE LEARNING IS NOT JUST FOR PREDICTIVE ANALYTICS NUMBER SIX Today, most efforts with machine learning are to support predictive analytics, especially when the analytics parses vast amounts of diverse big data. This is an important practice and it will continue to grow and mature. However, a few cutting-edge vendors and open source projects are embedding ML-driven intelligence into data management (DM) tools. Embedded within these DM tools, ML algorithms and models typically address three broad goals: • Automation for well-understood but time-consuming development tasks such as mapping sources to targets, cataloging data, or onboarding new sources. • Optimization of system performance by automatically selecting query optimization strategies, table join approaches, resource management schemes, and distribution methods for data (e.g., hot versus cold storage, memory versus disk, or replication across nodes).

Articles in this issue

Links on this page

view archives of White Papers - TDWI Checklist - The Automation and Optimzation of Advanced Analytics Based on Machine Learning