White Papers

TDWI Checklist - The Automation and Optimzation of Advanced Analytics Based on Machine Learning

Issue link: https://www.qubole.com/resources/i/1161239

Contents of this Issue

Navigation

Page 2 of 8

2 T DW I RESE A RCH tdwi.org T DW I C HEC K L IS T RE P O R T: T H E A U T O M AT I O N A N D O P T I M I Z AT I O N O F A D VA N C E D A N A LY T I C S B A S E D O N M L FOREWORD Several trends are driving organizations toward machine learning. As with other forms of advanced analytics, the basic concepts and techniques of machine learning have been around for decades. However, a number of trends have converged to make machine learning (ML) suddenly more desirable and practical than ever before: • User organizations need a wider range of analytics. The drive to be more competitive, profitable, agile, innovative, and growth oriented has spurred organizations to investigate new approaches, including predictive analytics (as enabled by machine learning). • Moore's Law has taken us to a higher level of speed and scale. These improvements are required to get value from big data and other voluminous data sources, such as social media and the Internet of Things (IoT). High performance and ample training data accelerate the development of analytics models and algorithms, as well as the continuous tuning of these via machine learning during deployment. • Analytics tools are better than ever. The vendor and open source communities have given us greater ease of use and design depth for the automation and optimization of machine learning. • Data professionals are addressing the skills gap. Driven by business demand, technical staff and some data- savvy business users have developed new skills for data science, big data, and advanced forms of analytics such as machine learning. Closing the skills gap is a critical success factor for organizations seeking to compete on analytics and to leverage big data for organizational advantage. Machine learning algorithms learn from large data sets to create predictive models. Here's how it works: Machine learning algorithms consume and process large volumes of data to learn complex patterns about people, business processes, transactions, events, and so on. This intelligence is then incorporated into a predictive model. Comparisons to the model can reveal whether an entity is operating within acceptable parameters or is an anomaly. Machine learning is being used today to solve well-bounded tasks such as classification and clustering. Note that a machine learning algorithm learns from so-called training data during development; it also learns continuously from real-world data during deployment so the algorithm can improve its model with experience. 1 Machine learning has serious data requirements that are critical to success. • Machine learning demands large, diverse data sets. Prior to model design, a machine learning algorithm's learning process depends on large volumes of data, from which it draws many entities, relationships, and clusters. Volume aside, data integrated from diverse sources tends to broaden and enrich the correlations made by the algorithm. • Large data sets demand large, diverse infrastructure for data management. Infrastructure for machine learning's training data typically involves multiple data platforms, tools, and processing engines, ranging from traditional (relational and columnar databases) to modern (Hadoop, Spark, and cloud storage). Multiple technologies are required to cope with training data's extreme size, multiple data structures, and (in some cases) multiple latencies. In short, tools for machine learning are obviously important, but data management infrastructure is just as important. This report will drill into the data, tool, and platform requirements for machine learning with a focus on automating and optimizing ML's development environment, production systems, voracious appetite for data, and actionable output. The point is to provide useful information for organizations that need machine learning for business analytics and also need to get greater business value from big data and other new data sources. 1 This definition of ML paraphrases two TDWI publications. For more information, see the 2017 TDWI Best Practices Report: Advanced Analytics: Moving Toward AI, Machine Learning, and Natural Language Processing and the 2018 TDWI Checklist Report: Seven Best Practices for Machine Learning on a Data Lake. Both are available at tdwi.org.

Articles in this issue

Links on this page

view archives of White Papers - TDWI Checklist - The Automation and Optimzation of Advanced Analytics Based on Machine Learning