!FULL! Download Book Automated Machine Learning Pdf
This open access book offers a comprehensive and thorough introduction to almost all aspects of metalearning and automated machine learning (AutoML), covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience.
Download Book Automated Machine Learning pdf
As one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, AutoML is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user.
This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence.
Pavel B. Brazdil is a senior researcher at LIAAD INESC TEC, Porto and Full Professor at FEP, University of Porto, Portugal and since 2019, Professor Emeritus. He obtained his PhD in machine learning in 1981 at the University of Edinburgh. Since the 1990s he has pioneered the area of metalearning and supervised various PhD students in this area. His main interests lie in machine learning, data mining, algorithm selection, metalearning, AutoML and text mining, among others. He has edited 6 books and more than 110 papers referenced on Google Scholar, of which approximately 80 are also on ISI/DBLP/Scopus. He was a program chair of various machine learning conferences (e.g., in 1992,2005), has co-organized various workshops on metalearning and acted as a co-editor of two special issues of MLJ on this topic. He is a member of the editorial board of the Machine Learning Journal and a Fellow of EurAI.
Jan N. van Rijn obtained his PhD in Computer Science in 2016 at Leiden Institute of Advanced Computer Science (LIACS), Leiden University (the Netherlands). During his PhD, he made several funded research visits to the University of Waikato (New Zealand) and University of Porto (Portugal). After obtaining his PhD, he worked as a postdoctoral researcher in the Machine Learning lab at University of Freiburg (Germany), headed by Prof. Dr. Frank Hutter, after which he moved to work as a postdoctoral researcher at Columbia University in the City of New York (USA). He currently holds a position as assistant professor at LIACS, Leiden University. His research aim is to democratize the access to machine learning and artificial intelligence across societal institutions. He is one of the founders of OpenML.org, an open science platform for machine learning. His research interests include artificial intelligence, automated machine learning and metalearning.
Carlos Soares is an Associate Professor at the Faculty of Engineering of U. Porto. Carlos is also an External Advisor for Intelligent Systems at Fraunhofer Portugal AICOS, a researcher at LIACC and a collaborator at LIAAD-INESC TEC. He is also a lecturer at the Porto Business School. The focus of his research is on metalearning/autoML but he has a general interest in Data Science. He has participated in 20+ national and international R&ID, as well as consulting projects. Carlos regularly collaborates with companies, including recent projects with Feedzai, Accenture and InovRetail. He has published/edited several books and 150+ papers in journals and conferences, (90+/125+ indexed by ISI/Scopus) and supervised 10+/50+ Ph.D./M.Sc. theses. Recent participation in the organization of events, includes ECML PKDD 2015, IDA 2016 and Discovery Science 2021 as programme co-chair. In 2009, he was awarded the Scientific Merit and Excellence Award of the Portuguese AI Association.
Artificial Intelligence (AI) and its underlying implementations of ML and deep learning help us not only find the metaphorical needle in the haystack, but also to see the underlying trends, seasonality, and patterns in these large data streams to make better predictions. In this book, we will cover one of the key emerging technologies in AI and ML; that is, automated ML, or AutoML for short.
Before introducing you to automated ML, we should first define how we operationalize and scale ML experiments into production. To go beyond Hello-World apps and works-on-my-machine-in-my-Jupyter-notebook kinds of projects, enterprises need to adapt a robust, reliable, and repeatable model development and deployment process. Just as in a software development life cycle (SDLC), the ML or data science life cycle is also a multi-stage, iterative process.
Scikit-learn (also known as sklearn) is a popular ML library for Python development. As part of this ecosystem and based on Efficient and Robust Automated ML by Feurer et al., auto-sklearn is an automated ML toolkit that performs algorithm selection and hyperparameter tuning using Bayesian optimization, meta-learning, and ensemble construction.
The Tree-based Pipeline Optimization Tool, or TPOT for short (nice acronym, eh!), is a product of University of Pennsylvania, Computational Genetics Lab. TPOT is an automated ML tool written in Python. It helps build and optimize ML pipelines with genetic programming. Built on top of scikit-learn, TPOT helps automate feature selection, preprocessing, construction, model selection, and parameter optimization by "exploring thousands of possible pipelines to find the best one". It is just one of the many toolkits with a small learning curve.
Uber's automated ML tool, Ludwig, is an open source deep learning toolbox used for experimentation, testing, and training ML models. Built on top of TensorFlow, Ludwig enables users to create model baselines and perform automated ML-style experiments with different network architectures and models. In its latest release (at the time of writing), Ludwig now integrates with CometML and supports BERT text encoders.
From AWS Labs, with the goal of democratization of ML in mind, AutoGluon has been developed to enable "easy-to-use and easy-to-extend AutoML with a focus on deep learning and real-world applications spanning image, text, or tabular data". AutoGluon, an integral part of AWS's automated ML strategy, enables both junior and seasoned data scientists to build deep learning models and end-to-end solutions with ease. Like other automated ML toolkits, AutoGluon offers network architecture search, model selection, and custom model improvements.
H2O's open source offerings were discussed earlier in the Open source platforms and books section. The commercial offering of H2O Driverless AI is an automated ML platform that addresses the needs of feature engineering, architecture search, and pipeline generation. The "bring your own recipe" feature is unique (even though it's now being adapted by other vendors) and is used to integrate custom algorithms. The commercial product has extensive capabilities and a feature-rich user interface for data scientists to get up to speed.
Other notable frameworks and tools in this space include Autoxgboost, RapidMiner Auto Model, BigML, MLJar, MLBox, DATAIKU, and Salesforce Einstein (powered by Transmogrif AI). The links to their toolkits can be found in this book's Appendix. The following table is from Mark Lin's Awesome AutoML repository and outlines some of the most important automated machine learning toolkits, along with their corresponding links:
As the industry makes significant investments in the area surrounding automated ML, it is poised to become an important part of our enterprise data science workflows, if it isn't already. Serving as a valuable assistant, this apprentice will help data scientists and knowledge workers focus on the business problem and take care of any thing unwieldy and trivial. Even though the current focus is limited to automated feature engineering, architecture search, and hyperparameter optimization, we will also see that meta-learning techniques will be introduced in other areas to help automate this automation process.
Adnan Masood, PhD is an artificial intelligence and machine learning researcher, visiting scholar at Stanford AI Lab, software engineer, Microsoft MVP (Most Valuable Professional), and Microsoft's regional director for artificial intelligence. As chief architect of AI and machine learning at UST Global, he collaborates with Stanford AI Lab and MIT CSAIL, and leads a team of data scientists and engineers building artificial intelligence solutions to produce business value and insights that affect a range of businesses, products, and initiatives.
Automated machine learning, also referred to as automated ML or AutoML, is the process of automating the time-consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality. Automated ML in Azure Machine Learning is based on a breakthrough from our Microsoft Research division.
Configure the automated machine learning parameters that determine how many iterations over different models, hyperparameter settings, advanced preprocessing/featurization, and what metrics to look at when determining the best model. 041b061a72