 ## 3 Books on Optimization for Machine Learning

Optimization is a field of mathematics concerned with finding a good or best solution among many candidates.

It is an important foundational topic required in machine learning as most machine learning algorithms are fit on historical data using an optimization algorithm. Additionally, broader problems, such as model selection and hyperparameter tuning, can also be framed as an optimization problem.

Although having some background in optimization is critical for machine learning practitioners, it can be a daunting topic given that it is often described using highly mathematical language.

In this post, you will discover top books on optimization that will be helpful to machine learning practitioners.

Let’s get started. Books on Optimization for Machine Learning
Photo by Patrick Alexander, some rights reserved.

## Overview

The field of optimization is enormous as it touches many other fields of study.

As such, there are hundreds of books on the topic, and most are textbooks filed with math and proofs. This is fair enough given that it is a highly mathematical subject.

Nevertheless, there are books that provide a more approachable description of optimization algorithms.

Not all optimization algorithms are relevant to machine learning; instead, it is useful to focus on a small subset of algorithms.

Frankly, it is hard to group optimization algorithms as there are many concerns. Nevertheless, it is important to have some idea of the optimization that underlies simpler algorithms, such as linear regression and logistic regression (e.g. convex optimization, least squares, newton methods, etc.), and neural networks (first-order methods, gradient descent, etc.).

These are foundational optimization algorithms covered in most optimization textbooks.

Not all optimization problems in machine learning are well behaved, such as optimization used in AutoML and hyperparameter tuning. Therefore, knowledge of stochastic optimization algorithms is required (simulated annealing, genetic algorithms, particle swarm, etc.). Although these are optimization algorithms, they are also a type of learning algorithm referred to as biologically inspired computation or computational intelligence.

Therefore, we will take a look at both books that cover classical optimization algorithms as well as books on alternate optimization algorithms.

In fact, the first book we will look at covers both types of algorithms, and much more.

This book was written by Mykel Kochenderfer and Tim Wheeler and was published in 2019.

This book might be one of the very few textbooks that I’ve seen that broadly covers the field of optimization techniques relevant to modern machine learning.

This book provides a broad introduction to optimization with a focus on practical algorithms for the design of engineering systems. We cover a wide variety of optimization topics, introducing the underlying mathematical problem formulations and the algorithms for solving them. Figures, examples, and exercises are provided to convey the intuition behind the various approaches.

— Page xiiix, Algorithms for Optimization, 2019.

Importantly the algorithms range from univariate methods (bisection, line search, etc.) to first-order methods (gradient descent), second-order methods (Newton’s method), direct methods (pattern search), stochastic methods (simulated annealing), and population methods (genetic algorithms, particle swarm), and so much more.

It includes both technical descriptions of algorithms with references and worked examples of algorithms in Julia. It’s a shame the examples are not in Python as this would make the book near perfect in my eyes.

• Chapter 01: Introduction
• Chapter 02: Derivatives and Gradients
• Chapter 03: Bracketing
• Chapter 04: Local Descent
• Chapter 05: First-Order Methods
• Chapter 06: Second-Order Methods
• Chapter 07: Direct Methods
• Chapter 08: Stochastic Methods
• Chapter 09: Population Methods
• Chapter 10: Constraints
• Chapter 11: Linear Constrained Optimization
• Chapter 12: Multiobjective Optimization
• Chapter 13: Sampling Plans
• Chapter 14: Surrogate Models
• Chapter 15: Probabilistic Surrogate Models
• Chapter 16: Surrogate Optimization
• Chapter 17: Optimization under Uncertainty
• Chapter 18: Uncertainty Propagation
• Chapter 19: Discrete Optimization
• Chapter 20: Expression Optimization
• Chapter 21: Multidisciplinary Optimization

I like this book a lot; it is full of valuable practical advice. I highly recommend it!

This book was written by Jorge Nocedal and Stephen Wright and was published in 2006.

This book is focused on the math and theory of the optimization algorithms presented and does cover many of the foundational techniques used by common machine learning algorithms. It may be a little too heavy for the average practitioner.

The book is intended as a textbook for graduate students in mathematical subjects.

We intend that this book will be used in graduate-level courses in optimization, as offered in engineering, operations research, computer science, and mathematics departments.

— Page xviii, Numerical Optimization, 2006.

Even though it is highly mathematical, the descriptions of the algorithms are precise and may provide a useful alternative description to complement the other books listed.

• Chapter 01: Introduction
• Chapter 02: Fundamentals of Unconstrained Optimization
• Chapter 03: Line Search Methods
• Chapter 04: Trust-Region Methods
• Chapter 05: Conjugate Gradient Methods
• Chapter 06: Quasi-Newton Methods
• Chapter 07: Large-Scale Unconstrained Optimization
• Chapter 08: Calculating Derivatives
• Chapter 09: Derivative-Free Optimization
• Chapter 10: Least-Squares Problems
• Chapter 11: Nonlinear Equations
• Chapter 12: Theory of Constrained Optimization
• Chapter 13: Linear Programming: The Simplex Method
• Chapter 14: Linear Programming: Interior-Point Methods
• Chapter 15: Fundamentals of Algorithms for Nonlinear Constrained Optimization
• Chapter 17: Penalty and Augmented Lagrangian Methods
• Chapter 18: Sequential Quadratic Programming
• Chapter 19: Interior-Point Methods for Nonlinear Programming

It’s a solid textbook on optimization.

If you do prefer the theoretical approach to the subject, another widely used mathematical book on optimization is “Convex Optimization” written by Stephen Boyd and Lieven Vandenberghe and published in 2004.

This book was written by Andries Engelbrecht and published in 2007.

This book provides an excellent overview of the field of nature-inspired optimization algorithms, also referred to as computational intelligence. This includes fields such as evolutionary computation and swarm intelligence.

This book is far less mathematical than the previous textbooks and is more focused on the metaphor of the inspired system and how to configure and use the specific algorithms with lots of pseudocode explanations.

While the material is introductory in nature, it does not shy away from details, and does present the mathematical foundations to the interested reader. The intention of the book is not to provide thorough attention to all computational intelligence paradigms and algorithms, but to give an overview of the most popular and frequently used models.

— Page xxix, Computational Intelligence: An Introduction, 2007.

Algorithms like genetic algorithms, genetic programming, evolutionary strategies, differential evolution, and particle swarm optimization are useful to know for machine learning model hyperparameter tuning and perhaps even model selection. They also form the core of many modern AutoML systems.

• Part I Introduction
• Chapter 01: Introduction to Computational Intelligence
• Part II Artificial Neural Networks
• Chapter 02: The Artificial Neuron
• Chapter 03: Supervised Learning Neural Networks
• Chapter 04: Unsupervised Learning Neural Networks
• Chapter 05: Radial Basis Function Networks
• Chapter 06: Reinforcement Learning
• Chapter 07: Performance Issues (Supervised Learning)
• Part III Evolutionary Computation
• Chapter 08: Introduction to Evolutionary Computation
• Chapter 09: Genetic Algorithms
• Chapter 10: Genetic Programming
• Chapter 11: Evolutionary Programming
• Chapter 12: Evolution Strategies
• Chapter 13: Differential Evolution
• Chapter 14: Cultural Algorithms
• Chapter 15: Coevolution
• Part IV Computational Swarm Intelligence
• Chapter 16: Particle Swarm Optimization
• Chapter 17: Ant Algorithms
• Part V Artificial Immune Systems
• Chapter 18: Natural Immune System
• Chapter 19: Artificial Immune Models
• Part VI Fuzzy Systems
• Chapter 20: Fuzzy Sets
• Chapter 21: Fuzzy Logic and Reasoning

I’m a fan of this book and recommend it.

## Summary

In this post, you discovered books on optimization algorithms that are helpful to know for applied machine learning.

Did I miss a good book on optimization?
Let me know in the comments below.

Have you read any of the books listed?
Let me know what you think of it in the comments.