## Optimizing estimators with the ADSTuner: A hyperparameter optimization engine

[ad_1]

**By Nupur Chatterji, Machine Learning Engineer, and John Peach, Principal Data Scientist**

A key step in model development is the optimization of hyperparameters. The ADSTuner class performs a hyperparameter search, sometimes called hyperparameter tuning. It does this by searching over a range or distribution of values looking for the best model. This powerful module is available as part of the most recent release of the Oracle Accelerated Data Science (ADS) library. This blog post shows you how to:

- Tune a model using the ADSTuner.
- Obtain tuning information.
- Customize the search space.
- Generate informative visualizations.

**Meet the ADSTuner**

Many machine learning models have parameters that are used to control the learning process. They are called hyperparameters and are not directly learned from the data. Hyperparameter tuning is the process of searching for the values of the hyperparameters by setting the hyperparameter values and then optimizing a model. This is repeated for many different combinations of values, and then the best set of hyperparameters are chosen. The ADSTuner class is a hyperparameter optimization engine that is agnostic to the model class as long as it implements the sklearn interface. Generally, most models support this interface, with some common examples like XGBoost, lightGBM, and sklearn pipeline() objects.

Each class of model has its own set of hyperparameters, and a common problem is determining what range or distribution should be used in the search for the best hyperparameters. If the range is too broad, it takes a long time to carry out the search. If the range is too narrow, it may miss out on the globally optimal solution. ADSTuner addresses this problem by providing a sensible set of values that are specific to each supported model class, or you can specify your own values. This optimization process can be expensive if there is a large search space. Therefore, ADSTuner prunes trials that do not appear promising, thus reducing computational costs.

Let us start by using sklearn’s stochastic gradient descent classifier model, SGDClassifier(), and the classic Iris dataset. Before we tune the model, the data needs to be split into test and train datasets.

model = SGDClassifier() X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y)

To instantiate the ADSTuner object, it requires the model that is to be tuned. Optionally, we can specify the number of cross-validation folds to be used with each set of parameters and/or the search strategy.

`tuner = ADSTuner(model, cv=3)`

*Subscribe to the Oracle AI & Data Science Newsletter to get the latest AI, ML, and data science content sent straight to your inbox!*

**Understanding the search space and score**

After instantiating our model and preparing the data, we can create an instance of the tuner. All we need to do is pass in the model. Optionally, we can define the type of search strategy and the number of cross-validation folds. In the preceding code snippet, we picked the number of cross-validation folds to be 3, but have not explicitly defined the search strategy. For each supported model class, there are two predefined strategies, perfunctory and detailed . You can also provide a dictionary of tuning parameters to customize your search.

The default strategy is perfunctory and this approach searches over a subset of the most important hyperparameters. Generally, this is used in the early stages of model assessment, when we are trying to determine which class of models show the most promise. By using the perfunctory strategy, the cost of computing is reduced by not prematurely optimizing on model classes that do relatively poorly. The detailed strategy creates a sensible though larger search space for ADSTuner. It generally finds a better solution than the perfunctory strategy, but the computational cost is usually much higher. The detailed strategy is useful when you know what model classes perform well on your data and you want to find an optimal solution. The following code snippet shows how the define a strategy when instantiating the tuner.

tuner = ADSTuner(model, cv=5, strategy='detailed')

At its core, the strategy parameter is a dictionary where the key is the name of the hyperparameter and the value is a distribution function. The perfunctory and detailed strategies are aliases to dictionaries that are specific to each model class. Later on, we will see how you can access dictionaries and optionally modify them.

Each hyperparameter in the model can have an entry in the strategy parameter. If it is omitted, then that parameter is not tuned and a default value is used. In the following example, a logistic regression is tuned. This model has the C , solver , and max_iter hyperparameters. The C hyperparameter is sampled from a log-uniform distribution. The solver is set to ‘saga’ using a categorical distribution with a single category and the max_iter is sampled from a uniform integer distribution between 500 and 2000.

The tune() method has a scoring parameter used to score the model’s performance. By default, it uses sklearn’s behavior. This parameter is used to assess the quality of the model and it can also be used as a stopping criterion, more on that soon. The following example is a logistic regression, and we want to balance the type I and II errors so we will score on the f1_score .

model = LogisticRegression() tuner = ADSTuner(model, cv=3, strategy = {'C': LogUniformDistribution(low=1e-05, high=1), 'solver': CategoricalDistribution(['saga']), 'max_iter': IntUniformDistribution(500, 2000, 50)}, scoring=sklearn.metrics.make_scorer(f1_score, average='weighted'))

**Tuning with ADSTuner()**

To tune the model, call the tune() method while providing the observations and outcome values.

tuner.tune(X_train, y_train)

The stopping criteria can also be specified with the exit_criterion parameter. ADSTuner continues to search until one of the stopping criteria is reached. The supported stopping criteria are:

- NTrials(N): The maximum number of trials.
- TimeBudget(T): The maximum amount of seconds that the search is allowed to run.
- ScoreValue(V): Stop once the model score has been reached.

It is possible and often desirable to combine these criteria together. The values are passed as a list, and the search stops once any criteria is reached.

tuner.tune(X_train, y_train, exit_criterion=[NTrials(30)]) tuner.tune(X_train, y_train, exit_criterion=[TimeBudget(500)]) tuner.tune(X_train, y_train, exit_criterion=[NTrials(5), TimeBudget(1000)]) tuner.tune(X_train, y_train, exit_criterion=[NTrials(10), TimeBudget(500), ScoreValue(70)])

**The ADSTuner API**

The ADSTuner allows for information about the ADSTuner to be accessed and mutated. The tuner.strategy attribute returns a label ( perfunctory or detailed ) if that was used. Otherwise, it returns the customized strategy. The tuner.search_space() returns a dictionary of the hyperparameters that are to be tuned. If perfunctory or detailed were provided then the actual hyperparameters associated with these values, for the given model class, are returned. It is possible to capture this dictionary, modify it, and then call the tune() method to provide a customized search strategy that is based on the default settings.

The following example shows that for the model class being used with the perfunctory strategy, the alpha parameter is sampled from a log-uniform distribution between 0.0001 and 0.1, and there are three penalty types tried, L_{1}, L_{2}, and no penalty.

print('Search Space for default strategy "{}" is n n {}'.format(tuner.strategy, tuner.search_space()))

Detailed information about the tuning trials is obtained from the tuner.trials attribute. It provides information like the trial number, model score, start/end times, and duration of the tuning step. It also provides information about the various hyperparameter values used in the trial.

The tuner.best_index attribute gives the index in tuner.trials where the model score was the best. This allows us to determine what hyperparameters should be used to train the final model, or to perform a finer-grained hyperparameter search near these values. Depending on the stopping criteria, the number of trials conducted can vary. Therefore, the tuner.n_trials gives the number of trials, and also the cardinality of the tuner.trials output.

print(f"The index of the best trial is {tuner.best_index} out of a total of {tuner.n_trials} trials.")

Each tuning operation uses a model score to assess the quality of the model and its associated hyperparameters. The tuner.scoring_name provides the name of the metric that was used. The tuner.best_score gives the numerical value of the best model score. In the following example, the quality of the models were assessed by the mean accuracy.

print('So far the best {} score is {}'.format(tuner.scoring_name, tuner.best_score))

The tuner.best_param provides a dictionary of the hyperparameters that were tuned. It also provides the set of parameters that resulted in the best model, as assessed by the tuner.scoring_name metric and the tuner.best_score value.

**Stopping and resuming trials**

In long-running tuning operations, it is often desirable to halt the process and examine the results until that point. After evaluation, the search process can be resumed. Generally, if you change the search space, the process must restart. However, it is possible to resume the process and apply a new exit_criterion. For example, the process might have run for 1000 trials and it appears that the hyperparameter search has not converged and you wantr it to continue for an additional 500 trials. The tuner.resume() method can be used to do this. In doing this, the previous results are retained.

To maximize the potential for finding the most ideal hyperparameters, ADSTuner offers the option to stop tuning, monitor intermediate results, and resume tuning if there is potential for further improvement with additional runs. Perhaps in the prior run, the exit criterion limited tuning to only 5 trials or 200 seconds. With the option to resume, the tuning can continue for longer and possibly find more optimal hyperparameters. The next code snippet displays how to do this.

tuner.resume(exit_criterion=[NTrials(1500)])

**Visualizing the tuning trials**

Without expensive experience with a particular model and data set, it is challenging to set stopping criteria apriori. Therefore, the progress that the tuner is making is used to make the best judgement call on when it has converged or when the process should be abandoned as the required results are not being obtained with the given class of model. ADSTuner provides a number of standard plots that can be used to assess the tuning process. The tuner.plot_best_scores() method plots the scores achieved over tuning trials.

tuner.plot_best_scores()

It is often helpful to understand not just the best score but how the score varies with various hyperparameters. A contour map can be created with tuner.plot_contour_scores() where the hyperparameters in which the score is to be plotted against is given. In the following figure, the alpha value and penalty (L_{1}, L_{2}, and no penalty) are plotted. It shows that there are two alpha values (the dark regions on the left-hand side and the middle) where the score is near the lowest value. The figure also indicates that L_{2} penalty performs slightly better than L_{1} and no penalty, but the model is not very sensitive to the penalty term.

tuner.plot_contour_scores(params=['penalty', 'alpha'])

Parallel coordinate plots can be a way to distill complex relationships down into simple to understand representations. In the following plot, the model score is plotted with the penalty and alpha scores. It shows that many of the lines pass through alpha values near 0.01 and 0.001. The objective values are all coming from values near 0.98. This suggests that there is a local extremum value near there when the alpha values are near 0.01 and 0.001 and the penalty is none. This information can be used to modify the search criteria and allow the tuner to search over a smaller space and still obtain a near-optimal set of hyperparameters.

tuner.plot_parallel_coordinate_scores(params=['penalty', 'alpha'])

The empirical cumulative distribution plot of the model score provides information on the distribution of score values. It can be used to assess the size of the solution space near the optimal score.

`tuner.plot_edf_scores()`

Detailed information on each trial is shown using tuner.plot_intermediate_scores(). The intermediate value is plotted against each step and the various lines represent the hyperparameters sets.

tuner.plot_intermediate_scores()

**In summary**

This post provides an overview of the ADSTuner, a hyperparameter optimization engine that can be run in the Oracle Cloud Infrastructure Data Science service notebook session environment. Various code snippets detail how to use the engine, customize strategies, and visualize the tuning process.

I also invite you to participate in our upcoming Oracle Developer Live: AI and ML for Your Enterprise event on January 26, 28, and February 2, 2021. Learn how to optimize the machine learning lifecycle with technical sessions and hands-on labs, including two sessions on Accelerated Data Science.

**Keep in touch!**

– Visit our website

– Visit our service documentation

– (Oracle Internal) Join our slack channel #oci_datascience_users

– View our YouTube Playlist

– Visit our LiveLabs Hands-on Lab

Read More …

[ad_2]