Advanced Options with Hyperopt for Tuning Hyperparameters in Neural Networks | by Nicholas Lewis | Jan, 2021

[ad_1]


Nicholas Lewis
Photo by C M on Unsplash

For our data, we’ll generate some First Order Plus Dead Time (FOPDT) model data. FOPDT models are powerful and straightforward models that are often used in industry for preliminary results. They are a way of describing what happens in response to a changing stimulus. For example, we can model how the speed of a car changes based on how much you press the gas pedal.

def fopdt(y,t,um,Km,taum):
# arguments
# y = output
# t = time
# uf = input linear function (for time shift)
# Km = model gain
# taum = model time constant
# calculate derivative
dydt = (-(y-yp0) + Km * (um-u0))/taum
return dydt
def sim_model(Km,taum):
# array for model values
ym = np.zeros(ns)
# initial condition
ym[0] = yp0
# loop through time steps
for i in range(0,ns-1):
ts = [t[i],t[i+1]]
y1 = odeint(fopdt,ym[i],ts,args=(u[i],Km,taum))
ym[i+1] = y1[-1]
return ym
# Parameters and time for FOPDT model
ns = 10000
t = np.linspace(0,ns-1,ns)
u = np.zeros(ns)
# Additional FOPDT parameters
yp0 = 0.0
u0 = u[0]
Km = 0.67
taum = 160.0
# Generate step data for u
end = 60 # leave 1st minute of u as 0
while end <= ns:
start = end
end += random.randint(300,900) # keep new Q1s value for anywhere
from 5 to 15 minutes
u[start:end] = random.randint(0,100)
# Simulate FOPDT model
y = sim_model(Km,taum)
# Add Gaussian noise
noise = np.random.normal(0,0.2,ns)
y += noise
# Scale data
data = np.vstack((u,y)).T
s = MinMaxScaler(feature_range=(0,1))
data_s = s.fit_transform(data)
Plot by author.

Keras is an excellent platform for constructing neural networks. We could keep this really basic, and do something like the following:

# Keras LSTM model
model = Sequential()

model.add(LSTM(units = 50,
input_shape = (Xtrain.shape[1],Xtrain.shape[2])
)
)
model.add(Dropout(rate = 0.1))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')

es = EarlyStopping(monitor='val_loss',mode='min',verbose=1,patience=15)

result = model.fit(Xtrain, ytrain, verbose=0, validation_split=0.1,
batch_size=100,
epochs=200)
Plot by author

The purpose of this article isn’t an introduction to Hyperopt, but rather aimed at expanding what you want to do with Hyperopt. Looking at the Keras block of code above, there are several hyperparameters we could pick out to optimize, such as units in the LSTM layer, rate in the Dropout layer, and batch_size when we’re fitting. Finding optimal values of these would be covered in an introductory Hyperopt tutorial. However, we may find it useful to add some extra LSTM and Dropout layers, or even look at a more optimal window of datapoints to feed into the LSTM. We may even find it beneficial to change the objective function that we’re trying to minimize.

from hyperopt.pyll.base import scope 
#quniform returns float, some parameters require int; use this to force int
space = {'rate' : hp.uniform('rate',0.01,0.5),
'units' : scope.int(hp.quniform('units',10,100,5)),
'batch_size' :
scope.int(hp.quniform('batch_size',100,250,25)),
'layers' : scope.int(hp.quniform('layers',1,6,1)),
'window' : scope.int(hp.quniform('window',10,50,5))
}
def f_nn(params):
# Generate data with given window
Xtrain, ytrain, Xtest, ytest =
format_data(window=params['window'])

# Keras LSTM model
model = Sequential()

if params['layers'] == 1:
model.add(LSTM(units=params['units'],
input_shape=(Xtrain.shape[1],Xtrain.shape[2])))
model.add(Dropout(rate=params['rate']))
else:
# First layer specifies input_shape and returns sequences
model.add(LSTM(units=params['units'],
return_sequences=True,
input_shape=(Xtrain.shape[1],Xtrain.shape[2])))
model.add(Dropout(rate=params['rate']))

# Middle layers return sequences
for i in range(params['layers']-2):
model.add(LSTM(units=params['units'],
return_sequences=True))
model.add(Dropout(rate=params['rate']))

# Last layer doesn't return anything
model.add(LSTM(units=params['units']))
model.add(Dropout(rate=params['rate']))

model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')

es = EarlyStopping(monitor='val_loss',mode='min',
verbose=1,patience=15)

result = model.fit(Xtrain, ytrain,
verbose=0,
validation_split=0.1,
batch_size=params['batch_size'],
epochs=200)

# Get the lowest validation loss of the training epochs
validation_loss = np.amin(result.history['val_loss'])
print('Best validation loss of epoch:', validation_loss)

return {'loss': validation_loss,
'status': STATUS_OK,
'model': model,
'params': params}

To run the actual optimization, be prepared for some long run times. Training an LSTM always takes a bit of time, and what we’re doing is training it several times with different hyperparameter sets. This next part took about 12 hours to run on my personal computer. You can speed up the process significantly by using Google Colab’s GPU resources.

trials = Trials()
best = fmin(f_nn,
space,
algo=tpe.suggest,
max_evals=50,
trials=trials)
best_model = trials.results[np.argmin([r['loss'] for r in 
trials.results])]['model']
best_params = trials.results[np.argmin([r['loss'] for r in
trials.results])]['params']
worst_model = trials.results[np.argmax([r['loss'] for r in
trials.results])]['model']
worst_params = trials.results[np.argmax([r['loss'] for r in
trials.results])]['params']

Now we’ve run the optimization and saved the model (and for good measure the set of hyperparameters), it’s time to see how the model looks. We’ll look at two different approaches. The first approach involves taking the previous window of actual input data points (pedal %) and using that to predict the next output (speed). We’ll call this the “prediction.” This is quite simply found by taking our test data and applying the model.predict() function. It looks like this:

# Best window
best_window = best_params['window']
# Format data
Xtrain, ytrain, Xtest, ytest = format_data(window=best_window)
Yp = best_model.predict(Xtest)
def forecast(Xtest,ytest,model,window):
Yf = ytest.copy()
for i in range(len(Yf)):
if i < window:
pass
else:
Xu = Xtest[i,:,0]
Xy = Yf[i-window:i]
Xf = np.vstack((Xu,Xy)).T
Xf = np.reshape(Xf, (1, Xf.shape[0], Xf.shape[1]))
Yf[i] = model.predict(Xf)[0]

return Yf

Plot by author
Plot by author

This is where the objective function comes in. You can get quite clever with the objective function that you’re minimizing to account for all types of situations where you want to see better results. For example, what if we also want to account for how much time the model takes to train? We could change our loss score to include an element of time — maybe something like multiply by the time it takes, so that a fast training time is rewarded.

# Get validation set
val_length = int(0.2*len(ytest))
Xval, yval = Xtrain[-val_length:], ytrain[-val_length:]
# Evaluate forecast
Yr, Yp, Yf = forecast(Xval,yval,model,params['window'])
mse = np.mean((Yr - Yf)**2)
return {'loss': mse, 
'status': STATUS_OK,
'model': model,
'params': params}
Plot by author
Plot by author

These are just a few examples of how you can utilize Hyperopt to get increased performance from your machine learning model. While the exact methods used here might not be used in your particular situation, I hope that some ideas were sparked and that you can see some more potential uses for Hyperopt. I’ve included the code from my different simulations on my Github repo.

Read More …

[ad_2]


Write a comment