Scikeras Tutorial: A Multi Input Multi Output(MIMO) Wrapper for CapsNet Hyperparameter Tuning with Keras | by Anshuman Sabath | Jan, 2021
Building up on our discussion so far, the wrapper would need to override both
BaseWrappers.target_encoder() . Depending on the type of transformation required, we could either resort to writing our custom transformer, or use one of the many transformers that are already offered in
sklearn.preprocessing . For this tutorial, we will demonstrate both the ways of transformation — we will write a custom transformer for the outputs and use a library transformer for the inputs.
Further, since the mechanism of training of the Keras model can not be strictly mirrored with that of a classifier or regressor (due to the reconstruction module), we will sub-class the
BaseWrapper while defining our estimator. Moreover, for the performance comparison of the model we need to consider two outputs — hence, a custom scorer will also be needed.
For our specific implementation, the outputs needed by the Keras model has to be in the form [y_true, X_true], while sklearn expects a numpy array to be fed as targets array. The transformer we define needs to be able to interface seamlessly between the two. This is achieved by fitting the transformer to the outputs in
fitmethod, and then using
transform method that reshapes the output into a list of arrays as expected by Keras, and an
inverse_transform method that reshapes the output as expected by sklearn.
We create our custom transformer
MultiOutputTransformer , by sub-classing or inheriting from
TransformerMixin classes of sklearn, and define a
fit method. This method could be used to incorporate multiple library encoder, (like
OneHotEncoder), into a single transformer, as demonstrated in the official tutorial, depending on the type of outputs. These encoders can be fit to the inputs so that the
inverse_transform methods can work appropriately. In this function, it is necessary to set the
self.n_outputs_expected_ parameter to inform scikeras about the outputs from
fit, while other parameters in meta can be optionally set. This function must return
In the code presented here, however, I have tried to demonstrate the implementation when there is no transformation needed for the targets except for a possible separation and a rearrangement. It should be noted that it would be possible to define a
FunctionTransformer over an identity function to achieve this as well (which is demonstrated in next section).
get_metadata function is optionally defined for
meta parameter is accepted. Specific to this code, the
transform method is straightforward, in the
inverse_transform method, we need to define our custom inverse transformation, since we do not have any library encoders to rely on.
For the input transformer, we will use a library transformer already available in
sklearn.preprocessing — the
FunctionTransformer . For the
FunctionTransformer , it is possible to define a
lambda function into the
func parameter of transformer constructor. But, having a
lambda function could cause issues with
pickle. So, we instead define a separate function to pass into
To finish up the wrapper, we subclass
BaseWrapper as mentioned previously, and override
target_encoder functions. Note that, in the
scorer function, we only evaluate the output from the Capsules layer, since this is the metric on which we would want our cross-validation epochs to optimize the network.
The next steps are pretty similar to the first example using the wrappers in
tf.keras. We instantiate
get_model and pass the (hyper)parameters to
get_model as routed parameters (with
model__prefix). These routed arguments also include those hyperparameters that we would like to tune using grid-search.
Next, we define the
params dict containing the hyperparameters list and the corresponding values to try out as key-value pairs. We use the
clf as a estimator to create
GridSearchCV object, and then fit it to the data.
Care must be taken while specifying the
cv argument for the
GridSearchCV to achieve a suitable relation between the number of training examples (n), the batch size(b), and the number of cross-validation batches (cv)— n should be completely divisible by cv *b.
The results of the grid-search are accumulated in
gs_res after the fit operation. The best estimator can be obtained using
best_estimator_ attribute of
gs_res, similarly, the
best_score_ gives the best score, and
best_params_ gives the best fit of hyperparameters.
Read More …