Scikeras Tutorial: A Multi Input Multi Output(MIMO) Wrapper for CapsNet Hyperparameter Tuning with Keras | by Anshuman Sabath | Jan, 2021
[ad_1]
Building up on our discussion so far, the wrapper would need to override both BaseWrappers.feature_encoder()
and BaseWrappers.target_encoder()
. Depending on the type of transformation required, we could either resort to writing our custom transformer, or use one of the many transformers that are already offered in sklearn.preprocessing
. For this tutorial, we will demonstrate both the ways of transformation — we will write a custom transformer for the outputs and use a library transformer for the inputs.
Further, since the mechanism of training of the Keras model can not be strictly mirrored with that of a classifier or regressor (due to the reconstruction module), we will sub-class the BaseWrapper
while defining our estimator. Moreover, for the performance comparison of the model we need to consider two outputs — hence, a custom scorer will also be needed.
Output Transformer
For our specific implementation, the outputs needed by the Keras model has to be in the form [y_true, X_true], while sklearn expects a numpy array to be fed as targets array. The transformer we define needs to be able to interface seamlessly between the two. This is achieved by fitting the transformer to the outputs in fit
method, and then usingtransform
method that reshapes the output into a list of arrays as expected by Keras, and an inverse_transform
method that reshapes the output as expected by sklearn.
We create our custom transformer MultiOutputTransformer
, by sub-classing or inheriting from BaseEstimator
and TransformerMixin
classes of sklearn, and define a fit
method. This method could be used to incorporate multiple library encoder, (like LabelEncoder
, OneHotEncoder
), into a single transformer, as demonstrated in the official tutorial, depending on the type of outputs. These encoders can be fit to the inputs so that the transform
and inverse_transform
methods can work appropriately. In this function, it is necessary to set the self.n_outputs_expected_
parameter to inform scikeras about the outputs from fit
, while other parameters in meta can be optionally set. This function must return self
.
In the code presented here, however, I have tried to demonstrate the implementation when there is no transformation needed for the targets except for a possible separation and a rearrangement. It should be noted that it would be possible to define a FunctionTransformer
over an identity function to achieve this as well (which is demonstrated in next section).
The get_metadata
function is optionally defined for model_build_fn
where meta
parameter is accepted. Specific to this code, the transform
method is straightforward, in the inverse_transform
method, we need to define our custom inverse transformation, since we do not have any library encoders to rely on.
Input Transformer
For the input transformer, we will use a library transformer already available in sklearn.preprocessing
— the FunctionTransformer
. For the FunctionTransformer
, it is possible to define a lambda
function into the func
parameter of transformer constructor. But, having a lambda
function could cause issues with pickle
. So, we instead define a separate function to pass into FunctionTransformer
.
MIMO Estimator
To finish up the wrapper, we subclass BaseWrapper
as mentioned previously, and override feature_encoder
, scorer
, and target_encoder
functions. Note that, in the scorer
function, we only evaluate the output from the Capsules layer, since this is the metric on which we would want our cross-validation epochs to optimize the network.
The next steps are pretty similar to the first example using the wrappers in tf.keras
. We instantiate MIMOEstimator
using get_model
and pass the (hyper)parameters to get_model
as routed parameters (with model__
prefix). These routed arguments also include those hyperparameters that we would like to tune using grid-search.
Next, we define the params
dict containing the hyperparameters list and the corresponding values to try out as key-value pairs. We use the clf
as a estimator to create GridSearchCV
object, and then fit it to the data.
Care must be taken while specifying the cv
argument for the GridSearchCV
to achieve a suitable relation between the number of training examples (n), the batch size(b), and the number of cross-validation batches (cv)— n should be completely divisible by cv *b.
The results of the grid-search are accumulated in gs_res
after the fit operation. The best estimator can be obtained using best_estimator_
attribute of gs_res
, similarly, the best_score_
gives the best score, and best_params_
gives the best fit of hyperparameters.
Read More …
[ad_2]