Quirky Keras: Custom and Asymmetric Loss Functions for Keras in R | by Ellie White | Oct, 2020
TL;DR — this tutorial exhibits you find out how to use wrapper features to assemble customized loss features that take arguments aside from y_pred and y_true for Keras in R. See instance code for linear exponential error (LINEXE) and weighted least squared error (WLSE).
In statistical studying, the loss perform is a translation of an off-the-cuff philosophical modeling goal into the formal language of arithmetic (Hennig & Kutlukaya, 2007). So, the selection of a loss perform in estimation is considerably subjective and depends upon the precise software of the mannequin or the selections being made when used. Here are some loss features to think about:
- The Mean Squared Error (MSE) is the acquainted goal perform in easy least-squares regression, a convex perform, emphasizing factors distant from the majority of the info by squaring the error. MSE penalizes bigger errors greater than smaller error; the perform is steeper in the tails than in the center.
- The Log Hyperbolic Cosine (LOGCOSH) or log(cosh(x)) is roughly half of(x^2) for small x and abs(x) — log(2) for massive x. Therefore, the LOGCOSH works very like the MSE, however might be much less affected by occasional wildly incorrect predictions and in this regard is just like the Mean Absolute Error (MAE). The LOGCOSH and MAE are helpful when errors in the estimates for bigger values don’t have to be penalized extra by squaring like in the MSE.
- The Mean Squared Percentage Error (MSPE) is helpful in issues the place the relative errors are extra of curiosity (e.g., an error of 10 in 100 is extra attention-grabbing than 10 in 100,000). However, the MSPE is biased in the direction of decrease predictions and will not be suited to issues the place the info is skewed optimistic (e.g., streamflows which can be at all times >=0).
Here’s the customized code written for the MSPE. Note that the perform solely takes y_true and y_pred.
Symmetric features produce the identical loss when underpredicting and overpredicting of the identical absolute error. However, an uneven loss perform applies a distinct penalty to the totally different instructions of loss. For instance, in hydrologic prediction, an uneven loss perform can power the mannequin to overpredict streamflows in occasions of floods and underpredict them in droughts quite than the much less fascinating reverse. This strategy leads water managers to extra conservative choices, for the reason that fashions predict extra excessive floods and droughts.
First, a easy classification mannequin is required to label observations as flood (FLOOD==1) and drought (FLOOD==0). For every basin, the imply precipitation throughout the total file may be designated a tough threshold; if the precipitation of a given month fell under this worth, that commentary was designated a “drought” and if above, a “flood.” Given this designation, now, totally different losses may be utilized to the prediction error at totally different places in the info.
Let’s begin with the WLSE (Equation 1) the place the alpha and beta have totally different values for the observations labeled flood and drought. So, we’ve got alphad, betad, alphaf, and betaf as inputs into the loss perform.
Now, the quirck; losses in Keras can settle for solely two arguments: y_true and y_pred, that are the goal tensor and mannequin output tensor, respectively. However, if we need the loss to depend upon different tensors, just like the alpha and beta vectors, we’re required to make use of perform closures. Here, the wlse loss perform takes in no matter arguments we need and the wrapper perform returns the perform that solely depends upon y_true and y_pred.
Here’s the identical idea however with LINEXE:
The LINEXE (Equation 2) depends upon phi that takes on totally different values for the observations labeled flood and drought. So, we’ve got phid and phif as inputs into the loss perform.
…and the wrapper:
Now, on to modeling. First we outline the structure of the NN mannequin:
Next, we’ve got to compile and match the mannequin. For symmetric losses we’ve got:
… and our job is completed for symmetric losses!
In uneven losses, since we now have labeled observations (floods or droughts), we’d like this designation to line up with every y_true and y_pred accurately. Therefore, we will now not use minibatch coaching strategies that scramble the info with out altering the scrambling algorithm to accommodate labels. The dimension of the minibatch is decided by the validation break up (e.g., 0.2) and solely aids in dashing up the mannequin coaching. To nonetheless make correct predictions with out minibatch, we will merely enhance the coaching epochs and by setting shuffle=FALSE we now not have the issue of the flood and drought labels not lining up with the info.
Now, predictions can come from:
Figure 1 exhibits the predictions of circulate for one river basin in California, the American river at North Fork Dam (White, E. 2020). We can see with the WLSE and LINEXE uneven losses, the predictions are constantly overpredicting the floods and underpredicting the droughts! This is what we wished to occur. This means we will make extra conservative choices and be ready for extra excessive circumstances.
Losses in Keras can settle for solely two arguments: y_true and y_pred, that are the goal tensor and mannequin output tensor, respectively. However, if we need the loss to depend upon different tensors-as is the case with uneven losses-we are required to make use of perform closures. Here, the loss perform takes in no matter arguments we need and returns the perform that solely depends upon y_true and y_pred. Hence, the identify wrappers.
In common, the pliability in selecting a loss perform is particularly helpful in risk-based resolution making the place the modeling intention is to precisely predict the likelihood distribution significantly at its tails the place excessive value penalties might happen. Asymmetric loss features show helpful in this regard.
C. Hennig, & M. Kutlukaya, Some ideas in regards to the design of loss features (2007). REVSTAT– Statistical Journal , 5 (1), 19–39.
E. White, Statistical studying for unimpaired circulate prediction in ungauged basins (2020). PhD Dissertation.