A Machine Learning Approach to Estimating Reference Evapotranspiration | by Rouhin Mitra | Dec, 2020


We obtained the data from the California Irrigation Management Information System (CIMIS) that has 145 in situ weather stations located across California. These stations are placed on well-watered grass surfaces. Since it measures all parameters required to calculate the reference evapotranspiration, we calculated it using the Penman-Monteith equation and that acts as our response variable.

A map of all the weather stations used in our study. Image by Author.

The features chosen for our machine learning model were the atmospheric variables RH, TMAX, TMIN, TAVG, and Wind Speed to consider the conditions of the site and the elevation to accommodate the terrain. However, since in-situ measurements are being used, the dependence of these atmospheric variables on reference evapotranspiration could change based on the location. In California, we may expect stations close to the coast to behave similarly, and to capture this variation in space, k-means clustering was used to group stations with similar atmospheric conditions.

Elbow method to determine the number of clusters. Image by author
Importance of each feature based on random forest ensemble. Image by author
List of features. Table by author

In our analysis, we used an artificial neural network to predict the reference evapotranspiration. Since most of the features showed some linear relationship with our predictor, we had initially used ridge regression for prediction. We chose ridge regression because some of the features show a strong linear relationship within themselves, for instance, the temperature is linearly related to the humidity. In the presence of multicollinearity in a feature space, ridge regression performs better than linear regression.

The neural network used for our study. Image by author
The loss function of MLP regressor

Model predictions were compared with the true values and the results show the model can explain 93% of the variability in our predicted values.

Comparision of true and predicted values. Image by author
Performance of the tested models. Image by author

All three models have shown high R² values but the ANN performed slightly better than the random forest and the ridge regression model. The errors in the model could be partially attributed to using a limited feature space and excluding the wind speed component. The error for each day(every row) was also plotted against the features to investigate the presence of systematic bias in the model. However, the variation of the errors with the features did not show any bias in predicting the reference evapotranspiration.

Read More …


Write a comment