How to Know if a Neural Network is Right for Your Machine Learning Initiative
Deep learning models (aka neural nets) now power everything from self-driving cars to video recommendations on a YouTube feed, having grown very popular over the last couple of years. Despite their popularity, the technology is known to have some drawbacks, such as the deep learning “reproducibility crisis”— as it is very common for researchers at one to be unable to recreate a set of results published by another, even on the same data set. Additionally, the steep costs of deep learning would give any company pause, as the FAANG companies have spent over $30,000 to train just a single (very) deep net. Even the largest tech companies on the planet struggle with the scale, depth, and complexity of venturing into neural nets, while the same problems are even more pronounced for smaller data science organizations as neural nets can be both time-and cost-prohibitive. Also, there is no guarantee that neural nets will be able to outperform benchmark models like logistic regression or gradient-boosted ones, as neural nets are finicky and typically require added data and engineering complexities.
With these concerns in mind, it is important to remember that there must be a business reason for even considering neural nets and it should not be because the C-Suite is feeling a bad case of FOMO.
When thinking about the need for neural nets, it’s useful to think about your data as coming in three flavors:
- Continuous variables, which consist of numeric, decimal-pointed values.
- Categorical variables, which offer a limited number of possible values, typically semantic. T
- Text sequences, which provide unstructured semantic information.
Decision tree-based algorithms are not the most efficient method for handling text sequences or categorical variables with many different values, which can require a company to employ creative ways to encode these values into numeric features for their models. This can also mean a lot of manual feature engineering work, as this type of approach adds a lot of complexity to model pipelines when the number of potential values exceeds a handful.
Neural networks generally have a much easier time learning so-called “sparse features,” which is why any organization thinking about venturing into deep learning might want to consider these tips before doing so:
- Try to stick with pre-built, plug-and-play solutions (at least at first), such as models that have been pre-assembled with TensorFlow. It’s always tempting to react to an exciting new paper by attempting to pursue the absolute SOTA, but if the model doesn’t come from within a package with at least some regular users on GitHub/GitLab, it’s probably not going to work well out of the box for your dataset.
- Avoid implementing any algorithm that does not come with any accompanying code. Paperswithcode.com keeps up with the SOTA in deep learning/AI and conveniently links the papers describing neural net architectures and their implementations on GitHub. If your team is considering a model based on a paper that does not come with a corresponding open-sourced implementation (like this paper on a neural net architecture called seq2slate from Google), then don’t. It will be very difficult and time consuming to recreate the model and match performance claims made in the paper, and likely means that very few people are actually using what the paper proposes.
- Don’t give up if you don’t see immediate results! We can’t stress enough the value of keeping precise records of configurations and results. There may be dozens of hyperparameters to tune, and while some of them will have little effect on the outcome, some will change your results dramatically. Keep a careful eye on your choice of optimizer and learning rate—this combination can make all the difference between whether training will make zero, little, or a lot of progress. A good rule of thumb is to start with a very small learning rate to check for loss decrease during training.
- Training neural nets can take a long time to train using just CPUs, but training on a GPU (graphical processor unit) can speed up training by a factor of 10. Get access to a free GPU on Google Colab, which conveniently comes with the latest version of TensorFlow already installed. When evaluating a new model, your data science team should generally do a feature extraction and feature engineering on your own VPC, and then upload training sets to Google Drive to train models in Colab notebooks. This keeps costs down and avoids additional infrastructure for dev ops to maintain. Finally, using TensorFlow’s tensorboard utility, data scientists can study model training results directly from the cloud and compare all of their different experiments.
- Write reusable code. It’s much too easy to tweak one of the dozens of neural net hyperparameters or to subtly manipulate the training set in a way that greatly impacts model results in a notebook development setting and lose track of which settings led to which results. But maintaining central dataset creation and model training scripts allows you to record the best settings.
Bio: Frank Fineis is the Lead Data Scientist at Avatria, a digital commerce firm and developer of e-commerce solutions. The company builds data-driven products that leverage machine learning to provide actionable insights for its B2C and B2B customer’s e-commerce needs.