How to encode categorical features for ML with scikit-learn
In an effort to embody categorical options in your Machine Studying mannequin, you need to encode them numerically utilizing “dummy” or “one-hot” encoding. However how do you do that accurately utilizing scikit-learn?
On this 28-minute video, you will be taught:
ColumnTransformerto encode your categorical options and put together your function matrix in a single step
- embody this step inside a
Pipelineso to cross-validate your mannequin and preprocessing steps concurrently
- Why you need to use scikit-learn (moderately than pandas) for preprocessing your dataset
If you wish to observe together with the code, you possibly can download the Jupyter notebook from GitHub.
Click on on a timestamp under to leap to a selected part:
0:22 Why do you have to use a
2:30 Preview of the lesson
3:35 Loading and getting ready a dataset
6:11 Cross-validating a easy mannequin
10:00 Encoding categorical options with
15:01 Deciding on columns for preprocessing with
19:00 Making a two-step
19:54 Cross-validating a
21:44 Making predictions on new knowledge
23:43 Recap of the lesson
24:50 Why do you have to use scikit-learn (moderately than pandas) for preprocessing?
P.S. Wish to grasp Machine Studying in Python? Enroll in my on-line course, Machine Learning with Text in Python!