 ## Machine Learning Tutorial Python – 11 Random Forest

Random forest is a popular regression and classification algorithm. In this tutorial we will see how it works for classification problem in machine learning. It uses decision tree underneath and forms multiple trees and eventually takes majority vote out of it. We will go over some theory first and then solve digits classification problem using sklearn RandomForestClassifier. In the end we have an exercise for you to solve.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningAlgorithm #RandomForest

Exercise: Exercise description is avialable in above notebook towards the end

Topics that are covered in this Video:
0:07 Random forest algorithm
0:50 How to build multiple decision trees based on single data set?
2:34 Use of sklearn digits data set to make a classification using random forest
3:04 Coding (Start) (Use sklearn digits dataset for classification using random forest)
7:10 sklearn.ensemble RandomForestClassifier
10:36 Confusion Matrix (sklearn.metrics confusion_matrix)
12:04 Exercise (Classify iris flower using sklearn iris flower dataset and random forest classifier)

Next Video:
Machine Learning Tutorial Python 12 – K Fold Cross Validation: https://www.youtube.com/watch?v=gJo0uNL-5Qw&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=13

Populor Playlist:

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/

Source

### Comment List

• codebasics
November 25, 2020
• codebasics
November 25, 2020

This video was amazing. Thanks!

• codebasics
November 25, 2020

please make a video on hyperparameter tuning
it would be a great help

• codebasics
November 25, 2020

Hey, Hope you're doing well! I have a query regarding random forest algo! I want to ask that I have predicted random forests algo and made 70 30 ratio! But how i can specify the prediction for 30days! Any variable or specifier?
Looking forward to hearing from you soon!
Thanky!

• codebasics
November 25, 2020

Sir in my jupiter notebook after writing this program random forest hyperparameters are not displayed, so what I do now?

• codebasics
November 25, 2020

n_estimator = 1 is the best in my model, that so damn hard to explain, maybe hence the datasets so small ?

• codebasics
November 25, 2020

default 100 n_estimators or 20 n_estimator , each case it gives 1.0 accuracy. well after getting on this channel , i can feel the warmth on the tip of my fingers.

• codebasics
November 25, 2020

I didn't understand the "number of random forests".

• codebasics
November 25, 2020

You are funny! Thanks for the tutorial

• codebasics
November 25, 2020

At n_estimator = 100, I got a score of 1.0

• codebasics
November 25, 2020

100 % accuracy with 30 trees in the forest

• codebasics
November 25, 2020

I am not afraid of you, but I respect you!
So I am gonna do the exercise right now!

• codebasics
November 25, 2020

score is 93 with n_estimator =10
score is 96 with n_estimator =20

• codebasics
November 25, 2020

Thank you so much for this tutorial
my accuracy score is 0.9666667 with n_estimators at 40

• codebasics
November 25, 2020

I am always getting output like this
RandomForestClassifier()

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini",
max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=20, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False)

then what should I do to get that with parameters.
help me.

• codebasics
November 25, 2020

Smart Dewds !!!

• codebasics
November 25, 2020

I got solution.But when i run a confusion matrix i am get 0,1,2 for setosa,versicolor,virginica in a graph. how to convert numbers into there names in the graph.

• codebasics
November 25, 2020

X: Hi! I have no coding background but i want to learn Data Science.Do you know any youtube Chanel who has perfect strategical video.
codebasics subscriber:Go to codbasics .1st he explain basic,then show how to implement,then give you Exercise.
X:But I am poor in Math
codebasics subscriber:Ha Ha Ha.
X:He was right .He save my lot of money

• codebasics
November 25, 2020

Sir I got score=1.0 for estimator=10
And random_state=10
Very nice explanation👌👌👌

• codebasics
November 25, 2020

I got 100% accuracy with default estimator and random_state=10. Thanks a lot Sir

• codebasics
November 25, 2020

accuracy = 96.66% (n_estimators=10) and it is maximum in my case

• codebasics
November 25, 2020

Thank you sir…I got 100% accuracy with n_estimator 90

• codebasics
November 25, 2020

Sir I don’t know why, but whenever I am applying any Algorithm on iris dataset.. I get model.score()=1.0.. Which means 100% accuracy.. can it be possible or is there a problem with my program.. i am using train test split .. and even am using them at appropriate positions… Plz Answer asap

• codebasics
November 25, 2020

n_estimators = 10, criterion = 'entropy' led to a 100% accurate model !! Thanks!

• codebasics
November 25, 2020

The default value of n_estimators changed from 10 to 100 in 0.22 version of skllearn. i got accuracy of 95.56 with n_estimators = 10 and for 100 the same.

• codebasics
November 25, 2020

When I fit the data into model I didn't get the output as you like all feature are included in in model.It just showed me the model fitted nothing else.what can I do for to see full details of model at fitting??

• codebasics
November 25, 2020

Awesome channel.

I have a question though.

To find the optimal n_estimators I made a loop that went from n_estimators=1 until a number of my choice (number_trees)

But I thought that a lucky train_test_split could give a very good score to a shitty model. So i made an inside loop that run up to a number of my choice (number_sets) the split, Train model, score and keep the best and worst scores.

The result is that I see absolutely no tendency on the score depending on n_estimators.

For example, with n_iterations = 3 and doing the split 5 times, the worst i get is 0.97 accuracy, which is great

But with n_iterations = 4, the worst i get is 0.89, which is worse

But then again, n_iterations = 10 i get 0.97

And so on so forth.

My question is, why do not I see a tendency on the score depending on n_estimators? I was expecting the score to go up up to a certain n_estimators and then not changing.

CODE (YouTube doesnt allow copypaste so there might be a typo)

number_trees = 100

number_sets = 5

pd.set_Option("Display.max_rows", None)

results = pd.DataFrame(columns = ["min_score", "max_score"])

for i in range (1, number_trees+1):

modeli = RandomForestClassifier(n_estimators = i)

min_score = 1

max_score = 0

for j in range (number_sets):

X_Train, X_test, y_train, y_test = Train_test_split(X,y)

modeli.fit(X_train, y_train)

score = modeli.score(X_test, y_test)

if score > max_score:

max_score = score

if score < min_score:

min_score = score

results.loc[i, "min_score"]= min_score

results.loc[i,"max_score"]= max_score

results

• codebasics
November 25, 2020

Is it possible to predict a set of numbers that will output from a random number generator, finding the algorithm, in order to duplicate the same pattern of results?

• codebasics
November 25, 2020

When using dir(digits) . It is displaying something else like the below
['__annotations__',
'__call__',
'__class__',
'__closure__',
'__code__',
'__defaults__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__get__',
'__getattribute__',
'__globals__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__kwdefaults__',
'__le__',
'__lt__',
'__module__',

• codebasics
November 25, 2020

I got around (95%) is it good or not ..and. n_estimators=30 when I m changing estimator it is decreased my accuracy and my (x_test,y_test=1)

• codebasics
November 25, 2020

Awsome,Can we say the defination of random forest as "Combination or a group of decision trees" ? thank you

• codebasics
November 25, 2020

It's good. But if u explain what parameters need to fine tunned it would be great

• codebasics
November 25, 2020

Hi Sir,
Can we use any other model (eg: svm) with the random forest approach, that is, by creating an ensemble out of 10 svm models and getting a majority vote?
Thank you for the wonderful video.

• codebasics
November 25, 2020

n_estimators = 1 (also 290 or bigger) is even made accuracy %100 but, as all we know , this type of datasets are prepared for learning phases, so making %100 accuracy is so easy as well.

• codebasics
November 25, 2020

I achieved an accuracy of .9736. Earlier, I got an accuracy of .9 when the test size was 0.2 and changing the number of trees wasn't changing the accuracy much. So, I tweaked the test size to .25 and tried different number of tree size. The best I got was .9736 with n_estimators = 60 and criterion = entropy gives a better result.
Thank you so much sir for the series. This is the best Youtube Series on Machine Learning out there!!

• codebasics
November 25, 2020

With n_estimators = 70, and Score = 97.8

• codebasics
November 25, 2020

Awesome training.. I like all your videos.. The spelling of Random is incorrect in your starting video page.. Please do correct it. It is spelt as RENDOM.

• codebasics
November 25, 2020

100% accuracy on the given exercise. I used n_estimators = 1