## Machine Learning Tutorial Python – 11 Random Forest

[ad_1]

Random forest is a popular regression and classification algorithm. In this tutorial we will see how it works for classification problem in machine learning. It uses decision tree underneath and forms multiple trees and eventually takes majority vote out of it. We will go over some theory first and then solve digits classification problem using sklearn RandomForestClassifier. In the end we have an exercise for you to solve.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningAlgorithm #RandomForest

Code: https://github.com/codebasics/py/blob/master/ML/11_random_forest/11_random_forest.ipynb

Exercise: Exercise description is avialable in above notebook towards the end

Exercise solution: https://github.com/codebasics/py/blob/master/ML/11_random_forest/Exercise/random_forest_exercise.ipynb

Topics that are covered in this Video:

0:07 Random forest algorithm

0:50 How to build multiple decision trees based on single data set?

2:34 Use of sklearn digits data set to make a classification using random forest

3:04 Coding (Start) (Use sklearn digits dataset for classification using random forest)

7:10 sklearn.ensemble RandomForestClassifier

10:36 Confusion Matrix (sklearn.metrics confusion_matrix)

12:04 Exercise (Classify iris flower using sklearn iris flower dataset and random forest classifier)

Next Video:

Machine Learning Tutorial Python 12 – K Fold Cross Validation: https://www.youtube.com/watch?v=gJo0uNL-5Qw&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=13

Populor Playlist:

Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV

Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg

Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl

Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1

Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/

Facebook: https://www.facebook.com/codebasicshub

Twitter: https://twitter.com/codebasicshub

Source

[ad_2]

https://github.com/codebasics/py/blob/master/ML/11_random_forest/Exercise/random_forest_exercise.ipynb

Complete machine learning tutorial playlist: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

This video was amazing. Thanks!

please make a video on hyperparameter tuning

it would be a great help

Hey, Hope you're doing well! I have a query regarding random forest algo! I want to ask that I have predicted random forests algo and made 70 30 ratio! But how i can specify the prediction for 30days! Any variable or specifier?

Looking forward to hearing from you soon!

Thanky!

Sir in my jupiter notebook after writing this program random forest hyperparameters are not displayed, so what I do now?

n_estimator = 1 is the best in my model, that so damn hard to explain, maybe hence the datasets so small ?

default 100 n_estimators or 20 n_estimator , each case it gives 1.0 accuracy. well after getting on this channel , i can feel the warmth on the tip of my fingers.

I didn't understand the "number of random forests".

You are funny! Thanks for the tutorial

At n_estimator = 100, I got a score of 1.0

100 % accuracy with 30 trees in the forest

I am not afraid of you, but I respect you!

So I am gonna do the exercise right now!

score is 93 with n_estimator =10

score is 96 with n_estimator =20

Thank you so much for this tutorial

my accuracy score is 0.9666667 with n_estimators at 40

I am always getting output like this

RandomForestClassifier()

instead of below output

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini",

max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=20, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False)

then what should I do to get that with parameters.

help me.

Smart Dewds !!!

I got solution.But when i run a confusion matrix i am get 0,1,2 for setosa,versicolor,virginica in a graph. how to convert numbers into there names in the graph.

X: Hi! I have no coding background but i want to learn Data Science.Do you know any youtube Chanel who has perfect strategical video.

codebasics subscriber:Go to codbasics .1st he explain basic,then show how to implement,then give you Exercise.

X:But I am poor in Math

codebasics subscriber:Ha Ha Ha.

X:He was right .He save my lot of money

Sir I got score=1.0 for estimator=10

And random_state=10

Very nice explanation👌👌👌

I got 100% accuracy with default estimator and random_state=10. Thanks a lot Sir

accuracy = 96.66% (n_estimators=10) and it is maximum in my case

Thank you sir…I got 100% accuracy with n_estimator 90

Sir I don’t know why, but whenever I am applying any Algorithm on iris dataset.. I get model.score()=1.0.. Which means 100% accuracy.. can it be possible or is there a problem with my program.. i am using train test split .. and even am using them at appropriate positions… Plz Answer asap

n_estimators = 10, criterion = 'entropy' led to a 100% accurate model !! Thanks!

The default value of n_estimators changed from 10 to 100 in 0.22 version of skllearn. i got accuracy of 95.56 with n_estimators = 10 and for 100 the same.

When I fit the data into model I didn't get the output as you like all feature are included in in model.It just showed me the model fitted nothing else.what can I do for to see full details of model at fitting??

Awesome channel.

I have a question though.

To find the optimal n_estimators I made a loop that went from n_estimators=1 until a number of my choice (number_trees)

But I thought that a lucky train_test_split could give a very good score to a shitty model. So i made an inside loop that run up to a number of my choice (number_sets) the split, Train model, score and keep the best and worst scores.

The result is that I see absolutely no tendency on the score depending on n_estimators.

For example, with n_iterations = 3 and doing the split 5 times, the worst i get is 0.97 accuracy, which is great

But with n_iterations = 4, the worst i get is 0.89, which is worse

But then again, n_iterations = 10 i get 0.97

And so on so forth.

My question is, why do not I see a tendency on the score depending on n_estimators? I was expecting the score to go up up to a certain n_estimators and then not changing.

CODE (YouTube doesnt allow copypaste so there might be a typo)

number_trees = 100

number_sets = 5

pd.set_Option("Display.max_rows", None)

results = pd.DataFrame(columns = ["min_score", "max_score"])

for i in range (1, number_trees+1):

modeli = RandomForestClassifier(n_estimators = i)

min_score = 1

max_score = 0

for j in range (number_sets):

X_Train, X_test, y_train, y_test = Train_test_split(X,y)

modeli.fit(X_train, y_train)

score = modeli.score(X_test, y_test)

if score > max_score:

max_score = score

if score < min_score:

min_score = score

results.loc[i, "min_score"]= min_score

results.loc[i,"max_score"]= max_score

results

Is it possible to predict a set of numbers that will output from a random number generator, finding the algorithm, in order to duplicate the same pattern of results?

When using dir(digits) . It is displaying something else like the below

['__annotations__',

'__call__',

'__class__',

'__closure__',

'__code__',

'__defaults__',

'__delattr__',

'__dict__',

'__dir__',

'__doc__',

'__eq__',

'__format__',

'__ge__',

'__get__',

'__getattribute__',

'__globals__',

'__gt__',

'__hash__',

'__init__',

'__init_subclass__',

'__kwdefaults__',

'__le__',

'__lt__',

'__module__',

CAN ANYONE PLEASE HELP ME OUT COZ I M COMPLETELY NEW TO THIS

I got around (95%) is it good or not ..and. n_estimators=30 when I m changing estimator it is decreased my accuracy and my (x_test,y_test=1)

Awsome,Can we say the defination of random forest as "Combination or a group of decision trees" ? thank you

It's good. But if u explain what parameters need to fine tunned it would be great

Hi Sir,

Can we use any other model (eg: svm) with the random forest approach, that is, by creating an ensemble out of 10 svm models and getting a majority vote?

Thank you for the wonderful video.

n_estimators = 1 (also 290 or bigger) is even made accuracy %100 but, as all we know , this type of datasets are prepared for learning phases, so making %100 accuracy is so easy as well.

I achieved an accuracy of .9736. Earlier, I got an accuracy of .9 when the test size was 0.2 and changing the number of trees wasn't changing the accuracy much. So, I tweaked the test size to .25 and tried different number of tree size. The best I got was .9736 with n_estimators = 60 and criterion = entropy gives a better result.

Thank you so much sir for the series. This is the best Youtube Series on Machine Learning out there!!

With n_estimators = 70, and Score = 97.8

Awesome training.. I like all your videos.. The spelling of Random is incorrect in your starting video page.. Please do correct it. It is spelt as RENDOM.

100% accuracy on the given exercise. I used n_estimators = 1