Machine Learning Tutorial Python – 11 Random Forest




[ad_1]

Random forest is a popular regression and classification algorithm. In this tutorial we will see how it works for classification problem in machine learning. It uses decision tree underneath and forms multiple trees and eventually takes majority vote out of it. We will go over some theory first and then solve digits classification problem using sklearn RandomForestClassifier. In the end we have an exercise for you to solve.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningAlgorithm #RandomForest

Code: https://github.com/codebasics/py/blob/master/ML/11_random_forest/11_random_forest.ipynb

Exercise: Exercise description is avialable in above notebook towards the end

Exercise solution: https://github.com/codebasics/py/blob/master/ML/11_random_forest/Exercise/random_forest_exercise.ipynb

Topics that are covered in this Video:
0:07 Random forest algorithm
0:50 How to build multiple decision trees based on single data set?
2:34 Use of sklearn digits data set to make a classification using random forest
3:04 Coding (Start) (Use sklearn digits dataset for classification using random forest)
7:10 sklearn.ensemble RandomForestClassifier
10:36 Confusion Matrix (sklearn.metrics confusion_matrix)
12:04 Exercise (Classify iris flower using sklearn iris flower dataset and random forest classifier)

Next Video:
Machine Learning Tutorial Python 12 – K Fold Cross Validation: https://www.youtube.com/watch?v=gJo0uNL-5Qw&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=13

Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV

Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg

Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl

Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1

Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub

Source


[ad_2]

Comment List

  • codebasics
    November 25, 2020

    This video was amazing. Thanks!

  • codebasics
    November 25, 2020

    please make a video on hyperparameter tuning
    it would be a great help

  • codebasics
    November 25, 2020

    Hey, Hope you're doing well! I have a query regarding random forest algo! I want to ask that I have predicted random forests algo and made 70 30 ratio! But how i can specify the prediction for 30days! Any variable or specifier?
    Looking forward to hearing from you soon!
    Thanky!

  • codebasics
    November 25, 2020

    Sir in my jupiter notebook after writing this program random forest hyperparameters are not displayed, so what I do now?

  • codebasics
    November 25, 2020

    n_estimator = 1 is the best in my model, that so damn hard to explain, maybe hence the datasets so small ?

  • codebasics
    November 25, 2020

    default 100 n_estimators or 20 n_estimator , each case it gives 1.0 accuracy. well after getting on this channel , i can feel the warmth on the tip of my fingers.

  • codebasics
    November 25, 2020

    I didn't understand the "number of random forests".

  • codebasics
    November 25, 2020

    You are funny! Thanks for the tutorial

  • codebasics
    November 25, 2020

    At n_estimator = 100, I got a score of 1.0

  • codebasics
    November 25, 2020

    100 % accuracy with 30 trees in the forest

  • codebasics
    November 25, 2020

    I am not afraid of you, but I respect you!
    So I am gonna do the exercise right now!

  • codebasics
    November 25, 2020

    score is 93 with n_estimator =10
    score is 96 with n_estimator =20

  • codebasics
    November 25, 2020

    Thank you so much for this tutorial
    my accuracy score is 0.9666667 with n_estimators at 40

  • codebasics
    November 25, 2020

    I am always getting output like this
    RandomForestClassifier()

    instead of below output

    RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini",
    max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=20, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False)

    then what should I do to get that with parameters.
    help me.

  • codebasics
    November 25, 2020

    Smart Dewds !!!

  • codebasics
    November 25, 2020

    I got solution.But when i run a confusion matrix i am get 0,1,2 for setosa,versicolor,virginica in a graph. how to convert numbers into there names in the graph.

  • codebasics
    November 25, 2020

    X: Hi! I have no coding background but i want to learn Data Science.Do you know any youtube Chanel who has perfect strategical video.
    codebasics subscriber:Go to codbasics .1st he explain basic,then show how to implement,then give you Exercise.
    X:But I am poor in Math
    codebasics subscriber:Ha Ha Ha.
    X:He was right .He save my lot of money

  • codebasics
    November 25, 2020

    Sir I got score=1.0 for estimator=10
    And random_state=10
    Very nice explanation👌👌👌

  • codebasics
    November 25, 2020

    I got 100% accuracy with default estimator and random_state=10. Thanks a lot Sir

  • codebasics
    November 25, 2020

    accuracy = 96.66% (n_estimators=10) and it is maximum in my case

  • codebasics
    November 25, 2020

    Thank you sir…I got 100% accuracy with n_estimator 90

  • codebasics
    November 25, 2020

    Sir I don’t know why, but whenever I am applying any Algorithm on iris dataset.. I get model.score()=1.0.. Which means 100% accuracy.. can it be possible or is there a problem with my program.. i am using train test split .. and even am using them at appropriate positions… Plz Answer asap

  • codebasics
    November 25, 2020

    n_estimators = 10, criterion = 'entropy' led to a 100% accurate model !! Thanks!

  • codebasics
    November 25, 2020

    The default value of n_estimators changed from 10 to 100 in 0.22 version of skllearn. i got accuracy of 95.56 with n_estimators = 10 and for 100 the same.

  • codebasics
    November 25, 2020

    When I fit the data into model I didn't get the output as you like all feature are included in in model.It just showed me the model fitted nothing else.what can I do for to see full details of model at fitting??

  • codebasics
    November 25, 2020

    Awesome channel.

    I have a question though.

    To find the optimal n_estimators I made a loop that went from n_estimators=1 until a number of my choice (number_trees)

    But I thought that a lucky train_test_split could give a very good score to a shitty model. So i made an inside loop that run up to a number of my choice (number_sets) the split, Train model, score and keep the best and worst scores.

    The result is that I see absolutely no tendency on the score depending on n_estimators.

    For example, with n_iterations = 3 and doing the split 5 times, the worst i get is 0.97 accuracy, which is great

    But with n_iterations = 4, the worst i get is 0.89, which is worse

    But then again, n_iterations = 10 i get 0.97

    And so on so forth.

    My question is, why do not I see a tendency on the score depending on n_estimators? I was expecting the score to go up up to a certain n_estimators and then not changing.

    CODE (YouTube doesnt allow copypaste so there might be a typo)

    number_trees = 100

    number_sets = 5

    pd.set_Option("Display.max_rows", None)

    results = pd.DataFrame(columns = ["min_score", "max_score"])

    for i in range (1, number_trees+1):

    modeli = RandomForestClassifier(n_estimators = i)

    min_score = 1

    max_score = 0

    for j in range (number_sets):

    X_Train, X_test, y_train, y_test = Train_test_split(X,y)

    modeli.fit(X_train, y_train)

    score = modeli.score(X_test, y_test)

    if score > max_score:

    max_score = score

    if score < min_score:

    min_score = score

    results.loc[i, "min_score"]= min_score

    results.loc[i,"max_score"]= max_score

    results

  • codebasics
    November 25, 2020

    Is it possible to predict a set of numbers that will output from a random number generator, finding the algorithm, in order to duplicate the same pattern of results?

  • codebasics
    November 25, 2020

    When using dir(digits) . It is displaying something else like the below
    ['__annotations__',
    '__call__',
    '__class__',
    '__closure__',
    '__code__',
    '__defaults__',
    '__delattr__',
    '__dict__',
    '__dir__',
    '__doc__',
    '__eq__',
    '__format__',
    '__ge__',
    '__get__',
    '__getattribute__',
    '__globals__',
    '__gt__',
    '__hash__',
    '__init__',
    '__init_subclass__',
    '__kwdefaults__',
    '__le__',
    '__lt__',
    '__module__',

    CAN ANYONE PLEASE HELP ME OUT COZ I M COMPLETELY NEW TO THIS

  • codebasics
    November 25, 2020

    I got around (95%) is it good or not ..and. n_estimators=30 when I m changing estimator it is decreased my accuracy and my (x_test,y_test=1)

  • codebasics
    November 25, 2020

    Awsome,Can we say the defination of random forest as "Combination or a group of decision trees" ? thank you

  • codebasics
    November 25, 2020

    It's good. But if u explain what parameters need to fine tunned it would be great

  • codebasics
    November 25, 2020

    Hi Sir,
    Can we use any other model (eg: svm) with the random forest approach, that is, by creating an ensemble out of 10 svm models and getting a majority vote?
    Thank you for the wonderful video.

  • codebasics
    November 25, 2020

    n_estimators = 1 (also 290 or bigger) is even made accuracy %100 but, as all we know , this type of datasets are prepared for learning phases, so making %100 accuracy is so easy as well.

  • codebasics
    November 25, 2020

    I achieved an accuracy of .9736. Earlier, I got an accuracy of .9 when the test size was 0.2 and changing the number of trees wasn't changing the accuracy much. So, I tweaked the test size to .25 and tried different number of tree size. The best I got was .9736 with n_estimators = 60 and criterion = entropy gives a better result.
    Thank you so much sir for the series. This is the best Youtube Series on Machine Learning out there!!

  • codebasics
    November 25, 2020

    With n_estimators = 70, and Score = 97.8

  • codebasics
    November 25, 2020

    Awesome training.. I like all your videos.. The spelling of Random is incorrect in your starting video page.. Please do correct it. It is spelt as RENDOM.

  • codebasics
    November 25, 2020

    100% accuracy on the given exercise. I used n_estimators = 1

Write a comment