Machine Learning with Python – Part 2: Decision Tree




[ad_1]

Learn Full-Stack development through my flagship Udemy Course using .NET Core 3 + Vue.js + PostgreSQL
https://www.udemy.com/course/learn-full-stack-vue-net-core-postgres/

In this series, we’ll explore machine learning with Python by building a classifier to determine whether or not we might like a song based on its attributes, which are provided by the Spotify API. We’ll use an existing data set from Kaggle to explore and implement various classifiers.

In Part 2, we’ll create a Decision Tree classifier and visualize it using graphviz, pydotplus, scipy, and matplotlib! I’ll speak briefly about the advantages and disadvantages of Decision Tree classifiers.

If you enjoy my videos, support me on Patreon!
https://www.patreon.com/wesdoyle

Throughout this series, we’ll:

– Perform Exploratory Data Analysis (EDA) in a Jupyter Notebook using Pandas, Numpy, matplotlib, and other commonly-used libraries

– Build a Decision Tree classifier using scikit-learn

– Build a Random Forest classifier using scikit-learn

– Build an Artificial Neural Network classifier using Keras

Links:

Link to the dataset on Kaggle:
https://www.kaggle.com/geomack/spotifyclassification

Intro track is adapted from “despondency” by Fog Lake, used with permission from the artist. Go check out his music, it’s fantastic! –
https://foglake.bandcamp.com/

Source


[ad_2]

Comment List

  • Wes Doyle
    December 11, 2020

    @Wes Doyle, during model.fit (c.fit) , it shows me value error.
    could not convert string to float: 'GodLovesUgly'. Any Idea to solve this

  • Wes Doyle
    December 11, 2020

    A good tutorial and explanation using decision trees.
    I would like to know why test_train_split was not used, because it can be used to control the percentage of testing and training data, as well as the random_state value, which may give better accuracy for the tree.

  • Wes Doyle
    December 11, 2020

    We can no longer use scipy.misc.imread because imread has been removed from later versions

    Use instead the following code with the help of imageio:

    import imageio

    def show_tree(tree, features, path):

    f=io.StringIO()

    export_graphviz(tree, out_file = f, feature_names = features)

    pydotplus.graph_from_dot_data(f.getvalue()).write_png(path)

    img = imageio.imread(path)

    plt.rcParams["figure.figsize"] = [20,20]

    plt.imshow(img)

  • Wes Doyle
    December 11, 2020

    Thank you, very good explanation. If anyone reads this comment from INDONESIA or wants to see tutorials on Decision Tree topic with Bahasa Indonesia, i also have another good video.
    https://youtu.be/sgnOaSpB5co

    (Ngoding decision tree di python3 + optimasi)

  • Wes Doyle
    December 11, 2020

    is there part 3 ?

  • Wes Doyle
    December 11, 2020

    Man!!!!

  • Wes Doyle
    December 11, 2020

    Thanks for the video. Too bad that i didn't find this channel earlier. GREAT!

  • Wes Doyle
    December 11, 2020

    Very interesting to see the decision tree visually. Thanks for the code logic!

  • Wes Doyle
    December 11, 2020

    name 'pydotplus' is not defined

  • Wes Doyle
    December 11, 2020

    Can anyone recommend me a comsumer analysis dataset suitable for decision tree classifer model ?

  • Wes Doyle
    December 11, 2020

    You can get a sneak peak of decision tree using AI here https://youtu.be/Cc690rwRzVY

  • Wes Doyle
    December 11, 2020

    Part 3?

  • Wes Doyle
    December 11, 2020

    Problems I got with solutions:

    export_graphviz: Check your import, from sklearn.tree import DecisionTreeClassifier, export_graphviz

    misc_imread isn't used anymore. I used matplotlib's imread which is: plt.imread (if you called it plt as in the video)

  • Wes Doyle
    December 11, 2020

    Hi. Could you please explain why you used f = io.string(io)
    I spent some 2 weeks trying to understand it. Now I have a good idea what's it used for but I wana hear from you.

    I'd be greatful

    Thanks!

  • Wes Doyle
    December 11, 2020

    After you figure-out the problem with Graphviz path , the problem with Scipy that doesn't work anymore, and have to install l Imageio. it is was a very nice tutorial. 🙂

  • Wes Doyle
    December 11, 2020

    Great video!

  • Wes Doyle
    December 11, 2020

    Hello Wes, good 2 videos on machine learning using Python Scikit Learn! Look forwarding to more machine learning videos from you!
    Terry

  • Wes Doyle
    December 11, 2020

    part 3??

  • Wes Doyle
    December 11, 2020

    hi Wes, nice demo, thx for sharing. Quick question, for example, the first feature to split -> instrumentalness, how was the 0.0001 was selected as the splitting value? internally, would sklearn loop through each district value of instrumentalness and calc gini and 0.0001 was chosen is because that value has the best splitting effect? could you explain more? Thanks again. looking forward more videos regarding decision trees.

  • Wes Doyle
    December 11, 2020

    i am getting this error : can anyone help ? —————————————————————————
    NameError Traceback (most recent call last)
    <ipython-input-374-5e5a7e995826> in <module>()
    —-> 1 show_tree(dt, features, "dec_tree_01.png")

    <ipython-input-373-1fe163d2f1a9> in show_tree(tree, features, path)
    1 def show_tree(tree, features, path):
    —-> 2 f = io.StringIO()
    3 export_graphviz(tree, out_file=f, feature_names=features)
    4 pydotplus.graph_from_dot_data(f.getvalue()).write_png(path)
    5 img = imageio.imread(path)

    NameError: name 'io' is not defined

  • Wes Doyle
    December 11, 2020

    i mean the videos on random forest, and artificial neural network using python

  • Wes Doyle
    December 11, 2020

    hey this is an awesome vid, where can i get the part 3 one

  • Wes Doyle
    December 11, 2020

    Is there package in Python to build Soft Decision Tree?

  • Wes Doyle
    December 11, 2020

    It would be helpful if you upload the code.

  • Wes Doyle
    December 11, 2020

    Can anyone help me here? I have few dummy variables for columns such as gender, education_qualification, marital status etc. If a split shows as MaritalStatus_M <= 0.5201, how would I interpret that? Other "things" (sorry, don't know the technical term) in that split are as follows:
    gini = 0.0245
    samples = 161
    value = [2, 159]
    class = Churn (customer who left the company)

  • Wes Doyle
    December 11, 2020

    Not sure if you are still answering questions, but:
    How would you find the accuracy of both "1 and 0" separately? how accurate were your values for Music you liked vs the accuracy for your values for music you dont like? For example your accuracy of 74.9, is that for 1 or 0? Or finally in other words, a confusion matrix.

  • Wes Doyle
    December 11, 2020

    can someone please explain each line in the def show_tree(): function which was created to visualize the tree? Thanks

  • Wes Doyle
    December 11, 2020

    To everyone attempting to follow this part, it appears that between Part 1 and Part 2, there is a change in the modules imported. specifically in 12:37 on part 1, there were no mention of graphwiz, pydotplus, io, and misc, which were identified only when on Part2, at 12:22. I had banged my head a few times until I replayed it and read through everything again.

  • Wes Doyle
    December 11, 2020

    Great content, great editing, really nice man. Thanks for putting this content on Youtube.

Write a comment