Machine Learning with Scikit-learn – Data Analysis with Python and Pandas p.6




[ad_1]

How to include the Pandas data analysis library into your machine learning workflow.

Text-based tutorial: https://pythonprogramming.net/machine-learning-python3-pandas-data-analysis/

Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

Source


[ad_2]

Comment List

  • sentdex
    November 13, 2020

    Why didn't he also scale the y ?

  • sentdex
    November 13, 2020

    As an advice; instead of giving certain number for test_size = 200, I prefer to use this:
    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

  • sentdex
    November 13, 2020

    I think `X = preprocessing.scale(X)` causes data leakage because it involved test data. I think we should use `s = StandardScaler().fit(X_train)`, `X1 = s.transform(X_train)`, and `X2 = s.transform(X_test)` instead.

  • sentdex
    November 13, 2020

    X = df.drop("price", axis=1).values

    what is values do here ?

  • sentdex
    November 13, 2020

    easy to understand. thank you~

  • sentdex
    November 13, 2020

    Best channel ever😍 … Upload more videos in data analysis

  • sentdex
    November 13, 2020

    Best channel ever😍 … Upload more videos in data analysis

  • sentdex
    November 13, 2020

    7:11 to 11:06 – unique_counts = pd.DataFrame.from_records([(col, df[col].nunique()) for col in df.columns], columns=['Column_Name', 'Num_Unique']).sort_values(by=['Num_Unique'])

    unique_counts.sort_values(by='Num_Unique',ascending=False)

    Column_Name Num_Unique
    1 cut 5
    2 color 7
    3 clarity 8
    5 table 127
    4 depth 184
    0 carat 273
    9 z 375
    8 y 552
    7 x 554
    6 price 11602

    then

    for col in df.columns:

    if df[col].nunique() < 60 :

    df[col] = df[col].astype('category')

    df.dtypes

    carat float64
    cut category
    color category
    clarity category
    depth float64
    table float64
    price int64
    x float64
    y float64
    z float64
    dtype: object

  • sentdex
    November 13, 2020

    I watched a 20-minute ad. Does that help more than watching a 30-second one lol?

  • sentdex
    November 13, 2020

    Thank you soooo much you are such a great teacher

  • sentdex
    November 13, 2020

    after 1 hour of searching i find the answer here you are my god!!!
    press:shift+enter

  • sentdex
    November 13, 2020

    What is cat.codes?

  • sentdex
    November 13, 2020

    I needed to add "df = df.reset_index()" here in order for the index column to be nummerical as yours is showing. index_col=0 was not sufficient. Thanks a lot for doing all of this bro.

  • sentdex
    November 13, 2020

    is it common to map the strings into a number using dicts or are there other popular methods?

  • sentdex
    November 13, 2020

    I got this error at 22:30 ValueError: Found array with dim 3. Estimator expected <= 2…. I cant decipher it

  • sentdex
    November 13, 2020

    You have to normalize data before running linear regresion then the resulting numbers will be coorect between 0 and 1

  • sentdex
    November 13, 2020

    You are the best, Would love to see more videos in this series!

  • sentdex
    November 13, 2020

    Great tutorial series, I have one question : you gave Value =1 which represent wrost for "cut" and "Color" but for "clarity" , you gave value = 1 for best one, shouldn't it be FL = 1 and "I3" = 11? or this isn't matter here?

  • sentdex
    November 13, 2020

    "not only is it unless, but also in string form…. fascinating"
    I laughed too loud in the coffee shop and got stared at….

  • sentdex
    November 13, 2020

    @ 9:13 instead of copying from text to dictionary. you can do:
    def col_txt_dict(col):

    return {df[col].unique()[i]: i+1 for i in range(len(df[col].unique()))}

Write a comment