Machine Learning Tutorial Python – 13: K Means Clustering




[ad_1]

K Means algorithm is unsupervised machine learning technique used to cluster data points. In this tutorial we will go over some theory behind how k means works and then solve income group clustering problem using sklearn, kmeans and python. Elbow method is a technique used to determine optimal number of k, we will review that method as well.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #kmeans #MachineLearningTechnique #sklearn

Code: https://github.com/codebasics/py/blob/master/ML/13_kmeans/13_kmeans_tutorial.ipynb
data link: https://github.com/codebasics/py/tree/master/ML/13_kmeans

Exercise solution: https://github.com/codebasics/py/blob/master/ML/13_kmeans/Exercise/13_kmeans_exercise.ipynb

Topics that are covered in this Video:
0:00 introduction
0:08 Theory – Explanation of Supervised vs Unsupervised learning and how kmeans clustering works. kmeans is unsupervised learning
5:00 Elbow method
7:33 Coding (start) (Cluster people income based on age)
9:38 sklearn.cluster KMeans model creation and training
14:56 Use MinMaxScaler from sklearn
24:07 Exercise (Cluster iris flowers using their petal width and length)

Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I’ve been using Kite and I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codebasics&utm_content=description-only

Next Video:
Machine Learning Tutorial Python – 14: Naive Bayes Part 1: https://www.youtube.com/watch?v=PPeaRc-r1OI&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=15

Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV

Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg

Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw

Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl

Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1

Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub

Source


[ad_2]

Comment List

  • codebasics
    January 22, 2021

    Very Very Good tutorial! You have explained each and every concept very nicely. Thank you so much😌😌😌

  • codebasics
    January 22, 2021

    life saver thank you sir

  • codebasics
    January 22, 2021

    thank you soooo much .It REALLY helped me!!!

  • codebasics
    January 22, 2021

    nicely explained! Thank you!

  • codebasics
    January 22, 2021

    When I train iris dataset on KMeans model and then cross check the clusters from official dataset, I found that some label are coming wrong,
    why is this?

  • codebasics
    January 22, 2021

    I am sorry I wan to ask about import data. Where we have to save our data, so that it can be imported into jupyter?
    Thank you….

  • codebasics
    January 22, 2021

    Sir can you please upload a separate video on KNN.

  • codebasics
    January 22, 2021

    Very good explanation, thanks a lot !!

  • codebasics
    January 22, 2021

    Best channel for explanations on ML algorithms. Thank you so much 🙂 , definitely subscribed .

  • codebasics
    January 22, 2021

    Hi, I have a data set in which i have data from two populations that come to be cluster, BUT one data point lost the label saying where it comes from, I should you clustering to define where it come from. Can you help me?

  • codebasics
    January 22, 2021

    superbbbbb mann truly superbbbbb

  • codebasics
    January 22, 2021

    24:56,

    I got a perfect cluster at k=2,
    and we need to use scaler preprocessing, otherwise, only one point is miscalculated.

  • codebasics
    January 22, 2021

    Excellent video sir

  • codebasics
    January 22, 2021

    how preprocessing module works, why we use here?

  • codebasics
    January 22, 2021

    Very good step by step explanation along with python codes. I need to be an expert in python coding how to go about it.

  • codebasics
    January 22, 2021

    Who else get this error and did solve it?

    Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

  • codebasics
    January 22, 2021

    Thanks

  • codebasics
    January 22, 2021

    hi, thanks for the tutorial. i have a question. i want to be able to iterate my data frame to assign each data to corresponding cluster label.

    for instance, here in your examle, for the first row ( this first row / let’s say customer) you have Income and Age feature and cluster label. In stead of appending the cluster label as a column at the end, I want to print
    X customer | Age| Age_cluster| Income | Income_cluster
    Y customer | Age | Age_cluster| Income| Income_cluster ….

    In other words, if Income has 2 different cluster groups, and age let’s say 4 different cluster groups I want to show as X customer’s age 25, and falls unders 1st cluster (out of 2) and Income 250K and falls under 3rd cluster (out of 4)

    I haven’t been able to find a good iteration method as such, I was wondering if you could show an example as such.
    Thanks!

  • codebasics
    January 22, 2021

    Very clean explanatıons. Thank you. You should be more visible on YouTube!

  • codebasics
    January 22, 2021

    If you have more than 2 atributes use.. df([[ 'name', 'example]] >X ,[[ 'name', 'example]] >Y )

  • codebasics
    January 22, 2021

    Why it always error in line * df['Income($)] = scaler.transform(df['Income($)']) *
    The Error is expected 2D array, got 1D array instead:

  • codebasics
    January 22, 2021

    Ur just awesome guy!

  • codebasics
    January 22, 2021

    Hi sir, is there a proper way to get the K value from the elbow plot in stead of just estimating with naked eye? Thanks!

  • codebasics
    January 22, 2021

    Excellent, amazing . You make it so easy. Thank you sir

  • codebasics
    January 22, 2021

    You are probably one of the best teachers I have come across. Thank you so much!

  • codebasics
    January 22, 2021

    unfortunately SSE was not explained and also why to perform standardization of the data did not come out clearly , sir

  • codebasics
    January 22, 2021

    what is the use of k mean clustering

  • codebasics
    January 22, 2021

    Great tutorial, excellent explanation! I have a dataset with 4 features which I want to categorize into two classes (k=2) but I want to be able to specify the starting centroid values. Is this possible? Any sort of explanation will be very well appreciated. Many thanks

  • codebasics
    January 22, 2021

    I have started loving machine learning due to the simplicity of explanations.

  • codebasics
    January 22, 2021

    What about distance measures? I know someone who said they use a "Pearson 1-R" distance for kmeans. What does that mean and how would that be done?

  • codebasics
    January 22, 2021

    how to do clustering with more than 5 variables ?

  • codebasics
    January 22, 2021

    it is really awesome, please do a video for KNN, thanks in advance

  • codebasics
    January 22, 2021

    Thank you so much sir, from Bangladesh.

  • codebasics
    January 22, 2021

    Summarizing the algorithm for K Means clustering based on this video:
    1. Start with k centroids by putting them at random points here k =2

    2. Compute distance of every point from centroid and cluster them accordingly

    3. Adjust centroid so they become center of gravity of given cluster

    4. Again recluster every point based on distance with adjusted centroid

    5. Reiterate until data points stop changing cluster

    6. Again adjust centroids

  • codebasics
    January 22, 2021

    hello, sir if if , i put K= 5,do the same steps. what would happen to k-elbow method? k-elbow method kept say it is at k=3. Can you explain sir.

  • codebasics
    January 22, 2021

    sir, can u suggest me a project to work with so that through reverse engineering , i can have really gud grasp on initials of subset(ML, Data-Science ), and can mention it in my resume too.

  • codebasics
    January 22, 2021

    Exercise done. viewing the initial plot n_clusters seems equal to 2, bt using ellbow method clears it to use n_clusters =3,,,,,Enjoying this holiday!!!!!!!!!!

  • codebasics
    January 22, 2021

    The way explained, it really understandable… Keep uploading more n more videos on ML .. with Case Study .. Thanks in Advance

  • codebasics
    January 22, 2021

    sir, this is the best explanation, thankyou sir

  • codebasics
    January 22, 2021

    Sir, you say the best k is 3, but I see in the sum square error the lowest value 23:45 is in the order of k which is 9, this makes me dizzy

  • codebasics
    January 22, 2021

    19:15 so by doing minmax scaler, k-means algorithm is more accurate in grouping data, right?

  • codebasics
    January 22, 2021

    10:35 Sir, In the k-means algorithm, if the value of an attribute in the form of string / category must be changed to numeric?

  • codebasics
    January 22, 2021

    my grad school professor explains this very badly. You explain things very well with patience, you are the definition of a good teacher

  • codebasics
    January 22, 2021

    Dear whenever u make a video please mention the data link also so that we can use the same data

  • codebasics
    January 22, 2021

    Hi Dhaval ji – excellent video on KMC. Very precise in presenting, Particularly liked the cluster_centers_ and inertia_ concepts. The final elbow plot with for loop being the starting point was unparalleled in clarity. Thanks a lot

  • codebasics
    January 22, 2021

    thanks for this tutorial, optimal value of K=3

Write a comment