## Machine Learning Tutorial Python – 13: K Means Clustering

K Means algorithm is unsupervised machine learning technique used to cluster data points. In this tutorial we will go over some theory behind how k means works and then solve income group clustering problem using sklearn, kmeans and python. Elbow method is a technique used to determine optimal number of k, we will review that method as well.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #kmeans #MachineLearningTechnique #sklearn

Topics that are covered in this Video:
0:00 introduction
0:08 Theory – Explanation of Supervised vs Unsupervised learning and how kmeans clustering works. kmeans is unsupervised learning
5:00 Elbow method
7:33 Coding (start) (Cluster people income based on age)
9:38 sklearn.cluster KMeans model creation and training
14:56 Use MinMaxScaler from sklearn
24:07 Exercise (Cluster iris flowers using their petal width and length)

Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while youโre typing. I’ve been using Kite and I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codebasics&utm_content=description-only

Next Video:
Machine Learning Tutorial Python – 14: Naive Bayes Part 1: https://www.youtube.com/watch?v=PPeaRc-r1OI&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=15

Populor Playlist:

To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.

Website: http://codebasicshub.com/

Source

### Comment List

• codebasics
January 22, 2021

Very Very Good tutorial! You have explained each and every concept very nicely. Thank you so much๐๐๐

• codebasics
January 22, 2021

life saver thank you sir

• codebasics
January 22, 2021

thank you soooo much .It REALLY helped me!!!

• codebasics
January 22, 2021

nicely explained! Thank you!

• codebasics
January 22, 2021

When I train iris dataset on KMeans model and then cross check the clusters from official dataset, I found that some label are coming wrong,
why is this?

• codebasics
January 22, 2021

I am sorry I wan to ask about import data. Where we have to save our data, so that it can be imported into jupyter?
Thank you….

• codebasics
January 22, 2021

• codebasics
January 22, 2021

Very good explanation, thanks a lot !!

• codebasics
January 22, 2021

Best channel for explanations on ML algorithms. Thank you so much ๐ , definitely subscribed .

• codebasics
January 22, 2021

Hi, I have a data set in which i have data from two populations that come to be cluster, BUT one data point lost the label saying where it comes from, I should you clustering to define where it come from. Can you help me?

• codebasics
January 22, 2021

superbbbbb mann truly superbbbbb

• codebasics
January 22, 2021

I got a perfect cluster at k=2,
and we need to use scaler preprocessing, otherwise, only one point is miscalculated.

• codebasics
January 22, 2021

Excellent video sir

• codebasics
January 22, 2021

how preprocessing module works, why we use here?

• codebasics
January 22, 2021

Very good step by step explanation along with python codes. I need to be an expert in python coding how to go about it.

• codebasics
January 22, 2021

Who else get this error and did solve it?

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

• codebasics
January 22, 2021

Thanks

• codebasics
January 22, 2021

hi, thanks for the tutorial. i have a question. i want to be able to iterate my data frame to assign each data to corresponding cluster label.

for instance, here in your examle, for the first row ( this first row / letโs say customer) you have Income and Age feature and cluster label. In stead of appending the cluster label as a column at the end, I want to print
X customer | Age| Age_cluster| Income | Income_cluster
Y customer | Age | Age_cluster| Income| Income_cluster ….

In other words, if Income has 2 different cluster groups, and age letโs say 4 different cluster groups I want to show as X customerโs age 25, and falls unders 1st cluster (out of 2) and Income 250K and falls under 3rd cluster (out of 4)

I havenโt been able to find a good iteration method as such, I was wondering if you could show an example as such.
Thanks!

• codebasics
January 22, 2021

Very clean explanatฤฑons. Thank you. You should be more visible on YouTube!

• codebasics
January 22, 2021

If you have more than 2 atributes use.. df([[ 'name', 'example]] >X ,[[ 'name', 'example]] >Y )

• codebasics
January 22, 2021

Why it always error in line * df['Income(\$)] = scaler.transform(df['Income(\$)']) *
The Error is expected 2D array, got 1D array instead:

• codebasics
January 22, 2021

Ur just awesome guy!

• codebasics
January 22, 2021

Hi sir, is there a proper way to get the K value from the elbow plot in stead of just estimating with naked eye? Thanks!

• codebasics
January 22, 2021

Excellent, amazing . You make it so easy. Thank you sir

• codebasics
January 22, 2021

You are probably one of the best teachers I have come across. Thank you so much!

• codebasics
January 22, 2021

unfortunately SSE was not explained and also why to perform standardization of the data did not come out clearly , sir

• codebasics
January 22, 2021

what is the use of k mean clustering

• codebasics
January 22, 2021

Great tutorial, excellent explanation! I have a dataset with 4 features which I want to categorize into two classes (k=2) but I want to be able to specify the starting centroid values. Is this possible? Any sort of explanation will be very well appreciated. Many thanks

• codebasics
January 22, 2021

I have started loving machine learning due to the simplicity of explanations.

• codebasics
January 22, 2021

What about distance measures? I know someone who said they use a "Pearson 1-R" distance for kmeans. What does that mean and how would that be done?

• codebasics
January 22, 2021

how to do clustering with more than 5 variables ?

• codebasics
January 22, 2021

it is really awesome, please do a video for KNN, thanks in advance

• codebasics
January 22, 2021

Thank you so much sir, from Bangladesh.

• codebasics
January 22, 2021

Summarizing the algorithm for K Means clustering based on this video:
1. Start with k centroids by putting them at random points here k =2

2. Compute distance of every point from centroid and cluster them accordingly

3. Adjust centroid so they become center of gravity of given cluster

4. Again recluster every point based on distance with adjusted centroid

5. Reiterate until data points stop changing cluster

• codebasics
January 22, 2021

hello, sir if if , i put K= 5,do the same steps. what would happen to k-elbow method? k-elbow method kept say it is at k=3. Can you explain sir.

• codebasics
January 22, 2021

sir, can u suggest me a project to work with so that through reverse engineering , i can have really gud grasp on initials of subset(ML, Data-Science ), and can mention it in my resume too.

• codebasics
January 22, 2021

Exercise done. viewing the initial plot n_clusters seems equal to 2, bt using ellbow method clears it to use n_clusters =3,,,,,Enjoying this holiday!!!!!!!!!!

• codebasics
January 22, 2021

The way explained, it really understandable… Keep uploading more n more videos on ML .. with Case Study .. Thanks in Advance

• codebasics
January 22, 2021

sir, this is the best explanation, thankyou sir

• codebasics
January 22, 2021

Sir, you say the best k is 3, but I see in the sum square error the lowest value 23:45 is in the order of k which is 9, this makes me dizzy

• codebasics
January 22, 2021

19:15 so by doing minmax scaler, k-means algorithm is more accurate in grouping data, right?

• codebasics
January 22, 2021

10:35 Sir, In the k-means algorithm, if the value of an attribute in the form of string / category must be changed to numeric?

• codebasics
January 22, 2021

my grad school professor explains this very badly. You explain things very well with patience, you are the definition of a good teacher

• codebasics
January 22, 2021

Dear whenever u make a video please mention the data link also so that we can use the same data

• codebasics
January 22, 2021

Hi Dhaval ji – excellent video on KMC. Very precise in presenting, Particularly liked the cluster_centers_ and inertia_ concepts. The final elbow plot with for loop being the starting point was unparalleled in clarity. Thanks a lot

• codebasics
January 22, 2021

thanks for this tutorial, optimal value of K=3