Machine Learning Tutorial Python – 13: K Means Clustering
[ad_1]
K Means algorithm is unsupervised machine learning technique used to cluster data points. In this tutorial we will go over some theory behind how k means works and then solve income group clustering problem using sklearn, kmeans and python. Elbow method is a technique used to determine optimal number of k, we will review that method as well.
#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #kmeans #MachineLearningTechnique #sklearn
Code: https://github.com/codebasics/py/blob/master/ML/13_kmeans/13_kmeans_tutorial.ipynb
data link: https://github.com/codebasics/py/tree/master/ML/13_kmeans
Exercise solution: https://github.com/codebasics/py/blob/master/ML/13_kmeans/Exercise/13_kmeans_exercise.ipynb
Topics that are covered in this Video:
0:00 introduction
0:08 Theory – Explanation of Supervised vs Unsupervised learning and how kmeans clustering works. kmeans is unsupervised learning
5:00 Elbow method
7:33 Coding (start) (Cluster people income based on age)
9:38 sklearn.cluster KMeans model creation and training
14:56 Use MinMaxScaler from sklearn
24:07 Exercise (Cluster iris flowers using their petal width and length)
Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while youโre typing. I’ve been using Kite and I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codebasics&utm_content=description-only
Next Video:
Machine Learning Tutorial Python – 14: Naive Bayes Part 1: https://www.youtube.com/watch?v=PPeaRc-r1OI&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=15
Populor Playlist:
Data Science Full Course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV
Data Science Project: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg
Machine learning tutorials: https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
Pandas: https://www.youtube.com/watch?v=CmorAWRsCAw&list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy
matplotlib: https://www.youtube.com/watch?v=qqwf4Vuj8oM&list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl
Python: https://www.youtube.com/watch?v=eykoKxsYtow&list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0&index=1
Jupyter Notebook: https://www.youtube.com/watch?v=q_BzsPxwLOE&list=PLeo1K3hjS3uuZPwzACannnFSn9qHn8to8
To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.
Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub
Source
[ad_2]
Very Very Good tutorial! You have explained each and every concept very nicely. Thank you so much๐๐๐
life saver thank you sir
thank you soooo much .It REALLY helped me!!!
nicely explained! Thank you!
When I train iris dataset on KMeans model and then cross check the clusters from official dataset, I found that some label are coming wrong,
why is this?
I am sorry I wan to ask about import data. Where we have to save our data, so that it can be imported into jupyter?
Thank you….
Sir can you please upload a separate video on KNN.
Very good explanation, thanks a lot !!
Best channel for explanations on ML algorithms. Thank you so much ๐ , definitely subscribed .
Hi, I have a data set in which i have data from two populations that come to be cluster, BUT one data point lost the label saying where it comes from, I should you clustering to define where it come from. Can you help me?
superbbbbb mann truly superbbbbb
24:56,
I got a perfect cluster at k=2,
and we need to use scaler preprocessing, otherwise, only one point is miscalculated.
Excellent video sir
how preprocessing module works, why we use here?
Very good step by step explanation along with python codes. I need to be an expert in python coding how to go about it.
Who else get this error and did solve it?
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Thanks
hi, thanks for the tutorial. i have a question. i want to be able to iterate my data frame to assign each data to corresponding cluster label.
for instance, here in your examle, for the first row ( this first row / letโs say customer) you have Income and Age feature and cluster label. In stead of appending the cluster label as a column at the end, I want to print
X customer | Age| Age_cluster| Income | Income_cluster
Y customer | Age | Age_cluster| Income| Income_cluster ….
In other words, if Income has 2 different cluster groups, and age letโs say 4 different cluster groups I want to show as X customerโs age 25, and falls unders 1st cluster (out of 2) and Income 250K and falls under 3rd cluster (out of 4)
I havenโt been able to find a good iteration method as such, I was wondering if you could show an example as such.
Thanks!
Very clean explanatฤฑons. Thank you. You should be more visible on YouTube!
If you have more than 2 atributes use.. df([[ 'name', 'example]] >X ,[[ 'name', 'example]] >Y )
Why it always error in line * df['Income($)] = scaler.transform(df['Income($)']) *
The Error is expected 2D array, got 1D array instead:
Ur just awesome guy!
Hi sir, is there a proper way to get the K value from the elbow plot in stead of just estimating with naked eye? Thanks!
Excellent, amazing . You make it so easy. Thank you sir
You are probably one of the best teachers I have come across. Thank you so much!
unfortunately SSE was not explained and also why to perform standardization of the data did not come out clearly , sir
what is the use of k mean clustering
Great tutorial, excellent explanation! I have a dataset with 4 features which I want to categorize into two classes (k=2) but I want to be able to specify the starting centroid values. Is this possible? Any sort of explanation will be very well appreciated. Many thanks
I have started loving machine learning due to the simplicity of explanations.
What about distance measures? I know someone who said they use a "Pearson 1-R" distance for kmeans. What does that mean and how would that be done?
how to do clustering with more than 5 variables ?
it is really awesome, please do a video for KNN, thanks in advance
Thank you so much sir, from Bangladesh.
Summarizing the algorithm for K Means clustering based on this video:
1. Start with k centroids by putting them at random points here k =2
2. Compute distance of every point from centroid and cluster them accordingly
3. Adjust centroid so they become center of gravity of given cluster
4. Again recluster every point based on distance with adjusted centroid
5. Reiterate until data points stop changing cluster
6. Again adjust centroids
hello, sir if if , i put K= 5,do the same steps. what would happen to k-elbow method? k-elbow method kept say it is at k=3. Can you explain sir.
sir, can u suggest me a project to work with so that through reverse engineering , i can have really gud grasp on initials of subset(ML, Data-Science ), and can mention it in my resume too.
Exercise done. viewing the initial plot n_clusters seems equal to 2, bt using ellbow method clears it to use n_clusters =3,,,,,Enjoying this holiday!!!!!!!!!!
The way explained, it really understandable… Keep uploading more n more videos on ML .. with Case Study .. Thanks in Advance
sir, this is the best explanation, thankyou sir
Sir, you say the best k is 3, but I see in the sum square error the lowest value 23:45 is in the order of k which is 9, this makes me dizzy
19:15 so by doing minmax scaler, k-means algorithm is more accurate in grouping data, right?
10:35 Sir, In the k-means algorithm, if the value of an attribute in the form of string / category must be changed to numeric?
my grad school professor explains this very badly. You explain things very well with patience, you are the definition of a good teacher
Dear whenever u make a video please mention the data link also so that we can use the same data
Hi Dhaval ji – excellent video on KMC. Very precise in presenting, Particularly liked the cluster_centers_ and inertia_ concepts. The final elbow plot with for loop being the starting point was unparalleled in clarity. Thanks a lot
thanks for this tutorial, optimal value of K=3