Machine learning with Python and sklearn – Hierarchical Clustering (E-commerce dataset example)




[ad_1]

In this Machine Learning & Python video tutorial I demonstrate Hierarchical Clustering method.

Hierarchical Clustering is a part of Machine Learning and belongs to Clustering family:
– Connectivity-based clustering (hierarchical clustering)
– Centroid-based clustering (K-Means Clustering) – https://www.youtube.com/watch?v=iybATqk6LNI
– Distribution-based clustering
– Density-based clustering

In data mining and statistics, Hierarchical Clustering also called hierarchical cluster analysis or HCA is a method of cluster analysis which seeks to build a hierarchy of clusters. In this video I demonstrate how Agglomerative Hierarchical Clustering is working.

Must know for Hierarchical Clustering is knowing Dendrograms. Dendrogram helps you to decide the optimal number of clusters for your dataset.

For executing task in Python I used:
– sklearn library that is for Machine Learning algorithms.
– ward method that means Minimum Variance Method.

If you are interesting more in Hierarchical Clustering, read my article on LinkedIn where I described my experiment about combining Machine Learning (Hierarchical Clustering) in GIS (Geographical Information System). – https://www.linkedin.com/pulse/machine-learning-gis-hierarchical-clustering-urban-bielinskas

Data-set for this example is taken from https://www.kaggle.com. There you can find many dataset for very different Machine Learning tasks.

Hierarchicaal Clustering is very usable in solving Data Analysis, Data Mining and Statistics problems.

If you have any question or comments please write below.

Do not forget to subscribe me if want to follow my new videos about Machine Learning, Data Science, Python programming and relative issues.

Follow me on LinkedIn: https://www.linkedin.com/in/bielinskas/

Source


[ad_2]

Comment List

  • Dr. Vytautas Bielinskas
    January 21, 2021

    Which editor are you using ?? Thanks !

  • Dr. Vytautas Bielinskas
    January 21, 2021

    I got the error

    in the future, 0-d boolean arrays will be interpreted as a valid boolean index
    how to resolve this?

  • Dr. Vytautas Bielinskas
    January 21, 2021

    Thank you so much for your explanation! I got the clear idea about hierarchical clustering. I was trying the same code using the 'centroid' method as it deals with the outlier issue. but when I put linkage=centroid it gave me an error showing 'Euclidean metric contains only ward, average and complete linkage.
    How can I solve this? Could you please give me any idea?

  • Dr. Vytautas Bielinskas
    January 21, 2021

    Hello, thank you very much for your video. I am trying to apply this to my own data set but when I try to select the columns for X, I get the following error:

    ~AppDataLocalContinuumanaconda3libsite-packagespandascoreindexing.py in _is_valid_list_like(self, key, axis)
    1646 # so don't treat a tuple as a valid indexer
    1647 if isinstance(key, tuple):
    -> 1648 raise IndexingError('Too many indexers')
    1649
    1650 # coerce the key to not exceed the maximum size of the index

    IndexingError: Too many indexers

    Could you please guide me with a possible solution?

Write a comment