Skip to content

Clustering algorithm implementaions from scratch with python (k-means, EM-GMM, mean-shift, agglomerative)

Notifications You must be signed in to change notification settings

DolbyUUU/clustering_algorithm_implementation_python

Repository files navigation

Clustering Algorithm Implementation and Visualization from Scratch with Python

Overview

This project implements four popular clustering algorithms from scratch in Python, designed to work for datasets with d >= 2 dimensions and k >= 2 clusters. The implementations are tested on 2D datasets and compared visually with scikit-learn's implementations to evaluate correctness and performance.

Implemented Clustering Algorithms

  1. K-Means Clustering
  2. Gaussian Mixture Model (GMM) using Expectation-Maximization (EM)
  3. Mean-Shift Clustering
  4. Agglomerative Clustering

Python Implementations

  • KMeans.py: K-Means clustering.
  • KMeans_Ver0.py: K-Means clustering (2nd version).
  • GaussianMM.py: EM-GMM.
  • GaussianMM_Ver0.py: EM-GMM with functions of AIC, BIC and predict (2nd version).
  • MeanShift.py: Mean-Shift clustering.
  • Agglomerative.py: Agglomerative clustering.

Evaluations and Tests

  • test_2d_visualization.py:
    Tests each implementation on 2D datasets with visualization, comparing the results to scikit-learn's equivalent algorithms.
  • data_2d_test/:
    Contains the datasets used for testing.
  • test_2d_visualization_results/:
    Stores the output images of the clustering results.

Visualization Results

Blobs Dataset

Algorithm My Implementation Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Moons and Stars Dataset

Algorithm My Implementation Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Sticks Dataset

Algorithm My Implementation Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Releases

No releases published

Packages

No packages published

Languages