This project implements four popular clustering algorithms from scratch in Python, designed to work for datasets with d >= 2
dimensions and k >= 2
clusters. The implementations are tested on 2D datasets and compared visually with scikit-learn's implementations to evaluate correctness and performance.
- K-Means Clustering
- Gaussian Mixture Model (GMM) using Expectation-Maximization (EM)
- Mean-Shift Clustering
- Agglomerative Clustering
KMeans.py
: K-Means clustering.KMeans_Ver0.py
: K-Means clustering (2nd version).GaussianMM.py
: EM-GMM.GaussianMM_Ver0.py
: EM-GMM with functions of AIC, BIC and predict (2nd version).MeanShift.py
: Mean-Shift clustering.Agglomerative.py
: Agglomerative clustering.
test_2d_visualization.py
:
Tests each implementation on 2D datasets with visualization, comparing the results to scikit-learn's equivalent algorithms.data_2d_test/
:
Contains the datasets used for testing.test_2d_visualization_results/
:
Stores the output images of the clustering results.
Algorithm | My Implementation | Scikit-learn |
---|---|---|
Agglomerative | ||
EM-GMM | ||
K-Means | ||
Mean-Shift |
Algorithm | My Implementation | Scikit-learn |
---|---|---|
Agglomerative | ||
EM-GMM | ||
K-Means | ||
Mean-Shift |
Algorithm | My Implementation | Scikit-learn |
---|---|---|
Agglomerative | ||
EM-GMM | ||
K-Means | ||
Mean-Shift |