Customer segmentation is the process of grouping customers together based on some common characteristics, based on their interactions with the business. In most cases this interaction is in terms of their purchase behavior and patterns. These groups are beneficial for marketing campaigns, in identifying potential profitable customers and in developing customer loyalty.
To identify clusters of customer based on their purchase behaviour by taking into account the recency, frequency and monetary value of their transactions.
Each row of data represents a transaction and each column contains a transaction's attributes.
InvoiceNo : A unique identifier for the invoice. An invoice number shared across rows means that those transactions were performed in a single invoice (multiple purchases).
StockCode : Identifier for items contained in an invoice.
Description : Textual description of each of the stock item.
Quantity : The quantity of the item purchased.
InvoiceDate : Date of purchase.
UnitPrice : Value of each item.
CustomerID : Identifier for customer making the purchase.
Country : Country of customer.
-
Loading Dependencies
-
Loading Data
-
Data Exploration
-
Data Processing
-
Focussing on One Market (UK in this case)
-
Building Recency Feature
-
Calculating Frequency and Monetary Values
-
Customer Segmentation Kmeans Algorithm Silhouette Score Metric
-
Visualize Customer Segments
We use the silhouette score for finding out the optimal number of clusters during our clustering process.
Sales By Counntry
Top 15 customers contributing to 10.5% of total sales
Sales Recency
Data with Recency, Frequency and Monetary feature
Developed and tested 3,4 and 5 number of clusters for their silhouette score. The results are as follows:
Clusters 3
-
There is a stark difference in Monetary vallue of customer
-
Cluster 2 is the cluster with high value customers who shop frequently and is certainly an important segement for each business.
-
Cluster 0 and 1 has customer groups with low spend and medium spends
Clusters 4
- The high value customers are subdivided into two groups, one with lower spends and lower frequency (represented by cluster 0) and another with high amount and higher frquency but lower recency represented by cluster 1.
Clusters 5
- With 5 clusters too we have two subgroup for higher spend customers and 3 subgroup for customers with lower spend but varying frequency and recency.
Amount vs Frequency
Recency vs Amount
Recency vs Frequency
Going by mathematical metrics we see the silhouette score for 3 clusters is max suggesting that 3 clusters is the optimal number of clusters for this dataset. However we need to include business metrics and domain insights in our modelling process to obtain the best suited data-focussed solution for the bsuiness problem at hand :-)