Clustering Analysis for Market Segmentation: Making It (Actually) Useful


In this blog post, we provide some tips on bridging the gap between clustering analysis and real-life business value. This post will be most useful to machine learning practitioners that want their output to resonate with non-technical audiences (in this author's opinion, this should be a goal for everyone building machine learning models!).


I recently attended a conference with a talk on “Market Segmentation using Clustering Analysis”. As if predestined by nerdy serendipity, it became clear that the presenter’s application of clustering analysis had an uncanny resemblance my current client project. The speaker proclaimed that her main challenge was to identify segments that were both statistically significant as well as compelling from a business perspective. Put another way, to make clustering analysis useful, one must craft a coherent, resonant business narrative around the results.

Clustering is a machine learning technique that discovers interesting structures within data. Segments are automatically identified according to their similarity. When applied to ‘market segmentation’, this means simultaneously comparing customers attributes, behavioral patterns, and/or derived features.

Feature engineering, preprocessing, and tuning are all imperative steps for building a quality clustering model. However, this blog will discuss what comes afterward. What considerations will help draw a connection between a model’s output and real life business needs?

The Process of Belief

Modeling market segments in an “unsupervised” way is an excellent opportunity to confirm, enhance, or refine an organization's institutional knowledge. It is incumbent on the modeler to discern what market characteristics are important to business stakeholders.

To some extent, organizations already have established beliefs about their market. These beliefs may be codified as dogma within executive teams or in sales initiatives (E.g., ‘Gold Level users are those that have ten logins a month’). However, a company’s intuition around their market might be less explicit and only come to light during meetings and conversations.

Leverage the organization's beliefs about the market to make your modeling output more impactful. How closely do the segment’s attributes align with the organization’s rules-of-thumb? Do KPIs bear out in how the clusters separate? What ongoing business processes can be informed with this new, higher-resolution insight? By asking these questions, your clustering analysis can apply scientific rigor to reenforce or enhance an organization’s beliefs.

Preexisting Conditions

As part of regular operations, organizations often have classifications of their own. These segments tend to develop organically in order to enable basic business functions like fulfillment, sales compensation, or marketing execution. These classifications can include things such as product catalogs, sales territories, or user taxonomies.

While it may be tempting to include preexisting classifications as categorical features*, comparing your models output to these preexisting classifications can be incredibly insightful. The contrast of how clusters ‘naturally’ separate can illuminate the way an organization formally thinks about their market. If established classifications do not separate out during the modeling process, the analysis may contain useful insights for better classifications.

*Additionally, including categorical variables into a clustering model can make the modeling process unnecessarily difficult and complex

Human Interpretability

Clustering models require creativity on the part of the modeler to craft a compelling narrative that is actionable for business stakeholders. Limiting the number of clusters your model renders can help make the results more digestible for the audience. The challenge is taking both technical and business factors into consideration when tuning your model.

From a pure modeling perspective, clustering models should balance the tension between minimizing the sum of squares distance and maximizing the silhouette. In order to be valuable, however, the number of market segments identified in a model must be interpretable for a human. That is, it will be difficult to explain hundreds of different market segments to even the most savvy business stakeholders.


Using the tricks above, I created persona based customer cohorts. These customer ‘types’ have very different characteristics, behavior, and motivations. As a result, our client could granularly target and incentivize these cohorts. By focusing on crafting a compelling business narrative, modelers can make clustering analysis highly useful for real-life applications.

Read our latest white paper on operationalizing your analytics.

Brendan Freehart is a Data Engineer at Silectis.
You can find him on LinkedIn.