Automatic Cluster Discovery
One of the biggest challenges in unsupervised learning is choosing the number of clusters ( ).
While standard K-Means requires you to specify upfront, X-Means (available in DruidJS) automatically discovers the optimal number of clusters using the Bayesian Information Criterion (BIC).
Discovering clusters automatically...
How it works
- Initial K: X-Means starts with a minimum number of clusters (typically ).
- Recursive Splitting: For each cluster, the algorithm attempts to split it into two sub-clusters using K-Means.
- Statistical Validation: It compares the BIC score of the original cluster versus the split. If the split provides a significantly better fit, the new clusters are kept.
- Iterative Refinement: This process repeats until it reaches a maximum or no further improvements are found.
Benefits
- Zero Configuration: No need to run "elbow plots" or manual searches for .
- Objective Fit: Uses a robust statistical measure (BIC) to prevent over-fitting.
- Speed: Built on top of K-Means, it remains efficient for large datasets.
How-to (Code)
Use XMeans when you want the algorithm to determine the cluster count for you. You only need to provide a search range (K_min and K_max).
javascript
import * as druid from "@saehrimnir/druidjs";
const data = [
/* ... high-dimensional data ... */
];
// Initialize XMeans with a search range for clusters
const xmeans = new druid.XMeans(data, {
K_min: 2,
K_max: 10,
});
// Run the discovery algorithm
// It returns an array of cluster indices for each point
const clusters = xmeans.get_cluster_list();
// You can check how many clusters were found
console.log(`Optimal number of clusters: ${xmeans.k}`);