From 4a40369f26154bc0c00c6b74029ac34ceeccef38 Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Mon, 12 Jan 2026 00:34:30 +0000
Subject: [PATCH] Optimize kmeans_clustering

The optimized code achieves a **28x speedup** (2821%) by replacing nested Python loops with vectorized NumPy operations. Here's why it's faster:

## Key Optimizations

**1. Vectorized Distance Calculation**
- **Original**: Triple-nested loops compute distances one feature at a time for each sample-centroid pair
  - Inner loop iterates over features: 23.7% of runtime
  - Distance accumulation: 36% of runtime
  - Square root calls: 10.6% of runtime
- **Optimized**: Broadcasting computes all distances at once
  ```python
  differences = X[:, np.newaxis, :] - centroids[np.newaxis, :, :]
  distances = np.linalg.norm(differences, axis=2)
  ```
  This creates a 3D array of differences and computes norms in one vectorized operation, eliminating ~70% of the original runtime spent in nested loops.

**2. Vectorized Label Assignment**
- **Original**: Loop through samples finding minimum distance (5.1% + 4.4% overhead)
- **Optimized**: `labels = np.argmin(distances, axis=1)` finds all minimum distances in one operation

**3. Vectorized Centroid Updates**
- **Original**: Nested loops accumulate sums and manually divide by counts (9.7% of runtime)
- **Optimized**: Boolean masking and `mean()` operation
  ```python
  mask = labels == j
  new_centroids[j] = X[mask].mean(axis=0)
  ```
  NumPy's optimized C implementations handle aggregation much faster than Python loops.

## Performance Impact

The optimization excels with larger datasets:
- **Small data** (single points, k=1): 19-29% slower due to NumPy overhead
- **Medium data** (50-100 samples): 300-1000% faster
- **Large data** (500+ samples, high dimensions): 2800-9700% faster

The line profiler shows the optimized version spends most time (38.5%) on the `X[mask].mean()` operation and distance calculations (23.2%), both of which are highly optimized C operations. The original spent 60%+ of time on raw Python loops for distance calculations.

This optimization is particularly valuable in hot paths where k-means runs repeatedly (hyperparameter tuning, batch processing) or with high-dimensional data, as evidenced by the massive speedups in large-scale test cases.
---
 src/statistics/clustering.py | 26 ++++++--------------------
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/src/statistics/clustering.py b/src/statistics/clustering.py
index 9b28592..9218f4a 100644
--- a/src/statistics/clustering.py
+++ b/src/statistics/clustering.py
@@ -8,28 +8,14 @@ def kmeans_clustering(
     centroid_indices = np.random.choice(n_samples, k, replace=False)
     centroids = X[centroid_indices]
     for _ in range(max_iter):
-        labels = np.zeros(n_samples, dtype=int)
-        for i in range(n_samples):
-            min_dist = float("inf")
-            for j in range(k):
-                dist = 0
-                for feat in range(X.shape[1]):
-                    dist += (X[i, feat] - centroids[j, feat]) ** 2
-                dist = np.sqrt(dist)
-                if dist < min_dist:
-                    min_dist = dist
-                    labels[i] = j
+        differences = X[:, np.newaxis, :] - centroids[np.newaxis, :, :]
+        distances = np.linalg.norm(differences, axis=2)
+        labels = np.argmin(distances, axis=1)
         new_centroids = np.zeros_like(centroids)
-        counts = np.zeros(k)
-        for i in range(n_samples):
-            cluster = labels[i]
-            counts[cluster] += 1
-            for feat in range(X.shape[1]):
-                new_centroids[cluster, feat] += X[i, feat]
         for j in range(k):
-            if counts[j] > 0:
-                for feat in range(X.shape[1]):
-                    new_centroids[j, feat] /= counts[j]
+            mask = labels == j
+            if np.any(mask):
+                new_centroids[j] = X[mask].mean(axis=0)
         if np.array_equal(centroids, new_centroids):
             break
         centroids = new_centroids