From 2c6f2944a0b0cd26f56b05e9bfc18fe61fe6d0be Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Fri, 16 Jan 2026 05:20:37 +0000
Subject: [PATCH] Optimize longest_increasing_subsequence_length
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimized code achieves a **408x speedup** by replacing the O(n²) dynamic programming algorithm with an O(n log n) patience sorting approach using binary search.

**Key algorithmic change:**
- **Original approach**: Nested loops comparing every element with all previous elements (O(n²) complexity). The line profiler shows the inner loop executed 1.6M times, consuming 23.2% of total runtime, with comparison operations taking another 33.4%.
- **Optimized approach**: Maintains a sorted `tails` array where `tails[k]` stores the smallest tail value of any increasing subsequence of length `k+1`. For each element, uses `bisect.bisect_left()` to find its insertion position in O(log n) time.

**Why this is faster:**
1. **Eliminates nested iteration**: Instead of comparing each element with all previous elements (1.6M comparisons in the profiler), we perform just one binary search per element (~5,800 searches for ~5,800 elements).
2. **Leverages optimized C implementation**: Python's `bisect` module is implemented in C and highly optimized for sorted list operations.
3. **Reduces memory operations**: No repeated array indexing in tight loops. The `tails` list grows incrementally and stays small (max length = LIS length, not input length).

**Performance characteristics from test results:**
- **Small arrays** (< 10 elements): Modest 20-150% speedup due to overhead being comparable to the O(n²) work.
- **Medium arrays** (~30-100 elements): 2,000-23,000% speedup as the quadratic penalty becomes significant.
- **Large arrays** (400-500 elements): **31,000-96,000% speedup** - the profiler shows original runtime of 3.66s vs optimized 16.7ms for the same workload.
- **Worst-case patterns**: Descending sequences show 31,707% speedup (was O(n²) comparisons with no updates, now O(n log n) searches).

**Impact considerations:**
The optimization maintains identical correctness (handles integers, floats, negatives, duplicates, edge cases) while dramatically improving performance on any input size > ~10 elements. Given the massive speedup on realistic data sizes, this change would benefit any workload calling this function repeatedly or on non-trivial arrays.
---
 code_to_optimize/sample_code.py | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/code_to_optimize/sample_code.py b/code_to_optimize/sample_code.py
index d356ce807..1fcc767a8 100644
--- a/code_to_optimize/sample_code.py
+++ b/code_to_optimize/sample_code.py
@@ -1,3 +1,4 @@
+import bisect
 from functools import partial
 
 import jax.numpy as jnp
@@ -97,19 +98,18 @@ def longest_increasing_subsequence_length(arr: np.ndarray) -> int:
     if n == 0:
         return 0
 
-    dp = np.ones(n, dtype=np.int64)
+    # Use patience sorting / tails method for O(n log n) time.
+    tails = []  # tails[k] = smallest tail value of an increasing subsequence of length k+1
 
-    for i in range(1, n):
-        for j in range(i):
-            if arr[j] < arr[i]:
-                if dp[j] + 1 > dp[i]:
-                    dp[i] = dp[j] + 1
-
-    max_length = dp[0]
-    for i in range(1, n):
-        if dp[i] > max_length:
-            max_length = dp[i]
+    for x in arr:
+        # Find the insertion point for x in tails to maintain sorted order.
+        i = bisect.bisect_left(tails, x)
+        if i == len(tails):
+            tails.append(x)
+        else:
+            tails[i] = x
 
+    max_length = np.int64(len(tails))
     return max_length