From 2c6f2944a0b0cd26f56b05e9bfc18fe61fe6d0be Mon Sep 17 00:00:00 2001 From: "codeflash-ai[bot]" <148906541+codeflash-ai[bot]@users.noreply.github.com> Date: Fri, 16 Jan 2026 05:20:37 +0000 Subject: [PATCH] Optimize longest_increasing_subsequence_length MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The optimized code achieves a **408x speedup** by replacing the O(n²) dynamic programming algorithm with an O(n log n) patience sorting approach using binary search. **Key algorithmic change:** - **Original approach**: Nested loops comparing every element with all previous elements (O(n²) complexity). The line profiler shows the inner loop executed 1.6M times, consuming 23.2% of total runtime, with comparison operations taking another 33.4%. - **Optimized approach**: Maintains a sorted `tails` array where `tails[k]` stores the smallest tail value of any increasing subsequence of length `k+1`. For each element, uses `bisect.bisect_left()` to find its insertion position in O(log n) time. **Why this is faster:** 1. **Eliminates nested iteration**: Instead of comparing each element with all previous elements (1.6M comparisons in the profiler), we perform just one binary search per element (~5,800 searches for ~5,800 elements). 2. **Leverages optimized C implementation**: Python's `bisect` module is implemented in C and highly optimized for sorted list operations. 3. **Reduces memory operations**: No repeated array indexing in tight loops. The `tails` list grows incrementally and stays small (max length = LIS length, not input length). **Performance characteristics from test results:** - **Small arrays** (< 10 elements): Modest 20-150% speedup due to overhead being comparable to the O(n²) work. - **Medium arrays** (~30-100 elements): 2,000-23,000% speedup as the quadratic penalty becomes significant. - **Large arrays** (400-500 elements): **31,000-96,000% speedup** - the profiler shows original runtime of 3.66s vs optimized 16.7ms for the same workload. - **Worst-case patterns**: Descending sequences show 31,707% speedup (was O(n²) comparisons with no updates, now O(n log n) searches). **Impact considerations:** The optimization maintains identical correctness (handles integers, floats, negatives, duplicates, edge cases) while dramatically improving performance on any input size > ~10 elements. Given the massive speedup on realistic data sizes, this change would benefit any workload calling this function repeatedly or on non-trivial arrays. --- code_to_optimize/sample_code.py | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/code_to_optimize/sample_code.py b/code_to_optimize/sample_code.py index d356ce807..1fcc767a8 100644 --- a/code_to_optimize/sample_code.py +++ b/code_to_optimize/sample_code.py @@ -1,3 +1,4 @@ +import bisect from functools import partial import jax.numpy as jnp @@ -97,19 +98,18 @@ def longest_increasing_subsequence_length(arr: np.ndarray) -> int: if n == 0: return 0 - dp = np.ones(n, dtype=np.int64) + # Use patience sorting / tails method for O(n log n) time. + tails = [] # tails[k] = smallest tail value of an increasing subsequence of length k+1 - for i in range(1, n): - for j in range(i): - if arr[j] < arr[i]: - if dp[j] + 1 > dp[i]: - dp[i] = dp[j] + 1 - - max_length = dp[0] - for i in range(1, n): - if dp[i] > max_length: - max_length = dp[i] + for x in arr: + # Find the insertion point for x in tails to maintain sorted order. + i = bisect.bisect_left(tails, x) + if i == len(tails): + tails.append(x) + else: + tails[i] = x + max_length = np.int64(len(tails)) return max_length