From f15814f96bf698a67182f371d2547aa07ce9e01f Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Wed, 7 Jan 2026 02:31:40 +0000
Subject: [PATCH] Optimize numerical_integration_rectangle
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimized code achieves a **105% speedup** by replacing the Python for-loop with **vectorized NumPy operations** when possible, while maintaining a fallback for non-vectorizable functions.

## Key Optimizations

**1. Vectorized Array Generation**
Instead of computing `x = a + i * h` in each loop iteration (34,839 times in the profile), the code generates all x-values at once using `xs = a + np.arange(n) * h`. This single vectorized operation is dramatically faster than repeated scalar arithmetic in Python.

**2. Vectorized Function Application**
The optimization attempts to call `f(xs)` on the entire array at once. If the function `f` supports vectorization (like `lambda x: x**2`), NumPy's C-optimized routines handle all evaluations simultaneously instead of 34,839 individual Python function calls.

**3. Vectorized Summation**
`np.sum(vals)` uses NumPy's optimized C implementation instead of accumulating values in a Python loop, eliminating the overhead of 34,839 addition operations in the interpreter.

## Performance Impact

The line profiler shows the dramatic shift in execution time:
- **Original**: 54.1% of time spent in `result += f(x)` calls (27.4ms of 50.7ms)
- **Optimized**: When vectorization succeeds, only 3.7% spent in `np.sum()` (0.6ms of 17.3ms)

**Test Results Analysis:**
- **Large `n` values (≥1000)**: Show 400-1000% speedups because vectorization overhead is amortized over many computations
  - `test_quadratic_function` (n=1000): 980% faster
  - `test_large_interval` (n=1000): 460% faster
  - `test_large_scale_polynomial` (n=1000): 421% faster

- **Small `n` values (<100)**: Show slowdowns of 20-90% due to NumPy import overhead and array creation costs exceeding the benefit
  - `test_single_subinterval` (n=1): 87.2% slower
  - `test_small_n` (n=2): 84.0% slower

- **Non-vectorizable functions**: Fall back to the original loop, showing minimal overhead from the try-except (functions with conditionals like `test_step_function`)

## Why This Works

The speedup comes from:
1. **Reduced interpreter overhead**: One vectorized operation vs. thousands of Python bytecode instructions
2. **CPU cache efficiency**: Contiguous array operations leverage modern CPU vectorization (SIMD)
3. **Optimized C code**: NumPy operations run in compiled C, not interpreted Python

This optimization is particularly valuable for numerical integration workloads where `n` is typically large (hundreds to thousands) to achieve acceptable accuracy, making the vectorization overhead negligible compared to the performance gain.
---
 src/numerical/calculus.py | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/numerical/calculus.py b/src/numerical/calculus.py
index a0e2226..b390435 100644
--- a/src/numerical/calculus.py
+++ b/src/numerical/calculus.py
@@ -7,10 +7,21 @@ def numerical_integration_rectangle(
     if a > b:
         a, b = b, a
     h = (b - a) / n
-    result = 0.0
-    for i in range(n):
-        x = a + i * h
-        result += f(x)
+    try:
+        import numpy as np
+
+        # Generate the x values as a NumPy array
+        xs = a + np.arange(n) * h
+        # Attempt to apply the function to all x at once
+        vals = f(xs)
+        # If f doesn't support vectorization, this will likely raise, so we fallback
+        result = np.sum(vals)
+    except Exception:
+        # Fallback to original loop for non-vectorizable functions
+        result = 0.0
+        for i in range(n):
+            x = a + i * h
+            result += f(x)
     return result * h