From 4002294bcfad3437bef38dc92d6bf0c827bac41e Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Mon, 12 Jan 2026 00:13:24 +0000
Subject: [PATCH] Optimize monte_carlo_pi
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Brief: The only meaningful change is replacing the float power expressions x**2 and y**2 with x * x and y * y. That small micro-optimization reduces per-iteration Python overhead on the inner loop, producing a ~36% end-to-end speedup for typical workloads (largest gains when num_samples is large) while keeping behavior identical.

What changed
- Replaced x**2 and y**2 with x * x and y * y inside the inner loop.

Why this speeds things up
- The body of this function is a tight hot loop; the cost of the distance check (x**2 + y**2 <= 1) is executed num_samples times. Any per-iteration overhead accumulates.
- x**2 triggers Python's power machinery (BINARY_POWER / PyNumber_Power), which is more general and therefore heavier than a plain multiplication. Multiplication for floats is implemented as a very cheap C fast-path (BINARY_MULTIPLY).
- Using x * x and y * y reduces bytecode and C-level calls, so the distance check executes fewer and cheaper operations per iteration.
- Line-profiler confirms the conditional line dropped from ~8.5e6 ns to ~6.84e6 ns total in the measured run; the optimized runtime moved from 2.88 ms -> 2.11 ms (36% speedup). The savings are concentrated in the conditional computation inside the loop.

Impact on workloads and tests
- Big wins when monte_carlo_pi is called with large num_samples (see annotated_tests: 1000-sample tests show ~33–40% faster). For micro-calls (num_samples small or 0/negative), the improvement is negligible because loop overhead dominates or there are no iterations.
- Behavior and numeric results are unchanged; the optimization is purely local and safe for floats. All regression tests remain valid.

Risks / notes
- No API or semantic change. Readability stays clear; this is a standard micro-optimization for numeric loops in Python.
- This is most valuable in hot paths where the function is invoked many times or with large sample counts.
---
 src/numerical/monte_carlo.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/numerical/monte_carlo.py b/src/numerical/monte_carlo.py
index 070ae2f..e62146e 100644
--- a/src/numerical/monte_carlo.py
+++ b/src/numerical/monte_carlo.py
@@ -9,7 +9,7 @@ def monte_carlo_pi(num_samples: int) -> float:
         x = random.uniform(-1, 1)
         y = random.uniform(-1, 1)
 
-        if x**2 + y**2 <= 1:
+        if x * x + y * y <= 1:
             inside_circle += 1
 
     return 4 * inside_circle / num_samples