⚡️ Speed up function `image_rotation` by 271% #253

codeflash-ai · 2026-01-12T00:18:28Z

📄 271% (2.71x) speedup for `image_rotation` in `src/signal/image.py`

⏱️ Runtime : 2.70 milliseconds → 728 microseconds (best of 250 runs)

📝 Explanation and details

Brief explanation of why and how optimized_source_code is faster than original_source_code.

What changed (key optimizations)

Removed the nested Python loops over every output pixel and replaced them with NumPy vectorized operations.
- Constructed coordinate grids once with ys = arange(... )[:, None] and xs = arange(...)[None, :].
- Computed mapped coordinates original_yf and original_xf as whole 2D arrays using broadcasted arithmetic.
- Converted to integer indices (astype(int)) and built a boolean valid_mask for bounds-checking in one step.
- Performed a single advanced-indexed assignment: rotated[valid_mask] = image[original_y[valid_mask], original_x[valid_mask]].
Added an early exit for new_height == 0 or new_width == 0 to avoid unnecessary array work.

Why this is faster (technical reasoning)

Eliminates per-pixel Python overhead: the original code spent almost all time in Python-level loops doing arithmetic, int conversions, and bounds checks for each (x,y) pair. Each iteration incurred interpreter overhead. The line profiler shows those inner loops dominated runtime.
Moves heavy work into NumPy's C-implemented loops: computing ys/xs, the matrix arithmetic, astype, mask creation, and the indexed copy are all executed in compiled code (tight C loops, often vectorized). This reduces thousands of Python-level operations to a few array ops.
Fewer conditional checks and function calls per pixel: bounds checks are done with a single vectorized boolean mask rather than an if per pixel.
Final assignment touches only valid pixels (via the mask), so fewer memory writes than attempting to assign every output pixel in Python.

Measured results that support this

Wall-clock: original 2.70 ms -> optimized 0.728 ms (~2.7x speedup).
Line profiler: original dominated by the nested loops and per-pixel arithmetic; optimized version's cost is concentrated in a few NumPy lines (array creation, arithmetic, mask, and one bulk assignment), which are much cheaper overall.

Behavioral/compatibility notes

Behavior is preserved: integer truncation semantics are the same (astype(int) truncates floats like the original int()). Channel handling is preserved: the single advanced-index assignment works for both grayscale (2D) and multi-channel images because NumPy advanced-indexing returns the appropriate shape to assign per-pixel vectors.
Early-return for zero-dimension outputs avoids creating unnecessary arrays (safe and slightly faster for empty inputs).
Memory tradeoff: the optimized code allocates several temporaries (ys, xs, original_f, original_ ints, mask). This increases peak memory usage proportional to new image size versus the original in-place per-pixel loop which used less temporary memory. In practice, for typical small-to-moderate images the CPU cost saved overwhelms the extra temporaries; for extremely large images memory pressure could be a concern.

When this optimization helps most / tradeoffs

Big wins for medium-to-large images where Python loop overhead dominates (tests with 31x31 and similar sizes show huge speedups).
For extremely small images (single pixel or tiny shapes) the cost to allocate intermediate NumPy arrays can dominate; the microbenchmarks in annotated_tests show some tiny-case timings where the optimized version is not faster or is slightly slower. If the function is frequently called with very small images in a tight loop, consider:
- keeping original loop-based path for very small sizes (threshold), or
- reusing preallocated arrays, or
- using a JIT approach (numba) if both memory and per-call overhead must be minimized.
For typical usage (hot path with many pixels), the optimized vectorized approach is preferable.

Summary

The optimized version converts thousands of Python-level per-pixel operations into a handful of NumPy array operations executed in C, eliminating interpreter overhead and per-pixel branching. This change is why we see the measured ~2.7x runtime improvement on representative tests, with large wins on larger images and acceptable tradeoffs (extra temporaries) for most workloads.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 33 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from typing import Any, List

import numpy as np  # used to create and manipulate image arrays

# imports
import pytest  # used for our unit tests
from src.signal.image import image_rotation

# Helper utilities for tests ---------------------------------------------------


def _to_nested_list(arr: np.ndarray) -> Any:
    """
    Convert a numpy array to nested Python lists for equality comparisons that
    avoid numpy-specific assert helpers.
    """
    return arr.tolist()


def _nested_allclose(a: Any, b: Any, tol: float = 1e-6) -> bool:
    """
    Recursively compare nested lists (or scalars) with tolerance for floats.
    Returns True if all numeric elements differ by at most tol.
    """
    if isinstance(a, list) and isinstance(b, list):
        if len(a) != len(b):
            return False
        return all(_nested_allclose(x, y, tol) for x, y in zip(a, b))
    # scalar case
    try:
        # works for ints and floats
        return abs(float(a) - float(b)) <= tol
    except Exception:
        # non-numeric fallback to exact equality
        return a == b


# Unit tests ------------------------------------------------------------------


def test_identity_rotation_grayscale_zero_degrees():
    # Basic: check that 0-degree rotation returns same pixel values (numerically).
    # Use a small int-valued 3x2 image for clarity.
    image = np.array([[1, 2], [3, 4], [5, 6]], dtype=np.int32)
    # Call the function
    codeflash_output = image_rotation(image, 0.0)
    rotated = codeflash_output  # 10.7μs -> 23.8μs (55.0% slower)


def test_identity_rotation_color_zero_degrees_preserves_channels():
    # Basic: verify color image with channels keeps the channel dimension and values.
    image = np.array(
        [[[10, 20, 30], [40, 50, 60]], [[70, 80, 90], [100, 110, 120]]], dtype=np.int16
    )  # 2x2x3
    codeflash_output = image_rotation(image, 0.0)
    rotated = codeflash_output  # 10.0μs -> 25.4μs (60.4% slower)


def test_rotate_90_single_row_to_column():
    # Basic but precise: rotating a single-row image by +90 degrees should produce
    # a single-column image with the same left-to-right order as top-to-bottom.
    # Using a 1x3 array makes expected result easy to compute by hand.
    image = np.array([[1, 2, 3]], dtype=np.int32)  # shape (1, 3)
    codeflash_output = image_rotation(image, 90.0)
    rotated = codeflash_output  # 8.17μs -> 23.3μs (64.9% slower)
    # Expected: column [[1], [2], [3]] from manual derivation of the algorithm
    expected = np.array([[1], [2], [3]], dtype=float)  # function produces floats


def test_rotate_180_square_color_matches_flip_both_axes():
    # Basic: rotating by 180 degrees should equal flipping both axes for square images.
    # Use a 3x3 color image with distinct values to detect orientation issues.
    image = np.zeros((3, 3, 2), dtype=np.int32)
    # fill with distinct values per channel so we can detect wrong mappings
    for y in range(3):
        for x in range(3):
            image[y, x, 0] = y * 10 + x  # channel 0
            image[y, x, 1] = 100 + y * 10 + x  # channel 1
    codeflash_output = image_rotation(image, 180.0)
    rotated = codeflash_output  # 14.4μs -> 26.2μs (45.0% slower)
    # Expected using pure Python slicing (no numpy-specific assertions)
    expected = image[::-1, ::-1, :]


def test_empty_image_returns_empty():
    # Edge: handle empty images without raising exceptions.
    image = np.zeros((0, 0), dtype=np.float64)
    codeflash_output = image_rotation(image, 45.0)
    rotated = codeflash_output  # 4.67μs -> 4.79μs (2.61% slower)


def test_single_pixel_rotations_keep_value():
    # Edge: single pixel images should keep the pixel value regardless of angle,
    # because the center maps to the center for rotated array when dims remain 1x1.
    image = np.array([[7]], dtype=np.int8)
    for angle in (0.0, 45.0, 90.0, 179.0, 270.0):
        codeflash_output = image_rotation(image, angle)
        rotated = codeflash_output  # 22.0μs -> 94.8μs (76.8% slower)


def test_negative_angle_equivalence_to_positive_complement():
    # Edge: -90 degrees should behave identically to +270 degrees for same input.
    image = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int32)  # 2x3 image
    codeflash_output = image_rotation(image, -90.0)
    rotated_neg90 = codeflash_output  # 10.4μs -> 23.9μs (56.6% slower)
    codeflash_output = image_rotation(image, 270.0)
    rotated_270 = codeflash_output  # 8.21μs -> 20.5μs (60.0% slower)


def test_input_image_is_not_modified():
    # Edge: verify that the input array is not mutated by the function.
    image = np.arange(12).reshape((3, 4)).astype(np.int32)
    image_copy = image.copy()  # keep an explicit copy
    codeflash_output = image_rotation(image, 33.0)
    _ = codeflash_output  # 18.0μs -> 23.8μs (24.5% slower)


def test_rotated_center_contains_original_center_value_for_arbitrary_angle():
    # Large Scale (but kept under limits): ensure the center pixel of the original
    # maps to the center pixel of the rotated image for arbitrary angles.
    # Use a 31x31 image (loops ~961 < 1000 per dimension) with unique ints.
    size = 31
    image = np.arange(size * size, dtype=np.int32).reshape((size, size))
    angle = 13.0  # arbitrary non-multiple-of-90 angle
    codeflash_output = image_rotation(image, angle)
    rotated = codeflash_output  # 942μs -> 39.8μs (2266% faster)
    # Compute centers used by the algorithm
    center_y, center_x = size // 2, size // 2
    new_height = int(
        abs(size * np.cos(np.radians(angle))) + abs(size * np.sin(np.radians(angle)))
    )
    new_width = int(
        abs(size * np.cos(np.radians(angle))) + abs(size * np.sin(np.radians(angle)))
    )
    # center positions as in the implementation
    new_center_y, new_center_x = (rotated.shape[0] // 2, rotated.shape[1] // 2)
    # The algorithm maps rotated[new_center] <- original[center], so the center value must appear there.
    # If rotated has channels, index accordingly; here it is 2D.
    if rotated.size > 0:
        # Compare numerically with tolerance due to dtype change to float
        rotated_center_value = float(rotated[new_center_y, new_center_x])
        original_center_value = float(image[center_y, center_x])


def test_multi_channel_preserves_channel_count_and_mapping_at_center():
    # Large-ish: multi-channel image (3 channels) retains channel dimension and the
    # center mapping property (center of rotated corresponds to center of original).
    h, w, c = 17, 13, 3  # keep total elements under limits
    # Fill with distinct values per-channel so equality checks are meaningful
    image = np.zeros((h, w, c), dtype=np.int32)
    for ch in range(c):
        image[..., ch] = ch * 1000 + np.arange(h * w).reshape((h, w))
    angle = 37.0
    codeflash_output = image_rotation(image, angle)
    rotated = codeflash_output  # 310μs -> 42.8μs (624% faster)
    # center mapping property: rotated center should equal original center (per-channel)
    center_y, center_x = h // 2, w // 2
    new_center_y, new_center_x = rotated.shape[0] // 2, rotated.shape[1] // 2
    # Compare all channels numerically
    for ch in range(c):
        rotated_val = float(rotated[new_center_y, new_center_x, ch])
        original_val = float(image[center_y, center_x, ch])


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np  # used to construct test images and expected results

# imports
import pytest  # used for our unit tests
from src.signal.image import image_rotation

# unit tests


def test_rotate_zero_degrees_grayscale():
    # Basic: rotating by 0 degrees should return an array equal to the input
    img = np.array(
        [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], dtype=int
    )  # 3x4 grayscale image with distinct integers
    codeflash_output = image_rotation(img, 0.0)
    rotated = codeflash_output  # 15.2μs -> 24.0μs (36.7% slower)


def test_rotate_zero_degrees_rgb():
    # Basic: 0-degree rotation for multi-channel image must preserve all channels and pixels
    img = np.zeros((2, 3, 3), dtype=int)  # 2x3 RGB-like image
    # give each pixel a distinct RGB triple
    for y in range(2):
        for x in range(3):
            img[y, x] = [y, x, y + x]
    codeflash_output = image_rotation(img, 0.0)
    rotated = codeflash_output  # 11.7μs -> 25.2μs (53.7% slower)


def test_rotate_full_circle_identity():
    # Basic: rotating by 360 degrees should be identity (within integer rounding of implementation)
    rng = np.random.default_rng(0)
    img = rng.integers(0, 256, size=(5, 5), dtype=int)  # small random grayscale image
    codeflash_output = image_rotation(img, 360.0)
    rotated = codeflash_output  # 24.8μs -> 24.5μs (1.19% faster)


def test_rotate_90_degrees_square_matches_numpy_rot90_clockwise():
    # Basic/behavioral: for odd-sized square images, this implementation rotates clockwise for +90 deg.
    img = np.arange(9).reshape((3, 3)).astype(int)  # values 0..8 in 3x3 layout
    # The implementation (per math in src) returns the image rotated clockwise for +90 degrees.
    expected = np.rot90(img, k=-1)  # numpy's clockwise rotation
    codeflash_output = image_rotation(img, 90.0)
    rotated = codeflash_output  # 12.7μs -> 23.8μs (46.7% slower)


def test_rotate_90_degrees_non_square_swaps_dimensions_and_matches_rot90():
    # Basic: non-square images should swap height and width for a 90 degree rotation
    img = np.array([[10, 11, 12], [20, 21, 22]], dtype=int)  # shape (2, 3)
    expected = np.rot90(img, k=-1)  # use numpy's clockwise rotation as reference
    codeflash_output = image_rotation(img, 90.0)
    rotated = codeflash_output  # 9.71μs -> 23.6μs (58.8% slower)


def test_rotate_negative_angle_equivalence_to_positive():
    # Edge: rotating by -90 should be equivalent to rotating by 270 in the same implementation
    img = np.arange(12).reshape((3, 4)).astype(int)  # 3x4 image with distinct integers
    codeflash_output = image_rotation(img, -90.0)
    r_neg90 = codeflash_output  # 14.6μs -> 23.8μs (38.7% slower)
    codeflash_output = image_rotation(img, 270.0)
    r_pos270 = codeflash_output  # 12.1μs -> 20.5μs (40.9% slower)


def test_rotate_empty_image_shapes_no_exceptions():
    # Edge: images with zero height or zero width should not raise and should return shapes consistent with geometry
    img_zero_height = np.zeros((0, 5), dtype=int)  # height 0, width 5
    img_zero_width = np.zeros((5, 0), dtype=int)  # height 5, width 0

    # rotating by 0 degrees should keep zero dimension intact
    codeflash_output = image_rotation(img_zero_height, 0.0)
    r1 = codeflash_output  # 4.71μs -> 4.71μs (0.021% faster)
    # rotating an all-zero-width input should result in width possibly zero (no crash)
    codeflash_output = image_rotation(img_zero_width, 0.0)
    r2 = codeflash_output  # 3.50μs -> 3.21μs (9.07% faster)

    # rotating by 45 degrees: function might compute new extents, but must return a valid ndarray and not raise
    codeflash_output = image_rotation(img_zero_height, 45.0)
    r3 = codeflash_output  # 8.33μs -> 21.1μs (60.5% slower)
    codeflash_output = image_rotation(img_zero_width, 45.0)
    r4 = codeflash_output  # 8.12μs -> 19.3μs (57.9% slower)


def test_rotate_single_pixel_preserved_under_various_angles():
    # Edge: a 1x1 image should map to a 1x1 image and keep its single pixel value for typical angles
    img = np.array([[42]], dtype=int)
    for angle in (0.0, 45.0, 90.0, 123.4, 360.0):
        codeflash_output = image_rotation(img, angle)
        rotated = codeflash_output  # 21.9μs -> 94.5μs (76.8% slower)


def test_preserves_channels_for_rgb_and_maps_some_key_pixels():
    # Edge/Basic: for multi-channel images values per channel must be preserved when mapped from original pixels
    h, w = 5, 5
    img = np.zeros((h, w, 3), dtype=int)
    # Fill each pixel with a unique RGB triple derived from coordinates to be able to identify pixels
    for y in range(h):
        for x in range(w):
            img[y, x] = [y, x, y * 10 + x]
    # rotate by a non-trivial angle (45 degrees) and ensure we have an ndarray back
    codeflash_output = image_rotation(img, 45.0)
    rotated = codeflash_output  # 45.9μs -> 28.8μs (59.3% faster)
    # check that at least the center pixel value from the original appears somewhere in the rotated image
    center_val = img[h // 2, w // 2].tolist()
    # flatten rotated into list of triples (safe because number of pixels < 1000 here)
    flat_rotated = [list(tuple(v)) for v in rotated.reshape(-1, rotated.shape[-1])]


def test_large_scale_31x31_unique_values_content_and_shape():
    # Large Scale: test algorithm with a relatively large image within the given constraints (<1000 elements)
    # Use 31x31 = 961 pixels which is under the 1000-element guideline.
    h, w = 31, 31
    # create a unique-value grayscale image so we can reason about content
    img = np.arange(h * w, dtype=int).reshape((h, w))
    # rotate by a non-trivial angle (33.0 degrees)
    codeflash_output = image_rotation(img, 33.0)
    rotated = codeflash_output  # 1.15ms -> 42.2μs (2618% faster)
    # ensure at least one original pixel value appears in the rotated image (basic content preservation)
    # pick the original center pixel and check for its presence somewhere in rotated (no loops >1000)
    center_value = int(img[h // 2, w // 2])


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.signal.image import image_rotation

To edit these changes git checkout codeflash/optimize-image_rotation-mkaezso6 and push.

Brief explanation of why and how optimized_source_code is faster than original_source_code. What changed (key optimizations) - Removed the nested Python loops over every output pixel and replaced them with NumPy vectorized operations. - Constructed coordinate grids once with ys = arange(... )[:, None] and xs = arange(...)[None, :]. - Computed mapped coordinates original_yf and original_xf as whole 2D arrays using broadcasted arithmetic. - Converted to integer indices (astype(int)) and built a boolean valid_mask for bounds-checking in one step. - Performed a single advanced-indexed assignment: rotated[valid_mask] = image[original_y[valid_mask], original_x[valid_mask]]. - Added an early exit for new_height == 0 or new_width == 0 to avoid unnecessary array work. Why this is faster (technical reasoning) - Eliminates per-pixel Python overhead: the original code spent almost all time in Python-level loops doing arithmetic, int conversions, and bounds checks for each (x,y) pair. Each iteration incurred interpreter overhead. The line profiler shows those inner loops dominated runtime. - Moves heavy work into NumPy's C-implemented loops: computing ys/xs, the matrix arithmetic, astype, mask creation, and the indexed copy are all executed in compiled code (tight C loops, often vectorized). This reduces thousands of Python-level operations to a few array ops. - Fewer conditional checks and function calls per pixel: bounds checks are done with a single vectorized boolean mask rather than an if per pixel. - Final assignment touches only valid pixels (via the mask), so fewer memory writes than attempting to assign every output pixel in Python. Measured results that support this - Wall-clock: original 2.70 ms -> optimized 0.728 ms (~2.7x speedup). - Line profiler: original dominated by the nested loops and per-pixel arithmetic; optimized version's cost is concentrated in a few NumPy lines (array creation, arithmetic, mask, and one bulk assignment), which are much cheaper overall. Behavioral/compatibility notes - Behavior is preserved: integer truncation semantics are the same (astype(int) truncates floats like the original int()). Channel handling is preserved: the single advanced-index assignment works for both grayscale (2D) and multi-channel images because NumPy advanced-indexing returns the appropriate shape to assign per-pixel vectors. - Early-return for zero-dimension outputs avoids creating unnecessary arrays (safe and slightly faster for empty inputs). - Memory tradeoff: the optimized code allocates several temporaries (ys, xs, original_*f, original_* ints, mask). This increases peak memory usage proportional to new image size versus the original in-place per-pixel loop which used less temporary memory. In practice, for typical small-to-moderate images the CPU cost saved overwhelms the extra temporaries; for extremely large images memory pressure could be a concern. When this optimization helps most / tradeoffs - Big wins for medium-to-large images where Python loop overhead dominates (tests with 31x31 and similar sizes show huge speedups). - For extremely small images (single pixel or tiny shapes) the cost to allocate intermediate NumPy arrays can dominate; the microbenchmarks in annotated_tests show some tiny-case timings where the optimized version is not faster or is slightly slower. If the function is frequently called with very small images in a tight loop, consider: - keeping original loop-based path for very small sizes (threshold), or - reusing preallocated arrays, or - using a JIT approach (numba) if both memory and per-call overhead must be minimized. - For typical usage (hot path with many pixels), the optimized vectorized approach is preferable. Summary - The optimized version converts thousands of Python-level per-pixel operations into a handful of NumPy array operations executed in C, eliminating interpreter overhead and per-pixel branching. This change is why we see the measured ~2.7x runtime improvement on representative tests, with large wins on larger images and acceptable tradeoffs (extra temporaries) for most workloads.

codeflash-ai bot requested a review from KRRT7 January 12, 2026 00:18

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `image_rotation` by 271% #253

⚡️ Speed up function `image_rotation` by 271% #253

Uh oh!

codeflash-ai bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function image_rotation by 271% #253

Are you sure you want to change the base?

⚡️ Speed up function image_rotation by 271% #253

Uh oh!

Conversation

codeflash-ai bot commented Jan 12, 2026

📄 271% (2.71x) speedup for image_rotation in src/signal/image.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `image_rotation` by 271% #253

⚡️ Speed up function `image_rotation` by 271% #253

📄 271% (2.71x) speedup for `image_rotation` in `src/signal/image.py`