Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 1, 2025

📄 22% (0.22x) speedup for _addindent in doctr/utils/repr.py

⏱️ Runtime : 579 microseconds 475 microseconds (best of 315 runs)

📝 Explanation and details

The optimization achieves a 22% speedup by eliminating redundant operations and reducing the number of intermediate variables:

Key Changes:

  1. Pre-compute the indent string once: indent = " " * num_spaces is calculated upfront instead of repeating num_spaces * " " for every line in the list comprehension.
  2. Single-pass join operation: Instead of mutating the list with pop(0), creating a new list, joining it, and then concatenating with the first line, the optimized version directly constructs the final result with "\n".join([s[0]] + [indent + line for line in s[1:]])
  3. Eliminate intermediate variables: Removes the need for first and multiple reassignments to s.

Why it's faster:

  • Reduced string operations: The original code performs num_spaces * " " multiplication for each line (49.2% of total time), while the optimized version does it once.
  • Fewer list operations: Eliminates the pop(0) operation and intermediate list reassignment.
  • Direct construction: Builds the final result in one operation instead of multiple concatenations.

Performance characteristics from tests:

  • Large-scale improvements: Shows significant gains with many lines (23.6% faster for 1000 lines, 63.1% faster for 1000 empty lines)
  • Best for multi-line strings: Most effective when len(s) > 1, with minimal impact on single-line cases
  • Scales well with line count: Performance improvement increases with more lines to indent

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 53 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from doctr.utils.repr import _addindent

# unit tests

# -------------------------
# Basic Test Cases
# -------------------------

def test_single_line_no_indent():
    # Single line string should be unchanged regardless of num_spaces
    codeflash_output = _addindent("Hello world", 4) # 718ns -> 796ns (9.80% slower)
    codeflash_output = _addindent("Test", 0) # 493ns -> 479ns (2.92% faster)
    codeflash_output = _addindent("", 10) # 276ns -> 267ns (3.37% faster)

def test_multi_line_basic_indent():
    # Multi-line string should indent all but the first line
    input_str = "First line\nSecond line\nThird line"
    expected = "First line\n  Second line\n  Third line"
    codeflash_output = _addindent(input_str, 2) # 2.58μs -> 2.41μs (7.18% faster)

def test_multi_line_zero_indent():
    # Zero indentation should not add any spaces
    input_str = "A\nB\nC"
    expected = "A\nB\nC"
    codeflash_output = _addindent(input_str, 0) # 2.16μs -> 2.25μs (4.13% slower)

def test_multi_line_one_space_indent():
    # Indent with one space
    input_str = "Header\nItem1\nItem2"
    expected = "Header\n Item1\n Item2"
    codeflash_output = _addindent(input_str, 1) # 2.20μs -> 2.19μs (0.778% faster)

def test_multi_line_varied_content():
    # Indent with three spaces, with special characters
    input_str = "Title\n\tIndented\n!@#$%^&*()"
    expected = "Title\n   \tIndented\n   !@#$%^&*()"
    codeflash_output = _addindent(input_str, 3) # 2.23μs -> 2.21μs (1.04% faster)

# -------------------------
# Edge Test Cases
# -------------------------

def test_empty_string():
    # Empty string should return empty string
    codeflash_output = _addindent("", 5) # 710ns -> 702ns (1.14% faster)

def test_string_with_only_newlines():
    # String with only newlines should indent all but the first (empty) line
    input_str = "\n\n\n"
    expected = "\n     \n     \n     "
    codeflash_output = _addindent(input_str, 5) # 2.26μs -> 2.34μs (3.25% slower)

def test_string_with_trailing_newline():
    # Trailing newline should result in an indented empty line at the end
    input_str = "abc\ndef\n"
    expected = "abc\n  def\n  "
    codeflash_output = _addindent(input_str, 2) # 2.21μs -> 2.13μs (3.80% faster)

def test_string_with_leading_newline():
    # Leading newline: first line is empty, rest get indented
    input_str = "\nabc\ndef"
    expected = "\n  abc\n  def"
    codeflash_output = _addindent(input_str, 2) # 2.23μs -> 2.17μs (2.95% faster)

def test_string_with_multiple_consecutive_newlines():
    # Multiple consecutive newlines between lines
    input_str = "a\n\nb\n\nc"
    expected = "a\n  \n  b\n  \n  c"
    codeflash_output = _addindent(input_str, 2) # 2.49μs -> 2.54μs (2.08% slower)

def test_negative_indent():
    # Negative indentation should result in no spaces (equivalent to zero)
    input_str = "x\ny\nz"
    expected = "x\ny\nz"
    codeflash_output = _addindent(input_str, -5) # 2.06μs -> 2.13μs (3.48% slower)

def test_indent_is_none_or_nonint():
    # Non-integer num_spaces should raise TypeError
    with pytest.raises(TypeError):
        _addindent("foo\nbar", None) # 2.46μs -> 1.73μs (42.0% faster)
    with pytest.raises(TypeError):
        _addindent("foo\nbar", "2") # 1.34μs -> 1.09μs (22.8% faster)
    with pytest.raises(TypeError):
        _addindent("foo\nbar", 2.5) # 1.27μs -> 1.01μs (26.6% faster)

def test_string_with_unicode():
    # Unicode characters should be preserved
    input_str = "你好\n世界\n🌍"
    expected = "你好\n  世界\n  🌍"
    codeflash_output = _addindent(input_str, 2) # 3.73μs -> 3.63μs (2.70% faster)

def test_string_with_tabs_and_spaces():
    # Tabs and spaces in content should not be affected
    input_str = "A\n\tB\n    C"
    expected = "A\n  \tB\n     C"
    codeflash_output = _addindent(input_str, 2) # 2.29μs -> 2.23μs (3.15% faster)

def test_string_with_only_spaces():
    # String with only spaces and newlines
    input_str = "   \n   \n   "
    expected = "   \n  \n  "
    codeflash_output = _addindent(input_str, 2) # 2.23μs -> 2.12μs (4.99% faster)

# -------------------------
# Large Scale Test Cases
# -------------------------

def test_large_multiline_string():
    # Large multiline string (1000 lines)
    lines = [f"Line {i}" for i in range(1000)]
    input_str = "\n".join(lines)
    expected_lines = [lines[0]] + ["    " + line for line in lines[1:]]
    expected = "\n".join(expected_lines)
    codeflash_output = _addindent(input_str, 4) # 84.6μs -> 68.4μs (23.6% faster)

def test_large_multiline_empty_lines():
    # Large multiline string with empty lines
    lines = [""] * 1000
    input_str = "\n".join(lines)
    expected_lines = [lines[0]] + ["  " for _ in lines[1:]]
    expected = "\n".join(expected_lines)
    codeflash_output = _addindent(input_str, 2) # 53.3μs -> 32.7μs (63.1% faster)

def test_large_multiline_varied_content():
    # Large multiline string with varied content
    lines = []
    for i in range(1000):
        if i % 3 == 0:
            lines.append(f"Line {i}")
        elif i % 3 == 1:
            lines.append("")
        else:
            lines.append("    ")
    input_str = "\n".join(lines)
    expected_lines = [lines[0]] + ["  " + line for line in lines[1:]]
    expected = "\n".join(expected_lines)
    codeflash_output = _addindent(input_str, 2) # 76.6μs -> 58.0μs (32.0% faster)

def test_large_indent_value():
    # Large indent value (100 spaces)
    input_str = "start\nmiddle\nend"
    expected = "start\n" + (" " * 100) + "middle\n" + (" " * 100) + "end"
    codeflash_output = _addindent(input_str, 100) # 2.39μs -> 2.17μs (10.1% faster)

def test_large_string_with_long_lines():
    # Large string with long lines (each line 500 chars, 10 lines)
    line = "x" * 500
    lines = [line for _ in range(10)]
    input_str = "\n".join(lines)
    expected_lines = [lines[0]] + ["   " + l for l in lines[1:]]
    expected = "\n".join(expected_lines)
    codeflash_output = _addindent(input_str, 3) # 6.23μs -> 5.59μs (11.3% faster)

def test_large_string_with_mixed_newlines():
    # Large string with mixed newlines and content
    lines = []
    for i in range(1000):
        if i % 10 == 0:
            lines.append("")
        else:
            lines.append(f"Content {i}")
    input_str = "\n".join(lines)
    expected_lines = [lines[0]] + [" " + line for line in lines[1:]]
    expected = "\n".join(expected_lines)
    codeflash_output = _addindent(input_str, 1) # 74.1μs -> 67.0μs (10.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from doctr.utils.repr import _addindent

# unit tests

# =========================
# BASIC TEST CASES
# =========================

def test_single_line_no_indent():
    # Single line string should be unchanged
    codeflash_output = _addindent("hello world", 4) # 714ns -> 672ns (6.25% faster)

def test_multi_line_basic_indent():
    # Basic multi-line string, indent with 2 spaces
    input_str = "first line\nsecond line\nthird line"
    expected = "first line\n  second line\n  third line"
    codeflash_output = _addindent(input_str, 2) # 2.48μs -> 2.24μs (10.7% faster)

def test_multi_line_zero_indent():
    # Zero indentation should not add spaces
    input_str = "a\nb\nc"
    expected = "a\nb\nc"
    codeflash_output = _addindent(input_str, 0) # 2.13μs -> 2.16μs (1.30% slower)

def test_multi_line_one_indent():
    # Indent with 1 space
    input_str = "x\ny\nz"
    expected = "x\n y\n z"
    codeflash_output = _addindent(input_str, 1) # 2.14μs -> 2.29μs (6.38% slower)

def test_multi_line_indent_with_empty_lines():
    # Indent with 3 spaces, includes empty lines
    input_str = "first\n\nthird"
    expected = "first\n   \n   third"
    codeflash_output = _addindent(input_str, 3) # 2.31μs -> 2.27μs (2.21% faster)

# =========================
# EDGE TEST CASES
# =========================

def test_empty_string():
    # Empty string should return empty string
    codeflash_output = _addindent("", 5) # 690ns -> 703ns (1.85% slower)

def test_only_newlines():
    # String with only newlines
    input_str = "\n\n"
    expected = "\n   \n   "
    codeflash_output = _addindent(input_str, 3) # 2.10μs -> 2.15μs (2.32% slower)

def test_leading_trailing_newlines():
    # Leading and trailing newlines in string
    input_str = "\nfoo\nbar\n"
    expected = "\n  foo\n  bar\n  "
    codeflash_output = _addindent(input_str, 2) # 2.47μs -> 2.35μs (5.16% faster)

def test_indent_is_negative():
    # Negative indentation should result in no spaces (as ""*-ve == "")
    input_str = "a\nb\nc"
    expected = "a\nb\nc"
    codeflash_output = _addindent(input_str, -5) # 2.08μs -> 2.18μs (4.63% slower)

def test_indent_is_large():
    # Large indentation (but not excessive)
    input_str = "start\nmiddle\nend"
    expected = "start\n" + (" " * 50) + "middle\n" + (" " * 50) + "end"
    codeflash_output = _addindent(input_str, 50) # 2.31μs -> 2.21μs (4.29% faster)

def test_lines_with_spaces():
    # Lines already have leading spaces
    input_str = "foo\n  bar\nbaz"
    expected = "foo\n    bar\n  baz"
    codeflash_output = _addindent(input_str, 2) # 2.23μs -> 2.11μs (5.72% faster)

def test_lines_with_tabs():
    # Lines with tabs, indent with 4 spaces
    input_str = "foo\n\tbar\nbaz"
    expected = "foo\n    \tbar\n    baz"
    codeflash_output = _addindent(input_str, 4) # 2.25μs -> 2.13μs (5.58% faster)

def test_multiline_with_unicode():
    # Unicode characters in lines
    input_str = \nβ\nγ"
    expected = \n  β\n  γ"
    codeflash_output = _addindent(input_str, 2) # 3.10μs -> 2.96μs (4.87% faster)

def test_multiline_with_mixed_whitespace():
    # Lines with mixed whitespace
    input_str = "foo\n bar\n\tbaz"
    expected = "foo\n   bar\n   \tbaz"
    codeflash_output = _addindent(input_str, 3) # 2.19μs -> 2.07μs (6.09% faster)

def test_multiline_with_long_lines():
    # Lines of considerable length
    long_line = "x" * 500
    input_str = f"start\n{long_line}\nend"
    expected = f"start\n  {long_line}\n  end"
    codeflash_output = _addindent(input_str, 2) # 2.64μs -> 2.47μs (6.80% faster)

# =========================
# LARGE SCALE TEST CASES
# =========================

def test_large_number_of_lines():
    # Test with 1000 lines, indent with 1 space
    lines = [f"line {i}" for i in range(1000)]
    input_str = "\n".join(lines)
    # First line is not indented
    expected = lines[0] + "\n" + "\n".join([" " + line for line in lines[1:]])
    codeflash_output = _addindent(input_str, 1) # 70.9μs -> 64.7μs (9.56% faster)

def test_large_line_length():
    # Test with a single line of 1000 characters (should not be indented)
    input_str = "a" * 1000
    codeflash_output = _addindent(input_str, 4) # 1.10μs -> 1.09μs (1.01% faster)

def test_large_multiline_long_lines():
    # 500 lines, each 100 characters, indent with 3 spaces
    lines = ["x" * 100 for _ in range(500)]
    input_str = "\n".join(lines)
    expected = lines[0] + "\n" + "\n".join(["   " + line for line in lines[1:]])
    codeflash_output = _addindent(input_str, 3) # 60.3μs -> 51.5μs (17.0% faster)

def test_large_indent_and_lines():
    # 100 lines, indent with 30 spaces
    lines = [f"foo{i}" for i in range(100)]
    input_str = "\n".join(lines)
    expected = lines[0] + "\n" + "\n".join([" " * 30 + line for line in lines[1:]])
    codeflash_output = _addindent(input_str, 30) # 11.6μs -> 9.82μs (18.1% faster)

def test_large_empty_lines():
    # 1000 empty lines, indent with 5 spaces
    input_str = "\n".join(["" for _ in range(1000)])
    expected = "" + "\n" + "\n".join(["     " for _ in range(999)])
    codeflash_output = _addindent(input_str, 5) # 54.2μs -> 33.9μs (59.8% faster)

# =========================
# ADDITIONAL EDGE CASES
# =========================

def test_newline_at_end_only():
    # String ends with a newline only
    input_str = "foo\n"
    expected = "foo\n  "
    codeflash_output = _addindent(input_str, 2) # 1.88μs -> 1.92μs (1.88% slower)

def test_newline_at_start_only():
    # String starts with a newline only
    input_str = "\nbar"
    expected = "\n  bar"
    codeflash_output = _addindent(input_str, 2) # 1.78μs -> 1.95μs (8.81% slower)

def test_newline_at_start_and_end():
    # String starts and ends with newline
    input_str = "\nbar\n"
    expected = "\n  bar\n  "
    codeflash_output = _addindent(input_str, 2) # 2.17μs -> 2.10μs (3.29% faster)

def test_all_lines_empty():
    # All lines are empty
    input_str = "\n\n\n"
    expected = "\n  \n  \n  "
    codeflash_output = _addindent(input_str, 2) # 2.14μs -> 2.08μs (2.74% faster)

def test_indent_is_none():
    # num_spaces is None (should raise TypeError)
    with pytest.raises(TypeError):
        _addindent("foo\nbar", None) # 2.49μs -> 1.85μs (34.3% faster)

def test_indent_is_float():
    # num_spaces is float (should raise TypeError)
    with pytest.raises(TypeError):
        _addindent("foo\nbar", 2.5) # 2.35μs -> 1.67μs (40.5% faster)

def test_indent_is_string():
    # num_spaces is string (should raise TypeError)
    with pytest.raises(TypeError):
        _addindent("foo\nbar", "2") # 2.16μs -> 1.54μs (40.6% faster)

def test_input_is_not_string():
    # s_ is not a string (should raise AttributeError)
    with pytest.raises(AttributeError):
        _addindent(123, 2) # 1.50μs -> 1.43μs (5.19% faster)

To edit these changes git checkout codeflash/optimize-_addindent-mg7rh1z8 and push.

Codeflash

The optimization achieves a **22% speedup** by eliminating redundant operations and reducing the number of intermediate variables:

**Key Changes:**
1. **Pre-compute the indent string once**: `indent = " " * num_spaces` is calculated upfront instead of repeating `num_spaces * " "` for every line in the list comprehension.
2. **Single-pass join operation**: Instead of mutating the list with `pop(0)`, creating a new list, joining it, and then concatenating with the first line, the optimized version directly constructs the final result with `"\n".join([s[0]] + [indent + line for line in s[1:]])`
3. **Eliminate intermediate variables**: Removes the need for `first` and multiple reassignments to `s`.

**Why it's faster:**
- **Reduced string operations**: The original code performs `num_spaces * " "` multiplication for each line (49.2% of total time), while the optimized version does it once.
- **Fewer list operations**: Eliminates the `pop(0)` operation and intermediate list reassignment.
- **Direct construction**: Builds the final result in one operation instead of multiple concatenations.

**Performance characteristics from tests:**
- **Large-scale improvements**: Shows significant gains with many lines (23.6% faster for 1000 lines, 63.1% faster for 1000 empty lines)
- **Best for multi-line strings**: Most effective when `len(s) > 1`, with minimal impact on single-line cases
- **Scales well with line count**: Performance improvement increases with more lines to indent
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 1, 2025 09:05
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant