Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jan 15, 2026

⚡️ This pull request contains optimizations for PR #1059

If you approve this dependent PR, these changes will be merged into the original PR branch feat/agentic-codeflash.

This PR will be automatically closed if the original PR is merged.


📄 124% (1.24x) speedup for Memory.get_total_tokens in codeflash/agent/memory.py

⏱️ Runtime : 513 microseconds 229 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 123% speedup by eliminating function call overhead and avoiding floating-point arithmetic:

Key Optimizations

  1. Replaced float multiplication with integer division in encoded_tokens_len:

    • Original: int(len(s) * 0.25) performs floating-point multiplication then truncates
    • Optimized: len(s) // 4 uses native integer floor division
    • This is mathematically equivalent for positive integers and avoids the float conversion overhead
  2. Inlined computation in get_total_tokens to eliminate function calls:

    • Original: Called encoded_tokens_len() once per message (4,368 calls in profiler), creating generator overhead plus function call cost
    • Optimized: Directly computes len(message["content"]) // 4 in a simple loop
    • Removes ~4,200 function calls and the sum() generator machinery

Why This Is Faster

  • Function call elimination: Python function calls have significant overhead (stack frame creation, argument passing, return value handling). The line profiler shows the original encoded_tokens_len was called 4,368 times at ~429ns per call. The optimized version eliminates most of these calls.
  • Float arithmetic avoidance: Integer operations are faster than float operations in CPUs. The original code performed floating-point multiplication for every message, while the optimized version uses pure integer division.
  • Reduced memory allocations: The generator expression in sum() creates an iterator object; the simple loop avoids this allocation.

Test Results Indicate

The optimization benefits all workloads uniformly:

  • Small datasets (empty/single message): 100-140% faster
  • Medium datasets (50-200 messages): 120-150% faster
  • Large datasets (500-800 messages): 108-153% faster

The speedup is consistent because the optimization reduces per-message overhead proportionally—whether processing 1 message or 1,000, each message benefits equally from eliminated function calls and faster arithmetic.

Behavior Preservation

The mathematical equivalence int(x * 0.25) == x // 4 for non-negative integers ensures identical results across all test cases, including edge cases with empty strings, Unicode, and large content.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 93 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
# imports
import pytest  # used for our unit tests

from codeflash.agent.memory import Memory


# function to test
# (This replicates the original module implementations so the tests can execute deterministically.
# In a real codebase the tests would import these from their modules, but the user requested the
# test suite to also define the function under test.)
# file: codeflash/code_utils/code_utils.py
def encoded_tokens_len(s: str) -> int:
    """Return the approximate length of the encoded tokens.

    It's an approximation of BPE encoding (https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).
    """
    return int(len(s) * 0.25)


# unit tests
# Write your test functions here, e.g.:
# def test_basic_functionality():
#     ...

# Basic Test Cases


def test_empty_messages_returns_zero():
    """If there are no messages, total tokens should be zero."""
    mem = Memory()
    # _messages defaults to empty list
    codeflash_output = mem.get_total_tokens()  # 1.07μs -> 531ns (102% faster)


def test_single_short_message_counts_correctly():
    """A single short message's characters should be converted to tokens via encoded_tokens_len."""
    mem = Memory()
    # "hello" has length 5 -> int(5 * 0.25) == 1
    mem._messages = [{"content": "hello"}]
    codeflash_output = mem.get_total_tokens()  # 1.84μs -> 781ns (136% faster)


def test_multiple_messages_sum_tokens():
    """Multiple messages should have their token counts summed correctly."""
    mem = Memory()
    # create two messages:
    # - "aaaa" length 4 -> int(4*0.25) == 1
    # - "bbbbbbbb" length 8 -> int(8*0.25) == 2
    mem._messages = [{"content": "aaaa"}, {"content": "bbbbbbbb"}]
    codeflash_output = mem.get_total_tokens()  # 2.04μs -> 841ns (143% faster)


# Edge Test Cases


def test_empty_string_message_contributes_zero():
    """An empty string content should contribute zero tokens."""
    mem = Memory()
    mem._messages = [{"content": ""}]
    codeflash_output = mem.get_total_tokens()  # 1.76μs -> 802ns (120% faster)


def test_missing_content_key_raises_keyerror():
    """If a message dict lacks 'content', get_total_tokens should raise KeyError."""
    mem = Memory()
    # message missing "content" key should cause a KeyError when accessed
    mem._messages = [{"text": "no content here"}]  # incorrect key
    with pytest.raises(KeyError):
        mem.get_total_tokens()  # 3.12μs -> 2.32μs (34.1% faster)


def test_non_string_content_raises_typeerror():
    """If a message's content is not a string, encoded_tokens_len / len() will raise TypeError."""
    mem = Memory()
    # Use an integer for content; len(123) raises TypeError
    mem._messages = [{"content": 123}]
    with pytest.raises(TypeError):
        mem.get_total_tokens()  # 3.48μs -> 2.52μs (38.2% faster)


def test_unicode_and_combining_characters_counting():
    """Ensure that characters with combining marks and emojis are counted according to Python's len()."""
    mem = Memory()
    # '😊' has len 1 -> int(1*0.25) == 0
    # 'e\u0301' (e + combining acute) has len 2 -> int(2*0.25) == 0
    # 'abcd' has len 4 -> int(4*0.25) == 1
    mem._messages = [{"content": "😊"}, {"content": "e\u0301"}, {"content": "abcd"}]
    # Expected tokens: 0 + 0 + 1 = 1
    codeflash_output = mem.get_total_tokens()  # 2.42μs -> 1.03μs (135% faster)


def test_additional_message_fields_are_ignored():
    """Messages may contain extra keys; only the 'content' key should be used for token counting."""
    mem = Memory()
    mem._messages = [
        {"content": "xxxx", "role": "user", "meta": {"foo": "bar"}},  # len 4 -> 1 token
        {"content": "yy", "role": "assistant"},  # len 2 -> int(2*0.25) == 0 tokens
    ]
    codeflash_output = mem.get_total_tokens()  # 2.02μs -> 842ns (140% faster)


def test_idempotent_multiple_calls_without_mutation():
    """Calling get_total_tokens repeatedly without changing messages should return the same value."""
    mem = Memory()
    mem._messages = [{"content": "a" * 20}]  # len 20 -> int(20*0.25) == 5
    codeflash_output = mem.get_total_tokens()
    first = codeflash_output  # 1.77μs -> 762ns (133% faster)
    codeflash_output = mem.get_total_tokens()
    second = codeflash_output  # 821ns -> 390ns (111% faster)
    codeflash_output = mem.get_total_tokens()
    third = codeflash_output  # 611ns -> 290ns (111% faster)


def test_mutating_messages_updates_total():
    """After changing the _messages list, get_total_tokens should reflect new content."""
    mem = Memory()
    mem._messages = [{"content": "aaaa"}]  # len 4 -> 1 token
    codeflash_output = mem.get_total_tokens()  # 1.73μs -> 741ns (134% faster)
    # Append a new message and verify the total increases accordingly
    mem._messages.append({"content": "b" * 8})  # len 8 -> 2 tokens
    codeflash_output = mem.get_total_tokens()  # 992ns -> 431ns (130% faster)


def test_max_tokens_attribute_does_not_affect_computation():
    """Memory.max_tokens should not influence the computed total from get_total_tokens."""
    mem = Memory()
    mem._messages = [{"content": "x" * 100}]  # len 100 -> int(100*0.25) == 25
    # artificially change max_tokens to a low value; computation should remain the same
    mem.max_tokens = 10
    codeflash_output = mem.get_total_tokens()  # 1.69μs -> 711ns (138% faster)


# Large Scale Test Cases
def test_large_scale_many_messages_under_1000_elements():
    """Test correctness and reasonable performance on a large but bounded number of messages.

    We create 500 messages each of length 1000 characters (500 * 1000 = 500k chars).
    Each message's token estimate: int(1000 * 0.25) == 250
    Total expected tokens: 500 * 250 == 125000
    """
    mem = Memory()
    # Build 500 messages; 500 is below the 1000-element constraint and avoids long loops.
    n_messages = 500
    single_length = 1000
    content_piece = "x" * single_length  # each message is this long
    mem._messages = [{"content": content_piece} for _ in range(n_messages)]
    expected_per_message = int(single_length * 0.25)  # 250
    expected_total = expected_per_message * n_messages  # 125000
    codeflash_output = mem.get_total_tokens()  # 73.4μs -> 35.2μs (108% faster)


# Defensive Test: ensure encoded_tokens_len behavior when given empty and whitespace strings
def test_whitespace_only_strings_count_based_on_length():
    """Whitespace characters should be counted like any other character when computing tokens."""
    mem = Memory()
    mem._messages = [
        {"content": " "},  # single space -> len 1 -> int(0.25) -> 0
        {"content": "   "},  # three spaces -> len 3 -> int(0.75) -> 0
        {"content": "\n" * 4},  # four newlines -> len 4 -> int(1.0) -> 1
    ]
    # Expected tokens: 0 + 0 + 1 = 1
    codeflash_output = mem.get_total_tokens()  # 2.23μs -> 972ns (130% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from codeflash.agent.memory import Memory
from codeflash.code_utils.code_utils import encoded_tokens_len

# Test Suite for Memory.get_total_tokens()
# ==========================================
# This test suite covers basic functionality, edge cases, and large-scale scenarios


# ============================================================================
# BASIC TEST CASES - Fundamental functionality under normal conditions
# ============================================================================


def test_get_total_tokens_empty_memory():
    """Test that an empty memory returns 0 tokens."""
    memory = Memory()
    codeflash_output = memory.get_total_tokens()  # 1.13μs -> 511ns (122% faster)


def test_get_total_tokens_single_message():
    """Test token calculation with a single message."""
    memory = Memory()
    test_content = "Hello, world!"
    memory._messages = [{"content": test_content}]
    expected_tokens = encoded_tokens_len(test_content)
    codeflash_output = memory.get_total_tokens()  # 1.60μs -> 751ns (113% faster)


def test_get_total_tokens_multiple_messages():
    """Test token calculation with multiple messages."""
    memory = Memory()
    contents = ["Hello", "World", "Test"]
    memory._messages = [{"content": content} for content in contents]
    expected_tokens = sum(encoded_tokens_len(content) for content in contents)
    codeflash_output = memory.get_total_tokens()  # 1.51μs -> 931ns (62.5% faster)


def test_get_total_tokens_accumulation():
    """Test that tokens accumulate correctly as messages are added."""
    memory = Memory()
    total = 0

    # Add messages incrementally and verify accumulation
    for i in range(5):
        message_content = f"Message number {i}"
        memory._messages.append({"content": message_content})
        total += encoded_tokens_len(message_content)
        codeflash_output = memory.get_total_tokens()  # 5.55μs -> 2.67μs (108% faster)


def test_get_total_tokens_with_numeric_content():
    """Test token calculation with numeric content."""
    memory = Memory()
    content = "12345"
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.46μs -> 731ns (100% faster)


def test_get_total_tokens_with_special_characters():
    """Test token calculation with special characters."""
    memory = Memory()
    content = "!@#$%^&*()_+-={}[]|:;<>,.?/"
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.51μs -> 742ns (104% faster)


# ============================================================================
# EDGE CASE TEST CASES - Extreme or unusual conditions
# ============================================================================


def test_get_total_tokens_empty_string_content():
    """Test that empty string content contributes 0 tokens."""
    memory = Memory()
    memory._messages = [{"content": ""}]
    codeflash_output = memory.get_total_tokens()  # 1.78μs -> 852ns (109% faster)


def test_get_total_tokens_multiple_empty_messages():
    """Test that multiple empty messages result in 0 total tokens."""
    memory = Memory()
    memory._messages = [{"content": ""}, {"content": ""}, {"content": ""}]
    codeflash_output = memory.get_total_tokens()  # 2.17μs -> 1.08μs (101% faster)


def test_get_total_tokens_mixed_empty_and_filled():
    """Test mixing empty and non-empty messages."""
    memory = Memory()
    filled_content = "Test"
    memory._messages = [{"content": ""}, {"content": filled_content}, {"content": ""}]
    expected_tokens = encoded_tokens_len(filled_content)
    codeflash_output = memory.get_total_tokens()  # 1.93μs -> 1.07μs (80.3% faster)


def test_get_total_tokens_single_character():
    """Test with messages containing a single character."""
    memory = Memory()
    content = "a"
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.46μs -> 751ns (94.8% faster)


def test_get_total_tokens_whitespace_only():
    """Test that whitespace-only content is counted correctly."""
    memory = Memory()
    content = "   "
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.50μs -> 741ns (103% faster)


def test_get_total_tokens_newlines_and_tabs():
    """Test that newlines and tabs are counted correctly."""
    memory = Memory()
    content = "Line1\nLine2\tTabbed"
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.48μs -> 671ns (121% faster)


def test_get_total_tokens_unicode_characters():
    """Test with Unicode characters."""
    memory = Memory()
    content = "Hello 世界 مرحبا мир"
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.49μs -> 752ns (98.5% faster)


def test_get_total_tokens_emoji():
    """Test with emoji characters."""
    memory = Memory()
    content = "😀😃😄😁😆"
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.51μs -> 771ns (96.2% faster)


def test_get_total_tokens_very_long_single_message():
    """Test with a very long message."""
    memory = Memory()
    content = "a" * 10000
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.65μs -> 942ns (75.5% faster)


def test_get_total_tokens_message_with_code():
    """Test with message containing code."""
    memory = Memory()
    content = 'def hello():\n    print("Hello, world!")'
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.52μs -> 771ns (97.5% faster)


def test_get_total_tokens_message_with_json():
    """Test with message containing JSON."""
    memory = Memory()
    content = '{"name": "John", "age": 30, "city": "New York"}'
    memory._messages = [{"content": content}]
    expected_tokens = encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 1.46μs -> 742ns (97.2% faster)


def test_get_total_tokens_zero_is_nonnegative():
    """Test that total tokens is always non-negative."""
    memory = Memory()
    codeflash_output = memory.get_total_tokens()  # 1.08μs -> 531ns (104% faster)


def test_get_total_tokens_sum_property():
    """Test that the sum is actually the sum of individual messages."""
    memory = Memory()
    contents = ["First", "Second", "Third", "Fourth"]
    memory._messages = [{"content": content} for content in contents]

    individual_sums = [encoded_tokens_len(content) for content in contents]
    codeflash_output = memory.get_total_tokens()
    total_tokens = codeflash_output  # 1.93μs -> 1.01μs (91.0% faster)


# ============================================================================
# LARGE SCALE TEST CASES - Performance and scalability with large data
# ============================================================================


def test_get_total_tokens_many_messages_small_content():
    """Test with many messages containing small content."""
    memory = Memory()
    num_messages = 500
    content = "x"
    memory._messages = [{"content": content} for _ in range(num_messages)]

    expected_tokens = num_messages * encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 68.1μs -> 27.6μs (147% faster)


def test_get_total_tokens_many_messages_medium_content():
    """Test with many messages containing medium-sized content."""
    memory = Memory()
    num_messages = 200
    content = "This is a medium-sized message with several words in it."
    memory._messages = [{"content": content} for _ in range(num_messages)]

    expected_tokens = num_messages * encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 28.5μs -> 12.3μs (131% faster)


def test_get_total_tokens_few_messages_large_content():
    """Test with few messages containing large content."""
    memory = Memory()
    content = "x" * 50000
    memory._messages = [{"content": content}, {"content": content}, {"content": content}]

    expected_tokens = 3 * encoded_tokens_len(content)
    codeflash_output = memory.get_total_tokens()  # 2.16μs -> 1.25μs (72.8% faster)


def test_get_total_tokens_mixed_sizes_scaling():
    """Test with messages of varying sizes to verify linear scaling."""
    memory = Memory()
    sizes = [100, 500, 1000, 2000, 5000]
    memory._messages = [{"content": "a" * size} for size in sizes]

    expected_tokens = sum(encoded_tokens_len("a" * size) for size in sizes)
    codeflash_output = memory.get_total_tokens()  # 2.05μs -> 1.50μs (36.8% faster)


def test_get_total_tokens_incremental_large_scale():
    """Test incremental addition of many messages."""
    memory = Memory()
    total_expected = 0

    for i in range(300):
        content = f"Message {i}: " + "a" * (i % 100)
        memory._messages.append({"content": content})
        total_expected += encoded_tokens_len(content)

    codeflash_output = memory.get_total_tokens()  # 42.3μs -> 18.5μs (129% faster)


def test_get_total_tokens_consistent_across_calls():
    """Test that multiple calls return the same result (determinism)."""
    memory = Memory()
    contents = [f"Message {i}" for i in range(100)]
    memory._messages = [{"content": content} for content in contents]

    codeflash_output = memory.get_total_tokens()
    result1 = codeflash_output  # 15.4μs -> 6.18μs (149% faster)
    codeflash_output = memory.get_total_tokens()
    result2 = codeflash_output  # 14.0μs -> 5.67μs (147% faster)
    codeflash_output = memory.get_total_tokens()
    result3 = codeflash_output  # 13.8μs -> 5.46μs (153% faster)


def test_get_total_tokens_with_realistic_conversation():
    """Test with a realistic conversation scenario."""
    memory = Memory()
    conversation = [
        {"content": "What is the capital of France?"},
        {"content": "The capital of France is Paris. It's located in the north-central part of the country."},
        {"content": "Tell me more about Paris."},
        {
            "content": "Paris is known for its iconic landmarks such as the Eiffel Tower, Notre-Dame, and the Louvre Museum."
        },
    ]
    memory._messages = conversation

    expected_tokens = sum(encoded_tokens_len(msg["content"]) for msg in conversation)
    codeflash_output = memory.get_total_tokens()  # 1.54μs -> 931ns (65.6% faster)


def test_get_total_tokens_boundary_near_max():
    """Test with content that approaches typical token limits."""
    memory = Memory()
    # Create messages that approach but don't exceed max_tokens
    num_messages = 50
    content_per_message = "This is a test message. " * 10  # Medium-length messages
    memory._messages = [{"content": content_per_message} for _ in range(num_messages)]

    codeflash_output = memory.get_total_tokens()
    total_tokens = codeflash_output  # 8.61μs -> 3.87μs (123% faster)


def test_get_total_tokens_performance_does_not_degrade():
    """Test that performance remains acceptable with large number of messages."""
    memory = Memory()

    # Add a large number of messages
    for i in range(800):
        memory._messages.append({"content": f"Message number {i}"})

    # Call the function and verify it completes and returns correct value
    codeflash_output = memory.get_total_tokens()
    total = codeflash_output  # 108μs -> 47.1μs (131% faster)

    # Verify the result is correct
    expected = sum(encoded_tokens_len(f"Message number {i}") for i in range(800))


def test_get_total_tokens_with_very_varied_content():
    """Test with highly varied message content types."""
    memory = Memory()
    varied_contents = [
        "Simple text",
        "123456789",
        "!@#$%^&*()",
        "Line1\nLine2\nLine3",
        "Mixed123!@#abc",
        "🎉🎊🎈",
        "café naïve résumé",
        "a" * 5000,
        "",
        "   spaces   ",
    ]

    # Repeat the varied contents to create a large dataset
    for _ in range(50):
        for content in varied_contents:
            memory._messages.append({"content": content})

    codeflash_output = memory.get_total_tokens()
    total_tokens = codeflash_output  # 71.4μs -> 31.9μs (124% faster)
    expected_tokens = 50 * sum(encoded_tokens_len(content) for content in varied_contents)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from codeflash.agent.memory import Memory


def test_Memory_get_total_tokens():
    Memory.get_total_tokens(Memory())
🔎 Click to see Concolic Coverage Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_ik_zub_8/tmpr3v2j3zs/test_concolic_coverage.py::test_Memory_get_total_tokens 1.02μs 510ns 100%✅

To edit these changes git checkout codeflash/optimize-pr1059-2026-01-15T14.23.38 and push.

Codeflash Static Badge

The optimized code achieves a **123% speedup** by eliminating function call overhead and avoiding floating-point arithmetic:

## Key Optimizations

1. **Replaced float multiplication with integer division in `encoded_tokens_len`**:
   - Original: `int(len(s) * 0.25)` performs floating-point multiplication then truncates
   - Optimized: `len(s) // 4` uses native integer floor division
   - This is mathematically equivalent for positive integers and avoids the float conversion overhead

2. **Inlined computation in `get_total_tokens` to eliminate function calls**:
   - Original: Called `encoded_tokens_len()` once per message (4,368 calls in profiler), creating generator overhead plus function call cost
   - Optimized: Directly computes `len(message["content"]) // 4` in a simple loop
   - Removes ~4,200 function calls and the `sum()` generator machinery

## Why This Is Faster

- **Function call elimination**: Python function calls have significant overhead (stack frame creation, argument passing, return value handling). The line profiler shows the original `encoded_tokens_len` was called 4,368 times at ~429ns per call. The optimized version eliminates most of these calls.
- **Float arithmetic avoidance**: Integer operations are faster than float operations in CPUs. The original code performed floating-point multiplication for every message, while the optimized version uses pure integer division.
- **Reduced memory allocations**: The generator expression in `sum()` creates an iterator object; the simple loop avoids this allocation.

## Test Results Indicate

The optimization benefits **all workloads uniformly**:
- Small datasets (empty/single message): 100-140% faster
- Medium datasets (50-200 messages): 120-150% faster  
- Large datasets (500-800 messages): 108-153% faster

The speedup is consistent because the optimization reduces per-message overhead proportionally—whether processing 1 message or 1,000, each message benefits equally from eliminated function calls and faster arithmetic.

## Behavior Preservation

The mathematical equivalence `int(x * 0.25) == x // 4` for non-negative integers ensures identical results across all test cases, including edge cases with empty strings, Unicode, and large content.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 15, 2026
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr1059-2026-01-15T14.23.38 branch January 16, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants