From 100ea7c32b16375a1e848d7a76218e8bb4921a27 Mon Sep 17 00:00:00 2001 From: "codeflash-ai[bot]" <148906541+codeflash-ai[bot]@users.noreply.github.com> Date: Thu, 15 Jan 2026 14:23:42 +0000 Subject: [PATCH] Optimize Memory.get_total_tokens MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The optimized code achieves a **123% speedup** by eliminating function call overhead and avoiding floating-point arithmetic: ## Key Optimizations 1. **Replaced float multiplication with integer division in `encoded_tokens_len`**: - Original: `int(len(s) * 0.25)` performs floating-point multiplication then truncates - Optimized: `len(s) // 4` uses native integer floor division - This is mathematically equivalent for positive integers and avoids the float conversion overhead 2. **Inlined computation in `get_total_tokens` to eliminate function calls**: - Original: Called `encoded_tokens_len()` once per message (4,368 calls in profiler), creating generator overhead plus function call cost - Optimized: Directly computes `len(message["content"]) // 4` in a simple loop - Removes ~4,200 function calls and the `sum()` generator machinery ## Why This Is Faster - **Function call elimination**: Python function calls have significant overhead (stack frame creation, argument passing, return value handling). The line profiler shows the original `encoded_tokens_len` was called 4,368 times at ~429ns per call. The optimized version eliminates most of these calls. - **Float arithmetic avoidance**: Integer operations are faster than float operations in CPUs. The original code performed floating-point multiplication for every message, while the optimized version uses pure integer division. - **Reduced memory allocations**: The generator expression in `sum()` creates an iterator object; the simple loop avoids this allocation. ## Test Results Indicate The optimization benefits **all workloads uniformly**: - Small datasets (empty/single message): 100-140% faster - Medium datasets (50-200 messages): 120-150% faster - Large datasets (500-800 messages): 108-153% faster The speedup is consistent because the optimization reduces per-message overhead proportionally—whether processing 1 message or 1,000, each message benefits equally from eliminated function calls and faster arithmetic. ## Behavior Preservation The mathematical equivalence `int(x * 0.25) == x // 4` for non-negative integers ensures identical results across all test cases, including edge cases with empty strings, Unicode, and large content. --- codeflash/agent/memory.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/codeflash/agent/memory.py b/codeflash/agent/memory.py index 84d0a4933..ab905bc97 100644 --- a/codeflash/agent/memory.py +++ b/codeflash/agent/memory.py @@ -2,8 +2,6 @@ from pathlib import Path from typing import Any -from codeflash.code_utils.code_utils import encoded_tokens_len - json_primitive_types = (str, float, int, bool) @@ -44,4 +42,7 @@ def get_messages(self) -> list[dict[str, str]]: return self._messages def get_total_tokens(self) -> int: - return sum(encoded_tokens_len(message["content"]) for message in self._messages) + total = 0 + for message in self._messages: + total += len(message["content"]) // 4 + return total