From 9c41215a63223933ce9a5520b6a54f442d25271e Mon Sep 17 00:00:00 2001 From: "codeflash-ai[bot]" <148906541+codeflash-ai[bot]@users.noreply.github.com> Date: Thu, 15 Jan 2026 14:17:10 +0000 Subject: [PATCH] Optimize Memory.set_messages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The optimization replaces the call to `encoded_tokens_len(message["content"])` with an inlined approximation `len(message["content"]) // 4` in the `get_total_tokens()` method. **What changed:** - Instead of calling `encoded_tokens_len()` for each message (which multiplies `len(s)` by 0.25 and converts to int), the code now directly computes `len(message["content"]) // 4` inline. **Why it's faster:** 1. **Eliminates function call overhead**: Each call to `encoded_tokens_len()` incurs Python function call overhead (argument passing, stack frame creation, return). With hundreds of messages, these microseconds accumulate significantly. The line profiler shows `get_total_tokens()` dropping from 5.06ms to 2.32ms—a 54% reduction. 2. **Avoids floating-point arithmetic**: The original code multiplies by 0.25 (float) then converts to int. The optimized version uses integer division (`// 4`), which is faster as it stays in integer arithmetic throughout. 3. **Better CPU cache locality**: Inlining keeps the hot loop tighter, improving instruction cache utilization during the list comprehension. **Impact on workloads:** The annotated tests show consistent speedups across all scenarios: - Small workloads (single message): ~25-30% faster - Medium workloads (100 messages): ~85-93% faster - Large workloads (1000 messages): ~106-110% faster The speedup scales linearly with message count because the optimization eliminates per-message overhead. Functions that process many messages (batch operations, conversation histories) benefit most. Since `set_messages()` calls `get_total_tokens()` to check against `max_tokens`, any code path that validates message lists sees this improvement. **Test case performance:** - Best for large-scale scenarios (500-1000 messages): 85-110% speedup - Good for typical workloads (10-100 messages): 30-93% speedup - Minimal but positive impact on edge cases (empty lists, single messages): 2-34% speedup The optimization maintains identical behavior—both `int(len(s) * 0.25)` and `len(s) // 4` produce the same token approximation for all string lengths. --- codeflash/agent/memory.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/codeflash/agent/memory.py b/codeflash/agent/memory.py index 84d0a4933..b8bf1f441 100644 --- a/codeflash/agent/memory.py +++ b/codeflash/agent/memory.py @@ -2,8 +2,6 @@ from pathlib import Path from typing import Any -from codeflash.code_utils.code_utils import encoded_tokens_len - json_primitive_types = (str, float, int, bool) @@ -44,4 +42,4 @@ def get_messages(self) -> list[dict[str, str]]: return self._messages def get_total_tokens(self) -> int: - return sum(encoded_tokens_len(message["content"]) for message in self._messages) + return sum(len(message["content"]) // 4 for message in self._messages)