⚡️ Speed up method Memory.set_messages by 88% in PR #1059 (feat/agentic-codeflash)
#1060
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1059
If you approve this dependent PR, these changes will be merged into the original PR branch
feat/agentic-codeflash.📄 88% (0.88x) speedup for
Memory.set_messagesincodeflash/agent/memory.py⏱️ Runtime :
659 microseconds→350 microseconds(best of189runs)📝 Explanation and details
The optimization replaces the call to
encoded_tokens_len(message["content"])with an inlined approximationlen(message["content"]) // 4in theget_total_tokens()method.What changed:
encoded_tokens_len()for each message (which multiplieslen(s)by 0.25 and converts to int), the code now directly computeslen(message["content"]) // 4inline.Why it's faster:
Eliminates function call overhead: Each call to
encoded_tokens_len()incurs Python function call overhead (argument passing, stack frame creation, return). With hundreds of messages, these microseconds accumulate significantly. The line profiler showsget_total_tokens()dropping from 5.06ms to 2.32ms—a 54% reduction.Avoids floating-point arithmetic: The original code multiplies by 0.25 (float) then converts to int. The optimized version uses integer division (
// 4), which is faster as it stays in integer arithmetic throughout.Better CPU cache locality: Inlining keeps the hot loop tighter, improving instruction cache utilization during the list comprehension.
Impact on workloads:
The annotated tests show consistent speedups across all scenarios:
The speedup scales linearly with message count because the optimization eliminates per-message overhead. Functions that process many messages (batch operations, conversation histories) benefit most. Since
set_messages()callsget_total_tokens()to check againstmax_tokens, any code path that validates message lists sees this improvement.Test case performance:
The optimization maintains identical behavior—both
int(len(s) * 0.25)andlen(s) // 4produce the same token approximation for all string lengths.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
🔎 Click to see Concolic Coverage Tests
codeflash_concolic_ik_zub_8/tmpj95ssauz/test_concolic_coverage.py::test_Memory_set_messagesTo edit these changes
git checkout codeflash/optimize-pr1059-2026-01-15T14.17.06and push.