⚡️ Speed up method Memory.get_total_tokens by 124% in PR #1059 (feat/agentic-codeflash)
#1061
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1059
If you approve this dependent PR, these changes will be merged into the original PR branch
feat/agentic-codeflash.📄 124% (1.24x) speedup for
Memory.get_total_tokensincodeflash/agent/memory.py⏱️ Runtime :
513 microseconds→229 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 123% speedup by eliminating function call overhead and avoiding floating-point arithmetic:
Key Optimizations
Replaced float multiplication with integer division in
encoded_tokens_len:int(len(s) * 0.25)performs floating-point multiplication then truncateslen(s) // 4uses native integer floor divisionInlined computation in
get_total_tokensto eliminate function calls:encoded_tokens_len()once per message (4,368 calls in profiler), creating generator overhead plus function call costlen(message["content"]) // 4in a simple loopsum()generator machineryWhy This Is Faster
encoded_tokens_lenwas called 4,368 times at ~429ns per call. The optimized version eliminates most of these calls.sum()creates an iterator object; the simple loop avoids this allocation.Test Results Indicate
The optimization benefits all workloads uniformly:
The speedup is consistent because the optimization reduces per-message overhead proportionally—whether processing 1 message or 1,000, each message benefits equally from eliminated function calls and faster arithmetic.
Behavior Preservation
The mathematical equivalence
int(x * 0.25) == x // 4for non-negative integers ensures identical results across all test cases, including edge cases with empty strings, Unicode, and large content.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
🔎 Click to see Concolic Coverage Tests
codeflash_concolic_ik_zub_8/tmpr3v2j3zs/test_concolic_coverage.py::test_Memory_get_total_tokensTo edit these changes
git checkout codeflash/optimize-pr1059-2026-01-15T14.23.38and push.