⚡️ Speed up function find_last_node by 16,667%
#240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 16,667% (166.67x) speedup for
find_last_nodeinsrc/algorithms/graph.py⏱️ Runtime :
98.3 milliseconds→586 microseconds(best of166runs)📝 Explanation and details
The optimized code achieves a 167x speedup (from 98.3ms to 586μs) by eliminating a critical algorithmic bottleneck.
What Changed:
The original implementation uses a nested loop structure: for each node, it checks
all(e["source"] != n["id"] for e in edges), which iterates through every edge. This creates O(nodes × edges) time complexity.The optimized version precomputes a set of all source IDs with
sources = {e["source"] for e in edges}, then checks membership withn["id"] not in sources. Set membership is O(1) on average, reducing overall complexity to O(nodes + edges).Why This Is Faster:
In Python, the
all()function with a generator expression forces iteration over the entire edges list for each node. With many edges, this becomes extremely expensive. The line profiler shows the original code spending 770ms in that single line across 59 calls.The optimized approach builds the set once (561μs) then performs fast lookups (346μs total for all nodes). Set construction is O(edges) and lookups are O(1), making the total O(nodes + edges) instead of O(nodes × edges).
Performance Characteristics:
test_large_linear_chain(1000 nodes, 999 edges) show 327x speedup (18.5ms → 56.4μs), andtest_large_complete_graph(100 nodes, 9900 edges) shows 88x speedup (17.3ms → 193μs).test_many_nodes_few_edges), the optimization still helps (139% faster) since building a small set is cheap and eliminates redundant edge traversals.The optimization is universally beneficial across all test cases, with particularly dramatic improvements when the edge count is high relative to nodes.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-find_last_node-mk3e8gsoand push.