From 53582e36127e0fb254cba02b4c77051031045387 Mon Sep 17 00:00:00 2001 From: "codeflash-ai[bot]" <148906541+codeflash-ai[bot]@users.noreply.github.com> Date: Thu, 8 Jan 2026 07:56:38 +0000 Subject: [PATCH] Optimize find_last_node MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The optimized code achieves a **~240x speedup** by eliminating a nested quadratic operation and replacing it with a more efficient set-based lookup pattern. **Key optimization:** The original code performs a nested loop for each node, checking `all(e["source"] != n["id"] for e in edges)`. This creates O(N×E) comparisons where N is the number of nodes and E is the number of edges. The optimized version: 1. **Pre-builds a set of source IDs** (`sources = {e["source"] for e in edges}`) - O(E) operation 2. **Uses set membership testing** (`n["id"] not in sources`) - O(1) lookup per node 3. **Early returns when no edges exist** to avoid unnecessary work **Why this is faster:** - **Set lookups are O(1)** vs the original's O(E) per-node check with `all()` - **Single pass through edges** to build the set vs repeated iteration for each node - The overall complexity improves from O(N×E) to O(N+E) **Performance breakdown from tests:** - **Large linear chains** (1000 nodes): 18.9ms → 57.7μs (**327x faster**) - Benefits most from eliminating redundant edge traversals - **Small graphs** (2-3 nodes): 1.5-2.5μs → ~1μs (**~2x faster**) - Set overhead is minimal, lookup efficiency still wins - **Empty edges**: 1.2μs → 458ns (**~3x faster**) - Early return avoids creating empty set and accessing node IDs - **Dense graphs** with many edges show dramatic improvements as the O(E) reduction per node compounds **Behavioral preservation:** The early return for empty edges maintains the original behavior where nodes without an "id" key won't raise KeyError when there are no edges to check against, ensuring correctness while optimizing the common path. This optimization is particularly valuable in graph traversal scenarios where finding terminal nodes is done repeatedly or on large graphs, as evidenced by the massive gains on the large-scale test cases. --- src/algorithms/graph.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/algorithms/graph.py b/src/algorithms/graph.py index 777ea3b..065fdd4 100644 --- a/src/algorithms/graph.py +++ b/src/algorithms/graph.py @@ -47,7 +47,10 @@ def find_shortest_path(self, start: str, end: str) -> list[str]: def find_last_node(nodes, edges): """This function receives a flow and returns the last node.""" - return next((n for n in nodes if all(e["source"] != n["id"] for e in edges)), None) + if not edges: + return next(iter(nodes), None) + sources = {e["source"] for e in edges} + return next((n for n in nodes if n["id"] not in sources), None) def find_leaf_nodes(nodes: list[dict], edges: list[dict]) -> list[dict]: