From a043f1611036775bb026916619bf33d191699754 Mon Sep 17 00:00:00 2001 From: "codeflash-ai[bot]" <148906541+codeflash-ai[bot]@users.noreply.github.com> Date: Wed, 7 Jan 2026 05:19:40 +0000 Subject: [PATCH] Optimize find_last_node MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The optimization achieves a **53x speedup** by eliminating nested iteration through a classic algorithmic improvement: replacing O(n×m) complexity with O(n+m) complexity. **What changed:** The original code checks every edge for every node using nested iteration: `all(e["source"] != n["id"] for e in edges)`. The optimized version pre-computes a set of all source node IDs once: `sources = {e["source"] for e in edges}`, then performs fast O(1) membership lookups: `n["id"] not in sources`. **Why this is faster:** 1. **Set lookup vs. linear scan**: Python set membership (`in`) is O(1) average case using hash tables, while the `all()` check iterates through all edges for each node, resulting in O(m) per node check 2. **Single pass vs. repeated iteration**: The set is built once in O(m) time, then reused for all n nodes. The original code iterates through all m edges for each of the n nodes 3. **Algorithmic complexity reduction**: Total complexity drops from O(n×m) to O(n+m), which is dramatically faster as graph size grows **Performance characteristics by test case:** - **Small graphs** (2-5 nodes/edges): 26-100% faster - overhead of set creation is negligible - **Medium graphs** (100 nodes/edges): 97-3193% faster - set lookup advantage becomes clear - **Large graphs** (500+ nodes/edges): 418-16229% faster - the O(n×m) vs O(n+m) difference dominates - **Dense graphs** (many edges, few nodes): 112-6874% faster - particularly benefits from avoiding repeated edge iteration The line profiler confirms this: the original code spent 100% of time (46.7ms) in the nested iteration, while the optimized version spends only 43.8% (149μs) building the set and 56.2% (191μs) doing lookups - a total of 340μs vs 46.7ms. **Impact considerations:** Without `function_references`, we cannot determine if this function is in a hot path, but given it processes graph structures (potentially in flow/workflow systems based on "flow" in the docstring), any system repeatedly querying for terminal nodes would benefit significantly, especially as graph size scales. --- src/algorithms/graph.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/algorithms/graph.py b/src/algorithms/graph.py index 777ea3b..f23d356 100644 --- a/src/algorithms/graph.py +++ b/src/algorithms/graph.py @@ -47,7 +47,8 @@ def find_shortest_path(self, start: str, end: str) -> list[str]: def find_last_node(nodes, edges): """This function receives a flow and returns the last node.""" - return next((n for n in nodes if all(e["source"] != n["id"] for e in edges)), None) + sources = {e["source"] for e in edges} + return next((n for n in nodes if n["id"] not in sources), None) def find_leaf_nodes(nodes: list[dict], edges: list[dict]) -> list[dict]: