From a043f1611036775bb026916619bf33d191699754 Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Wed, 7 Jan 2026 05:19:40 +0000
Subject: [PATCH] Optimize find_last_node
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimization achieves a **53x speedup** by eliminating nested iteration through a classic algorithmic improvement: replacing O(n×m) complexity with O(n+m) complexity.

**What changed:**
The original code checks every edge for every node using nested iteration: `all(e["source"] != n["id"] for e in edges)`. The optimized version pre-computes a set of all source node IDs once: `sources = {e["source"] for e in edges}`, then performs fast O(1) membership lookups: `n["id"] not in sources`.

**Why this is faster:**
1. **Set lookup vs. linear scan**: Python set membership (`in`) is O(1) average case using hash tables, while the `all()` check iterates through all edges for each node, resulting in O(m) per node check
2. **Single pass vs. repeated iteration**: The set is built once in O(m) time, then reused for all n nodes. The original code iterates through all m edges for each of the n nodes
3. **Algorithmic complexity reduction**: Total complexity drops from O(n×m) to O(n+m), which is dramatically faster as graph size grows

**Performance characteristics by test case:**
- **Small graphs** (2-5 nodes/edges): 26-100% faster - overhead of set creation is negligible
- **Medium graphs** (100 nodes/edges): 97-3193% faster - set lookup advantage becomes clear
- **Large graphs** (500+ nodes/edges): 418-16229% faster - the O(n×m) vs O(n+m) difference dominates
- **Dense graphs** (many edges, few nodes): 112-6874% faster - particularly benefits from avoiding repeated edge iteration

The line profiler confirms this: the original code spent 100% of time (46.7ms) in the nested iteration, while the optimized version spends only 43.8% (149μs) building the set and 56.2% (191μs) doing lookups - a total of 340μs vs 46.7ms.

**Impact considerations:**
Without `function_references`, we cannot determine if this function is in a hot path, but given it processes graph structures (potentially in flow/workflow systems based on "flow" in the docstring), any system repeatedly querying for terminal nodes would benefit significantly, especially as graph size scales.
---
 src/algorithms/graph.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/algorithms/graph.py b/src/algorithms/graph.py
index 777ea3b..f23d356 100644
--- a/src/algorithms/graph.py
+++ b/src/algorithms/graph.py
@@ -47,7 +47,8 @@ def find_shortest_path(self, start: str, end: str) -> list[str]:
 
 def find_last_node(nodes, edges):
     """This function receives a flow and returns the last node."""
-    return next((n for n in nodes if all(e["source"] != n["id"] for e in edges)), None)
+    sources = {e["source"] for e in edges}
+    return next((n for n in nodes if n["id"] not in sources), None)
 
 
 def find_leaf_nodes(nodes: list[dict], edges: list[dict]) -> list[dict]: