Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 7, 2026

📄 13,556% (135.56x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 25.4 milliseconds 186 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 135x speedup by eliminating redundant work through better algorithmic design.

Key Optimization:
The original code uses a nested iteration pattern that checks every edge for every node, resulting in O(n*m) complexity where n is the number of nodes and m is the number of edges. For each node candidate, it iterates through all edges to verify none have that node as a source.

The optimized version pre-computes a set of all source node IDs once (O(m) operation), then performs constant-time membership checks (O(1) per node) as it iterates through nodes (O(n) total). This reduces the overall complexity to O(n+m).

Why This Matters:

  • Set membership is O(1) vs iterating through edges which is O(m)
  • Single pass through edges instead of m passes (once per node evaluation)
  • Early exit when the first qualifying node is found

Performance Characteristics:
The speedup is most dramatic on large graphs:

  • Long chains (999 nodes): 18.3ms → 59.1μs (309x faster)
  • Binary tree (500 nodes): 2.24ms → 18.5μs (120x faster)
  • Small graphs: 1-2μs → 0.5-0.8μs (still 2-3x faster)

Even tiny test cases show consistent improvements because building the source set is very cheap, while the original's nested iteration is expensive regardless of early termination.

Trade-offs:
The optimization adds minimal memory overhead (one set storing source IDs) but dramatically reduces CPU cycles, making it beneficial across all workload sizes tested.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 46 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from __future__ import annotations

import copy  # used to check that inputs are not mutated

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# -----------------------
# Basic Test Cases
# -----------------------


def test_basic_linear_chain():
    # Simple linear chain A -> B -> C. The last node (no outgoing edges) should be 'C'.
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 2.42μs -> 833ns (190% faster)


def test_empty_edges_returns_first_node():
    # When there are no edges, every node has no outgoing edges.
    # The implementation picks the first node in the nodes list.
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.25μs -> 542ns (131% faster)


def test_multiple_candidates_returns_first_candidate():
    # Multiple nodes may have no outgoing edges.
    # The function must return the first such node in the nodes list.
    nodes = [{"id": 10}, {"id": 20}, {"id": 30}]
    edges = [{"source": 20, "target": 999}]  # only node 20 has outgoing edge
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 667ns (131% faster)


# -----------------------
# Edge Test Cases
# -----------------------


def test_missing_edge_source_key_raises_keyerror():
    # If an edge dict lacks the 'source' key, accessing e['source'] should raise KeyError.
    nodes = [{"id": "A"}]
    edges = [{"target": "B"}]  # missing 'source'
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)  # 2.42μs -> 875ns (176% faster)


def test_edges_referencing_nonexistent_node_returns_first_node():
    # Edges may reference node ids that are not present in nodes.
    # That should not cause problems; nodes that aren't referenced as sources are candidates.
    nodes = [{"id": "X"}, {"id": "Y"}]
    edges = [{"source": "Z", "target": "X"}]  # 'Z' not in nodes
    # Since no edge has source 'X' or 'Y', the first node (X) is returned
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 667ns (144% faster)


def test_duplicate_node_ids_with_outgoing_edges_results_in_none():
    # If duplicate node ids exist and there is an outgoing edge from that id,
    # all nodes with that id should be considered to have outgoing edges,
    # so none of them should qualify; expect None.
    nodes = [{"id": "dup"}, {"id": "dup"}]
    edges = [{"source": "dup", "target": "other"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.88μs -> 667ns (181% faster)


def test_supports_non_string_ids_like_tuples_and_none():
    # Node ids need not be strings; ensure tuples and None work correctly.
    tuple_node = {"id": (1, 2)}
    none_node = {"id": None}
    nodes = [tuple_node, none_node]
    edges = [{"source": None, "target": "X"}]  # only the None id has an outgoing edge
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.62μs -> 667ns (144% faster)


def test_inputs_are_not_mutated_by_function():
    # Ensure the function does not modify the nodes or edges collections.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    nodes_copy = copy.deepcopy(nodes)
    edges_copy = copy.deepcopy(edges)
    codeflash_output = find_last_node(nodes, edges)
    _ = codeflash_output  # 1.88μs -> 666ns (182% faster)


def test_type_mismatch_between_ids_and_sources_is_handled_by_equality():
    # If ids and sources are of different types (e.g., int vs str), Python's equality will decide.
    nodes = [{"id": 1}, {"id": "1"}]
    edges = [{"source": "1", "target": "X"}]
    # Only the second node has an outgoing edge (source == "1"), so the first node (id 1) is candidate.
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 1.54μs -> 666ns (132% faster)


def test_non_iterable_nodes_raises_typeerror():
    # Passing something that is not iterable for nodes should raise a TypeError.
    nodes = None  # not iterable
    edges = []
    with pytest.raises(TypeError):
        find_last_node(nodes, edges)  # 1.25μs -> 875ns (42.9% faster)


# -----------------------
# Large Scale Test Cases (performance and scalability under size constraints)
# - Keep structures under 1000 elements and avoid loops exceeding 1000 iterations.
# -----------------------


def test_large_scale_long_chain_999_nodes():
    # Create a long chain of 999 nodes: 0 -> 1 -> 2 -> ... -> 997 -> 998
    # The last node (id 998) has no outgoing edges and should be returned.
    size = 999  # below the 1000-element constraint
    nodes = [{"id": i} for i in range(size)]
    # Create edges for the chain up to the second-to-last node
    edges = [{"source": i, "target": i + 1} for i in range(size - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 18.3ms -> 59.1μs (30924% faster)


def test_large_scale_many_candidates_returns_first_among_many():
    # Large node list where many nodes have no outgoing edges; ensure first such node is returned.
    size = 500  # well under 1000
    # All nodes have ids 0..499
    nodes = [{"id": i} for i in range(size)]
    # Make outgoing edges only for nodes with odd ids
    edges = [{"source": i, "target": -i} for i in range(size) if i % 2 == 1]
    # Even id 0 is the first node and has no outgoing edge -> should be returned.
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output  # 11.6μs -> 7.88μs (47.6% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests


class TestFindLastNodeBasic:
    """Basic test cases for normal flow scenarios."""

    def test_single_node_no_edges(self):
        """Test with a single node and no edges - node should be the last node."""
        nodes = [{"id": 1, "name": "Node1"}]
        edges = []
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.29μs -> 500ns (158% faster)

    def test_two_nodes_one_edge(self):
        """Test linear flow: Node1 -> Node2, where Node2 is the last node."""
        nodes = [{"id": 1}, {"id": 2}]
        edges = [{"source": 1, "target": 2}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.88μs -> 709ns (164% faster)

    def test_three_nodes_linear_chain(self):
        """Test linear chain: Node1 -> Node2 -> Node3, where Node3 is last."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
        edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.29μs -> 833ns (175% faster)

    def test_string_node_ids(self):
        """Test with string node IDs instead of integers."""
        nodes = [{"id": "start"}, {"id": "end"}]
        edges = [{"source": "start", "target": "end"}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.92μs -> 708ns (171% faster)

    def test_nodes_with_additional_properties(self):
        """Test that additional node properties are preserved in result."""
        nodes = [
            {"id": 1, "label": "Start", "data": {"x": 0}},
            {"id": 2, "label": "End", "data": {"x": 100}},
        ]
        edges = [{"source": 1, "target": 2, "label": "flow"}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.88μs -> 709ns (164% faster)


class TestFindLastNodeMultiplePaths:
    """Test cases for graphs with multiple paths and branches."""

    def test_tree_structure_single_leaf(self):
        """Test tree with one root and two leaves - returns first leaf found."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
        edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.88μs -> 833ns (125% faster)

    def test_diamond_pattern(self):
        """Test diamond: 1 -> 2,3 -> 4, where node 4 is last."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
        edges = [
            {"source": 1, "target": 2},
            {"source": 1, "target": 3},
            {"source": 2, "target": 4},
            {"source": 3, "target": 4},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.67μs -> 959ns (178% faster)

    def test_multiple_disconnected_components(self):
        """Test graph with multiple disconnected components."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
        edges = [{"source": 1, "target": 2}, {"source": 3, "target": 4}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.88μs -> 833ns (125% faster)


class TestFindLastNodeEmptyAndNone:
    """Edge cases for empty or None inputs."""

    def test_empty_nodes_list(self):
        """Test with empty nodes list - should return None."""
        nodes = []
        edges = []
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 875ns -> 417ns (110% faster)

    def test_empty_edges_with_single_node(self):
        """Test single node with empty edges list - node is last."""
        nodes = [{"id": 1}]
        edges = []
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.29μs -> 583ns (122% faster)

    def test_empty_edges_with_multiple_nodes(self):
        """Test multiple nodes with no edges - returns first node."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
        edges = []
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.29μs -> 542ns (138% faster)


class TestFindLastNodeNoLastNode:
    """Test cases where no last node exists."""

    def test_circular_graph_two_nodes(self):
        """Test circular graph: 1 -> 2 -> 1, no last node exists."""
        nodes = [{"id": 1}, {"id": 2}]
        edges = [{"source": 1, "target": 2}, {"source": 2, "target": 1}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.92μs -> 750ns (156% faster)

    def test_circular_graph_three_nodes(self):
        """Test circular graph: 1 -> 2 -> 3 -> 1, no last node."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
        edges = [
            {"source": 1, "target": 2},
            {"source": 2, "target": 3},
            {"source": 3, "target": 1},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.21μs -> 792ns (179% faster)

    def test_fully_connected_graph(self):
        """Test where every node has outgoing edges."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
        edges = [
            {"source": 1, "target": 2},
            {"source": 2, "target": 3},
            {"source": 3, "target": 1},
            {"source": 1, "target": 3},
            {"source": 2, "target": 1},
            {"source": 3, "target": 2},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.21μs -> 1.00μs (121% faster)


class TestFindLastNodeMultipleLastNodes:
    """Test cases with multiple valid last nodes."""

    def test_two_last_nodes_returns_first(self):
        """Test with two nodes having no outgoing edges - returns first."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
        edges = [{"source": 1, "target": 2}]
        # Both nodes 2 and 3 have no outgoing edges
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.75μs -> 750ns (133% faster)

    def test_all_nodes_are_last_nodes(self):
        """Test where all nodes have no outgoing edges."""
        nodes = [{"id": "a"}, {"id": "b"}, {"id": "c"}]
        edges = []
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.29μs -> 542ns (138% faster)

    def test_multiple_leaf_nodes_in_tree(self):
        """Test tree structure with multiple leaf nodes."""
        nodes = [{"id": 0}, {"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
        edges = [
            {"source": 0, "target": 1},
            {"source": 0, "target": 2},
            {"source": 1, "target": 3},
            {"source": 1, "target": 4},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.38μs -> 958ns (148% faster)


class TestFindLastNodeComplexStructures:
    """Test cases for complex graph structures."""

    def test_self_loop_not_last_node(self):
        """Test node with self-loop - it has outgoing edge so not last."""
        nodes = [{"id": 1}, {"id": 2}]
        edges = [{"source": 1, "target": 1}, {"source": 1, "target": 2}]  # Self-loop
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.88μs -> 833ns (125% faster)

    def test_self_loop_only_node(self):
        """Test single node with only self-loop - not a last node."""
        nodes = [{"id": 1}]
        edges = [{"source": 1, "target": 1}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.38μs -> 666ns (106% faster)

    def test_multiple_edges_same_source(self):
        """Test multiple edges from the same source node."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
        edges = [
            {"source": 1, "target": 2},
            {"source": 1, "target": 3},
            {"source": 1, "target": 4},
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.96μs -> 875ns (124% faster)

    def test_mixed_id_types(self):
        """Test with mixed ID types (int and string)."""
        nodes = [{"id": 1}, {"id": "two"}, {"id": 3}]
        edges = [{"source": 1, "target": "two"}, {"source": "two", "target": 3}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.62μs -> 875ns (200% faster)

    def test_node_not_in_any_edge(self):
        """Test node that appears in nodes but not in any edge."""
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
        edges = [{"source": 1, "target": 2}]
        # Node 3 is not involved in any edge
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.75μs -> 667ns (162% faster)


class TestFindLastNodeDataStructureVariations:
    """Test variations in data structure formats."""

    def test_numeric_zero_as_id(self):
        """Test with 0 as a node ID (falsy but valid)."""
        nodes = [{"id": 0}, {"id": 1}]
        edges = [{"source": 0, "target": 1}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.83μs -> 750ns (144% faster)

    def test_negative_numeric_ids(self):
        """Test with negative numbers as IDs."""
        nodes = [{"id": -1}, {"id": -2}, {"id": -3}]
        edges = [{"source": -1, "target": -2}, {"source": -2, "target": -3}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.25μs -> 1.25μs (80.0% faster)

    def test_edges_with_extra_properties(self):
        """Test edges with additional properties beyond source/target."""
        nodes = [{"id": 1}, {"id": 2}]
        edges = [
            {
                "source": 1,
                "target": 2,
                "weight": 5,
                "label": "connection",
                "metadata": {"color": "blue"},
            }
        ]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.79μs -> 750ns (139% faster)

    def test_empty_string_as_id(self):
        """Test with empty string as node ID."""
        nodes = [{"id": ""}, {"id": "node1"}]
        edges = [{"source": "", "target": "node1"}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.92μs -> 709ns (170% faster)


class TestFindLastNodeLargeScale:
    """Large scale test cases for performance and scalability."""

    def test_large_linear_chain(self):
        """Test with a long linear chain of 500 nodes."""
        # Create chain: 0 -> 1 -> 2 -> ... -> 499
        n = 500
        nodes = [{"id": i} for i in range(n)]
        edges = [{"source": i, "target": i + 1} for i in range(n - 1)]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 4.66ms -> 29.5μs (15685% faster)

    def test_large_tree_structure(self):
        """Test with a tree of 500 nodes (binary tree structure)."""
        # Create binary tree: node i has children 2i+1 and 2i+2
        n = 500
        nodes = [{"id": i} for i in range(n)]
        edges = []
        for i in range(n // 2):
            if 2 * i + 1 < n:
                edges.append({"source": i, "target": 2 * i + 1})
            if 2 * i + 2 < n:
                edges.append({"source": i, "target": 2 * i + 2})

        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 2.24ms -> 18.5μs (12027% faster)

    def test_large_star_topology(self):
        """Test with star topology - one central node connected to 800 nodes."""
        # Central node 0 connects to nodes 1-800
        n = 800
        nodes = [{"id": i} for i in range(n + 1)]
        edges = [{"source": 0, "target": i} for i in range(1, n + 1)]

        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 31.3μs -> 16.0μs (95.8% faster)

    def test_many_edges_same_nodes(self):
        """Test with many edges between a small set of nodes."""
        nodes = [{"id": i} for i in range(10)]
        edges = []
        # Create dense graph with 500 edges
        for i in range(500):
            source = i % 9  # Sources are 0-8
            target = (i % 9) + 1  # Targets are 1-9
            edges.append({"source": source, "target": target})

        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 24.0μs -> 13.5μs (77.8% faster)

    def test_large_disconnected_components(self):
        """Test with 100 disconnected linear chains."""
        # Create 100 chains, each of length 5
        nodes = []
        edges = []
        for chain in range(100):
            for i in range(5):
                node_id = chain * 5 + i
                nodes.append({"id": node_id})
                if i < 4:  # Connect to next in chain
                    edges.append({"source": node_id, "target": node_id + 1})

        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 19.2μs -> 12.8μs (49.4% faster)


class TestFindLastNodeOrderDependence:
    """Test that function returns first matching node in iteration order."""

    def test_returns_first_last_node_in_list(self):
        """Verify that the first last node in the list is returned."""
        # Create scenario where nodes 1, 2, 3 all have no outgoing edges
        nodes = [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 0}]
        edges = [{"source": 0, "target": 1}]
        codeflash_output = find_last_node(nodes, edges)
        result = codeflash_output  # 1.54μs -> 667ns (131% faster)

    def test_node_order_affects_result(self):
        """Test that different node orderings can produce different results."""
        edges = [{"source": "a", "target": "b"}]

        # Test with one ordering
        nodes_v1 = [{"id": "a"}, {"id": "b"}, {"id": "c"}]
        codeflash_output = find_last_node(nodes_v1, edges)
        result_v1 = codeflash_output  # 1.96μs -> 750ns (161% faster)

        # Test with different ordering
        nodes_v2 = [{"id": "c"}, {"id": "b"}, {"id": "a"}]
        codeflash_output = find_last_node(nodes_v2, edges)
        result_v2 = codeflash_output  # 750ns -> 333ns (125% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mk3kwjau and push.

Codeflash Static Badge

The optimized code achieves a **135x speedup** by eliminating redundant work through better algorithmic design.

**Key Optimization:**
The original code uses a nested iteration pattern that checks every edge for every node, resulting in O(n*m) complexity where n is the number of nodes and m is the number of edges. For each node candidate, it iterates through all edges to verify none have that node as a source.

The optimized version pre-computes a set of all source node IDs once (O(m) operation), then performs constant-time membership checks (O(1) per node) as it iterates through nodes (O(n) total). This reduces the overall complexity to O(n+m).

**Why This Matters:**
- **Set membership is O(1)** vs iterating through edges which is O(m)
- **Single pass through edges** instead of m passes (once per node evaluation)
- **Early exit** when the first qualifying node is found

**Performance Characteristics:**
The speedup is most dramatic on large graphs:
- **Long chains (999 nodes)**: 18.3ms → 59.1μs (309x faster)
- **Binary tree (500 nodes)**: 2.24ms → 18.5μs (120x faster)  
- **Small graphs**: 1-2μs → 0.5-0.8μs (still 2-3x faster)

Even tiny test cases show consistent improvements because building the source set is very cheap, while the original's nested iteration is expensive regardless of early termination.

**Trade-offs:**
The optimization adds minimal memory overhead (one set storing source IDs) but dramatically reduces CPU cycles, making it beneficial across all workload sizes tested.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 January 7, 2026 05:29
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant