-
Notifications
You must be signed in to change notification settings - Fork 215
Feature/compute trace #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/compute trace #88
Conversation
…ndling with headers, error handling, and thread safety, Fix star imports and add error handling for NVML initialization, Fix the CSV reading to match the format written by execution.py, Return a structured error message or dictionary instead, Use file locking
Reviewer's GuideThis PR instruments the inference pipeline to record per-image execution time and GPU metadata into a persistent CSV trace, exposes an internal endpoint to compute a compact time-and-GPU badge, and integrates a frontend badge component for real-time display. Class diagram for InferenceBadge frontend componentclassDiagram
class InferenceBadge {
- badge: string
+ useEffect()
+ setBadge()
+ render()
}
Class diagram for backend inference timing and CSV traceclassDiagram
class Execution {
+ execute()
+ get_gpu_info()
+ write_inference_trace()
}
class CSVTrace {
+ node_id
+ inference_time
+ gpu_name
+ gpu_driver
}
Execution -- CSVTrace : writes to
Class diagram for API badge endpointclassDiagram
class InternalRoutes {
+ get_inference_badge(request)
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
WalkthroughA new backend API endpoint and React component were added to provide real-time inference speed and GPU information as a badge. The backend now logs inference timing and GPU details to a CSV file, and exposes this data via an internal API. The frontend displays the badge by fetching from this endpoint. Minor dependency updates were also made. Changes
Sequence Diagram(s)Inference Badge Fetch FlowsequenceDiagram
participant User
participant InferenceBadge (React)
participant API Server
participant CSV File
User ->> InferenceBadge: Load component
InferenceBadge ->> API Server: GET /internal/inference/badge
API Server ->> CSV File: Read inference_trace.csv
CSV File -->> API Server: Return timing & GPU info
API Server -->> InferenceBadge: JSON { badge: "..." }
InferenceBadge -->> User: Display badge
Inference Execution Logging FlowsequenceDiagram
participant Execution Engine
participant GPU
participant CSV File
Execution Engine ->> GPU: Query name & driver
Execution Engine ->> Execution Engine: Run node, measure wall time
Execution Engine ->> CSV File: Append timing, GPU info
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Poem
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @seacoastcreates - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- function
get_inference_badgeis defined inside a function but never used (link) - Variance calculation and CSV appending may cause file corruption due to incorrect file handle. (link)
General comments:
- The timing and CSV I/O logic is deeply embedded in the
executefunction—consider extracting it into a separate tracing service or helper that appends rows rather than rewriting the file on every node execution. - There’s an inconsistency between your CSV header (
inference_time) and the badge endpoint’s field name (wall_time_sec), and the hardcodedinference_trace.csvpath reduces flexibility—align these names and make the trace file location configurable. - The
get_gpu_info()function is never actually called before writing traces, sogpu_name/gpu_drivermay be undefined—invoke it once at startup (or cache its result) instead of inside the hot execution path.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The timing and CSV I/O logic is deeply embedded in the `execute` function—consider extracting it into a separate tracing service or helper that appends rows rather than rewriting the file on every node execution.
- There’s an inconsistency between your CSV header (`inference_time`) and the badge endpoint’s field name (`wall_time_sec`), and the hardcoded `inference_trace.csv` path reduces flexibility—align these names and make the trace file location configurable.
- The `get_gpu_info()` function is never actually called before writing traces, so `gpu_name`/`gpu_driver` may be undefined—invoke it once at startup (or cache its result) instead of inside the hot execution path.
## Individual Comments
### Comment 1
<location> `ComfyUI/execution.py:270` </location>
<code_context>
else:
return str(x)
+
+def get_gpu_info():
+ try:
+ import pynvml
</code_context>
<issue_to_address>
get_gpu_info() is defined but never called in the new code.
If you intend to record GPU info in the CSV, call this function and use its output. Otherwise, consider removing it as unused code.
</issue_to_address>
### Comment 2
<location> `ComfyUI/execution.py:418` </location>
<code_context>
+ if execution_list.is_empty():
</code_context>
<issue_to_address>
Variance calculation and CSV appending may cause file corruption due to incorrect file handle.
CSV_LOCK is a threading.Lock, not a file path, so passing it to open() will cause an exception. Use CSV_PATH in the open() call instead.
</issue_to_address>
### Comment 3
<location> `ComfyUI/api_server/routes/internal/internal_routes.py:68` </location>
<code_context>
+ @self.routes.get('/inference/badge')
</code_context>
<issue_to_address>
The badge endpoint averages all rows, including possible summary rows.
Explicitly filter out rows with node_id values like 'mean', 'stddev', or 'tolerance' to ensure summary statistics are not included in the average, rather than relying on catching ValueError.
Suggested implementation:
```python
# Read CSV with explicit fieldnames to match writing format (no header)
with open(csv_path, newline='') as csvfile:
```
```python
with open(csv_path, newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if len(row) < 3:
continue
node_id, time, gpu = row[0], row[1], row[2]
# Explicitly filter out summary/statistic rows
if node_id in ('mean', 'stddev', 'tolerance'):
continue
try:
node_id_int = int(node_id)
times.append(float(time))
gpu_name = gpu
except ValueError:
continue
```
</issue_to_address>
## Security Issues
### Issue 1
<location> `ComfyUI/api_server/routes/internal/internal_routes.py:68` </location>
<issue_to_address>
**security (python.lang.maintainability.useless-inner-function):** function `get_inference_badge` is defined inside a function but never used
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| else: | ||
| return str(x) | ||
|
|
||
| def get_gpu_info(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: get_gpu_info() is defined but never called in the new code.
If you intend to record GPU info in the CSV, call this function and use its output. Otherwise, consider removing it as unused code.
| if execution_list.is_empty(): | ||
| import pandas as pd | ||
| import numpy as np | ||
|
|
||
| try: | ||
| df = pd.read_csv(CSV_PATH, names=["node_id", "inference_time", "gpu_name", "gpu_driver"]) | ||
| mean = np.mean(df["inference_time"]) | ||
| stddev = np.std(df["inference_time"]) | ||
| tolerance = stddev / mean | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Variance calculation and CSV appending may cause file corruption due to incorrect file handle.
CSV_LOCK is a threading.Lock, not a file path, so passing it to open() will cause an exception. Use CSV_PATH in the open() call instead.
| @self.routes.get('/inference/badge') | ||
| async def get_inference_badge(request): | ||
| csv_path = "inference_trace.csv" | ||
| if not os.path.exists(csv_path): | ||
| return web.json_response({"badge": "N/A"}) | ||
|
|
||
| times = [] | ||
| gpu_name = "Unknown" | ||
|
|
||
| # Read CSV with explicit fieldnames to match writing format (no header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): The badge endpoint averages all rows, including possible summary rows.
Explicitly filter out rows with node_id values like 'mean', 'stddev', or 'tolerance' to ensure summary statistics are not included in the average, rather than relying on catching ValueError.
Suggested implementation:
# Read CSV with explicit fieldnames to match writing format (no header)
with open(csv_path, newline='') as csvfile: with open(csv_path, newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if len(row) < 3:
continue
node_id, time, gpu = row[0], row[1], row[2]
# Explicitly filter out summary/statistic rows
if node_id in ('mean', 'stddev', 'tolerance'):
continue
try:
node_id_int = int(node_id)
times.append(float(time))
gpu_name = gpu
except ValueError:
continue| @self.routes.get('/inference/badge') | ||
| async def get_inference_badge(request): | ||
| csv_path = "inference_trace.csv" | ||
| if not os.path.exists(csv_path): | ||
| return web.json_response({"badge": "N/A"}) | ||
|
|
||
| times = [] | ||
| gpu_name = "Unknown" | ||
|
|
||
| # Read CSV with explicit fieldnames to match writing format (no header) | ||
| with open(csv_path, newline='') as csvfile: | ||
| reader = csv.DictReader(csvfile, fieldnames=["node_id", "wall_time_sec", "gpu_name", "gpu_driver"]) | ||
| for row in reader: | ||
| # Skip summary rows if present | ||
| try: | ||
| times.append(float(row["wall_time_sec"])) | ||
| except ValueError: | ||
| continue | ||
| gpu_name = row["gpu_name"] | ||
|
|
||
| if not times: | ||
| return web.json_response({"badge": "N/A"}) | ||
|
|
||
| avg_time = sum(times) / len(times) | ||
| badge = f"{avg_time:.2f} s per image · {gpu_name}" | ||
| return web.json_response({"badge": badge}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (python.lang.maintainability.useless-inner-function): function get_inference_badge is defined inside a function but never used
Source: opengrep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
ComfyUI/execution.py (2)
360-365:elapsed_timecomputed but not recorded if write fails
elapsed_timeis only passed towrite_inference_trace; if that inner call raises the outer try swallows it and continues, losing the timing. Consider early return on failure or move timing logic outside the broad try-except.
366-384: CSV write reads entire file each node – O(N²)Reading and rewriting the whole CSV on every node execution is O(rows²). Append-only with
f.seek(0,2)and a simple row counter (or rotate files) would scale better.dream_layer_frontend/src/components/InferenceBadge.tsx (1)
13-23: Surfacing backend errors to UIRight now any non-Abort error logs to console and shows a generic “
⚠️ Error”. Consider exposing the actual message in a tooltip for easier debugging.ComfyUI/api_server/routes/internal/internal_routes.py (1)
78-93: Blocking file I/O inside aiohttp handler
open()and CSV parsing are synchronous; large trace files will block the event loop. Useaiofilesor run in a thread executor.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
dream_layer_frontend/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (4)
ComfyUI/api_server/routes/internal/internal_routes.py(2 hunks)ComfyUI/execution.py(6 hunks)dream_layer_frontend/package.json(1 hunks)dream_layer_frontend/src/components/InferenceBadge.tsx(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-14T22:55:51.063Z
Learnt from: rockerBOO
PR: DreamLayer-AI/DreamLayer#28
File: dream_layer_frontend/src/components/WorkflowCustomNode.tsx:88-106
Timestamp: 2025-07-14T22:55:51.063Z
Learning: In the DreamLayer frontend codebase, the team prefers to rely on TypeScript's type system for data validation rather than adding defensive programming checks, when the types are well-defined and data flow is controlled.
Applied to files:
dream_layer_frontend/package.json
📚 Learning: 2025-07-14T23:02:36.431Z
Learnt from: rockerBOO
PR: DreamLayer-AI/DreamLayer#28
File: dream_layer_frontend/package.json:44-44
Timestamp: 2025-07-14T23:02:36.431Z
Learning: xyflow/react version 12.8.2 is a valid and current version available on the npm registry as of July 2025, contrary to previous incorrect information suggesting it wasn't available.
Applied to files:
dream_layer_frontend/package.json
🪛 Ruff (0.12.2)
ComfyUI/api_server/routes/internal/internal_routes.py
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
ComfyUI/execution.py
269-269: Blank line contains whitespace
Remove whitespace from blank line
(W293)
272-272: Trailing whitespace
Remove trailing whitespace
(W291)
364-364: Local variable elapsed_time is assigned to but never used
Remove assignment to unused variable elapsed_time
(F841)
366-366: Trailing whitespace
Remove trailing whitespace
(W291)
370-370: Blank line contains whitespace
Remove whitespace from blank line
(W293)
391-391: Blank line contains whitespace
Remove whitespace from blank line
(W293)
430-430: Undefined name gpu_name
(F821)
430-430: Undefined name gpu_driver
(F821)
431-431: Undefined name gpu_name
(F821)
431-431: Undefined name gpu_driver
(F821)
432-432: Undefined name gpu_name
(F821)
432-432: Undefined name gpu_driver
(F821)
🔇 Additional comments (2)
dream_layer_frontend/package.json (1)
69-71: Dev-dependency bump looks goodThe upgrade to
@types/react@^18.3.23is safe and aligns with the current React 18 line.ComfyUI/execution.py (1)
366-391: Lock is local and path variable mis-used
CSV_LOCK = threading.Lock()is re-created for every node ⇒ no mutual exclusion across threads/processes.- Inside the lock you do
with open(CSV_LOCK, "a")(later) which passes the lock object as a file path — will raiseTypeError.CSV_PATH/ column names mismatch (inference_timevswall_time_sec).-CSV_LOCK = threading.Lock() +CSV_LOCK = globals().setdefault("_INFERENCE_CSV_LOCK", threading.Lock()) ... -with open(CSV_LOCK, "a") as f: +with open(CSV_PATH, "a") as f:And keep field names consistent with the header written.
Likely an incorrect or invalid review comment.
| reader = csv.DictReader(csvfile, fieldnames=["node_id", "wall_time_sec", "gpu_name", "gpu_driver"]) | ||
| for row in reader: | ||
| # Skip summary rows if present | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field-name mismatch with writer
Writer uses inference_time, reader expects wall_time_sec; average will always be empty when names differ. Align column names.
🤖 Prompt for AI Agents
In ComfyUI/api_server/routes/internal/internal_routes.py around lines 79 to 82,
the CSV reader uses the field name "wall_time_sec" while the writer uses
"inference_time", causing a mismatch and resulting in empty averages. To fix
this, update the reader's fieldnames list to use "inference_time" instead of
"wall_time_sec" so that the column names align correctly.
| def get_gpu_info(): | ||
| try: | ||
| import pynvml | ||
| pynvml.nvmlInit() | ||
| handle = pynvml.nvmlDeviceGetHandleByIndex(0) | ||
| gpu_name = pynvml.nvmlDeviceGetName(handle).decode() | ||
| gpu_driver = pynvml.nvmlSystemGetDriverVersion().decode() | ||
| except Exception: | ||
| gpu_name = "Unknown" | ||
| gpu_driver = "Unknown" | ||
| return gpu_name, gpu_driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
get_gpu_info() never used
The helper returns GPU data but nothing assigns
gpu_name, gpu_driver = get_gpu_info() before they are referenced later, leading to NameError.
+ gpu_name, gpu_driver = get_gpu_info()Place this immediately before start_time = time.perf_counter().
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.12.2)
272-272: Trailing whitespace
Remove trailing whitespace
(W291)
🤖 Prompt for AI Agents
In ComfyUI/execution.py around lines 270 to 280, the function get_gpu_info() is
defined but never called, causing gpu_name and gpu_driver to be undefined later
and resulting in a NameError. To fix this, add a call to get_gpu_info() and
assign its returned values to gpu_name and gpu_driver variables immediately
before the line where start_time = time.perf_counter() is set, ensuring these
variables are properly initialized before use.
| try: | ||
| df = pd.read_csv(CSV_PATH, names=["node_id", "inference_time", "gpu_name", "gpu_driver"]) | ||
| mean = np.mean(df["inference_time"]) | ||
| stddev = np.std(df["inference_time"]) | ||
| tolerance = stddev / mean | ||
|
|
||
| with CSV_LOCK: | ||
| with open(CSV_LOCK, "a") as f: | ||
| f.write(f"mean,{mean:.4f},{gpu_name},{gpu_driver}\n") | ||
| f.write(f"stddev,{stddev:.4f},{gpu_name},{gpu_driver}\n") | ||
| f.write(f"tolerance,{tolerance:.4f},{gpu_name},{gpu_driver}\n") | ||
| except Exception as e: | ||
| logging.error(f"Failed to append variance to CSV: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined identifiers & summary write bug
gpu_name and gpu_driver used here are out of scope (see first comment) and will crash.
Additionally, with open(CSV_LOCK, "a") repeats the path error.
🧰 Tools
🪛 Ruff (0.12.2)
430-430: Undefined name gpu_name
(F821)
430-430: Undefined name gpu_driver
(F821)
431-431: Undefined name gpu_name
(F821)
431-431: Undefined name gpu_driver
(F821)
432-432: Undefined name gpu_name
(F821)
432-432: Undefined name gpu_driver
(F821)
🤖 Prompt for AI Agents
In ComfyUI/execution.py around lines 422 to 434, the variables gpu_name and
gpu_driver are used but not defined in this scope, causing a crash. Also, the
code mistakenly uses CSV_LOCK as the file path in the open() call instead of the
CSV_PATH. To fix this, ensure gpu_name and gpu_driver are properly retrieved or
passed into this block before use, and replace CSV_LOCK with CSV_PATH in the
open() call to append to the correct CSV file.
Description
Brief description of what this PR does.
Description
Captures wall time per image for the inference block and records device information such as GPU name and driver. UI shows a compact badge like “2.4 s per image · A6000”. CSV includes the same fields. Variance tolerance is documented.
Changes Made
Evidence Required ✅
UI Screenshot
Generated Image
No API credits to generate image
Logs
Tests (Optional)
Checklist
Summary by Sourcery
Introduce end-to-end inference tracing by measuring per-node execution time and GPU details, storing the data in a CSV, and exposing an API and UI badge for average inference time and GPU identification
New Features:
Build:
Summary by CodeRabbit
New Features
Bug Fixes
Chores