-
Notifications
You must be signed in to change notification settings - Fork 27
[Optimization 1/n] Add Profiling Module (NCU Wrapper Generation & Kernel Profiling) #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| try: | ||
| from jinja2 import Template | ||
|
|
||
| HAS_JINJA2 = True | ||
| except ImportError: | ||
| HAS_JINJA2 = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we do this check in main too, but we don't need to
We should assume that users have a correct environment for common imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - fixed
| logger: Logger instance | ||
| """ | ||
| self.logger = logger | ||
| self._template_cache: Optional[Template] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer | None
| if self._template_cache is not None: | ||
| return self._template_cache | ||
|
|
||
| if not HAS_JINJA2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto above
| HAS_JINJA2 = False | ||
|
|
||
|
|
||
| class NCUWrapperGenerator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generator is a loaded term maybe Factory?
| class NCUWrapperGenerator: | |
| class NCUWrapperFactory: |
| __all__ = [ | ||
| "METRICS", | ||
| "METRIC_COLUMNS", | ||
| "profile_triton_kernel", | ||
| "load_ncu_metrics", | ||
| "metrics_to_prompt", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need this?
| metrics_file = self.logs_dir / f"round{round_num:03d}_ncu_metrics.json" | ||
|
|
||
| # Build metadata | ||
| metadata = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment, let's make a dataclass to pass around
| benchmark_script: Path, | ||
| workdir: Path, | ||
| out_csv: str = "ncu_output.csv", | ||
| python_executable: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we passed this to NCU command to specify the python executable for the benchmark script
| sub = pd.concat(results, ignore_index=True) | ||
| else: | ||
| sub = pd.DataFrame(columns=keep_cols) | ||
| elif select in ("first", "last", "max_cycles"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this an enum
|
Thanks again for modularizing everything! Few question/comments:
|
Thanks for reviewing this PR! For each |
|
Could you rebase/update the series onto latest |
29fc219 to
543453a
Compare
|
|
||
| [tool.setuptools.packages.find] | ||
| include = ["triton_kernel_agent*", "Fuser*", "scripts", "utils"] | ||
| include = ["triton_kernel_agent*", "kernel_perf_agent*", "Fuser*", "scripts", "utils"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing namespace config or init.py to pick up kernel_perf_agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Add the init.py in the latest commit
|
|
||
| # Build NCU command | ||
| cmd = [ | ||
| "sudo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we consider not forcing hardcode sudo in library code and make it opt-in instead? We can change profile_triton_kernel(... ) to accept use_sudo: bool = False (and/or read env like KERNELAGENT_NCU_USE_SUDO=1. If the non-sudo run fails with a known permission error (NCU often reports profiling permission restrictions), raise a clearer error: “NCU requires admin profiling permissions on this system; rerun with use_sudo=True or adjust driver setting / cluster policy.”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great feedback! I've updated this in the latest commit
This PR introduces a new modular profiling infrastructure for Triton kernel performance analysis using NVIDIA Nsight Compute (NCU). The module provides wrapper script generation, profiling execution, and metrics collection.
Core Components
1. NCUWrapperGenerator (
ncu_wrapper_generator.py)2. KernelProfiler (
kernel_profiler.py)3. Jinja2 Template (
ncu_wrapper_template.j2)kernel_function(*inputs)ModelinstancesTest Results:
The profiling module successfully extracts NCU metrics:
{ "metrics": { "void at::vectorized_elementwise_kernel<4, ...>": { "dram__throughput.avg.pct_of_peak_sustained_elapsed": 91.25, "sm__warps_active.avg.pct_of_peak_sustained_active": 88.6, ... } }, "metadata": { "kernel_file": "...", "round_num": 1, "timestamp": "2026-01-07T22:25:46.928773Z" } }Next steps:
This module is designed to add profiling functionality to future integration into
opt_worker.py. Future PRs will:opt_worker.py