chore: the HolisticTraceAnalysis fix the issue about AMD GPUs #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This pull request upgrades the Holistic Trace Analysis (HTA) dependency from commit
d731cc2e2249976c97129d409a83bd53d93051f6to versionv0.5.0, which includes native AMD GPU support and resolves critical path analysis issues in ROCm environments.Problem Addressed: The previous HTA version had a known issue where TraceCounters._get_queue_length_time_series_for_rank() could return None when processing ROCm/HIP traces, causing TypeError: 'NoneType' object is not subscriptable in [\facebookresearch\holistictraceanalysis\tree\main\hta\analyzers\critical_path_analysis.py]. This required manual patching to handle the None case.
Solution: HTA v0.5.0 (released May 29, 2024) officially added "support for AMD GPUs" and includes fixes for queue length processing issues. The official codebase now properly handles queue length data on clipped dataframes by using the full trace (t_full) instead, eliminating the need for manual patches.
Changes Made:
Updated dockerfile to use HTA v0.5.0 instead of the legacy commit
Removed dependency on manual HTA patches (fix_hta_critical_path.py can now be safely deleted)
Updated project documentation to reflect the complete PyTorch → Chakra ET → ASTRA-sim workflow
Benefits:
Native ROCm/AMD GPU compatibility without manual patches
More stable critical path analysis for ROCm traces
Cleaner codebase with fewer workarounds
Better alignment with official HTA development
Test Plan
Testing Environment:
Docker environment with ROCm/PyTorch base image
AMD GPU hardware with ROCm 6.x drivers
Multi-GPU distributed training setup (2 GPUs)