chore: the HolisticTraceAnalysis fix the issue about AMD GPUs #4

jjasoncool · 2025-10-13T05:52:20Z

Summary

This pull request upgrades the Holistic Trace Analysis (HTA) dependency from commit d731cc2e2249976c97129d409a83bd53d93051f6 to version v0.5.0, which includes native AMD GPU support and resolves critical path analysis issues in ROCm environments.

Problem Addressed: The previous HTA version had a known issue where TraceCounters._get_queue_length_time_series_for_rank() could return None when processing ROCm/HIP traces, causing TypeError: 'NoneType' object is not subscriptable in [\facebookresearch\holistictraceanalysis\tree\main\hta\analyzers\critical_path_analysis.py]. This required manual patching to handle the None case.

Solution: HTA v0.5.0 (released May 29, 2024) officially added "support for AMD GPUs" and includes fixes for queue length processing issues. The official codebase now properly handles queue length data on clipped dataframes by using the full trace (t_full) instead, eliminating the need for manual patches.

Changes Made:

Updated dockerfile to use HTA v0.5.0 instead of the legacy commit
Removed dependency on manual HTA patches (fix_hta_critical_path.py can now be safely deleted)
Updated project documentation to reflect the complete PyTorch → Chakra ET → ASTRA-sim workflow
Benefits:

Native ROCm/AMD GPU compatibility without manual patches
More stable critical path analysis for ROCm traces
Cleaner codebase with fewer workarounds
Better alignment with official HTA development
Test Plan
Testing Environment:

Docker environment with ROCm/PyTorch base image
AMD GPU hardware with ROCm 6.x drivers
Multi-GPU distributed training setup (2 GPUs)

chore: the HolisticTraceAnalysis fix the issue about AMD GPUs

73e4fd5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: the HolisticTraceAnalysis fix the issue about AMD GPUs #4

chore: the HolisticTraceAnalysis fix the issue about AMD GPUs #4

Uh oh!

jjasoncool commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chore: the HolisticTraceAnalysis fix the issue about AMD GPUs #4

Are you sure you want to change the base?

chore: the HolisticTraceAnalysis fix the issue about AMD GPUs #4

Uh oh!

Conversation

jjasoncool commented Oct 13, 2025

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant