-
Notifications
You must be signed in to change notification settings - Fork 62
add collective dependencies as data_deps #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add collective dependencies as data_deps #191
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
Hi @theodorbadea . Thank you for this PR. Also, I need to understand better regarding the redundant check. |
|
Hey @JoongunPark , thanks for your feedback on this PR. That check is redundant because child nodes are only added to the stack if they are not in the visited list: |
Ah Yes. You are right. I agree that original logic seems bit redundant and hard to understand. |
Thank you for your patient @theodorbadea ! And I am sorry for delayed testing. Result log from ASTRA-Sim |
After discussion with the group in our last Chakra meeting, we concluded to have more discussion with this PR. |
@theodorbadea @JoongunPark |
I observed the trace collected in the data parallel distributed training. I found that when different processes belonging to the same communication group perform allreduce, the recorded pg_name may not be equal. Therefore, it is not very accurate to judge the dependency based on pg_name kineto_trace_zbpp_gpt.0.json |
|
@9LLPPLL6 my understanding is that two collectives from the same PG cannot overlap and in reality, at least in nccl's case, they don't because nccl takes care of scheduling them. We faced the scenario of encountering overlapping collectives in emulation and this is why I proposed this change, to explicitly capture these dependencies. There have been shared opinions about the fact that the engine should be responsible for this scheduling. In your traces, could you notice overlaps in terms of start+duration times between collectives in the same PG and collectives from different PGs? |
Perhaps there was a deviation in my understanding.Thank you for your answer. I understand what you mean |
fef88bb to
89d8266
Compare
|
Closing the PR for now. The consensus was that we will keep the notion that one processgroup can issue one collective at any given time. Let's reopen if we want to revisit. |



Currently, there is no explicit dependency between collectives belonging to the same process group. Their sequential execution is deducted only from the duration of their parent compute node and their own duration.
This PR proposes to append to the data_deps list the previous collective belonging to the same PG.
I am attaching an example containing 5 All_Reduce in the same PG. Before the proposed change, each collective had only one node in the data_deps, i.e. the parent COMP_NODE named "record_param_comms". After the change, the first All_Reduce in the PG remains unchanged, but the next ones will have 2 nodes in data_deps: the COMP_NODE "record_param_comms" and the previous collective communication node.
Also attaching part of a diagram showing the collectives without explicit dependencies (they are all leaf nodes) and with the newly added dependencies. Cropped out nodes from above the collectives are the parent COMP_NODE.


kineto.json
new_chakra_jsonized.json
original_chakra_jsonized.json
p_trace_rank=0_.json