[REQUEST]some question about memory and latency analysis

您的开源项目llm-analysis帮助了我很多，但我尚有一些疑问，烦请您拨冗解答。
      1.在analysis.py中，"get_memory_optimizer_state_and_gradient_per_layer"函数中，为了得到"memory_optimizer_state_others_per_layer"，"self.get_num_params_per_layer_layernorm()" 除以了“self.parallelism_config.tp_size”，张量并行tp会对LN层的优化器状态和梯度进行切分吗？
      2.在analysis.py的“get_latency_fwd_per_tp_comm”中，这里没有考虑节点间通信传输的效率，但其他地方均考虑了，这是为什么呢？
      3.在analysis.py的“get_latency_fwd_per_layer_shared_dp_comm“中，如果dp_size<=8，则使用节点间通信，否则则使用节点内通信，这块是否有误？dp_size与通信选择似乎没有关系？
      还请您不吝赐教！
      祝好!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REQUEST]some question about memory and latency analysis #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[REQUEST]some question about memory and latency analysis #27

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions