You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using context parallel to fine-tune gpt-oss, no attention backend is supported for this configuration. I have to change cp_comm_type to a2a to enable FusedAttention. But this is potentially less efficient than a2a at large context length (say 128k).