-
Notifications
You must be signed in to change notification settings - Fork 257
Merge group related improvements for convolution operations #3439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
18ead08 to
cf8a4d4
Compare
5fa418d to
706ca76
Compare
706ca76 to
6fe2eaa
Compare
| // implement on demand | ||
| static_assert(NumGroupsToMerge == 1); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a message that merge group doesn't support GNHWK?
| // implement on demand | ||
| static_assert(NumGroupsToMerge == 1); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and in the other parts
| Sequence<3>{}, | ||
| Sequence<4>{})); | ||
|
|
||
| #if CK_USE_CUSTOM_TENSOR_TRANSFORM_FOR_BWD_DATA_OUT == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of this?
| index_t Ho, | ||
| index_t Wo, | ||
| index_t K, | ||
| [[maybe_unused]] index_t YDot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
| } | ||
| }; | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid duplication with the struct above? Afaik, these functions are only used in two places, so I would recommend to always call the MG variant and to have a defalut value (and ignore the GStep return value) to avoid duplicating the rest of the logic
A collection of several depthwise convolution related improvements.
Proposed changes
The list of todos came from the investigation of 2d convolution performance on fp32 data input. It turns out CK has limited support for merged group convolutions. The purpose of this PR is to add some of the missing functionality.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered