Skip to content

Conversation

@Giuseppe5
Copy link
Collaborator

Reason for this PR

Similar to inference mode, we enter a context manager to switch to Wave kernels

Some open issue:

  • Having their own pre-quantized weight, we risk model duplication. To avoid this we remove the weights from the original layer
  • The current interface of wave QuantLayers could be revised
  • Integration with more quantization algorithms will require careful sync to avoid infinite combinations
  • Another option could be to leverage torch_function within QuantTensor, but it will clash with some of the options above

Changes Made in this PR

Testing Summary

Risk Highlight

  • This PR includes code from another work (please detail).
  • This PR contains API-breaking changes.
  • This PR depends on work in another PR (please provide links/details).
  • This PR introduces new dependencies (please detail).
  • There are coverage gaps not covered by tests.
  • Documentation updates required in subsequent PR.

Checklist

  • Code comments added to any hard-to-understand areas, if applicable.
  • Changes generate no new warnings.
  • Updated any relevant tests, if applicable.
  • No conflicts with destination dev branch.
  • I reviewed my own code changes.
  • Initial CI/CD passing.
  • 1+ reviews given, and any review issues addressed and approved.
  • Post-review full CI/CD passing.

@nickfraser nickfraser mentioned this pull request May 20, 2025
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants