-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5055: DRA device taints: document state in 1.35 and plan beta in 1.36 #5716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e92153d to
725970b
Compare
| Automated upgrade/downgrade testing verifies that: | ||
| - A DeviceTaintRule created before a downgrade prevents pod scheduling after a downgrade. | ||
| - A pod which gets scheduled because of a toleration after the downgrade | ||
| is kept running after an upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've written this in present tense ("verifies") because I am assuming that it will be true soon, maybe even before this PR gets merged.
However, it is not true yet right now. I'm currently working on rewriting the upgrade/downgrade tests in kubernetes/kubernetes#135664 and will add the test case described here soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have it implemented and noticed one small difference: the KEP asks about "upgrade->downgrade->upgrade" (three cluster changes). The automated test only does "upgrade->downgrade" (two). Does this really matter?
Maybe, but as I cannot image what else can go wrong for "upgrade->downgrade->upgrade" that doesn't already go wrong for just "upgrade" I am not sure what I should test for.
My simplified test does:
- A pod which gets scheduled on the previous release because of a toleration is kept running after an upgrade.
- A DeviceTaintRule created to evict the pod before a downgrade prevents pod scheduling after a downgrade.
liggitt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updates reflecting API decisions made during 1.35 lgtm
|
|
||
| Usage of DeviceTaintRules can be seen in the apiserver's | ||
| `apiserver_resource_objects` metric with labels `group=resource.k8s.io` and | ||
| `resource=deviceTaintRules`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `resource=deviceTaintRules`. | |
| `resource=devicetaintrules`. |
One-line PR description: DRA device taints: document state in 1.35 and plan beta in 1.36
Issue link: DRA: device taints and tolerations #5055
Other comments:
The actual implementation in 1.35 was a bit different than planned. This gets reflected here, plus the changes for beta.