Skip to content

Conversation

@pohly
Copy link
Contributor

@pohly pohly commented Dec 3, 2025

The actual implementation in 1.35 was a bit different than planned. This gets reflected here, plus the changes for beta.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 3, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign sanposhiho, wojtek-t for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Dec 3, 2025
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 3, 2025
@pohly pohly mentioned this pull request Dec 3, 2025
15 tasks
@pohly pohly force-pushed the dra-device-taints-1.36 branch from e92153d to 725970b Compare December 11, 2025 07:35
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 11, 2025
@pohly pohly changed the title WIP: KEP-5055: DRA device taints: document state in 1.35 and plan beta in 1.36 KEP-5055: DRA device taints: document state in 1.35 and plan beta in 1.36 Dec 11, 2025
@pohly pohly marked this pull request as ready for review December 11, 2025 07:35
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2025
Automated upgrade/downgrade testing verifies that:
- A DeviceTaintRule created before a downgrade prevents pod scheduling after a downgrade.
- A pod which gets scheduled because of a toleration after the downgrade
is kept running after an upgrade.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've written this in present tense ("verifies") because I am assuming that it will be true soon, maybe even before this PR gets merged.

However, it is not true yet right now. I'm currently working on rewriting the upgrade/downgrade tests in kubernetes/kubernetes#135664 and will add the test case described here soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have it implemented and noticed one small difference: the KEP asks about "upgrade->downgrade->upgrade" (three cluster changes). The automated test only does "upgrade->downgrade" (two). Does this really matter?

Maybe, but as I cannot image what else can go wrong for "upgrade->downgrade->upgrade" that doesn't already go wrong for just "upgrade" I am not sure what I should test for.

My simplified test does:

  • A pod which gets scheduled on the previous release because of a toleration is kept running after an upgrade.
  • A DeviceTaintRule created to evict the pod before a downgrade prevents pod scheduling after a downgrade.

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates reflecting API decisions made during 1.35 lgtm


Usage of DeviceTaintRules can be seen in the apiserver's
`apiserver_resource_objects` metric with labels `group=resource.k8s.io` and
`resource=deviceTaintRules`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`resource=deviceTaintRules`.
`resource=devicetaintrules`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants