Feat/delete os pod #206

alvlkov · 2024-12-13T11:07:29Z

What type of PR is this?

This adds a new managed script to delete a pod from Openshift's reserved namespace.

What this PR does / Why we need it?

This will help fixing errors related to openshift reserved namespaces, essentially when pod restart is required.

Which Jira/Github issue(s) does this PR fix?

OSD_20528

Special notes for your reviewer

Pre-checks (if applicable)

Validated the changes in a ROSA stage cluster
Included documentation changes with PR

openshift-ci · 2024-12-13T11:07:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alvlkov
Once this PR has been reviewed and has the lgtm label, please assign wanghaoran1988 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2025-03-14T01:01:07Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

alvlkov · 2025-03-18T13:16:33Z

/remove-lifecycle stale

iamkirkbater

Requested change relates to the name of the script. The additional check for a replicaset would be more of a nice-to-have, but we can also add that after this is merged so that we can start using this sooner rather than later.

iamkirkbater · 2025-04-01T12:19:06Z

scripts/CEE/delete-os-pod/README.md

@@ -0,0 +1,21 @@
+# Delete Openshift Pod Script


Can we simplify this to just be delete-pod instead of adding the delete-os-pod? From a UX perspective, it will be easier to remember the closer the syntax name is to the actual OC command.

iamkirkbater · 2025-04-01T12:22:11Z

scripts/CEE/delete-os-pod/script.sh

+
+
+main(){
+  delete_pod


Would it be a huge lift here to validate if a pod is owned by a replicaset before proceeding? We might also need to add a "force" flag/parameter to bypass that as well, but it might be a nice protection for the rare chance that a pod isn't managed in an openshift namespace, this way we can make sure it will come back as a default behavior, but have the option to bypass it if we need to.

iamkirkbater · 2025-04-01T13:07:51Z

scripts/CEE/delete-os-pod/metadata.yaml

+author: Alex Volkov
+allowedGroups:
+  - CEE
+  - SREP


Suggested change

- SREP

- SREP

- MCSTierTwo

added the suggestions, thanks @iamkirkbater

feichashao · 2025-04-01T13:32:15Z

Thanks @iamkirkbater for the review!

I would suggest we add the safeguard in this PR to validate if a pod is backed by a replicaset, otherwise the delete operation can be too wide.
The protection can be "we are not making the situation worse":

If we are deleting a non-healthy pod, go ahead and delete it.
If we are deleting a healthy pod,
- If it is the only healthy pod in the replicaset, stop, raise a ticket and review it.
- If there's another healthy pod besides the one we are going to delete, it is ok to delete.

Another nice-to-have is that we put a list of allowed namespace instead of openshift-*. This sound like a toil but it give us an opportunity to review if we want to allow the deletion when a new namespace comes. (can be next PR for this one).

iamkirkbater · 2025-04-01T13:51:23Z

@feichashao - a few questions:

Can you expand on what you mean by "non-healthy" pod? If we're asking for this in this PR I'd like to be explicit to what we are looking for. For example, if we just mean a "healthy" pod is one in a "Running" state, vs non-healthy which would be "Error", "Completed", "Pending" - etc.
What specifically do you mean by raise a ticket - Do you mean like a JIRA here? Or would exiting out with an Error (if there's not a FORCE parameter set) work here?
For the list of allowed namespaces - one thing I'd like to keep in mind here is that CEE/MCS have a wider scope of what they support than SREP does. While SREP may only limit ourselves to specific managed namespaces, CEE/MCS will be supporting additional things like openshift-virtualization, etc. So limiting them to managed namespaces may not be as efficient as we think it might be.

feichashao · 2025-04-01T14:18:36Z

Can you expand on what you mean by "non-healthy" pod? If we're asking for this in this PR I'd like to be explicit to what we are looking for. For example, if we just mean a "healthy" pod is one in a "Running" state, vs non-healthy which would be "Error", "Completed", "Pending" - etc.

I would say Healthy = A pod with all containers in running state; The other should be non-healthy, eg, pending, crashloopbackoff, pod in running state but not all containers are running, showing like:

kube-apiserver-ip-10-119-135-4.ec2.internal           4/5     Running

(I mocked this)

…pt name

alvlkov · 2025-04-07T16:37:50Z

Added replicaset check and --force flag.

- Successfully deleted pod owned by replicaset regardless --force flag
- Couldn't delete a pod not owned by a replicaset without --force flag
- Successfully deleted pod not owned by replicaset with --force flag

alvlkov · 2025-07-02T07:33:52Z

/retest

scripts/CEE/delete-pod/metadata.yaml

typeid · 2025-07-10T12:15:40Z

scripts/CEE/delete-pod/metadata.yaml

+  clusterRoleRules:
+    - apiGroups:
+        - ""
+      resources:
+        - "pods"
+      verbs:
+        - "delete"
+        - "get"


This is valid for all namespaces. There's no limitation to from openshift's reserved namespace. as mentioned above.

This permission extends beyond the scope even SRE-P has.

The above limitation applies to the NAMESPACE parameter, to avoid deleting Openshift related pods. AFAIK I cant scope namespaces within clusterRoleRules. Please elaborate about the suggestion.

Co-authored-by: typeid <github@typeid.org>

typeid · 2025-07-22T08:55:54Z

/retest

openshift-ci · 2025-07-22T09:17:19Z

@alvlkov: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

typeid · 2025-07-28T07:38:12Z

/lgtm

typeid · 2025-07-28T07:39:28Z

Code LGTM. Pending approval from compliance: https://issues.redhat.com/browse/HCMSEC-611

typeid · 2025-07-28T07:39:37Z

/hold for compliance approve

typeid · 2025-09-18T20:43:43Z

/unhold

Merging this as we have not received any feedback from compliance. This does not provide read access to customer data so I'm okay just stamping this off.

typeid · 2025-09-18T20:46:43Z

/retest

openshift-bot · 2025-12-19T01:00:44Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

adding managed script for os pod deletion

4ad4789

openshift-ci bot requested review from MitaliBhalla and samanthajayasinghe December 13, 2024 11:07

adding SREP to allowedGroups

a4275c1

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2025

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2025

iamkirkbater requested changes Apr 1, 2025

View reviewed changes

openshift-ci bot assigned iamkirkbater Apr 1, 2025

iamkirkbater reviewed Apr 1, 2025

View reviewed changes

Alex Volkov added 2 commits April 7, 2025 12:50

adding --force flag and replicaset ownership check

eca9f00

adding replicaset check and --force flag for bypassing it, fixed scri…

f65f957

…pt name

alvlkov requested a review from iamkirkbater April 10, 2025 21:19

typeid reviewed Jul 10, 2025

View reviewed changes

Update scripts/CEE/delete-pod/metadata.yaml

86eaec2

Co-authored-by: typeid <github@typeid.org>

openshift-ci bot assigned typeid Jul 28, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 28, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 18, 2025

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2025



		main(){
		delete_pod

Feat/delete os pod #206

Are you sure you want to change the base?

Feat/delete os pod #206

Conversation

alvlkov commented Dec 13, 2024

What type of PR is this?

What this PR does / Why we need it?

Which Jira/Github issue(s) does this PR fix?

Special notes for your reviewer

Pre-checks (if applicable)

Uh oh!

openshift-ci bot commented Dec 13, 2024

Uh oh!

openshift-bot commented Mar 14, 2025

Uh oh!

alvlkov commented Mar 18, 2025

Uh oh!

iamkirkbater left a comment

Choose a reason for hiding this comment

Uh oh!

iamkirkbater Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

iamkirkbater Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

iamkirkbater Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

alvlkov Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

feichashao commented Apr 1, 2025

Uh oh!

iamkirkbater commented Apr 1, 2025

Uh oh!

feichashao commented Apr 1, 2025

Uh oh!

alvlkov commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alvlkov commented Jul 2, 2025

Uh oh!

Uh oh!

typeid Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

alvlkov Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

typeid commented Jul 22, 2025

Uh oh!

openshift-ci bot commented Jul 22, 2025

Uh oh!

typeid commented Jul 28, 2025

Uh oh!

typeid commented Jul 28, 2025

Uh oh!

typeid commented Jul 28, 2025

Uh oh!

typeid commented Sep 18, 2025

Uh oh!

typeid commented Sep 18, 2025

Uh oh!

openshift-bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alvlkov commented Apr 7, 2025 •

edited

Loading