We recently rolled something out where we broke SILO on staging but didn't notice (because the old pods stayed up) and rolled out to production. We should try to avoid that. In process terms this means checking argoCD for health before rolling out (when there is significant risk) but it would be better to put something technical in that would check the health on the cluster as e.g. a GitHub action.