-
Notifications
You must be signed in to change notification settings - Fork 8
feat: add Cleanup section and Slurm cleanup how-to #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AshleyCliff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, the cleanup section is a good idea! A couple questions about tying to the cloud clean up and a suggestion for the title.
howto/cleanup/cleanup-slurm.md
Outdated
| @@ -0,0 +1,67 @@ | |||
| (howto-cleanup-slurm)= | |||
| # How to clean up Slurm | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title feels a bit off somehow, maybe 'How to clean up slurm deployments'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any references to the clean up slurm page need to be added to this page? What happens if the cloud resource clean up happens without the slurm clean up happening first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cloud is the "layer below" so cleaning it up will clean up your Slurm model as well as all other models you have on the cloud (since we tell the users to do juju destroy-controller --destroy-all-models). It's not necessary to individually clean up the Slurm models first if you want to start from a completely clean slate - you just need to destroy the controller.
Removing just the Slurm model is useful if you want to destroy an old cluster and redeploy a new one on the same backing cloud with the same controller, e.g. you're testing out the Slurm charms on your laptop and don't want to go through juju bootstrap each time to set up a new LXD controller for VMs/containers.
We should definitely mention this somewhere - it's a good catch. We don't want to give the impression users must go through each cleanup how-to in order. Any thoughts on where it would best fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a 'Clean Up controller' section at the bottom of the slurm page that describes the steps needed to just destroy the controller and then points to the cloud clean up for further steps? We'd also want some info at the top of the clean up slurm page mentioning what removing the model vs the controller accomplishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone with an admonition at the top of the Slurm page to direct users to the cloud cleanup page if what they really want is to destroy their entire environment. Let me know what you think.
My thought process is the majority of the cloud cleanup is destroying the controller so it makes sense to keep that with the cloud docs. In fact, it's the same step for every cloud so we should probably move the juju destroy-controller line on the cloud cleanup page into a static section above the cloud-specific instructions for removing credentials, etc. but that's for a future PR.
| ::: | ||
|
|
||
| See the [Juju `destroy-model` documentation](https://documentation.ubuntu.com/juju/3.6/reference/juju-cli/list-of-juju-cli-commands/destroy-model/) | ||
| for the implications of this flag and details of further available options. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a Next steps section pointing to optional cloud clean up? Once we have MaaS docs we'd want to add some pointers for that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left off a "Next steps" section to avoid suggesting an ordering of the how-tos. Removing a Slurm model I'd expect to be done mostly in isolation. For MAAS cleanup, I'd say we could add a new tab next to Azure, AWS, and GCP in the cloud clean up docs so as long as we direct people to that page, we should be future-proofed for when we have our MAAS docs
AshleyCliff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaking the wording in the admonition.
| :::{admonition} Removing all Charmed HPC resources? | ||
| :class: note | ||
|
|
||
| You do not need to follow this guide if planning to tear down the entire Charmed HPC environment. | ||
|
|
||
| Follow {ref}`howto-cleanup-cloud-resources` instead to remove all resources, including Slurm, in a | ||
| single step. | ||
| ::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| :::{admonition} Removing all Charmed HPC resources? | |
| :class: note | |
| You do not need to follow this guide if planning to tear down the entire Charmed HPC environment. | |
| Follow {ref}`howto-cleanup-cloud-resources` instead to remove all resources, including Slurm, in a | |
| single step. | |
| ::: | |
| :::{admonition} Removing all Charmed HPC resources? | |
| :class: note | |
| If you are planning to tear down the entire Charmed HPC environment - all controllers, modules, XXX - you can jump to {ref}`howto-cleanup-cloud-resources` instead to remove all resources, including Slurm, in a | |
| single step. | |
| ::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some details need to be filled in on what's included in 'entire'. Also, the cloud resources page points to specific clouds for the process so we don't currently have an obvious option for someone deploying/testing on bare metal/MaaS.
|
Admonition has been updated. |
AshleyCliff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, thanks!
Pre-submission checklist
Summary of Changes
This PR:
This is useful for people looking to remove their Slurm model when it is no longer necessary and the new Cleanup section preserves the how-to sections reflecting the full lifecycle of a Charmed HPC cluster from first initialization to end-of-life.
The documentation tests currently fail due to the known issue of spellcheck applying to code/command blocks.
Related Issues, PRs, and Discussions
Closes #79