Skip to content

Conversation

@dsloanm
Copy link
Contributor

@dsloanm dsloanm commented Dec 12, 2025

Pre-submission checklist

  • I read and followed the CONTRIBUTING guidelines.
  • I have ensured that the documentation tests complete successfully.

Summary of Changes

This PR:

  • adds a new Cleanup section to the end of the existing Setup, Integrate, Manage, and Use sections for how-tos
  • adds a new how-to for cleaning up a Slurm deployment to the Cleanup section
  • moves the existing cloud resources cleanup how-to to the Cleanup section

This is useful for people looking to remove their Slurm model when it is no longer necessary and the new Cleanup section preserves the how-to sections reflecting the full lifecycle of a Charmed HPC cluster from first initialization to end-of-life.

The documentation tests currently fail due to the known issue of spellcheck applying to code/command blocks.

Related Issues, PRs, and Discussions

Closes #79

@dsloanm dsloanm requested a review from a team as a code owner December 12, 2025 14:03
@dsloanm dsloanm requested review from AshleyCliff and NucciTheBoss and removed request for a team December 12, 2025 14:03
Copy link
Contributor

@AshleyCliff AshleyCliff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the cleanup section is a good idea! A couple questions about tying to the cloud clean up and a suggestion for the title.

@@ -0,0 +1,67 @@
(howto-cleanup-slurm)=
# How to clean up Slurm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title feels a bit off somehow, maybe 'How to clean up slurm deployments'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any references to the clean up slurm page need to be added to this page? What happens if the cloud resource clean up happens without the slurm clean up happening first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cloud is the "layer below" so cleaning it up will clean up your Slurm model as well as all other models you have on the cloud (since we tell the users to do juju destroy-controller --destroy-all-models). It's not necessary to individually clean up the Slurm models first if you want to start from a completely clean slate - you just need to destroy the controller.

Removing just the Slurm model is useful if you want to destroy an old cluster and redeploy a new one on the same backing cloud with the same controller, e.g. you're testing out the Slurm charms on your laptop and don't want to go through juju bootstrap each time to set up a new LXD controller for VMs/containers.

We should definitely mention this somewhere - it's a good catch. We don't want to give the impression users must go through each cleanup how-to in order. Any thoughts on where it would best fit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a 'Clean Up controller' section at the bottom of the slurm page that describes the steps needed to just destroy the controller and then points to the cloud clean up for further steps? We'd also want some info at the top of the clean up slurm page mentioning what removing the model vs the controller accomplishes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone with an admonition at the top of the Slurm page to direct users to the cloud cleanup page if what they really want is to destroy their entire environment. Let me know what you think.

My thought process is the majority of the cloud cleanup is destroying the controller so it makes sense to keep that with the cloud docs. In fact, it's the same step for every cloud so we should probably move the juju destroy-controller line on the cloud cleanup page into a static section above the cloud-specific instructions for removing credentials, etc. but that's for a future PR.

:::

See the [Juju `destroy-model` documentation](https://documentation.ubuntu.com/juju/3.6/reference/juju-cli/list-of-juju-cli-commands/destroy-model/)
for the implications of this flag and details of further available options. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a Next steps section pointing to optional cloud clean up? Once we have MaaS docs we'd want to add some pointers for that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left off a "Next steps" section to avoid suggesting an ordering of the how-tos. Removing a Slurm model I'd expect to be done mostly in isolation. For MAAS cleanup, I'd say we could add a new tab next to Azure, AWS, and GCP in the cloud clean up docs so as long as we direct people to that page, we should be future-proofed for when we have our MAAS docs

@dsloanm dsloanm requested a review from AshleyCliff December 12, 2025 18:13
Copy link
Contributor

@AshleyCliff AshleyCliff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaking the wording in the admonition.

Comment on lines 4 to 11
:::{admonition} Removing all Charmed HPC resources?
:class: note

You do not need to follow this guide if planning to tear down the entire Charmed HPC environment.

Follow {ref}`howto-cleanup-cloud-resources` instead to remove all resources, including Slurm, in a
single step.
:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:::{admonition} Removing all Charmed HPC resources?
:class: note
You do not need to follow this guide if planning to tear down the entire Charmed HPC environment.
Follow {ref}`howto-cleanup-cloud-resources` instead to remove all resources, including Slurm, in a
single step.
:::
:::{admonition} Removing all Charmed HPC resources?
:class: note
If you are planning to tear down the entire Charmed HPC environment - all controllers, modules, XXX - you can jump to {ref}`howto-cleanup-cloud-resources` instead to remove all resources, including Slurm, in a
single step.
:::

Copy link
Contributor

@AshleyCliff AshleyCliff Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some details need to be filled in on what's included in 'entire'. Also, the cloud resources page points to specific clouds for the process so we don't currently have an obvious option for someone deploying/testing on bare metal/MaaS.

@dsloanm
Copy link
Contributor Author

dsloanm commented Dec 12, 2025

Admonition has been updated.

@dsloanm dsloanm requested a review from AshleyCliff December 12, 2025 18:45
Copy link
Contributor

@AshleyCliff AshleyCliff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, thanks!

@AshleyCliff AshleyCliff merged commit 4bcadf9 into charmed-hpc:main Dec 12, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add steps to destroy slurm model

2 participants