Skip to content

Conversation

@timmc-edx
Copy link
Contributor

No description provided.

Copy link
Contributor

@MoisesGSalas MoisesGSalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @timmc-edx, sorry for the late response.

I left a question in my comment, I'm mostly thinking about running codejail in a Kubernetes cluster with multiple independent openedx instances.

Comment on lines +276 to +285
* The ``NPROC`` limit constrains the ability of the *current* process to
create new threads and processes, but the usage count (how many processes
already exist) is the sum across *all* processes with the same UID, even in
other containers on the same host where the UID may be mapped to a different
username. This constraint also applies to the app user due to how the
rlimits are applied. Even if a UIDs are chosen so they aren't used by other
software on the host, multiple codejail sandbox processes on the same host
will share this usage pool and can reduce each other's ability to create
processes. In this situation, ``NPROC`` will need to be set higher than it
would be for a single codejail instance taking a single request at a time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm getting this right, if the app user spawns multiple sandboxes (for example the codejail service handling multiple requests) the process pool will be shared between them. But not only that the same pool will be shared across different containers in the same host? is that correct? then if one codejail instance is running alongside other instances and I set NPROC to a low value it might always fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, yes. It's a fundamental limit of how rlimit operates. One option would be to ensure that your codejail pods are spread out over several hosts (using Kubernetes' anti-affinity mechanism). Also see the notes here on how to choose UIDs for the app and sandbox users: https://github.com/openedx/codejail-service/blob/main/docs/deployment.rst#app-user-uid

I think a longer term solution would be to replace the current codejail mechanism with something that spins up a container per execution (giving better memory confinement) and that also uses systemd's virtual-user mechanism (which creates an ephemeral user with randomized UID, for better NPROC isolation).

@timmc-edx timmc-edx merged commit cc731d4 into master Apr 3, 2025
4 checks passed
@timmc-edx timmc-edx deleted the timmc/doc-nproc branch April 3, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants