Skip to content

Conversation

@timmc-edx
Copy link
Contributor

I just grabbed the commit message for f4cb95b/PR #36. I haven't actually verified whether this is still accurate -- note that this is from 2015, written for Python 2. For all I know this performance concern is no longer relevant. It also may not be relevant for codejail-service, which is a much smaller process than edxapp.

I just grabbed the commit message for f4cb95b/PR #36. I haven't
actually verified whether this is still accurate -- note that this is from
2015, written for Python 2. For all I know this performance concern is no
longer relevant. It also may not be relevant for codejail-service, which
is a much smaller process than edxapp.
@rayzhou-bit rayzhou-bit self-requested a review March 13, 2025 22:55
Copy link
Contributor

@MoisesGSalas MoisesGSalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, even if it no longer applies, having the rationale at hand is much better than doing archeology.

@MoisesGSalas MoisesGSalas merged commit a0d0f7e into master Mar 14, 2025
4 checks passed
@MoisesGSalas MoisesGSalas deleted the timmc/doc-proxy branch March 14, 2025 13:15
@timmc-edx
Copy link
Contributor Author

timmc-edx commented Mar 14, 2025

I ended up doing some more research -- and yeah, I suspect this is no longer needed.

Ned confirmed that this was put in place because long-lived edxapp processes would eventually consume more than 50% of RAM and would then be unable to fork. Apparently subprocess execution involves first forking the parent, then having the child run exec to become another process. (I had no idea until today. Seems a bit Rube Goldberg!) Forking causes a copy of the process page tables, and while it's nominally a copy-on-write system, that doesn't help much if the parent is highly active. Ned also noted that Python does a lot of memory writes to update refcounts.

However, I looked into the implementation of subprocess, and it has changed considerably since then! In Python 2.7, it explicitly called os.fork() and then os.execvpe(), as described above. But in 3.11, it tries to call vfork if possible. vfork does not copy the page tables, and instead temporarily blocks the parent and is intended to be used when the fork will then immediately call execve.

So yeah, this looks like an optimization that should now be unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants