Skip to content

Conversation

@QuantuMope
Copy link
Contributor

@QuantuMope QuantuMope commented Dec 10, 2025

Discovered a bug recently where DDP worker log files were being created but not written to.

Ultimately discovered that this is caused by spawned subprocesses using python's logging module by default as they create a new interpreter. Adding logging.use_absl_handler() (suggested by Wei) solves this issue.

One mystery that remains is that hobot1 projects didn't seem to have this issue with no recent code changes to train.py offering any hints. Low priority figuring this out for now as this new change fixes the current main.

emailweixu
emailweixu previously approved these changes Dec 10, 2025
@QuantuMope
Copy link
Contributor Author

Sorry, needs a restamp. Found out that pretty file names results in duplicate log files so removed it.

@QuantuMope QuantuMope merged commit 8a3b37e into pytorch Dec 10, 2025
2 checks passed
@QuantuMope QuantuMope deleted the PR/andrew/fix-ddp-logging branch December 10, 2025 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants