Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 30, 2025

Summary

  • Remove blocking wait_for_operation in delete_expired_instances() (VM deletion now fire-and-forget)
  • Remove blocking wait_for_operation in start_test() (VM creation now optimistic)
  • Update tests to remove unused mocks

Problem

These blocking calls were causing 504 timeouts on webhook deliveries because:

  1. GitHub has a 10-second webhook timeout
  2. When cron jobs run wait_for_operation (which can block for up to 30 minutes), gunicorn workers become blocked
  3. Webhook requests queue behind slow requests and exceed the timeout

Solution

  • delete_expired_instances: Delete VMs without waiting for confirmation (fire-and-forget)
  • start_test: Check for immediate errors, then record instance optimistically. If VM creation ultimately fails, the test won't report progress and will be cleaned up by the expired instances cron job.

Test plan

  • Verify existing tests pass (removed unused wait_for_operation mocks)
  • CI tests pass
  • Manual verification on staging that tests still run correctly

🤖 Generated with Claude Code

This PR removes the last two blocking wait_for_operation calls that were
causing gunicorn workers to be blocked for extended periods:

1. delete_expired_instances() - VM deletion is now fire-and-forget
2. start_test() - VM creation is now optimistic (recorded immediately)

These blocking calls were causing 504 timeouts on webhook deliveries
because GitHub has a 10-second webhook timeout. When cron jobs were
running wait_for_operation (which can take up to 30 minutes), all
gunicorn workers could become blocked, causing webhook requests to
queue and exceed the timeout.

Changes:
- delete_expired_instances: Remove blocking wait, log operation initiation
- start_test: Check for immediate errors, then record instance optimistically
- Tests: Remove unused wait_for_operation mocks from affected tests

If VM creation ultimately fails, the test won't report progress and will
be cleaned up by the expired instances cron job.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants