Skip to content

Eval runtimes are force-stopped after ~20–24 minutes of runtime-api inactivity, causing mid-run 404s #320

@simonrosenberg

Description

@simonrosenberg

Description

During evaluations, runtimes are being force-stopped by the runtime-api’s idle GC after ~20–24 minutes with no runtime-api traffic. Once the controller stops the pod/service, any subsequent conversation/exec/health request returns 404 Remote conversation not found, aborting the run. This happens even though the evaluation is still in progress; the only trigger is that the runtime-api has not received requests for ~20+ minutes.

Impact

  • Evaluations fail mid-run with 404s.
  • No conversation artifacts are saved for the affected runs.

Evidence (from runtime-api logs)

  • Repeated pattern: Stopping idle runtime (idle for N seconds) → Force-stopping runtime … → Runtime stopped successfully → pod/service removal messages (Calico/felix, log tailers closing).
  • The 404s occur immediately after the forced stop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions