Skip to content

Conversation

@cdbartholomew
Copy link
Contributor

Summary

  • Background tasks (async retain, consolidation, reflections) fail silently in multi-tenant deployments.
  • Root cause: The worker executes tasks without setting the tenant schema context, causing two cascading failures:
    1. The cancellation check in execute_task queries public.async_operations instead of the tenant's schema, finds no row, and skips the task as "cancelled" — even though it wasn't.
    2. Even if that were bypassed, _authenticate_tenant would throw AuthenticationError because background tasks have no API key/token.
  • Observed symptom: Every async operation logs Skipping cancelled operation: <id> immediately after being picked up.

Changes

  • poller.py: Pass task.schema into task_dict["_schema"] so the executor receives it
  • memory_engine.py (execute_task): Pop _schema from task dict and set _current_schema before the cancellation check
  • memory_engine.py (_authenticate_tenant): Skip tenant extension auth for internal=True requests when schema is already set
  • memory_engine.py (task handlers): Use RequestContext(internal=True) for all 4 background task handlers
  • task_backend.py: Add schema_getter for dynamic schema resolution in submit_task and wait_for_result
  • http.py: Pass tenant_extension to WorkerPoller in create_app

Test plan

  • Deploy to dev with multi-tenant config
  • Trigger async retain — verify task completes instead of logging "Skipping cancelled operation"
  • Trigger consolidation — verify it runs in the correct tenant schema
  • Trigger reflection creation — verify background generation completes
  • Verify single-tenant (no extension) deployments are unaffected

Background tasks (async retain, consolidation, reflections) fail in
multi-tenant deployments because the worker executes tasks without
setting the tenant schema context. This causes two failures:

1. The cancellation check in execute_task queries public.async_operations
   instead of the tenant's schema, finds no row, and skips the task as
   "cancelled" — even though it wasn't.

2. Even if that were fixed, _authenticate_tenant would throw
   AuthenticationError because background tasks have no API key.

Changes:
- Poller passes task.schema into task_dict so execute_task can set it
- execute_task sets _current_schema before the cancellation check
- Task handlers use RequestContext(internal=True) to signal background ops
- _authenticate_tenant skips extension auth for internal requests when
  schema is already set
- BrokerTaskBackend uses schema_getter for dynamic schema resolution
  when submitting tasks and waiting for results
- Pass tenant_extension to WorkerPoller in create_app
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants