Coding Agent Pipeline: Phases 2-4C + code/task entry point #262

joelteply · 2026-02-02T04:37:32Z

Summary

Phase 2: Rust file engine foundation (FileEngine, ChangeGraph, PathSecurity) + TypeScript code/* commands (read, write, edit, search, tree, undo, history, diff) with ts-rs type generation
Phase 3: Single-agent coding pipeline — CodingModelSelector (task→model routing), PlanFormulator (LLM plan generation with risk assessment), CodeAgentOrchestrator (DAG execution with budget tracking)
Phase 4 Foundation: CodingPlanEntity with hierarchical persistence, real-time step status tracking
Phase 4A: Sandbox security — SecurityTier (4-tier access control), ToolAllowlistEnforcer (per-tier tool filtering), ExecutionSandbox (process isolation)
Phase 4B: Self-modifying skills — skill/propose, skill/generate, skill/validate, skill/activate, skill/list commands with SkillEntity lifecycle
Phase 4C: Multi-agent coordination — CodeCoordinationStream (file-level MUTEX locking), CodeTaskDelegator (union-find cluster decomposition), PlanGovernance (risk-based approval routing), DryRun mode
code/task command: Entry point that wires the full pipeline together — validates params, builds CodingTask, invokes CodeAgentOrchestrator, maps CodingResult to CommandResult
Governance wiring: High-risk plans now route through PlanGovernance for team approval before execution (returns pending_approval status)
Fixes parameter validation bugs in should-respond-fast, activity/join, activity/create
Fixes coordination system killing AI engagement (removed mechanical throttles from InferenceCoordinator)

Test plan

TypeScript compiles cleanly (npm run build:ts)
342 coding system unit tests pass (npx vitest run tests/unit/code/)
Deploy with npm start and verify code/task command is discoverable via ./jtag help code/task
Test code/task dry-run mode with a simple description
Verify skill/* commands register and appear in help
Integration test: code/task with real LLM for a single-file edit

🤖 Generated with Claude Code

…type gen Rust Foundation (continuum-core/src/code/): - FileEngine: read/write/edit/delete with per-persona workspace scoping - ChangeGraph: DAG of ChangeNodes with undo via reverse diff - DiffEngine: unified diff computation (similar crate) - PathSecurity: workspace isolation, path traversal guard, extension allowlist - CodeSearch: regex + glob search with .gitignore support (ignore crate) - Tree: recursive directory tree generation - GitBridge: git status and diff operations - IPC handlers for all 12 code/* endpoints (359 tests passing) TypeScript Commands (8 generated via CommandGenerator): - code/read, code/write, code/edit, code/diff - code/search, code/tree, code/undo, code/history - Each with Types.ts, ServerCommand.ts, BrowserCommand.ts, README, tests Type Safety (ts-rs single source of truth): - 14 Rust types exported via #[derive(TS)] → shared/generated/code/ - Zero hand-written wire type duplicates - All object/any casts eliminated from code/* commands - CommandParams.userId used as canonical identity field RAG Integration: - CodeToolSource: dynamic coding workflow guidance in persona system prompts - Only shows tools persona has permission to use - Budget-aware with minimal fallback - 15 unit tests passing Infrastructure fixes: - PersonaToolExecutor now injects userId (standard CommandParams field) - CLAUDE.md documents ts-rs pattern and regeneration workflow

Delete old pre-Rust development/code/read and development/code/pattern-search commands that caused TS2300 duplicate identifier collision with new code/* commands. Remove legacy CodeDaemon methods (readFile, searchCode, getGitLog, clearCache, getCacheStats, getRepositoryRoot), their types, and the PathValidator/FileReader modules — all superseded by Rust IPC workspace ops. - Delete commands/development/code/ (7 files) - Delete daemons/code-daemon/server/modules/ (PathValidator, FileReader) - Clean CodeDaemonTypes.ts: remove 222 lines of legacy types - Clean CodeDaemon.ts: remove 7 legacy static methods - Clean CodeDaemonServer.ts: remove old CodeDaemonImpl class - Fix cli.ts: replace CODE_COMMANDS import with string literals - Fix PersonaToolDefinitions.ts: update essentialTools to code/* - Regenerate server/generated.ts and command constants

…strator CodingModelSelector routes coding tasks to frontier models with provider fallback. PlanFormulator decomposes tasks into executable step DAGs via LLM. CodeAgentOrchestrator executes plans with budget enforcement, retry logic, and dependency-ordered step execution. 51 unit tests.

CodingPlanEntity is a first-class persistent entity for coding plans. Supports hierarchical delegation (parentPlanId), team assignment (assignees + leadId), governance integration (proposalId), and real-time execution tracking. CodeAgentOrchestrator now persists plans via DataDaemon with best-effort semantics (works without DB in unit tests). 80 unit tests passing.

…ordination Phase 4A — Sandbox & Security Tiers: - SecurityTier: 4-tier access control (discovery/read/write/system) - ToolAllowlistEnforcer: per-tier command filtering with glob matching - ExecutionSandbox: process-isolated code execution with timeout/output limits - Risk assessment integrated into PlanFormulator output Phase 4B — Self-Modifying Skills: - SkillEntity: persistent skill registry with full lifecycle - skill/propose: AI creates command specifications - skill/generate: programmatic CommandGenerator invocation - skill/validate: sandbox compilation + test execution - skill/activate: dynamic tool registration - skill/list: query skill registry Phase 4C — Multi-Agent Coordination & Delegation: - CodeCoordinationStream: file-level MUTEX via BaseCoordinationStream - PlanGovernance: risk-based approval routing (auto-approve low risk, require approval for multi-agent/high-risk/system-tier) - CodeTaskDelegator: union-find plan decomposition into parallel file clusters, load-balanced agent assignment, sub-plan creation, result consolidation - DryRun mode: execute plans read-only, mock write operations 342 tests across 12 test files, all passing.

Two mechanical throttle layers were overriding AI cognition: 1. Temperature system: each AI "servicing" a room subtracted -0.2, so 14 personas crashed rooms to 0.00 in 35 seconds. Fixed by flipping to +0.05 warmth (active conversation stays alive). Removed hard -0.1 priority penalty for cold rooms. 2. InferenceCoordinator: gating calls consumed per-message "cards" in messageResponders, so when actual response generation tried to acquire a slot with the same messageId, every persona was denied. Rewrote from 489→197 lines, removing 6 mechanical rules (card dealing, responder caps, reserved slots, cooldowns, stagger delays, auto-thinning). Kept only hardware capacity protection. Result: AIs respond within seconds instead of being silenced.

…n, activity/create - should-respond-fast: params.messageText crash (toLowerCase on undefined) when AI calls without messageText. Now returns graceful false result. - activity/join: activityId undefined → "Activity not found: undefined" Now validates activityId before DB lookup. - activity/create: recipeId undefined → "Recipe not found: undefined" Now validates recipeId before DB lookup. All three were AIs calling tools with missing params, getting either crashes or confusing error messages instead of clear validation errors.

Add code/task command as the entry point for the full coding agent pipeline. Wire PlanGovernance approval flow and CodeTaskDelegator into orchestrator. Add pending_approval status to CodingResult for high-risk plan gating.

Copilot

Pull request overview

This PR implements a comprehensive coding agent pipeline with multi-phase capabilities spanning file operations, security enforcement, multi-agent coordination, and self-modifying skills.

Changes:

Establishes Rust-based file engine with workspace isolation, change tracking, and TypeScript type generation via ts-rs
Implements single-agent coding pipeline with LLM-powered planning, risk assessment, and DAG-based execution with budget controls
Adds multi-tier sandbox security (discovery/read/write/system) with tool allowlists and governance approval routing for high-risk operations

Reviewed changes

Copilot reviewed 151 out of 206 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
ChatCoordinationStream.ts	Fixes temperature adjustment logic to warm rooms on AI activity instead of cooling them
ToolAllowlistEnforcer.ts	New security gateway enforcing tier-based tool restrictions with audit logging
SecurityTier.ts	Defines 4-tier access control system with command allowlists and risk-to-tier mapping
PlanGovernance.ts	Routes high-risk plans through team approval via DecisionProposal integration
CodingModelSelector.ts	Maps coding task types to frontier models with provider fallback chains
CodeDaemonTypes.ts	Refactored to re-export Rust-generated workspace types via ts-rs
CodeDaemon.ts	Updated API surface for workspace-scoped operations backed by Rust IPC
CodeTaskServerCommand.ts	Entry point wiring task validation, orchestrator invocation, and result mapping
All code/* commands	New workspace commands (read/write/edit/search/tree/undo/history/diff) with Rust backend
All skill/* commands	Self-modifying skill lifecycle (propose/generate/validate/activate/list)
EntityRegistry.ts	Registers CodingPlanEntity and SkillEntity for persistence
activity/join, activity/create	Adds missing parameter validation for activityId and recipeId
should-respond-fast	Adds missing messageText validation
version.ts, package.json	Version bumps to 1.0.7521
generated files	Command registry updates for new code/* and skill/* commands

Files not reviewed (1)

src/debug/jtag/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CodeAgentOrchestrator.ensureWorkspace() now creates workspace directory and registers it in Rust backend before first code/* operation. Personas get writable workspace + read-only codebase access for discovery.

…ees, git write ops, iterative dev loop Phase 1: Orchestrator reads CLAUDE.md + architecture docs during discovery so AI plans follow project conventions. Phase 2: New code/verify command — runs tsc --noEmit via ExecutionSandbox, parses TypeScript errors, auto-verifies after write/edit steps. Added to write tier. Phase 3: WorkspaceStrategy abstraction routes sandbox (isolated dir) vs worktree (git sparse checkout on real repo). CodingTask extended with workspaceMode/sparsePaths. code/task command validates and passes through. Phase 4: Rust git_bridge extended with git_add/commit/push. IPC handlers + CodeDaemon methods for all git write ops. New code/git command (status/diff/log/add/commit/push) with SecurityTier gating. PlanFormulator knows commit action. Phase 5: Verify→re-plan iteration loop in orchestrator. When auto-verify fails, re-plans with error context (quick-fix mode), executes fix, re-verifies. Configurable via autoVerify/maxVerifyIterations on ExecutionOptions. 387 TypeScript tests (15 files), 362 Rust tests — all passing.

…spaces Rust: Notify-based blocking watch (no timeouts, no polling). Each ExecutionState gets Arc<Notify> — reader tasks call notify_one() on every output line. watch_execution() blocks on Notify until output arrives, classifies lines through CompiledSentinel (pre-compiled regex, first-match-wins), advances cursors, returns classified batch. Types (ts-rs exported): OutputClassification, SentinelAction, SentinelRule, ClassifiedLine, ShellWatchResponse — all generated from Rust structs to shared/generated/code/*.ts. IPC: code/shell-watch (async bridge via rt_handle.block_on after releasing DashMap lock) and code/shell-sentinel (synchronous, brief lock). TS bridge: RustCoreIPC, CodeDaemon, CodeDaemonServer wired. Commands: code/shell/watch and code/shell/sentinel generated via CommandGenerator with full server implementations delegating to CodeDaemon. Registered in schemas, constants, executors, structure. Workspace handle: sentinel(), watch(), execWatch(cmd, rules?, onLine?) — composed convenience that runs exec → sentinel → watch loop. Tests: 389 Rust lib tests pass, 183 TS unit tests pass (36 Workspace tests including 5 new watch/sentinel/execWatch tests), 0 failures.

…ls in system prompt DataDaemon.read<T>() now returns T|null (consistent with store/update), eliminating the .data.data unwrapping pattern across ~30 call sites. Recipe tools and strategy rules flow from JSON through ChatRAGBuilder into RAGContext and are injected into PersonaResponseGenerator system prompt as activity context for the LLM. PersonaUser supports per-room workspaces via Map<string, Workspace> with room-aware mode selection (worktree for code rooms). RecipeToolDeclaration type added to RecipeTypes, making the tools field in recipe JSONs visible to TypeScript.

…batch-micro-tune to GenomeJobCreate; fix VoiceService STT + GeminiLive cancel

Seed script (1265→622 lines): - Bulk load all users/rooms in single subprocess calls - Parallel Promise.all() for config updates (was 48+ sequential spawns) - Removed 550 lines of local function duplicates shadowing imports - Cleaned unused imports Code room: - Added CODE to DEFAULT_ROOMS in DefaultEntities.ts - Added code room to seed script with coding recipe - Added to ALL_EXPECTED_ROOMS for idempotent creation Auto-workspace bootstrapping: - PersonaToolExecutor auto-creates workspace when code/* tools invoked - Added ensureCodeWorkspace callback through MotorCortex→PersonaUser - Personas no longer need manual workspace creation to use code tools Validated: Together Assistant created index.html (365 bytes) using code/write and verified with code/tree. DeepSeek, Groq, Grok all using code/* tools correctly in the code room.

Two fixes for AI personas calling code/* tools with wrong parameter names: 1. PersonaToolDefinitions: PARAM_DESCRIPTION_OVERRIDES map provides meaningful descriptions (e.g. "Relative path to file within workspace") instead of generic "filePath parameter". Applied at tool definition build time so LLMs see useful descriptions in their function schemas. 2. PersonaToolExecutor: PARAM_CORRECTIONS map silently corrects common wrong names (path→filePath, contents→content, query→pattern, etc.) before validation. Covers code/write, code/read, code/edit, code/search, code/tree, code/git. Result: Together and Groq both successfully created calculator.html files using code/write with correct parameters after these fixes.

…tization, meta-tool exclusion - ToolRegistry: load descriptions via includeDescription:true (was empty strings) - PersonaToolExecutor: strip CDATA wrappers (Together), regex HTML entity decode (Groq), file_name correction - PersonaResponseGenerator: cap native tools at 64, prioritize recipe tools, exclude search_tools/list_tools/working_memory from native specs to prevent Claude meta-tool loops

…summaries, permissions, ToolNameCodec Four bugs fixed in the persona tool execution pipeline: 1. Message accumulation: each tool loop iteration rebuilt messages from scratch, so the model never saw previous tool calls or results. Now accumulates assistant+result messages across iterations. 2. Stale toolCalls: after regeneration, only aiResponse.text was updated — the old toolCalls carried over, causing infinite re-execution. Now updates both text and toolCalls from regenerated response. 3. Lean summary vagueness: tool result summaries said "completed" and closed with "provide your analysis" (inviting more tool calls). Now produces human-readable summaries ("Wrote 1218 bytes to hello.html") with context-dependent stop signals that prevent re-calling succeeded tools. 4. Permission filtering: role permissions used inconsistent suffixes (:read, :search, :write) that never matched the {category}:execute format tools actually check. All roles now use {category}:execute consistently. Additional improvements: - ToolNameCodec: bidirectional encoder/decoder for API tool names. Models mangle names (code__write, $FUNCTIONS.code_write, code-write) — codec handles all variants via reverse lookup map populated at registration time. - 3-tier tool prioritization: recipe tools → essentials → fill (was flat). Tighter cap (32) when recipe tools present. - Loop detection threshold lowered from 3 to 2 identical calls. - Tools stripped from regeneration request when all calls succeeded, forcing text-only response.

Joel added 9 commits February 1, 2026 17:05

Register skill/* commands in generated files, version bump

80d906d

Copilot AI review requested due to automatic review settings February 2, 2026 04:37

Copilot AI reviewed Feb 2, 2026

View reviewed changes

github-actions bot added the size: XL label Feb 2, 2026

Joel added 10 commits February 1, 2026 23:06

Workspace bootstrapping: auto-create per-persona Rust workspaces

e80783a

CodeAgentOrchestrator.ensureWorkspace() now creates workspace directory and registers it in Rust backend before first code/* operation. Personas get writable workspace + read-only codebase access for discovery.

Close training circuit: wire PersonaTrainingManager, TrainingDaemon, …

8337b5c

…batch-micro-tune to GenomeJobCreate; fix VoiceService STT + GeminiLive cancel

Fix workspace/tree → code/tree tool redirect for confused LLMs

1ec5519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coding Agent Pipeline: Phases 2-4C + code/task entry point #262

Coding Agent Pipeline: Phases 2-4C + code/task entry point #262

Uh oh!

joelteply commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Coding Agent Pipeline: Phases 2-4C + code/task entry point #262

Are you sure you want to change the base?

Coding Agent Pipeline: Phases 2-4C + code/task entry point #262

Uh oh!

Conversation

joelteply commented Feb 2, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants