From 4e912bdb9039ffab669e23996d848d08848e47b1 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Mon, 8 Dec 2025 10:24:24 +1100 Subject: [PATCH] add spec --- .../changes/add-eval-generator/design.md | 50 ---------------- .../changes/add-eval-generator/proposal.md | 17 ------ .../add-eval-generator/specs/cli/spec.md | 35 ----------- .../changes/add-eval-generator/tasks.md | 8 --- .../adopt-ts-template-prompts/design.md | 59 +++++++++++++++++++ .../adopt-ts-template-prompts/proposal.md | 21 +++++++ .../specs/custom-evaluator-prompts/spec.md | 56 ++++++++++++++++++ .../adopt-ts-template-prompts/tasks.md | 9 +++ docs/openspec/specs/eval-execution/spec.md | 14 ++--- docs/openspec/specs/evaluation/spec.md | 17 +++--- .../multiturn-messages-lm-provider/spec.md | 19 +++--- docs/openspec/specs/validation/spec.md | 28 ++++++--- 12 files changed, 192 insertions(+), 141 deletions(-) delete mode 100644 docs/openspec/changes/add-eval-generator/design.md delete mode 100644 docs/openspec/changes/add-eval-generator/proposal.md delete mode 100644 docs/openspec/changes/add-eval-generator/specs/cli/spec.md delete mode 100644 docs/openspec/changes/add-eval-generator/tasks.md create mode 100644 docs/openspec/changes/adopt-ts-template-prompts/design.md create mode 100644 docs/openspec/changes/adopt-ts-template-prompts/proposal.md create mode 100644 docs/openspec/changes/adopt-ts-template-prompts/specs/custom-evaluator-prompts/spec.md create mode 100644 docs/openspec/changes/adopt-ts-template-prompts/tasks.md diff --git a/docs/openspec/changes/add-eval-generator/design.md b/docs/openspec/changes/add-eval-generator/design.md deleted file mode 100644 index 7cca5d8..0000000 --- a/docs/openspec/changes/add-eval-generator/design.md +++ /dev/null @@ -1,50 +0,0 @@ -# Design: Eval Generator - -## Architecture - -### CLI Command -`agentv generate [options]` -- `prompt`: The high-level description (e.g., "Test the refund policy for a shoe store"). -- `--output, -o`: Output file path (default: `generated-eval.yaml`). -- `--count, -n`: Number of cases to generate (default: 5). -- `--target`: The LLM target to use for generation (default: `default`). -- `--context, -c`: One or more file/directory paths to include as context (e.g., `-c ./docs -c ./src/policy.ts`). - -### Generator Logic -1. **Context Gathering**: - * Iterate through paths provided via `--context`. - * Read file contents (respecting `.gitignore` if possible, or just simple recursive read). - * Concatenate file contents into a "Context Block". -2. **Prompt Construction**: - * Load a system prompt (`templates/generator.prompt.md`) that instructs the LLM to be an expert QA engineer. - * Inject the user's `` and the gathered "Context Block". - * Instruct the LLM to output a JSON object matching the `EvalCase` schema. -3. **Execution**: - * **Option A (Default)**: Use the existing `AxAI` provider infrastructure to call a standard LLM (e.g., GPT-4). - * **Option B (Agentic)**: If the target is `vscode` or `vscode-insiders`, reuse the `VSCodeProvider` logic from `packages/core/src/evaluation/providers/vscode.ts`. - * This allows us to dispatch the generation task to a real VS Code agent (like GitHub Copilot) running inside a subagent instance. - * The prompt will be sent as a user query to the agent. - * The agent will be instructed to write the YAML to a file or return it in the response. -4. **Parsing & Validation**: - * Parse the JSON/YAML output. - * Validate against the `EvalCase` schema. -5. **Output**: - * Write the YAML to the specified output file. - -### Adversarial Judge Configuration -The generator will be instructed to automatically add an `llm_judge` evaluator to each generated case. -* It should reference a standard "adversarial judge" prompt (which we will ship as a template). -* Example generated YAML: - ```yaml - - id: generated-1 - task: "I want a refund now!" - evaluators: - - name: adversarial_judge - type: llm_judge - prompt: prompts/adversarial-judge.md - ``` - -### Templates -We need to add two new templates to the `apps/cli/src/templates/` directory: -1. `generator.prompt.md`: The system prompt for the generator. -2. `adversarial-judge.md`: The standard judge prompt for checking policy violations. diff --git a/docs/openspec/changes/add-eval-generator/proposal.md b/docs/openspec/changes/add-eval-generator/proposal.md deleted file mode 100644 index b85dac9..0000000 --- a/docs/openspec/changes/add-eval-generator/proposal.md +++ /dev/null @@ -1,17 +0,0 @@ -# Add Eval Generator - -## Summary -Implement a new `agentv generate` command that uses an LLM to automatically generate adversarial evaluation cases from a high-level user description. - -## Problem -Creating high-quality, adversarial evaluation cases manually is time-consuming and requires creativity to think of edge cases. Users often know *what* they want to test (e.g., "refund policy") but struggle to write 10 different tricky scenarios in the correct YAML format. - -## Solution -Add a `generate` command to the CLI that: -1. Takes a natural language description of the test goal. -2. Uses a specialized "Generator Prompt" to act as an adversarial attacker. -3. Produces a valid `agentv` YAML file with multiple test cases. -4. Leverages the new `evaluators` list (from `implement-custom-evaluators`) to automatically configure an adversarial judge for these cases. - -## Dependencies -- Depends on `implement-custom-evaluators` to ensure the generated YAML can use custom adversarial judges. diff --git a/docs/openspec/changes/add-eval-generator/specs/cli/spec.md b/docs/openspec/changes/add-eval-generator/specs/cli/spec.md deleted file mode 100644 index 813ef07..0000000 --- a/docs/openspec/changes/add-eval-generator/specs/cli/spec.md +++ /dev/null @@ -1,35 +0,0 @@ -# Spec: Eval Generator - -## ADDED Requirements - -### Requirement: Generate Command -The CLI SHALL support a `generate` command that creates evaluation files from natural language descriptions. - -#### Scenario: Generate adversarial cases -Given a user runs `agentv generate "Test the refund policy"` -When the command completes -Then a YAML file is created containing multiple `EvalCase` entries -And each case has an `llm_judge` evaluator configured -And the cases represent adversarial attempts to test the policy. - -#### Scenario: Generate with file context -Given a user runs `agentv generate "Test the policy" --context ./policy.md` -When the command completes -Then the generated cases reflect the rules defined in `policy.md`. - -#### Scenario: Generate using VS Code agent -Given a user runs `agentv generate "Test the policy" --target vscode` -When the command completes -Then the generation task is dispatched to a VS Code subagent -And the resulting YAML is saved to the output file. - -#### Scenario: Custom output path -Given a user runs `agentv generate "Test" --output my-test.yaml` -When the command completes -Then the file `my-test.yaml` exists with the generated content. - -#### Scenario: Init includes judge template -Given a user runs `agentv init` -When the command completes -Then the file `prompts/adversarial-judge.md` (or similar) exists in the project -So that generated cases can reference it. diff --git a/docs/openspec/changes/add-eval-generator/tasks.md b/docs/openspec/changes/add-eval-generator/tasks.md deleted file mode 100644 index 6e8a553..0000000 --- a/docs/openspec/changes/add-eval-generator/tasks.md +++ /dev/null @@ -1,8 +0,0 @@ -# Tasks: Add Eval Generator - -- [ ] **Create Templates**: Add `generator.prompt.md` and `adversarial-judge.md` to `apps/cli/src/templates/`. -- [ ] **Implement CLI Command**: Create `apps/cli/src/commands/generate/index.ts` and register it in `cli.ts`. -- [ ] **Implement Generator Logic**: Write the logic to invoke the LLM with the generator prompt and parse the response. -- [ ] **Implement YAML Writer**: Ensure the output is written to a valid YAML file. -- [ ] **Add Init Support**: Update `agentv init` to copy the `adversarial-judge.md` template to the user's project. -- [ ] **Test Generation**: Verify that `agentv generate` produces valid YAML that can be run with `agentv eval`. diff --git a/docs/openspec/changes/adopt-ts-template-prompts/design.md b/docs/openspec/changes/adopt-ts-template-prompts/design.md new file mode 100644 index 0000000..b2e2dae --- /dev/null +++ b/docs/openspec/changes/adopt-ts-template-prompts/design.md @@ -0,0 +1,59 @@ +# Design: TypeScript Template Literals for Evaluator Prompts + +## Architecture +The core idea is to leverage TypeScript's first-class function capabilities to replace the need for a custom string templating engine for code-defined evaluators. + +### `PromptTemplate` Type +We will define a type alias for the prompt generation function: + +```typescript +export type PromptTemplate = (context: EvaluationContext) => string; +``` + +### `LlmJudgeEvaluator` Updates +The `LlmJudgeEvaluator` currently holds an optional `evaluatorTemplate` string. We will expand this to a union type: + +```typescript +export interface LlmJudgeEvaluatorOptions { + // ... + readonly evaluatorTemplate?: string | PromptTemplate; +} +``` + +In the `evaluateWithPrompt` method, we will check the type of `evaluatorTemplate`: +1. If it's a function, we call it with the current `EvaluationContext`. +2. If it's a string (or undefined, falling back to default), we proceed with the existing string substitution logic. + +### Evaluator Loading +To support the "simplified DX" where users just export a function, we need to update `resolveCustomPrompt` (or the relevant loading logic) in `orchestrator.ts`. + +We will use a library like `jiti` to dynamically import TypeScript files at runtime. + +**User Contract:** +The user's TypeScript file should export a function named `prompt` or a default export that matches the `PromptTemplate` signature. + +```typescript +// my-evaluator.ts +import type { EvaluationContext } from '@agentv/core'; + +export const prompt = (context: EvaluationContext) => ` + Question: ${context.promptInputs.question} + ... +`; +``` + +**Loader Logic:** +1. Check if `prompt` path ends in `.ts` or `.js`. +2. Use `jiti` (or dynamic import) to load the module. +3. Extract the `prompt` export or `default` export. +4. Validate it is a function. +5. Return it as the `evaluatorTemplateOverride`. + +## Trade-offs +- **Runtime Dependency**: Adding `jiti` adds a dependency, but enables a seamless TS experience. +- **Security**: Loading and executing user code implies trust. This is already the case with `code` evaluators, but now we are running it in-process (if using `jiti`). This is acceptable for a CLI tool intended to be run by developers on their own code. + +## Alternatives +- **Nunjucks/Handlebars**: We could integrate a full templating engine, but that adds runtime weight and doesn't solve the type safety issue for code-defined evaluators. +- **JSX/Svelte**: We could use component-based rendering, but that requires a build step and is overkill for the current needs of the CLI agent. + diff --git a/docs/openspec/changes/adopt-ts-template-prompts/proposal.md b/docs/openspec/changes/adopt-ts-template-prompts/proposal.md new file mode 100644 index 0000000..8a39c59 --- /dev/null +++ b/docs/openspec/changes/adopt-ts-template-prompts/proposal.md @@ -0,0 +1,21 @@ +# Adopt TypeScript Template Literals for Custom Evaluator Prompts + +## Summary +Enable the use of native TypeScript template literals for defining custom evaluator prompts in code-based evaluators. This provides type safety, better developer experience, and performance benefits over string-based templating with placeholders. + +## Problem +Currently, `LlmJudgeEvaluator` relies on string templates with `{{variable}}` placeholders. This approach: +- Lacks type safety: No compile-time check if variables exist in the context. +- Has limited logic: Conditional logic or loops require complex template syntax or are impossible. +- Is error-prone: Typos in placeholders are only caught at runtime. + +## Solution +1. Introduce a `PromptTemplate` function type that accepts `EvaluationContext` and returns a string. +2. Update `LlmJudgeEvaluator` to accept this function type. +3. Enhance the evaluator loader to support loading `.ts` files directly. Users can simply export a `prompt` function from a TypeScript file, and AgentV will load and use it as the evaluator template. + +## Impact +- **Core**: `LlmJudgeEvaluator`, `orchestrator.ts`, and loader logic. +- **DX**: Developers can write custom evaluators as simple TypeScript functions without boilerplate. +- **Dependencies**: May require adding a library like `jiti` to support runtime TypeScript loading. +- **Backward Compatibility**: Existing string-based templates will continue to work. diff --git a/docs/openspec/changes/adopt-ts-template-prompts/specs/custom-evaluator-prompts/spec.md b/docs/openspec/changes/adopt-ts-template-prompts/specs/custom-evaluator-prompts/spec.md new file mode 100644 index 0000000..47954f7 --- /dev/null +++ b/docs/openspec/changes/adopt-ts-template-prompts/specs/custom-evaluator-prompts/spec.md @@ -0,0 +1,56 @@ +# Spec: Custom Evaluator Prompts + +## ADDED Requirements + +### Requirement: Support Function-Based Prompt Templates +The `LlmJudgeEvaluator` MUST support `PromptTemplate` functions in addition to string templates. + +#### Scenario: Using a function template +Given a custom evaluator defined in code +When I pass a function as `evaluatorTemplate` that uses template literals +Then the evaluator should use the output of that function as the prompt +And the function should receive the full `EvaluationContext` + +```typescript +const myEvaluator = new LlmJudgeEvaluator({ + resolveJudgeProvider: myResolver, + evaluatorTemplate: (context) => ` + Analyze the following: + Question: ${context.promptInputs.question} + Answer: ${context.candidate} + ` +}); +``` + +#### Scenario: Backward compatibility with string templates +Given an existing evaluator configuration using a string template +When I run the evaluator +Then it should continue to function using the string substitution logic + +### Requirement: Load Prompt from TypeScript File +The system MUST support loading a `PromptTemplate` function from a user-provided TypeScript file. + +#### Scenario: Loading a named export +Given a TypeScript file `my-prompt.ts` that exports a `prompt` function +And an eval case configuration that points to this file +When the evaluator runs +Then it should load the file and use the exported `prompt` function as the template + +#### Scenario: Loading a default export +Given a TypeScript file `my-prompt.ts` that has a default export of a function +And an eval case configuration that points to this file +When the evaluator runs +Then it should load the file and use the default export as the template + +#### Scenario: Runtime TypeScript support +Given the agentv CLI running in a standard Node.js environment +When it loads a `.ts` prompt file +Then it should successfully compile/transpile and load the module without requiring the user to pre-compile it + +### Requirement: Type Definitions +The `PromptTemplate` type MUST be exported and available for consumers. + +#### Scenario: Type checking +Given a developer writing a custom prompt function +When they type the function argument +Then TypeScript should infer the `EvaluationContext` type and provide autocomplete for properties like `candidate`, `promptInputs`, etc. diff --git a/docs/openspec/changes/adopt-ts-template-prompts/tasks.md b/docs/openspec/changes/adopt-ts-template-prompts/tasks.md new file mode 100644 index 0000000..2134540 --- /dev/null +++ b/docs/openspec/changes/adopt-ts-template-prompts/tasks.md @@ -0,0 +1,9 @@ +# Tasks: Adopt TypeScript Template Literals for Custom Evaluator Prompts + +- [ ] Add `jiti` (or equivalent) dependency to `@agentv/core` +- [ ] Define `PromptTemplate` type in `packages/core/src/evaluation/types.ts` +- [ ] Update `LlmJudgeEvaluatorOptions` in `packages/core/src/evaluation/evaluators.ts` to accept `PromptTemplate` +- [ ] Update `LlmJudgeEvaluator` implementation to handle `PromptTemplate` functions +- [ ] Update `resolveCustomPrompt` in `orchestrator.ts` to load `.ts` files using `jiti` +- [ ] Add unit tests for function-based prompt templates and loading +- [ ] Create a sample custom evaluator using the new pattern diff --git a/docs/openspec/specs/eval-execution/spec.md b/docs/openspec/specs/eval-execution/spec.md index fa1fcee..4ec4652 100644 --- a/docs/openspec/specs/eval-execution/spec.md +++ b/docs/openspec/specs/eval-execution/spec.md @@ -7,11 +7,11 @@ Define how eval execution formats `input_messages` into `raw_request` for candid The system SHALL format `input_messages` to preserve conversation turn boundaries when there is actual conversational structure requiring role disambiguation. -#### Scenario: Single system and user message (backward compatibility) +#### Scenario: Single system and user message - **WHEN** an eval case has one system message with text content and one user message in `input_messages` -- **THEN** the `raw_request.question` field contains the system text followed by the user message content, flattened without role markers -- **AND** maintains current backward-compatible behavior +- **THEN** the `raw_request.question` field contains the system text and user message formatted with role markers +- **AND** the role markers are `@[System]:` and `@[User]:` #### Scenario: System file attachment with user message (no role markers needed) @@ -24,7 +24,7 @@ The system SHALL format `input_messages` to preserve conversation turn boundarie - **WHEN** an eval case has `input_messages` including one or more non-user messages (assistant, tool, etc.) - **THEN** the `raw_request.question` field contains all messages formatted with clear turn boundaries -- **AND** each turn is prefixed with its role (`[System]:`, `[User]:`, `[Assistant]:`, `[Tool]:`) on its own line +- **AND** each turn is prefixed with its role (`@[System]:`, `@[User]:`, `@[Assistant]:`, `@[Tool]:`) on its own line - **AND** file attachments in content blocks are embedded inline within their respective turn - **AND** blank lines separate consecutive turns for readability @@ -60,14 +60,14 @@ The system SHALL preserve conversation structure when building evaluator prompts #### Scenario: Multi-turn conversation evaluation - **WHEN** the evaluator judges a candidate answer from a multi-turn conversation -- **THEN** the `evaluator_raw_request.prompt` under `[[ ## question ## ]]` contains the formatted conversation with role markers -- **AND** the role markers match the format used in `raw_request.question` (e.g., `[System]:`, `[User]:`, `[Assistant]:`) +- **THEN** the `evaluator_provider_request.prompt` under `[[ ## question ## ]]` contains the formatted conversation with role markers +- **AND** the role markers match the format used in `raw_request.question` (e.g., `@[System]:`, `@[User]:`, `@[Assistant]:`) - **AND** the evaluator can distinguish between different conversation turns #### Scenario: Single-turn conversation evaluation - **WHEN** the evaluator judges a candidate answer from a single-turn interaction -- **THEN** the `evaluator_raw_request.prompt` under `[[ ## question ## ]]` contains the flat-formatted question without role markers +- **THEN** the `evaluator_provider_request.prompt` under `[[ ## question ## ]]` contains the flat-formatted question without role markers - **AND** maintains backward compatibility with existing evaluations #### Scenario: Evaluator receives same context as candidate diff --git a/docs/openspec/specs/evaluation/spec.md b/docs/openspec/specs/evaluation/spec.md index 2eda22a..a36bf3b 100644 --- a/docs/openspec/specs/evaluation/spec.md +++ b/docs/openspec/specs/evaluation/spec.md @@ -51,39 +51,38 @@ The system SHALL instruct LLM judge evaluators to emit a single JSON object and #### Scenario: LLM judge evaluation failure - **WHEN** an LLM judge evaluator fails to parse a valid score -- **THEN** the system logs a warning -- **AND** returns a score of 0 with the raw response for debugging +- **THEN** the system returns a score of 0 ### Requirement: Provider Integration -The system SHALL support multiple LLM providers with environment-based configuration and optional retry settings. +The system SHALL support multiple LLM providers with configuration resolved from target definitions (typically referencing environment variables) and optional retry settings. #### Scenario: Azure OpenAI provider - **WHEN** a test case uses the "azure-openai" provider -- **THEN** the system reads `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, and `AZURE_DEPLOYMENT_NAME` from environment +- **THEN** the system resolves `endpoint`, `api_key`, and `deployment` from the target configuration - **AND** invokes Azure OpenAI with the configured settings - **AND** applies any retry configuration specified in the target definition #### Scenario: Anthropic provider - **WHEN** a test case uses the "anthropic" provider -- **THEN** the system reads `ANTHROPIC_API_KEY` from environment +- **THEN** the system resolves `api_key` from the target configuration - **AND** invokes Anthropic Claude with the configured settings - **AND** applies any retry configuration specified in the target definition #### Scenario: Google Gemini provider - **WHEN** a test case uses the "gemini" provider -- **THEN** the system reads `GOOGLE_API_KEY` from environment -- **AND** optionally reads `GOOGLE_GEMINI_MODEL` to override the default model +- **THEN** the system resolves `api_key` from the target configuration +- **AND** optionally resolves `model` to override the default model - **AND** invokes Google Gemini with the configured settings - **AND** applies any retry configuration specified in the target definition #### Scenario: VS Code Copilot provider - **WHEN** a test case uses the "vscode-copilot" provider -- **THEN** the system generates a structured prompt file with preread block and SHA tokens +- **THEN** the system generates a structured prompt file with preread block - **AND** invokes the subagent library to execute the prompt - **AND** captures the Copilot response @@ -91,7 +90,7 @@ The system SHALL support multiple LLM providers with environment-based configura - **WHEN** a test case uses the "codex" provider - **THEN** the system locates the Codex CLI executable (default `codex`, overrideable via the target) -- **AND** it mirrors guideline and attachment files into a scratch workspace, emitting the same preread block links used by the VS Code provider so Codex opens every referenced file before answering +- **AND** it generates a prompt file with references to the original guideline and attachment files, emitting the same preread block links used by the VS Code provider so Codex opens every referenced file before answering - **AND** it renders the eval prompt into a single string and launches `codex exec --json` plus any configured profile, model, approval preset, and working-directory overrides defined on the target - **AND** it verifies the Codex executable is available while delegating profile/config resolution to the CLI itself - **AND** it parses the emitted JSONL event stream to capture the final assistant message as the provider response, attaching stdout/stderr when the CLI exits non-zero or returns malformed JSON diff --git a/docs/openspec/specs/multiturn-messages-lm-provider/spec.md b/docs/openspec/specs/multiturn-messages-lm-provider/spec.md index 2d6cecf..9407202 100644 --- a/docs/openspec/specs/multiturn-messages-lm-provider/spec.md +++ b/docs/openspec/specs/multiturn-messages-lm-provider/spec.md @@ -41,6 +41,7 @@ input_messages: Converts to: ```typescript [ + { role: "system", content: "You are a careful assistant. Follow all provided instructions and do not fabricate results." }, { role: "user", content: "Debug this code" }, { role: "assistant", content: "I can help with that" }, { role: "user", content: "Thanks, here's the code" } @@ -66,7 +67,7 @@ Converts to: [ { role: "system", - content: "You are a careful assistant.\n\n[[ ## Guidelines ## ]]\n\nAlways be concise" + content: "You are a careful assistant. Follow all provided instructions and do not fabricate results.\n\n[[ ## Guidelines ## ]]\n\nAlways be concise" }, { role: "user", content: "Review this code\n" } ] @@ -89,9 +90,10 @@ With file content "console.log('test')": Converts to: ```typescript [ + { role: "system", content: "You are a careful assistant. Follow all provided instructions and do not fabricate results." }, { role: "user", - content: "Review this:\n=== ./code.js ===\nconsole.log('test')" + content: "Review this:\n\nconsole.log('test')\n" } ] ``` @@ -117,7 +119,7 @@ With `guideline_patterns: ["**/*.instructions.md"]`: System message includes guideline content, user message has reference marker: ```typescript [ - { role: "system", content: "[[ ## Guidelines ## ]]\n\n" }, + { role: "system", content: "You are a careful assistant. Follow all provided instructions and do not fabricate results.\n\n[[ ## Guidelines ## ]]\n\n" }, { role: "user", content: "\nWrite a function" } ] ``` @@ -139,7 +141,7 @@ Both extracted to system message: [ { role: "system", - content: "[[ ## Guidelines ## ]]\n\n=== python.instructions.md ===\n\n\n=== security.instructions.md ===\n" + content: "You are a careful assistant. Follow all provided instructions and do not fabricate results.\n\n[[ ## Guidelines ## ]]\n\n\n\n\n\n\n\n" }, { role: "user", @@ -164,11 +166,12 @@ input_messages: value: guidelines.instructions.md ``` -If guideline file is the only content, that user message is empty after extraction: +If guideline file is the only content, that user message is empty after extraction. However, since only one message has visible content (the system message), this falls back to single-turn processing where the system message becomes part of the user prompt: + ```typescript [ - { role: "system", content: "System context\n\n[[ ## Guidelines ## ]]\n\n" } - // User message omitted - had no non-guideline content + { role: "system", content: "You are a careful assistant. Follow all provided instructions and do not fabricate results.\n\n[[ ## Guidelines ## ]]\n\n" }, + { role: "user", content: "System context\n" } ] ``` @@ -270,7 +273,7 @@ const request: ProviderRequest = { Provider prepends default system message: ```typescript [ - { role: "system", content: "You are a careful assistant." }, + { role: "system", content: "You are a careful assistant. Follow all provided instructions and do not fabricate results." }, { role: "user", content: "Hello" }, { role: "assistant", content: "Hi" }, { role: "user", content: "Help me" } diff --git a/docs/openspec/specs/validation/spec.md b/docs/openspec/specs/validation/spec.md index 4a3cdd9..e6d82c8 100644 --- a/docs/openspec/specs/validation/spec.md +++ b/docs/openspec/specs/validation/spec.md @@ -51,9 +51,14 @@ The system SHALL detect file types using the `$schema` field. #### Scenario: Targets file detection via schema field -- **WHEN** file contains `$schema: agentv-targets-v2.1` +- **WHEN** file contains `$schema: agentv-targets-v2.2` - **THEN** it is validated as a targets configuration file +#### Scenario: Config file detection via schema field + +- **WHEN** file contains `$schema: agentv-config-v2` +- **THEN** it is validated as a config file + #### Scenario: Missing schema field - **WHEN** file missing `$schema` field @@ -94,7 +99,7 @@ The system SHALL validate targets files against the v2 schema. #### Scenario: Schema field required -- **WHEN** targets file has `$schema: agentv-targets-v2.1` +- **WHEN** targets file has `$schema: agentv-targets-v2.2` - **THEN** validation proceeds with targets schema rules #### Scenario: Targets array required @@ -112,6 +117,20 @@ The system SHALL validate targets files against the v2 schema. - **WHEN** target has unknown provider - **THEN** warning issued about unknown provider (non-fatal) +### Requirement: Config File Schema Validation + +The system SHALL validate config files against the v2 schema. + +#### Scenario: Schema field required + +- **WHEN** config file has `$schema: agentv-config-v2` +- **THEN** validation proceeds with config schema rules + +#### Scenario: Guideline patterns validation + +- **WHEN** config file has `guideline_patterns` +- **THEN** it validates that it is an array of strings + ### Requirement: File Reference Validation The system SHALL validate that file URLs referenced in eval files exist. @@ -183,11 +202,6 @@ The system SHALL format validation output for developer readability. The system SHALL validate files efficiently for large codebases. -#### Scenario: Parallel validation - -- **WHEN** validating directory with many files -- **THEN** files are validated in parallel up to CPU core count - #### Scenario: Early exit on critical error - **WHEN** file cannot be parsed as YAML