Skip to content
This repository was archived by the owner on Nov 15, 2025. It is now read-only.
This repository was archived by the owner on Nov 15, 2025. It is now read-only.

[Epic] Prompt Optimization Framework – bbeval opt #15

@christso

Description

@christso

Goal

Transform bbeval into a complete, production-grade prompt optimization platform that:

  • Generates high-quality, structured prompts in Markdown, YAML, or SudoLang
  • Enforces company guidelines (attached via --guidelines)
  • Works across all agent types with <10 LLM calls
  • Outputs Copilot-ready .prompt.md files

Scope: 3 Modes + 1 Engine

Mode Use Case Input Output Optimizer
direct-llm Non-agentic reviews (SQL, schema, docs) Base .prompt.md + guidelines Optimized .prompt.md BootstrapRS
dspy-agent ReAct, tool-calling agents DSPy module + AgentGateway YAML config + MCP routes BootstrapRS
external-wrapper Copilot,840 Codex, Claude Desktop Mock output + refiner Post-process .py + .prompt.md COPRO / SIMBA

Core Requirements

  1. bbeval opt subcommand

    bbeval opt tests.yaml base.prompt.md \
      --mode direct-llm \
      --guidelines constraints.yaml \
      --format markdown \
      --output optimized.prompt.md
  2. Guideline Reinforcement Engine (Child Issue #101)

    • Enforces rules during optimization
    • Supports Markdown, YAML, SudoLang
    • Injects attachments ({{guidelines}} → PDF text)
  3. Universal Metric

    • bbeval run --jsonl as scoring oracle
    • Supports code execution, SQL linting, semantic match
  4. Output Format (Auto-Selected)

    Format When Example
    Markdown (default) Single workflow .prompt.md with headers, tools
    YAML Multi-workflow workflow: [step: analyze, tools: [...] ]
    SudoLang Loops, state, branching for each file, lint(file)

Success Metrics

Metric Target
Manual tuning time 90% reduction
Guideline compliance 100%
LLM calls per run ≤10
Copilot-ready output Drop-in .prompt.md
Format correctness Zero invalid outputs

Example: Optimized .prompt.md

---
description: 'Review SQL schema changes'
mode: 'agent'
tools: ['runInTerminal', 'getTerminalOutput', 'edit']
---

# SQL Schema Change Reviewer

You are a fintech DBA with 15 years in high-frequency trading systems.

{{guidelines}}

## Task
Review `${selection}` for:
- Index coverage on WHERE/JOIN columns
- No `SELECT *` in production views
- Partitioning on date columns

## Instructions
1. Run `EXPLAIN ANALYZE` on critical queries
2. Check for missing indexes
3. If issue found: emit fix with `edit`
4. Persist until all checks pass

## Output

    ```diff
    - -- Missing index
    + CREATE INDEX idx_orders_date ON orders(order_date);
    ```

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions