[Feature] Post-Process Optimization for External Agents like Copilot/Codex

## Problem

Black-box agents (e.g., GitHub Copilot, legacy Codex) can't be directly optimized, but users want DSPy-like tuning for their outputs (e.g., refine generated code against guidelines).

## Proposed Solution

Introduce `--mode external-wrapper` in `bbeval opt`:
- Workflow: Mock external agent output → DSPy post-processor (e.g., ChainOfThought verifier + refiner) optimized on testset.
- Input: API mock for Copilot (e.g., via VS Code extension hook); attach guidelines.
- Optimization: Tune refiner prompt (≤5 trials) for tasks like "Fix Copilot's SQL per rules."
- Output: JSON wrapper script (e.g., Python callable) + optimized refine prompt.
- Validation: Use code_execution for output testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Post-Process Optimization for External Agents like Copilot/Codex #18

Problem

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Post-Process Optimization for External Agents like Copilot/Codex #18

Description

Problem

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions