[Feature] Optimize Prompts for Direct LLM Calls in Non-Agentic Workflows

**Description:**

**Problem:** Users need to optimize system prompts for simple, direct LLM calls (e.g., via OpenAI API) for tasks like SQL code reviews or schema change validation, incorporating company-specific guidelines (e.g., attached PDF/YAML files for style rules).

**Proposed Solution:** Extend `bbeval opt` to support direct LLM optimization:
- Input: Base system prompt + attached guidelines file (e.g., `--guidelines coding_rules.pdf`).
- Use: BootstrapRS (≤5 trials) to tune prompt for accuracy on test tasks (e.g., "Review this SQL: [code]" → scored against expected feedback).
- Output: JSON prompt with enforced rules (e.g., required placeholders like `{{guidelines}}`, banned phrases).
- Integration: `--mode direct-llm` flag; validate via code_execution tool for SQL syntax.

**Benefits:** Enables quick tuning for non-agentic reviews; reduces manual prompt engineering by 80% on guideline adherence.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Optimize Prompts for Direct LLM Calls in Non-Agentic Workflows #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Optimize Prompts for Direct LLM Calls in Non-Agentic Workflows #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions