-
Notifications
You must be signed in to change notification settings - Fork 9
Open
1 / 21 of 2 issues completedDescription
Description:
The AFlowOptimizer has been integrated into our framework to enable the automatic optimization of agent workflows. This feature is currently only functional with humaneval_evaluator. To extend this capability across all supported benchmarks, we must make each evaluator compatible with the optimizer's requirements.
Proposed Evaluators to Extend
- aime_evaluator.py
- bbh_evaluator.py
- drop_evaluator.py
- gaia_evaluator.py
- gsm8k_evaluator.py
- hotpotqa_evaluator.py
- ifeval_evaluator.py
- math_evaluator.py
- mbpp_evaluator.py
- mmlu_pro_evaluator.py
- swebench_evaluator.py
Implementation Considerations:
- Implement the async_evaluate Method
- Define and Load Datasets for Optimization and Testing
References:
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels