Skip to content

Evaluate chatEval's Performance in MasArena on Standard Multi-Agent Benchmarks #44

@RuishanFang

Description

@RuishanFang

Benchmarks to Use:

  • math
  • AIME
  • DROP
  • MMLU_pro
  • BBH
  • HumanEval

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions