Skip to content

Vaibhavi1707/unibench_code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniBench

Real-world software engineering demands the ability to analyze large-scale GitHub code repositories, where understanding the complex interactions between variables, functions, and classes spread across multiple files is essential. With the growing reliance on software engineering (SWE) agents for automated development and maintenance, it is crucial that such models demonstrate strong capabilities in understanding and resolving issues within large and complex codebases. In this work, we benchmark existing SWE agents (Mini-SWE-Agent, CodeActAgent & Qwen3-Coder) with entreprise (GPT-5) and open-source LLMs on issue resolution tasks from the SWEBench dataset. We stress-test the models across novel metrics such as Null Resolution, Fault-Line IOU, ETS@x & Error-Resolution@x which leverage and process SWEBench in creative ways to quantify and test the versatility of SWE agents. We provide code for our unified benchmarking harness, UniBench, here.

About

SWEAgents and CodeLLM Evaluation Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •