Skip to content

Latest commit

 

History

History

📣 News: mini, the 100 line AI agent that still gets 65% on SWE-bench verified!
📣 New benchmark: CodeClash (website, github) evaluates SWE agents on goals, not tasks


SWE-agent   mini-SWE-agent   SWE-ReX   SWE-Smith   SWE-bench   codeclash logo   sb-cli

Software engineering agents, benchmarks, and models.
Built and maintained by researchers from Princeton University and Stanford University.

Slack HuggingFace YouTube

More information about the projects

Main projects:

  • SWE-agent, a system that automatically solves GitHub issues using an LM agent.
  • mini-SWE-agent, a 100 line AI agent that still gets 65% on SWE-bench verified!
  • SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
  • SWE-smith, a toolkit for generating SWE training data at scale.

Also check out the supporting infrastructure for working with SWE-* projects

  • SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
  • sb-cli, a command line interface for running evaluations on the cloud.