📣 News: mini, the 100 line AI agent that still gets 65% on SWE-bench verified!
📣 New benchmark: CodeClash (website, github) evaluates SWE agents on goals, not tasks
Software engineering agents, benchmarks, and models.
Built and maintained by researchers from Princeton University and Stanford University.
More information about the projects
Main projects:
- SWE-agent, a system that automatically solves GitHub issues using an LM agent.
- mini-SWE-agent, a 100 line AI agent that still gets 65% on SWE-bench verified!
- SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
- SWE-smith, a toolkit for generating SWE training data at scale.
Also check out the supporting infrastructure for working with SWE-* projects