profile

📣 News: mini, the 100 line AI agent that still gets 65% on SWE-bench verified!
📣 New benchmark: CodeClash (website, github) evaluates SWE agents on goals, not tasks

Software engineering agents, benchmarks, and models.

Built and maintained by researchers from Princeton University and Stanford University.

More information about the projects

Main projects:

SWE-agent, a system that automatically solves GitHub issues using an LM agent.
mini-SWE-agent, a 100 line AI agent that still gets 65% on SWE-bench verified!
SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
SWE-smith, a toolkit for generating SWE training data at scale.

Also check out the supporting infrastructure for working with SWE-* projects

SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
sb-cli, a command line interface for running evaluations on the cloud.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
mini_logo_text_below.svg		mini_logo_text_below.svg
sbcli_logo_text_below.svg		sbcli_logo_text_below.svg
sweagent_logo_text_below.svg		sweagent_logo_text_below.svg
swebench_logo_text_below.svg		swebench_logo_text_below.svg
swerex_logo_text_below.svg		swerex_logo_text_below.svg
swesmith_logo_text_below.svg		swesmith_logo_text_below.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

FilesExpand file tree

profile

Directory actions

More options

Directory actions

More options

Latest commit

History

profile

Folders and files

parent directory

README.md