Amplifier Bundle Evaluation

A one-stop-shop for evaluating AI agents, bundles, and recipes across the Amplifier ecosystem. Provides an evaluation mode and supporting context for running structured evaluations for a broad range of evaluation use cases.

Example Uses:

"/evaluation I have changes to an Amplifier bundle I would like to evaluate the impact of. Can you help me measure it?"
"/evaluation I have a custom agent that does Y"
"/evaluation I built a memory system and want to know if it improves my agent"

Self-evaluation

Evaluations of this bundle live under .amplifier/evaluations/. Each one is a self-contained scenario with a single-script runner. The first is 01-evaluate-amplifier-bundle.

Installation

Prerequisites

An existing Amplifier installation
A bundle that provides a runtime (e.g. amplifier-foundation) composed in the same session
amplifier-bundle-modes composed in the same session, since the evaluation mode is delivered through that capability
The industry benchmarking capability recommends using Humanity's Last Exam for sample tasks. This dataset is gated and requires creating a HuggingFace account and creating an access token with the permission "Read access to contents of all public gated repos you can access". Please protect the integrity of this benchmark by not publicly sharing, re-uploading, or distributing the dataset.

Amplifier Bundle

To compose it onto an existing setup:

amplifier bundle add "git+https://github.com/microsoft/amplifier-bundle-evaluation@main#subdirectory=behaviors/evaluation.yaml" --app

--app composes the bundle onto every Amplifier session. Remove it to only register the bundle for later activation with amplifier bundle use.

If you also need the modes capability (required for the evaluation mode to be discoverable):

amplifier bundle add "git+https://github.com/microsoft/amplifier-bundle-modes@main#subdirectory=behaviors/modes.yaml" --app

Contributing

Note

This project is not currently accepting external contributions, but we're actively working toward opening this up. We value community input and look forward to collaborating in the future. For now, feel free to fork and experiment!

Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit Contributor License Agreements.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.amplifier		.amplifier
behaviors		behaviors
context		context
examples		examples
modes		modes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
bundle.md		bundle.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amplifier Bundle Evaluation

Self-evaluation

Installation

Prerequisites

Amplifier Bundle

Contributing

Trademarks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Amplifier Bundle Evaluation

Self-evaluation

Installation

Prerequisites

Amplifier Bundle

Contributing

Trademarks

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages