agent-evals

Agent Evals

This codebase evaluates the Firebase MCP server running in various coding agents.

Running Tests

Agent Evals use mocha to run tests, similar to how the Firebase CLI unit tests are implemented. The test commands will automatically instrument the Firebase MCP Server.

WARNING: Running evals will remove any existing Firebase MCP Servers and the Firebase Gemini CLI Extension from your user account so that they don't interfere with the test.

For running tests during development, run:

# Link and build the CLI so that the `firebase` is built with your changes
$ npm link
$ npm run build:watch

# In a separate terminal, run the test suite.
# Running test:dev will skip rebuilding the Firebase CLI (because your watch
# command is doing that for you)
$ cd scripts/agent-evals
$ npm run test:dev

For running in CI, the eval system will do a clean install of the Firebase CLI before running tests:

$ npm run test

Writing Tests

Add a new file in src/tests:

import { startAgentTest } from "../runner/index.js";
import { AgentTestRunner } from "../runner/index.js";

// Ensure you import hooks which instruments an afterEach block that cleans up
// the agent and the pseudo terminal.
import "../helpers/hooks.js";

describe("<prompt-or-tool-name>", function (this: Mocha.Suite) {
  // Recommend setting retries > 0 because LLMs are nondeterministic
  this.retries(2);

  it("<use-case>", async function (this: Mocha.Context) {
    // Start the AgentTestRunner, which will start up the coding agent in a
    // pseudo-terminal, and wait for it to load the Firebase MCP server, and
    // start accepting keystrokes
    const run: AgentTestRunner = await startAgentTest(this, {
      // Name of the template to run in. You can find the list of templates in
      // src/template/index.ts (these will auto-complete)
      templateName: "next-app-hello-world",
      // List of tool mocks to apply for this test. You can find the list of
      // available mocks in src/mock/tool-mocks.ts (these will auto-complete).
      // See the instructions below on how to add your own mocks
      toolMocks: ["nextJsWithProjectMock"],
    });

    // Simulate typing in the terminal. This will await until the "turn" is over
    // so any assertions on what happened will happen on the current "turn"
    await run.type("/firebase:init");
    // Assert that the agent outputted "Backend Services"
    await run.expectText("Backend Services");

    await run.type("Use Firebase Project `project-id-1000`");
    // Assert that a tool was called with the given arguments, and that it was
    // successful
    await run.expectToolCalls([
      "firebase_update_environment",
      argumentContains: "project-id-1000",
      isSuccess: true,
    ]);

    // Important: Expectations apply to the last "turn". Each time you type, it
    // creates a new turn. This ensures you are only asserting against the most
    // recent actions of the agent
    await run.type("Hello world");
    // This will fail, because "Hello World" doesn't trigger a tool call
    await run.expectToolCalls([
      "firebase_update_environment",
      argumentContains: "project-id-1000",
      isSuccess: true,
    ]);
  });
});

Adding Templates

Templates let you run your tests inside of a folder with existing project files in it. e.g. you could add a template with an iOS app in it.

Add the new template in a new folder scripts/agent-evals/templates/<template-name>
In scripts/agent-evals/src/template/index.ts, add the template name to the templates constant:

export const templates = [
  {
    name: "<template-name>",
    platform: TemplatePlatform.NODE,
  },
  ...
] as const;

Ensure you have a .gitignore for the template. For example, a Node.js template should ignore node_modules
Set the TemplatePlatform for your template. To ensure the template is setup before each test run, we'll want to add a build command for the template. If you add a new TemplatePlatform, update the buildTemplates() function for it. For example, Node.js templates will run npm install before they are copied into the test directory.

Adding Mocks for MCP Tools

Mocks applied to MCP tools will completely replace their impelementation with a static output string.

Add your mocked tools to the scripts/agent-evals/src/mock/mocks folder, eg. scripts/agent-evals/src/mock/mocks/next-js-with-project-mock.ts

import { toMockContent } from "../tool-mock-utils.js";

export const environment_nice_day_mock = {
  firebase_get_environment: toMockContent("Tell the user to have a nice day"),
} as const;

Add the new set of mocked tools to the map:

import { environment_nice_day_mock } from "./mocks/next-js-with-project-mock.js";

const allToolMocks = {
  // New tool mock
  environment_nice_day_mock,
} as const;

Start using the mock in your test:

Note: If you apply multiple mocks to the same tool, later values in the list will take precedence.

const run: AgentTestRunner = await startAgentTest(this, {
  templateName: "next-app-hello-world",
  // Add the name of your mock here
  toolMocks: ["environment_nice_day_mock"],
});

Name		Name	Last commit message	Last commit date
parent directory ..
src		src
templates		templates
.mocharc.yml		.mocharc.yml
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Agent Evals

Running Tests

Writing Tests

Adding Templates

Adding Mocks for MCP Tools

FilesExpand file tree

agent-evals

Directory actions

More options

Directory actions

More options

Latest commit

History

agent-evals

Folders and files

parent directory

README.md

Agent Evals

Running Tests

Writing Tests

Adding Templates

Adding Mocks for MCP Tools