From e7cb02f83796c79cb7c337252105fd5467097702 Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Tue, 16 Jun 2026 18:21:52 -0400 Subject: [PATCH 01/12] Start iterating on #1682 On branch edburns/1682-java-tool-ergonomics new file: 1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md Signed-off-by: Ed Burns --- .../20260615-prompts.md | 605 ++++++++++++++++++ 1 file changed, 605 insertions(+) create mode 100644 1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md new file mode 100644 index 000000000..30ad01d76 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md @@ -0,0 +1,605 @@ +### 18:28 Prompt + +Consider this C# usage of our API: + +```C# + [DisplayName("set_current_phase")] + [Description("Sets the current phase of the agent. Use this to report progress.")] + private void SetCurrentPhase(Phase phase) + { + Phase = phase; + updateUi(); + } + +... + + Session = await client.CreateSessionAsync(new() + { + OnPermissionRequest = PermissionHandler.ApproveAll, + SystemMessage = systemMessageConfig, + AvailableTools = new ToolSet().AddCustom("*").AddBuiltIn("web_fetch"), + Tools = [ + CopilotTool.DefineTool(SetCurrentPhase), + CopilotTool.DefineTool(ReportIntent, new() { OverridesBuiltInTool = true }), + CopilotTool.DefineTool(database.SearchProperties), + ], + }); +``` + +How would one do this with the Java API? + +### 18:30 Response + +Searched for regex `public static ToolDefinition create` (`**/java/src/main/java/com/github/copilot/rpc/ToolDefinition.java`), 4 results + +Searched for regex `class ToolSet|addCustom|addBuiltIn|AddCustom|AddBuiltIn` (`**/java/src/main/java/com/github/copilot/rpc/ToolSet.java`), 8 results + +Searched for regex `setAvailableTools|setTools|List` (`**/java/src/main/java/com/github/copilot/rpc/SessionConfig.java`), 5 results + +Here's the Java equivalent: + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent's intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +The key differences from the C# version: + +| C# | Java | +|---|---| +| `CopilotTool.DefineTool(SetCurrentPhase)` — infers name, description, and schema from the method's attributes via reflection | `ToolDefinition.create(name, description, schema, handler)` — all explicit | +| `new() { OverridesBuiltInTool = true }` (options object) | `ToolDefinition.createOverride(...)` (dedicated factory) | +| `[DisplayName]` / `[Description]` attributes on the method | No annotation-based discovery; you provide the strings directly | +| Implicit JSON Schema generation from parameter types | You supply the `Map` schema manually | + +The Java SDK does not currently have a reflection/annotation-based `DefineTool` that can inspect a method and auto-generate the name, description, and parameter schema. Everything is explicit. + +### 18:31 Prompt + +Oof. That is very ugly. So you're saying there is no `DefineTool` analog in Java? + +### 18:34 Response + +Searched for regex `defineTool|DefineTool|@Tool|@CopilotTool|annotation.*tool|reflect.*tool|schema.*generat` (`**/java/src/main/java/**`), no results + +Correct — there is no `DefineTool` analog in the Java SDK today. No reflection-based discovery, no annotations, no automatic JSON Schema generation from parameter types. The only API is `ToolDefinition.create(name, description, schema, handler)` where you supply everything manually. + +The C# version gets its ergonomics from `CopilotTool.DefineTool(MethodGroup)` which uses .NET reflection to pull `[DisplayName]`, `[Description]`, and parameter types to auto-generate the schema. The Go SDK has `DefineTool[T, U](name, desc, handler)` which uses generics to derive the schema from the input type `T`. Python has `@define_tool` which introspects the decorated function's signature and docstring. + +Java *could* support something similar — e.g. a `DefineTool.from(instance, "methodName")` that reflects on the method's parameters and annotations, or a generic approach using a record type for args (similar to Go). But that doesn't exist yet. + +### 18:34 Prompt + +Given that `ToolDefinition` is already a record, run with that fact and sketch out what a `DefineTool` analog might look like. + +### 18:38 Response + +Here's a sketch leveraging records as the args type (similar to Go's `DefineTool[T, U]`): + +```java +// ─── User defines args as a record ─────────────────────────────────────────── + +record WeatherArgs( + @Param("City name") String location, + @Param(value = "Temperature unit", required = false) String unit +) {} + +// ─── Usage: one-liner tool definition ──────────────────────────────────────── + +var tool = ToolDefinition.define("get_weather", "Get current weather for a location", + WeatherArgs.class, + (args, invocation) -> { + String weather = fetchWeather(args.location(), args.unit()); + return CompletableFuture.completedFuture(weather); + }); +``` + +The handler receives the **deserialized record** directly — no more `invocation.getArgumentsAs(...)` boilerplate. + +--- + +### Implementation sketch + +```java +// ─── New annotation ────────────────────────────────────────────────────────── + +package com.github.copilot.rpc; + +import java.lang.annotation.*; + +@Target(ElementType.RECORD_COMPONENT) +@Retention(RetentionPolicy.RUNTIME) +public @interface Param { + /** Description of this parameter. */ + String value(); + /** Whether this parameter is required. Default true. */ + boolean required() default true; +} + +// ─── New typed handler functional interface ────────────────────────────────── + +@FunctionalInterface +public interface TypedToolHandler { + CompletableFuture handle(T args, ToolInvocation invocation); +} + +// ─── New factory method on ToolDefinition ──────────────────────────────────── + +public record ToolDefinition(...) { + + // Existing factories... + + /** + * Defines a tool with automatic JSON Schema generation from a record type. + * + *

Record components become the schema's {@code properties}. Use + * {@link Param} to annotate descriptions and optional flags. + * + * @param the args record type + * @param name tool name + * @param description tool description + * @param argsType the record class (e.g. {@code WeatherArgs.class}) + * @param handler receives the deserialized record + invocation context + * @return a fully constructed ToolDefinition with auto-generated schema + */ + public static ToolDefinition define( + String name, + String description, + Class argsType, + TypedToolHandler handler) { + + Map schema = SchemaGenerator.fromRecord(argsType); + + ToolHandler rawHandler = invocation -> { + T args = invocation.getArgumentsAs(argsType); + return handler.handle(args, invocation); + }; + + return new ToolDefinition(name, description, schema, rawHandler, null, null, null); + } +} +``` + +--- + +### Schema generation from record components + +```java +package com.github.copilot.rpc; + +import java.lang.reflect.RecordComponent; +import java.util.*; + +final class SchemaGenerator { + + static Map fromRecord(Class type) { + RecordComponent[] components = type.getRecordComponents(); + Map properties = new LinkedHashMap<>(); + List required = new ArrayList<>(); + + for (RecordComponent comp : components) { + Map prop = new LinkedHashMap<>(); + prop.put("type", jsonType(comp.getType())); + + Param param = comp.getAnnotation(Param.class); + if (param != null) { + prop.put("description", param.value()); + if (param.required()) { + required.add(comp.getName()); + } + } else { + // No annotation → required by default + required.add(comp.getName()); + } + + properties.put(comp.getName(), prop); + } + + Map schema = new LinkedHashMap<>(); + schema.put("type", "object"); + schema.put("properties", properties); + if (!required.isEmpty()) { + schema.put("required", required); + } + return schema; + } + + private static String jsonType(Class type) { + if (type == String.class) return "string"; + if (type == int.class || type == Integer.class + || type == long.class || type == Long.class) return "integer"; + if (type == double.class || type == Double.class + || type == float.class || type == Float.class) return "number"; + if (type == boolean.class || type == Boolean.class) return "boolean"; + if (type.isArray() || List.class.isAssignableFrom(type)) return "array"; + return "object"; + } +} +``` + +--- + +### The BRK206 example rewritten + +```java +record PhaseArgs(@Param("The phase to transition to") Phase phase) {} + +record SearchArgs(@Param("Search query") String query, + @Param(value = "Max results", required = false) Integer limit) {} + +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.define("set_current_phase", + "Sets the current phase of the agent. Use this to report progress.", + PhaseArgs.class, + (args, inv) -> { + this.phase = args.phase(); + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + args.phase()); + }), + ToolDefinition.define("report_intent", + "Reports the agent's intent", + ReportIntentArgs.class, + (args, inv) -> { /* ... */ }) + .withOverridesBuiltInTool(true), // would need a wither + ToolDefinition.define("search_properties", + "Search property listings", + SearchArgs.class, + (args, inv) -> database.searchProperties(args.query(), args.limit())) + ))).get(30, TimeUnit.SECONDS); +``` + +--- + +### Trade-offs + +| Pro | Con | +|-----|-----| +| Schema auto-generated from record components | Requires runtime reflection (one-time, at definition) | +| Type-safe handler — no manual `getArgumentsAs()` | Nested/complex schemas (arrays of objects) need more work | +| `@Param` keeps description co-located with the field | New annotation + new interface = new public API surface | +| Records are already idiomatic for value objects in Java 17+ | Enums in records (like `Phase`) need custom mapping logic | +| Very close to Go's `DefineTool[T, U]` ergonomics | Can't infer name/description from the method itself (unlike C#) | + +The main limitation vs C# is that Java doesn't have method-group-to-delegate conversion, so you can't just pass `this::setCurrentPhase` and have it figure out the name and schema. You still provide the name and description strings — but the schema becomes free. + +### 18:41 Prompt + +I have added the folder `langchain4j-00-main` to this VS Code workspace. + +Look at the design of `./langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java`. Explain how that design differs from the record based one you proposed. + +### 18:46 Response + +Searched for files matching `**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java`, 1 match + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java#1-1), lines 1 to 50 + +Searched for text `@interface P` (`**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/P.java`), 1 result + +Searched for regex `class ToolSpecification|record ToolSpecification|String name|String description|JsonObjectSchema` (`**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/ToolSpecification.java`), 14 results + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/Tool.java#50-50), lines 50 to 60 + +Searched for regex `toolSpecificationFrom|toolSpecificationsFrom|class ToolSpecifications` (`**/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/ToolSpecifications.java`), 6 results + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/P.java#1-1), lines 1 to 60 + +Read [](file:///home/edburns/workareas/langchain4j-00-main/langchain4j-core/src/main/java/dev/langchain4j/agent/tool/ToolSpecifications.java#100-100), lines 100 to 160 + +Here's how the two designs differ: + +--- + +### LangChain4j: **Annotation-on-method** approach + +```java +@Tool("Get current weather for a location") +String getWeather(@P("City name") String location, + @P(value = "Unit", required = false) String unit) { + return fetchWeather(location, unit); +} +``` + +The framework discovers tools by scanning an **object's methods** for `@Tool`. It then: +1. Reflects on the `Method` to get the name (from `@Tool(name=...)` or the method name itself) +2. Reflects on each `Parameter` to build the JSON Schema (using `@P` for description/required) +3. Generates a `ToolSpecification` (name + description + schema) automatically +4. At invocation time, dispatches to the annotated method with deserialized args passed as method parameters + +**The user never writes a schema or a handler lambda** — the method IS the handler. + +--- + +### My record-based proposal: **Record-as-schema + explicit lambda** approach + +```java +record WeatherArgs(@Param("City name") String location, + @Param(value = "Unit", required = false) String unit) {} + +var tool = ToolDefinition.define("get_weather", "Get current weather", + WeatherArgs.class, + (args, invocation) -> CompletableFuture.completedFuture(fetchWeather(args.location(), args.unit()))); +``` + +The record defines the schema shape; the lambda is the handler. + +--- + +### Key differences + +| Aspect | LangChain4j (`@Tool` on method) | Record-based proposal | +|--------|--------------------------------|----------------------| +| **Discovery** | Automatic — scan object for `@Tool` methods | Manual — you explicitly call `define(...)` | +| **Tool name** | Derived from method name or `@Tool(name=...)` | Provided as a string argument | +| **Description** | `@Tool("description")` on the method | Provided as a string argument | +| **Parameter schema** | Generated from method parameter types + `@P` annotations | Generated from record components + `@Param` annotations | +| **Handler** | The annotated method itself — no lambda | Explicit `TypedToolHandler` lambda | +| **Arg delivery** | Each parameter is a separate method argument | Single deserialized record object | +| **Return type** | Any (String sent as-is, others serialized to JSON) | Always `CompletableFuture` | +| **Registration** | Implicit (pass object to framework, it finds all `@Tool` methods) | Explicit (build a `List`) | +| **Async** | Synchronous method (framework handles threading) | Explicitly async via `CompletableFuture` | + +--- + +### The fundamental philosophical difference + +LangChain4j treats **the method as the tool**. You annotate existing methods and the framework does everything else. This is closest to C#'s `CopilotTool.DefineTool(SetCurrentPhase)`. + +My record-based proposal treats **the record as the schema** but still requires explicit wiring. It's closer to Go's `DefineTool[T, U]` — you get schema generation for free but still wire things manually. + +If you wanted the LangChain4j-style ergonomics in the Copilot Java SDK, it would look something like: + +```java +class MyTools { + @CopilotTool("Sets the current phase of the agent") + CompletableFuture setCurrentPhase(@Param("The phase") Phase phase) { + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Done"); + } +} + +// Registration: scan an object for @CopilotTool methods +var tools = ToolDefinition.fromObject(myToolsInstance); +``` + +That's the most ergonomic option but requires the most framework machinery (method scanning, reflective invocation, argument marshaling per-parameter rather than per-record). + +### 18:46 Prompt + +Yes, I like the langchain4j approach much better. Let's start by filing an issue in `copilot-sdk`. + +Title + +Java: Ergonomics: Defining tools + +Description + +The current way of defining tools in Java is quite awful. + + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent's intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +Compare that with what you get in langchain4j: + +```java +@Tool("Get current weather for a location") +String getWeather(@P("City name") String location, + @P(value = "Unit", required = false) String unit) { + return fetchWeather(location, unit); +} +``` + +We simply must do better. + +### 18:51 Response + +Ran terminal command: cd /home/edburns/workareas/copilot-sdk && gh issue create \ + --repo github/copilot-sdk \ + --title "Java: Ergonomics: Defining tools" \ + --body 'The current way of defining tools in Java is quite awful. + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent'''s intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +Compare that with what you get in [langchain4j](https://github.com/langchain4j/langchain4j): + +```java +@Tool("Get current weather for a location") +String getWeather(@P("City name") String location, + @P(value = "Unit", required = false) String unit) { + return fetchWeather(location, unit); +} +``` + +We simply must do better.' + +Filed: [#1682](https://github.com/github/copilot-sdk/issues/1682) — Java: Ergonomics: Defining tools. + +### 18:52 Prompt + +Define a new ADR in `java/docs/adr/adr-005-tool-definition.md`. Look at the existing ADRs in there, and follow ADR best practice. Create the new ADR describing the approaches we evaluated and how we ended up choosing the langchain4j approach as shown in https://github.com/langchain4j/langchain4j/blob/main/docs/docs/tutorials/tools.md#high-level-tool-api . + +Be explicit about how we looked at the record approach, the current approach, and chose the langchain4j approach for its Java idiomatic qualities. + +### 19:02 Response + +Searched for files matching `**/java/docs/adr/*.md`, 4 matches + +Read [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-004-copilotexperimental.md#1-1), lines 1 to 80 + +Now let me fetch the langchain4j tools tutorial for reference: + + + +Fetched https://github.com/langchain4j/langchain4j/blob/main/docs/docs/tutorials/tools.md + +Now I have all the context. Let me create the ADR: + +Created [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md) + +Created adr-005-tool-definition.md. It documents all three options evaluated (status quo, record-as-schema, annotation-on-method), the rationale for choosing the langchain4j-style approach, and links to the issue and reference implementations. + +Made changes. + + +### 19:02 Prompt + +Is there a way around this? + +----- +- Build requirement: recommend `-parameters` javac flag in documentation. Without it, developers must use `@Param(name="...")` explicitly (same trade-off as langchain4j, Spring, etc.). +----- + +We did an automatic annotation processor for `@CopilotExperimental`. Can't we do something similar here? + +### 19:06 Response + +Yes — this is completely solvable with an annotation processor. + +The key insight: **`javax.lang.model.element.VariableElement.getSimpleName()` always returns the real parameter name at compile time**, regardless of whether `-parameters` is passed to `javac`. The `-parameters` flag only controls whether those names survive into the `.class` bytecode for runtime reflection. An annotation processor sees the source-level names unconditionally. + +### How it would work + +The processor runs at compile time, finds all `@CopilotTool`-annotated methods, and generates a metadata class (or resource) that bakes in the parameter names, types, and descriptions: + +```java +// GENERATED — do not edit +final class MyTools$$CopilotToolMeta { + static List definitions(MyTools instance) { + return List.of( + new ToolDefinition("set_current_phase", + "Sets the current phase of the agent.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "description", "The phase to transition to")), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(Phase.class); + return CompletableFuture.completedFuture( + instance.setCurrentPhase(phase)); + }, null, null, null) + ); + } +} +``` + +At runtime, `ToolDefinition.fromObject(myTools)` just loads the generated `$$CopilotToolMeta` class — zero reflection, zero dependency on `-parameters`. + +### Precedent in the ecosystem + +| Framework | Approach | +|-----------|----------| +| **Micronaut** | Annotation processor generates all DI metadata at compile time — no runtime reflection, no `-parameters` needed | +| **Dagger 2** | Processor generates `_Factory` / `_MembersInjector` classes | +| **MapStruct** | Processor generates mapper implementations from interface method signatures | +| **Our own `@CopilotExperimental`** | Processor walks declared elements via JSR 269 | + +### Advantages over the `-parameters` fallback + +1. **No build configuration requirement** — works out of the box with any `javac` invocation. +2. **No runtime reflection** — the generated code calls the method directly. Faster, GraalVM-native-friendly, no `setAccessible` issues. +3. **Compile-time validation** — the processor can emit errors if `@Param` is missing on a parameter, if a type isn't serializable, if two tools have the same name, etc. (same as we did for `@CopilotExperimental`). +4. **Schema correctness guaranteed** — the schema is computed once from the source AST, not guessed at runtime from erased generics. + +### Trade-off vs. pure runtime reflection (langchain4j default) + +| | Annotation processor | Runtime reflection | +|---|---|---| +| Requires `-parameters`? | No | Yes (or `@P(name=...)`) | +| GraalVM native-image friendly? | Yes | Needs reflection config | +| Compile-time error checking? | Yes | Fails at runtime | +| Extra generated source files? | Yes | None | +| Works without running the processor? | No — but fails loudly at compile time | Yes (degraded) | + +So yes: we can follow the same JSR 269 pattern as `@CopilotExperimental` to eliminate the `-parameters` requirement entirely. The processor sees the real names from source, bakes them into generated code, and the runtime never needs to call `Parameter.getName()`. + +Want me to update the ADR to reflect this? + +### 19:07 Prompt + +Yes please. + +### 19:14 Response + +Read [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md#195-195), lines 195 to 220 + +Read [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md#218-218), lines 218 to 240 + +Replacing 15 lines with 10 lines in [](file:///home/edburns/workareas/copilot-sdk/java/docs/adr/adr-005-tool-definition.md) + + +``` + +``` +Done. Added the `@CopilotExperimental` decision as a consequence bullet — the new annotations and utility methods will be gated behind experimental opt-in until the API stabilizes. + +Made changes. From 4b8d2a709e20c24a9b366251588fdd49bd0b8970 Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Tue, 16 Jun 2026 19:48:07 -0400 Subject: [PATCH 02/12] Add new-java-e2e-test-yaml-and-test skill Packages the knowledge of creating net-new Java E2E integration tests with handcrafted replay proxy YAML snapshots into a reusable Copilot skill. New files: - .github/skills/new-java-e2e-test-yaml-and-test/SKILL.md Main skill instructions covering: YAML snapshot format, proxy matching logic, Java IT test template, verification commands, key classes/files reference table, and common pitfalls. Includes the critical constraint that Java's CapiProxy always sets GITHUB_ACTIONS=true so snapshots must be handcrafted rather than recorded. - .github/skills/new-java-e2e-test-yaml-and-test/examples.md Two working examples from the codebase: a simple single-turn test (Botanica identity REPLACE) and a multi-turn test with tool calls (system message transform with view tool). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../new-java-e2e-test-yaml-and-test/SKILL.md | 221 ++++++++++++++++++ .../examples.md | 177 ++++++++++++++ .../#20260616-prompts.md# | 93 ++++++++ .../.#20260616-prompts.md | 1 + .../20260616-prompts.md | 93 ++++++++ 5 files changed, 585 insertions(+) create mode 100644 .github/skills/new-java-e2e-test-yaml-and-test/SKILL.md create mode 100644 .github/skills/new-java-e2e-test-yaml-and-test/examples.md create mode 100644 1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# create mode 120000 1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md create mode 100644 1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md diff --git a/.github/skills/new-java-e2e-test-yaml-and-test/SKILL.md b/.github/skills/new-java-e2e-test-yaml-and-test/SKILL.md new file mode 100644 index 000000000..a1044762f --- /dev/null +++ b/.github/skills/new-java-e2e-test-yaml-and-test/SKILL.md @@ -0,0 +1,221 @@ +--- +name: new-java-e2e-test-yaml-and-test +description: "Use this skill when creating a new Java E2E integration test (failsafe IT) that requires a new replay proxy YAML snapshot file in test/snapshots/" +--- + +# Creating a New Java E2E Test with a Replay Proxy YAML Snapshot + +This skill covers the complete workflow for adding a new Java failsafe +integration test backed by a handcrafted YAML snapshot for the replay proxy. + +## Overview + +The Java E2E tests use a **replay proxy** (`test/harness/replayingCapiProxy.ts`) +that intercepts HTTP calls to the Copilot API and returns pre-recorded responses +from YAML snapshot files. This avoids needing real authentication in CI. + +**Key constraint:** Java's `CapiProxy.java` always sets `GITHUB_ACTIONS=true` +(line 104), which forces the replay proxy into read-only mode. You **cannot** +record snapshots by running Java tests — you must handcraft the YAML. + +## Step-by-Step Workflow + +### Step 1: Choose a snapshot category and test name + +- Category = a directory under `test/snapshots/` (e.g., `system_message_sections`) +- Test name = the Java method name, converted to snake_case (e.g., + `shouldUseReplacedIdentitySectionInResponse` → `should_use_replaced_identity_section_in_response`) +- Resulting file: `test/snapshots//.yaml` + +### Step 2: Create the YAML snapshot file + +The format is: + +```yaml +models: + - claude-sonnet-4.5 +conversations: + - messages: + - role: system + content: ${system} + - role: user + content: + - role: assistant + content: +``` + +**Rules:** +- `${system}` is a placeholder that matches ANY system message content +- `${workdir}` in tool arguments is substituted with the actual temp workDir +- Each conversation entry represents one request-response exchange +- For multi-turn, add multiple conversation entries +- For tool calls, include `tool_calls` on assistant messages and `role: tool` for results +- The user content must **exactly match** what your test sends (after normalization) + +### Step 3: Create the Java IT test class + +Place it in `java/src/test/java/com/github/copilot/` with an `IT` suffix +(e.g., `MyFeatureIT.java`). The failsafe plugin picks up `*IT.java` files. + +**Template:** + +```java +package com.github.copilot; + +import static org.junit.jupiter.api.Assertions.*; + +import java.util.concurrent.TimeUnit; + +import org.junit.jupiter.api.AfterAll; +import org.junit.jupiter.api.BeforeAll; +import org.junit.jupiter.api.Test; + +import com.github.copilot.generated.AssistantMessageEvent; +import com.github.copilot.rpc.MessageOptions; +import com.github.copilot.rpc.PermissionHandler; +import com.github.copilot.rpc.SessionConfig; +// ... other imports as needed + +class MyFeatureIT { + + private static E2ETestContext ctx; + + @BeforeAll + static void setUp() throws Exception { + ctx = E2ETestContext.create(); + } + + @AfterAll + static void tearDown() throws Exception { + if (ctx != null) { + ctx.close(); + } + } + + @Test + void myTestMethod() throws Exception { + // 1. Configure the proxy to use your snapshot + ctx.configureForTest("my_category", "my_test_method"); + + // 2. Create a client (uses fake token + proxy automatically) + try (CopilotClient client = ctx.createClient()) { + + // 3. Create a session with desired config + CopilotSession session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL)) + .get(30, TimeUnit.SECONDS); + + try { + // 4. Send the prompt (must match YAML exactly) + AssistantMessageEvent response = session + .sendAndWait(new MessageOptions().setPrompt("Your prompt here"), 60_000) + .get(90, TimeUnit.SECONDS); + + // 5. Assert on the response + assertNotNull(response); + String content = response.getData().content(); + assertTrue(content.contains("expected text")); + } finally { + session.close(); + } + } + } +} +``` + +### Step 4: Verify + +```sh +cd java +mvn spotless:apply +mvn failsafe:integration-test -Dit.test="MyFeatureIT#myTestMethod" -Denforcer.skip=true +``` + +Then run the full build to confirm no regressions: + +```sh +mvn clean verify +``` + +## Key Classes and Files + +| What | Where | +|------|-------| +| Test context (manages proxy, workDir, CLI) | `java/src/test/java/com/github/copilot/E2ETestContext.java` | +| Java proxy wrapper | `java/src/test/java/com/github/copilot/CapiProxy.java` | +| Replay proxy (TypeScript) | `test/harness/replayingCapiProxy.ts` | +| Proxy server entry point | `test/harness/server.ts` | +| Snapshot files | `test/snapshots//.yaml` | +| Existing IT tests for reference | `java/src/test/java/com/github/copilot/*IT.java` | + +## How the Proxy Matches Requests + +1. The proxy normalizes the incoming request's messages +2. It compares against each conversation in the YAML: + - System message matches if YAML has `${system}` (wildcard) + - User messages are compared by content (exact text match) + - Tool results are compared after normalizing `${workdir}` paths +3. If a match is found, the **last assistant message** in that conversation is returned +4. If no match, in CI mode (`GITHUB_ACTIONS=true`) it errors with "No cached response found" + +## YAML Format for Tool Calls + +If your test involves tool use: + +```yaml +conversations: + # First exchange: model wants to call a tool + - messages: + - role: system + content: ${system} + - role: user + content: Read the file test.txt + - role: assistant + content: I'll read that file. + tool_calls: + - id: toolcall_0 + type: function + function: + name: view + arguments: '{"path":"${workdir}/test.txt"}' + # Second exchange: after tool result is provided, model gives final answer + - messages: + - role: system + content: ${system} + - role: user + content: Read the file test.txt + - role: assistant + content: I'll read that file. + tool_calls: + - id: toolcall_0 + type: function + function: + name: view + arguments: '{"path":"${workdir}/test.txt"}' + - role: tool + tool_call_id: toolcall_0 + content: "1. Hello world!" + - role: assistant + content: The file test.txt contains "Hello world!" +``` + +**Important:** When the model calls tools like `view`, the CLI actually executes +them locally. The file must exist in the test's workDir. Create it in your test +before sending the prompt: + +```java +Files.writeString(ctx.getWorkDir().resolve("test.txt"), "Hello world!\n"); +``` + +## Common Pitfalls + +1. **Prompt mismatch** — The user content in YAML must exactly match what + `session.sendAndWait(new MessageOptions().setPrompt("..."))` sends. +2. **Forgetting `${system}`** — Always use `${system}` for the system role content + unless testing a specific system message matching scenario. +3. **Tool execution** — If the snapshot has the model calling `view` or other + built-in tools, the CLI will actually execute those tools. Files must exist. +4. **Snake_case conversion** — `configureForTest("category", "myMethodName")` + converts `myMethodName` to `my_method_name` for the filename lookup. +5. **Cannot record via Java** — `CapiProxy.java` forces `GITHUB_ACTIONS=true`. + Always handcraft snapshots or use the Node.js proxy directly for recording. diff --git a/.github/skills/new-java-e2e-test-yaml-and-test/examples.md b/.github/skills/new-java-e2e-test-yaml-and-test/examples.md new file mode 100644 index 000000000..af82ef4db --- /dev/null +++ b/.github/skills/new-java-e2e-test-yaml-and-test/examples.md @@ -0,0 +1,177 @@ +# Examples: New Java E2E Test with YAML Snapshot + +## Example 1: Simple single-turn conversation (no tool calls) + +### Snapshot YAML + +File: `test/snapshots/system_message_sections/should_use_replaced_identity_section_in_response.yaml` + +```yaml +models: + - claude-sonnet-4.5 +conversations: + - messages: + - role: system + content: ${system} + - role: user + content: Who are you? + - role: assistant + content: >- + I'm Botanica, your helpful gardening assistant! I'm here to help you + with all things related to plants and gardening. Whether you have + questions about plant care, garden design, soil preparation, pest + management, or anything else in the world of gardening, I'm happy to + help. What would you like to know about plants or gardening today? +``` + +### Corresponding Java test method + +```java +@Test +void shouldUseReplacedIdentitySectionInResponse() throws Exception { + ctx.configureForTest("system_message_sections", "should_use_replaced_identity_section_in_response"); + + var systemMessage = new SystemMessageConfig().setMode(SystemMessageMode.CUSTOMIZE) + .setSections(Map.of(SystemMessageSections.IDENTITY, + new SectionOverride().setAction(SectionOverrideAction.REPLACE) + .setContent("You are a helpful gardening assistant called Botanica. " + + "You only answer questions about plants and gardening."))); + + try (CopilotClient client = ctx.createClient()) { + CopilotSession session = client.createSession(new SessionConfig().setSystemMessage(systemMessage) + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL)).get(30, TimeUnit.SECONDS); + + try { + AssistantMessageEvent response = session + .sendAndWait(new MessageOptions().setPrompt("Who are you?"), 60_000).get(90, TimeUnit.SECONDS); + + assertNotNull(response, "Expected a response from the assistant"); + String content = response.getData().content().toLowerCase(); + assertTrue(content.contains("botanica") || content.contains("garden") || content.contains("plant"), + "Expected response to reflect the replaced identity section, but got: " + + response.getData().content()); + } finally { + session.close(); + } + } +} +``` + +**Key points:** +- `configureForTest("system_message_sections", "should_use_replaced_identity_section_in_response")` + maps to `test/snapshots/system_message_sections/should_use_replaced_identity_section_in_response.yaml` +- The prompt `"Who are you?"` exactly matches the YAML's user content +- `ctx.createClient()` uses `fake-token-for-e2e-tests` — works in CI + +--- + +## Example 2: Multi-turn with tool calls (from existing tests) + +### Snapshot YAML + +File: `test/snapshots/system_message_transform/should_invoke_transform_callbacks_with_section_content.yaml` + +```yaml +models: + - claude-sonnet-4.5 +conversations: + # First exchange: model decides to call tools + - messages: + - role: system + content: ${system} + - role: user + content: Read the contents of test.txt and tell me what it says + - role: assistant + content: I'll read the test.txt file for you. + tool_calls: + - id: toolcall_0 + type: function + function: + name: report_intent + arguments: '{"intent":"Reading test.txt file"}' + - id: toolcall_1 + type: function + function: + name: view + arguments: '{"path":"${workdir}/test.txt"}' + # Second exchange: after tool results come back, model gives final answer + - messages: + - role: system + content: ${system} + - role: user + content: Read the contents of test.txt and tell me what it says + - role: assistant + content: I'll read the test.txt file for you. + tool_calls: + - id: toolcall_0 + type: function + function: + name: report_intent + arguments: '{"intent":"Reading test.txt file"}' + - id: toolcall_1 + type: function + function: + name: view + arguments: '{"path":"${workdir}/test.txt"}' + - role: tool + tool_call_id: toolcall_0 + content: Intent logged + - role: tool + tool_call_id: toolcall_1 + content: 1. Hello transform! + - role: assistant + content: |- + The file test.txt contains: + ``` + Hello transform! + ``` +``` + +### Corresponding Java test method + +```java +@Test +void transformOnIdentitySectionReceivesNonEmptyContent() throws Exception { + ctx.configureForTest("system_message_transform", "should_invoke_transform_callbacks_with_section_content"); + + ConcurrentHashMap capturedContent = new ConcurrentHashMap<>(); + + var systemMessage = new SystemMessageConfig().setMode(SystemMessageMode.CUSTOMIZE) + .setSections(Map.of(SystemMessageSections.IDENTITY, new SectionOverride().setTransform(content -> { + capturedContent.put("identity", content); + return CompletableFuture.completedFuture(content); + }), SystemMessageSections.TONE, new SectionOverride().setTransform(content -> { + capturedContent.put("tone", content); + return CompletableFuture.completedFuture(content); + }))); + + try (CopilotClient client = ctx.createClient()) { + // Create the file the snapshot expects the CLI view tool to read + Path testFile = ctx.getWorkDir().resolve("test.txt"); + Files.writeString(testFile, "Hello transform!"); + + CopilotSession session = client.createSession(new SessionConfig().setSystemMessage(systemMessage) + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL)).get(30, TimeUnit.SECONDS); + + try { + AssistantMessageEvent response = session + .sendAndWait(new MessageOptions() + .setPrompt("Read the contents of test.txt and tell me what it says"), 60_000) + .get(90, TimeUnit.SECONDS); + + assertNotNull(response, "Expected a response from the assistant"); + + String identityContent = capturedContent.get("identity"); + assertNotNull(identityContent, "Expected identity transform callback to be invoked"); + assertTrue(!identityContent.isBlank(), "Expected identity section content to be non-empty"); + } finally { + session.close(); + } + } +} +``` + +**Key points:** +- The file `test.txt` must be created in `ctx.getWorkDir()` **before** sending the prompt +- The CLI's `view` tool will actually read that file; the YAML's tool result `"1. Hello transform!"` must match what `view` returns for that file content +- Two conversation entries: first for the tool-call decision, second for the final response after tool results diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# b/1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# new file mode 100644 index 000000000..6fa70503c --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# @@ -0,0 +1,93 @@ +# DAY: 2026-06-16 + +### 19:31 Prompt + +Consider this C# usage of our API: + +```C# + [DisplayName("set_current_phase")] + [Description("Sets the current phase of the agent. Use this to report progress.")] + private void SetCurrentPhase(Phase phase) + { + Phase = phase; + updateUi(); + } + +... + + Session = await client.CreateSessionAsync(new() + { + OnPermissionRequest = PermissionHandler.ApproveAll, + SystemMessage = systemMessageConfig, + AvailableTools = new ToolSet().AddCustom("*").AddBuiltIn("web_fetch"), + Tools = [ + CopilotTool.DefineTool(SetCurrentPhase), + CopilotTool.DefineTool(ReportIntent, new() { OverridesBuiltInTool = true }), + CopilotTool.DefineTool(database.SearchProperties), + ], + }); +``` + +According to our chat yesterday, here's how you'd do it with our Java API. + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent's intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +Also according to our chat yesterday: + +----- + +The key differences from the C# version: + +| C# | Java | +|---|---| +| `CopilotTool.DefineTool(SetCurrentPhase)` — infers name, description, and schema from the method's attributes via reflection | `ToolDefinition.create(name, description, schema, handler)` — all explicit | +| `new() { OverridesBuiltInTool = true }` (options object) | `ToolDefinition.createOverride(...)` (dedicated factory) | +| `[DisplayName]` / `[Description]` attributes on the method | No annotation-based discovery; you provide the strings directly | +| Implicit JSON Schema generation from parameter types | You supply the `Map` schema manually | + +The Java SDK does not currently have a reflection/annotation-based `DefineTool` that can inspect a method and auto-generate the name, description, and parameter schema. Everything is explicit. + +------ + +We are working on issue https://github.com/github/copilot-sdk/issues/1682 that addresses this gap. + +But first, I want to add test coverage for the way to do it that apparently currently does work. + +Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` + +### 19:40 Prompt + +Right now, the most important thing we did in this session was learn how to create a net new snapshot YAML, `test/snapshots/system_message_sections/should_use_replaced_identity_section_in_response.yaml` and use it in a net-new failsafe IT test. + +This is an important skill. + +I've never used skills before. + +Help me package up this knowledge as a reusable skill. + +I guess skills have names, right? Well, let's call this `new-java-e2e-test-yaml-and-test`, using whatever naming convention one uses for skills. + +Explain what you're donig step by step. I have a need for this skill right now. diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md new file mode 120000 index 000000000..423ff8c63 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md @@ -0,0 +1 @@ +edburns@DESKTOP-OG7FHKO.618:1781544114 \ No newline at end of file diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md new file mode 100644 index 000000000..b7da33967 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md @@ -0,0 +1,93 @@ +# DAY: 2026-06-16 + +### 19:31 Prompt + +Consider this C# usage of our API: + +```C# + [DisplayName("set_current_phase")] + [Description("Sets the current phase of the agent. Use this to report progress.")] + private void SetCurrentPhase(Phase phase) + { + Phase = phase; + updateUi(); + } + +... + + Session = await client.CreateSessionAsync(new() + { + OnPermissionRequest = PermissionHandler.ApproveAll, + SystemMessage = systemMessageConfig, + AvailableTools = new ToolSet().AddCustom("*").AddBuiltIn("web_fetch"), + Tools = [ + CopilotTool.DefineTool(SetCurrentPhase), + CopilotTool.DefineTool(ReportIntent, new() { OverridesBuiltInTool = true }), + CopilotTool.DefineTool(database.SearchProperties), + ], + }); +``` + +According to our chat yesterday, here's how you'd do it with our Java API. + +```java +var session = client.createSession(new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setSystemMessage(systemMessageConfig) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of( + ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", + Map.of("type", "object", + "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")), + invocation -> { + Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); + this.phase = phase; + updateUi(); + return CompletableFuture.completedFuture("Phase set to " + phase); + }), + ToolDefinition.createOverride("report_intent", "Reports the agent's intent", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> { /* ... */ }), + ToolDefinition.create("search_properties", "Search property listings", + Map.of("type", "object", "properties", Map.of(/* ... */)), + invocation -> database.searchProperties(invocation.getArguments())) + ))).get(30, TimeUnit.SECONDS); +``` + +Also according to our chat yesterday: + +----- + +The key differences from the C# version: + +| C# | Java | +|---|---| +| `CopilotTool.DefineTool(SetCurrentPhase)` — infers name, description, and schema from the method's attributes via reflection | `ToolDefinition.create(name, description, schema, handler)` — all explicit | +| `new() { OverridesBuiltInTool = true }` (options object) | `ToolDefinition.createOverride(...)` (dedicated factory) | +| `[DisplayName]` / `[Description]` attributes on the method | No annotation-based discovery; you provide the strings directly | +| Implicit JSON Schema generation from parameter types | You supply the `Map` schema manually | + +The Java SDK does not currently have a reflection/annotation-based `DefineTool` that can inspect a method and auto-generate the name, description, and parameter schema. Everything is explicit. + +------ + +We are working on issue https://github.com/github/copilot-sdk/issues/1682 that addresses this gap. + +But first, I want to add test coverage for the way to do it that apparently currently does work. + +Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` + +### 19:40 Prompt + +Right now, the most important thing we did in this session was learn how to create a net new snapshot YAML, `test/snapshots/system_message_sections/should_use_replaced_identity_section_in_response.yaml` and use it in a net-new failsafe IT test. + +This is an important skill. + +I've never used skills before. + +Help me package up this knowledge as a reusable skill. + +I guess skills have names, right? Well, let's call this `new-e2e-test-yaml-and-test`, using whatever naming convention one uses for skills. + +Explain what you're donig step by step. I have a need for this skill right now. From 94af6ccca2d5e0c1fbe0e7c0dfd000651f660fdf Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Tue, 16 Jun 2026 20:04:13 -0400 Subject: [PATCH 03/12] WIP prompts --- .../#20260616-prompts.md# | 93 ------------- .../.#20260616-prompts.md | 1 - .../20260616-prompts.md | 122 +++++++++++++++++- 3 files changed, 120 insertions(+), 96 deletions(-) delete mode 100644 1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# delete mode 120000 1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# b/1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# deleted file mode 100644 index 6fa70503c..000000000 --- a/1682-java-tool-ergonomics-prompts-remove-before-merge/#20260616-prompts.md# +++ /dev/null @@ -1,93 +0,0 @@ -# DAY: 2026-06-16 - -### 19:31 Prompt - -Consider this C# usage of our API: - -```C# - [DisplayName("set_current_phase")] - [Description("Sets the current phase of the agent. Use this to report progress.")] - private void SetCurrentPhase(Phase phase) - { - Phase = phase; - updateUi(); - } - -... - - Session = await client.CreateSessionAsync(new() - { - OnPermissionRequest = PermissionHandler.ApproveAll, - SystemMessage = systemMessageConfig, - AvailableTools = new ToolSet().AddCustom("*").AddBuiltIn("web_fetch"), - Tools = [ - CopilotTool.DefineTool(SetCurrentPhase), - CopilotTool.DefineTool(ReportIntent, new() { OverridesBuiltInTool = true }), - CopilotTool.DefineTool(database.SearchProperties), - ], - }); -``` - -According to our chat yesterday, here's how you'd do it with our Java API. - -```java -var session = client.createSession(new SessionConfig() - .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) - .setSystemMessage(systemMessageConfig) - .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) - .setTools(List.of( - ToolDefinition.create("set_current_phase", "Sets the current phase of the agent. Use this to report progress.", - Map.of("type", "object", - "properties", Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), - "required", List.of("phase")), - invocation -> { - Phase phase = invocation.getArgumentsAs(PhaseArgs.class).phase(); - this.phase = phase; - updateUi(); - return CompletableFuture.completedFuture("Phase set to " + phase); - }), - ToolDefinition.createOverride("report_intent", "Reports the agent's intent", - Map.of("type", "object", "properties", Map.of(/* ... */)), - invocation -> { /* ... */ }), - ToolDefinition.create("search_properties", "Search property listings", - Map.of("type", "object", "properties", Map.of(/* ... */)), - invocation -> database.searchProperties(invocation.getArguments())) - ))).get(30, TimeUnit.SECONDS); -``` - -Also according to our chat yesterday: - ------ - -The key differences from the C# version: - -| C# | Java | -|---|---| -| `CopilotTool.DefineTool(SetCurrentPhase)` — infers name, description, and schema from the method's attributes via reflection | `ToolDefinition.create(name, description, schema, handler)` — all explicit | -| `new() { OverridesBuiltInTool = true }` (options object) | `ToolDefinition.createOverride(...)` (dedicated factory) | -| `[DisplayName]` / `[Description]` attributes on the method | No annotation-based discovery; you provide the strings directly | -| Implicit JSON Schema generation from parameter types | You supply the `Map` schema manually | - -The Java SDK does not currently have a reflection/annotation-based `DefineTool` that can inspect a method and auto-generate the name, description, and parameter schema. Everything is explicit. - ------- - -We are working on issue https://github.com/github/copilot-sdk/issues/1682 that addresses this gap. - -But first, I want to add test coverage for the way to do it that apparently currently does work. - -Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` - -### 19:40 Prompt - -Right now, the most important thing we did in this session was learn how to create a net new snapshot YAML, `test/snapshots/system_message_sections/should_use_replaced_identity_section_in_response.yaml` and use it in a net-new failsafe IT test. - -This is an important skill. - -I've never used skills before. - -Help me package up this knowledge as a reusable skill. - -I guess skills have names, right? Well, let's call this `new-java-e2e-test-yaml-and-test`, using whatever naming convention one uses for skills. - -Explain what you're donig step by step. I have a need for this skill right now. diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md deleted file mode 120000 index 423ff8c63..000000000 --- a/1682-java-tool-ergonomics-prompts-remove-before-merge/.#20260616-prompts.md +++ /dev/null @@ -1 +0,0 @@ -edburns@DESKTOP-OG7FHKO.618:1781544114 \ No newline at end of file diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md index b7da33967..e9589f028 100644 --- a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md @@ -76,7 +76,7 @@ We are working on issue https://github.com/github/copilot-sdk/issues/1682 that a But first, I want to add test coverage for the way to do it that apparently currently does work. -Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` +Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` to instruct a `copilot --yolo` session to ### 19:40 Prompt @@ -88,6 +88,124 @@ I've never used skills before. Help me package up this knowledge as a reusable skill. -I guess skills have names, right? Well, let's call this `new-e2e-test-yaml-and-test`, using whatever naming convention one uses for skills. +I guess skills have names, right? Well, let's call this `new-java-e2e-test-yaml-and-test`, using whatever naming convention one uses for skills. Explain what you're donig step by step. I have a need for this skill right now. + +### 19:53 Prompt + +▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ +◐ The user wants to create a reusable skill for creating new E2E test YAML snapshots and corresponding failsafe IT tests. Let me first understand how skills work in this repo - I can see from the environment that there's already a `java-coding-skill` defined. Let me look at how skills are structured. + +● Let me look at how skills are structured in this repo so I can explain and create one for you. + +● Find existing skills (shell) + │ find /home/edburns/workareas/copilot-sdk/.github/skills -type f | sort + └ 4 lines... + +● List directory .github/skills/java-coding-skill + └ 1 file found + +● List directory .github/skills/rust-coding-skill + └ 2 files found + +● Read SKILL.md + │ .github/skills/rust-coding-skill/SKILL.md + └ L1:30 (30 lines read) + +● Read SKILL.md + │ .github/skills/java-coding-skill/SKILL.md + └ L1:10 (10 lines read) + +● Good. Here's what I'm seeing and what I'll do: + + How Skills Work (Step by Step) + + 1. Skills live in .github/skills// — each skill is a directory containing at minimum a SKILL.md file + 2. SKILL.md has YAML frontmatter with name and description — the description tells Copilot when to invoke the skill + 3. The body of SKILL.md is the instruction content — the "knowledge" that gets loaded when the skill is invoked + 4. Optional extra files (like examples.md) can be included for reference material + + What I'll Create + + For your new-e2e-test-yaml-and-test skill, I'll create: + + - .github/skills/new-e2e-test-yaml-and-test/SKILL.md — the main skill with step-by-step instructions on how to create a new YAML snapshot and corresponding failsafe IT test + - .github/skills/new-e2e-test-yaml-and-test/examples.md — the actual example files we created this session as reference + + Let me create it now: + +✗ Create skill directory (shell) + │ mkdir -p /home/edburns/workareas/copilot-sdk/.github/skills/new-e2e-test-yaml-and-test + └ The user rejected this tool call. User feedback: I named it wrong. The skill should be called + `new-java-e2e-test-yaml-and-test` + +● Create skill directory (shell) + │ mkdir -p /home/edburns/workareas/copilot-sdk/.github/skills/new-java-e2e-test-yaml-and-test + └ 1 line... + +● IDE connection lost: Visual Studio Code - Insiders closed + +● Now let me create the main SKILL.md: + +● Create SKILL.md +222 + └ .github/skills/new-java-e2e-test-yaml-and-test/SKILL.md + +● Now let me add the examples file with the actual artifacts from this session: + +● Get example YAML (shell) + │ cat /home/edburns/workareas/copilot-sdk/test/snapshots/system_message_sections/should_use_replaced_identity_section_in + │ _response.yaml + └ 16 lines... + +● All permissions are now enabled. Tool, path, and URL requests will be automatically approved. + +● Get example test (shell) + │ cat /home/edburns/workareas/copilot-sdk/java/src/test/java/com/github/copilot/SystemMessageSectionsIT.java + └ 193 lines... + +● Create examples.md +178 + └ .github/skills/new-java-e2e-test-yaml-and-test/examples.md + +● Done. Here's what I created: + + .github/skills/new-java-e2e-test-yaml-and-test/ + ├── SKILL.md ← Main instructions (when/how to create new E2E tests) + └── examples.md ← Real examples from this session (simple + tool-call) + + How it works: + + - The description in the frontmatter tells Copilot to invoke this skill whenever you're creating a new Java E2E integration test with a YAML snapshot + - SKILL.md covers: YAML format, proxy matching logic, the Java test template, common pitfalls, and the key constraint that you must handcraft YAMLs (can't record via Java) + - examples.md has two working examples — the simple Botanica test and the multi-turn tool-call test + + The skill is ready to use. Next time you (or Copilot) need to add a new E2E test, the skill will be available as new-java-e2e-test-yaml-and-test. + +### 19:54 Prompt + +Ok, now, in a subsequent session, I will want to use the new skill `new-java-e2e-test-yaml-and-test`. + +How do I instruct the agent to use this skill to do its job? + +Is it something like + +`Use the skill new-java-e2e-test-yaml-and-test to create a new test and yaml that exercises the feature we just created.` + +Or something else? + +### 19:56 Response + + Based on the documentation, here's how skills work: + + You don't need to explicitly invoke skills. Skills are automatically loaded based on their description field in the frontmatter. The agent sees the skill listed in and invokes it when the task matches the description. + + However, you can be explicit if you want. Any of these would work: + + 1. Implicit (recommended): Just describe what you want — the skill's description matches "creating a new Java E2E integration test that requires a new replay proxy YAML snapshot": + Create a new failsafe IT test with a YAML snapshot that exercises the new feature. + 2. Explicit mention: Reference the skill by name to make sure it's invoked: + Use the new-java-e2e-test-yaml-and-test skill to create a test and YAML snapshot for the new feature. + 3. Check available skills: Use /skills in the CLI to see what's loaded, or /env to see the full environment including skills. + + Both approaches work. The agent sees the skill in its available tools and will call it before generating code. The explicit mention is a good safety net if you want to be sure. + From c25ef6294a4e096903b8c83b857772025a082f08 Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Tue, 16 Jun 2026 20:13:11 -0400 Subject: [PATCH 04/12] Plan to generate test for non-ergonomic-tool-definition --- .../1682-low-level-tool-definition.md | 94 +++++++++++++++++++ .../20260616-prompts.md | 11 ++- 2 files changed, 103 insertions(+), 2 deletions(-) create mode 100644 1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md new file mode 100644 index 000000000..34fb2ee91 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md @@ -0,0 +1,94 @@ +# Plan: Add E2E test for non-ergonomic (low-level) tool definition + +## Goal + +Add a failsafe IT test that exercises the **current explicit** `ToolDefinition.create()` / `ToolDefinition.createOverride()` API — the "non-ergonomic" approach — with multiple tools, `ToolSet` with `addCustom`/`addBuiltIn`, `getArgumentsAs()` deserialization into a record, and a tool handler that mutates application state. This establishes baseline test coverage before issue #1682 adds the annotation-driven ergonomic API. + +## Instructions + +Read `java.instructions.md` in my User level Copilot instructions. This session is about Java. + +Use the `new-java-e2e-test-yaml-and-test` skill to create a new failsafe IT test that exercises the non-ergonomic-tool-definition approach to tool definition. + +### What the test must exercise + +The test class should be `LowLevelToolDefinitionIT.java` in `java/src/test/java/com/github/copilot/`. It must demonstrate **all** of the following in a single session: + +1. **`ToolDefinition.create(name, description, schema, handler)`** — define at least two custom tools explicitly with `Map` schemas. +2. **`ToolDefinition.createOverride(name, description, schema, handler)`** — define one tool that overrides a built-in tool. +3. **`invocation.getArgumentsAs(SomeRecord.class)`** — at least one handler must deserialize arguments into a Java record (not `getArguments()` returning raw Map). +4. **`invocation.getArguments()`** — at least one handler must use the raw `Map` accessor. +5. **`ToolSet` with `addCustom("*").addBuiltIn("web_fetch")`** — pass `setAvailableTools(...)` on the `SessionConfig`. +6. **Handler mutates state** — one tool handler should mutate a field on the test class and the test should assert that the field was updated after the response. +7. **Handler returns `CompletableFuture.completedFuture(...)`** — all handlers return completed futures (as is the current pattern). + +### Concrete test design + +#### Snapshot category + +`tools` (reuse the existing category under `test/snapshots/tools/`). + +#### Snapshot file + +`test/snapshots/tools/low_level_tool_definition.yaml` + +#### Java test method name + +`lowLevelToolDefinition` (converts to `low_level_tool_definition` for snapshot lookup). + +#### Tool definitions for the test + +| Tool | Factory | Name | Description | Schema | Handler behavior | +|------|---------|------|-------------|--------|-----------------| +| Set Phase | `create` | `set_current_phase` | "Sets the current phase of the agent" | `{ type: object, properties: { phase: { type: string, enum: [searching, analyzing, done] } }, required: [phase] }` | Deserializes via `getArgumentsAs(PhaseArgs.class)` where `record PhaseArgs(String phase) {}`. Mutates a `currentPhase` field on the test. Returns `"Phase set to " + phase`. | +| Search | `create` | `search_items` | "Search for items by keyword" | `{ type: object, properties: { keyword: { type: string } }, required: [keyword] }` | Uses `getArguments()` raw Map. Returns a fixed string like `"Found: item_alpha, item_beta"`. | +| Override grep | `createOverride` | `grep` | "Custom grep override" | `{ type: object, properties: { query: { type: string } }, required: [query] }` | Uses `getArguments()`. Returns `"CUSTOM_GREP: " + query`. | + +#### Prompt + +``` +First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results. +``` + +#### YAML snapshot structure + +Two conversations (one for the tool-call turn, one for the final response turn after tool results are provided): + +- **Conversation 1** (tool call turn): system `${system}` + user prompt → assistant with `tool_calls` for `set_current_phase` and `search_items`. +- **Conversation 2** (final response turn): full history including tool results → assistant final content mentioning "analyzing", "item_alpha", "item_beta". + +Study the existing snapshot files in `test/snapshots/tools/` carefully. In particular, study the snapshot file for the `testInvokesCustomTool` test in `ToolsTest.java` (`test/snapshots/tools/invokes_custom_tool.yaml`). It shows how tool call and tool result conversations are structured. Additionally, study `test/snapshots/tools/should_execute_multiple_custom_tools_in_parallel_single_turn.yaml` which shows multiple parallel tool calls in a single turn. + +#### Assertions + +1. `response` is not null. +2. Response content contains `"analyzing"` (confirming the phase tool was called). +3. Response content contains `"item_alpha"` or `"item_beta"` (confirming search tool was called). +4. The `currentPhase` field on the test class equals `"analyzing"` (confirming handler mutated state). + +#### Session config + +```java +new SessionConfig() + .setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of(setPhaseTool, searchTool, grepOverrideTool)) +``` + +### Step-by-step execution + +1. Create the YAML snapshot file at `test/snapshots/tools/low_level_tool_definition.yaml`. +2. Create the Java IT file at `java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java`. +3. Run `mvn spotless:apply` from the `java/` directory (using the background + log pattern from `java.instructions.md`). +4. Run the test in isolation: + ```sh + cd java + LOG="$(date +%Y%m%d-%H%M)-job-logs.txt" && mvn failsafe:integration-test -Dit.test="LowLevelToolDefinitionIT#lowLevelToolDefinition" -Denforcer.skip=true > "$LOG" 2>&1 & tail -f "$LOG" + ``` +5. Fix any failures. Iterate until the isolated test passes cleanly. +6. Run the full build: + ```sh + cd java + LOG="$(date +%Y%m%d-%H%M)-job-logs.txt" && mvn clean verify > "$LOG" 2>&1 & tail -f "$LOG" + ``` +7. Fix any failures from the full build. Iterate until `mvn clean verify` passes cleanly. diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md index e9589f028..931f02d08 100644 --- a/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/20260616-prompts.md @@ -28,7 +28,7 @@ Consider this C# usage of our API: }); ``` -According to our chat yesterday, here's how you'd do it with our Java API. +According to our chat yesterday, here's how you'd do it with our Java API. For discussion call this the non-ergonomic-tool-definition. ```java var session = client.createSession(new SessionConfig() @@ -76,7 +76,14 @@ We are working on issue https://github.com/github/copilot-sdk/issues/1682 that a But first, I want to add test coverage for the way to do it that apparently currently does work. -Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` to instruct a `copilot --yolo` session to +Write a plan to `1682-java-tool-ergonomics-prompts-remove-before-merge/1682-low-level-tool-definition.md` to instruct a `copilot --yolo` session to: + +- Read `java.instructions.md` in my User level Copilot instructions. This session is about Java. +- Use the `new-java-e2e-test-yaml-and-test` skill to create a new failsafe IT test that exercises the non-ergonomic-tool-definition approach to tool definition. +- Run the test in isolation until you get a clean run of just that test. +- Run all the automated tests according to the `java.instructions.md` and `mvn clean verify`. + + ### 19:40 Prompt From 4cf9335a9f14b96a1b763d3582bcba1a821807b6 Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Tue, 16 Jun 2026 20:14:49 -0400 Subject: [PATCH 05/12] chore: trigger GPG passphrase cache From ac67ecfaf258615e191c4f2a7ddf23b09d37e43f Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Wed, 17 Jun 2026 09:48:08 -0400 Subject: [PATCH 06/12] Add low-level tool definition E2E coverage Add a new Java failsafe integration test and replay snapshot that exercise the current explicit tool-definition APIs before ergonomic annotations are added. - add `LowLevelToolDefinitionIT` covering `create`, `createOverride`, `getArgumentsAs(record)`, `getArguments()`, and `ToolSet` available tools - add `tools/low_level_tool_definition.yaml` with multi-tool call and final response replay conversations - assert handler-driven state mutation (`currentPhase`) and expected response content Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../copilot/LowLevelToolDefinitionIT.java | 111 ++++++++++++++++++ .../tools/low_level_tool_definition.yaml | 49 ++++++++ 2 files changed, 160 insertions(+) create mode 100644 java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java create mode 100644 test/snapshots/tools/low_level_tool_definition.yaml diff --git a/java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java b/java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java new file mode 100644 index 000000000..9365e7169 --- /dev/null +++ b/java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java @@ -0,0 +1,111 @@ +/*--------------------------------------------------------------------------------------------- + * Copyright (c) Microsoft Corporation. All rights reserved. + *--------------------------------------------------------------------------------------------*/ + +package com.github.copilot; + +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertTrue; + +import java.util.List; +import java.util.Map; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.TimeUnit; + +import org.junit.jupiter.api.AfterAll; +import org.junit.jupiter.api.BeforeAll; +import org.junit.jupiter.api.Test; + +import com.github.copilot.generated.AssistantMessageEvent; +import com.github.copilot.rpc.MessageOptions; +import com.github.copilot.rpc.PermissionHandler; +import com.github.copilot.rpc.SessionConfig; +import com.github.copilot.rpc.ToolDefinition; +import com.github.copilot.rpc.ToolSet; + +/** + * Failsafe integration test for explicit (non-ergonomic) tool definition APIs. + * + * @see Snapshot: tools/low_level_tool_definition + */ +class LowLevelToolDefinitionIT { + + private static E2ETestContext ctx; + private String currentPhase; + + record PhaseArgs(String phase) { + } + + @BeforeAll + static void setup() throws Exception { + ctx = E2ETestContext.create(); + } + + @AfterAll + static void teardown() throws Exception { + if (ctx != null) { + ctx.close(); + } + } + + @Test + void lowLevelToolDefinition() throws Exception { + ctx.configureForTest("tools", "low_level_tool_definition"); + + Map setPhaseSchema = Map.of("type", "object", "properties", + Map.of("phase", Map.of("type", "string", "enum", List.of("searching", "analyzing", "done"))), + "required", List.of("phase")); + + ToolDefinition setPhaseTool = ToolDefinition.create("set_current_phase", "Sets the current phase of the agent", + setPhaseSchema, invocation -> { + PhaseArgs args = invocation.getArgumentsAs(PhaseArgs.class); + currentPhase = args.phase(); + return CompletableFuture.completedFuture("Phase set to " + currentPhase); + }); + + Map searchSchema = Map.of("type", "object", "properties", + Map.of("keyword", Map.of("type", "string")), "required", List.of("keyword")); + + ToolDefinition searchTool = ToolDefinition.create("search_items", "Search for items by keyword", searchSchema, + invocation -> { + Map args = invocation.getArguments(); + args.get("keyword"); + return CompletableFuture.completedFuture("Found: item_alpha, item_beta"); + }); + + Map grepSchema = Map.of("type", "object", "properties", + Map.of("query", Map.of("type", "string")), "required", List.of("query")); + + ToolDefinition grepOverrideTool = ToolDefinition.createOverride("grep", "Custom grep override", grepSchema, + invocation -> { + Map args = invocation.getArguments(); + String query = (String) args.get("query"); + return CompletableFuture.completedFuture("CUSTOM_GREP: " + query); + }); + + try (CopilotClient client = ctx.createClient()) { + CopilotSession session = client + .createSession(new SessionConfig().setOnPermissionRequest(PermissionHandler.APPROVE_ALL) + .setAvailableTools(new ToolSet().addCustom("*").addBuiltIn("web_fetch")) + .setTools(List.of(setPhaseTool, searchTool, grepOverrideTool))) + .get(30, TimeUnit.SECONDS); + + try { + AssistantMessageEvent response = session.sendAndWait(new MessageOptions().setPrompt( + "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results."), + 60_000).get(90, TimeUnit.SECONDS); + + assertNotNull(response, "Expected a response from the assistant"); + String content = response.getData().content().toLowerCase(); + assertTrue(content.contains("analyzing"), + "Response should contain the updated phase: " + response.getData().content()); + assertTrue(content.contains("item_alpha") || content.contains("item_beta"), + "Response should contain search results: " + response.getData().content()); + assertTrue("analyzing".equals(currentPhase), + "Expected currentPhase to be analyzing but was: " + currentPhase); + } finally { + session.close(); + } + } + } +} diff --git a/test/snapshots/tools/low_level_tool_definition.yaml b/test/snapshots/tools/low_level_tool_definition.yaml new file mode 100644 index 000000000..d8fcb9022 --- /dev/null +++ b/test/snapshots/tools/low_level_tool_definition.yaml @@ -0,0 +1,49 @@ +models: + - claude-sonnet-4.5 +conversations: + - messages: + - role: system + content: ${system} + - role: user + content: First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results. + - role: assistant + content: I'll set the phase and run the search now. + tool_calls: + - id: toolcall_0 + type: function + function: + name: set_current_phase + arguments: '{"phase":"analyzing"}' + - id: toolcall_1 + type: function + function: + name: search_items + arguments: '{"keyword":"copilot"}' + - messages: + - role: system + content: ${system} + - role: user + content: First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results. + - role: assistant + content: I'll set the phase and run the search now. + tool_calls: + - id: toolcall_0 + type: function + function: + name: set_current_phase + arguments: '{"phase":"analyzing"}' + - id: toolcall_1 + type: function + function: + name: search_items + arguments: '{"keyword":"copilot"}' + - role: tool + tool_call_id: toolcall_0 + content: Phase set to analyzing + - role: tool + tool_call_id: toolcall_1 + content: "Found: item_alpha, item_beta" + - role: assistant + content: |- + Current phase: analyzing + Search results: item_alpha, item_beta From 8f67322f129e36c810979b5cbfb3803c61f432ec Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Wed, 17 Jun 2026 12:12:14 -0400 Subject: [PATCH 07/12] Prompt to create net-new test in other non-Java languages --- ...that-use-low_level_tool_definition.yaml.md | 151 ++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 1682-java-tool-ergonomics-prompts-remove-before-merge/add-tests-that-use-low_level_tool_definition.yaml.md diff --git a/1682-java-tool-ergonomics-prompts-remove-before-merge/add-tests-that-use-low_level_tool_definition.yaml.md b/1682-java-tool-ergonomics-prompts-remove-before-merge/add-tests-that-use-low_level_tool_definition.yaml.md new file mode 100644 index 000000000..b60937cd2 --- /dev/null +++ b/1682-java-tool-ergonomics-prompts-remove-before-merge/add-tests-that-use-low_level_tool_definition.yaml.md @@ -0,0 +1,151 @@ +## Goal + +Port the Java integration test behavior from: + +- `java/src/test/java/com/github/copilot/LowLevelToolDefinitionIT.java` +- test method: `lowLevelToolDefinition` +- snapshot: `test/snapshots/tools/low_level_tool_definition.yaml` + +to the following non-Java SDKs, using each language's native E2E test infrastructure: + +1. `dotnet` +2. `go` +3. `nodejs` +4. `python` +5. `rust` + +The new/updated tests in each language must use the **same snapshot scenario** (`tools/low_level_tool_definition`) and validate the same behavior. + +--- + +## Required test behavior to port + +From a test perspective, replicate this behavior: + +1. Define a `set_current_phase` tool that accepts a `phase` argument (string, enum: `["searching", "analyzing", "done"]`) and returns `"Phase set to {phase}"`. The tool handler must also store the phase value in test-local state. +2. Define a `search_items` tool that accepts a `keyword` argument (string) and returns `"Found: item_alpha, item_beta"`. +3. Define a `grep` override tool (using whatever "override" mechanism the language provides) that accepts a `query` argument (string) and returns `"CUSTOM_GREP: {query}"`. +4. Create a session with: + - Permission handler that auto-approves all requests. + - Available tools: all custom tools (`*`) plus built-in `web_fetch`. + - The three tool definitions registered on the session. +5. Send prompt: `"First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results."` +6. Assert: + - The assistant response is non-null/non-empty. + - The response content (case-insensitive) contains `"analyzing"`. + - The response content contains `"item_alpha"` or `"item_beta"`. + - The test-local phase state equals `"analyzing"` (verifying the tool handler was actually invoked). + +Do not weaken these assertions. + +--- + +## Critical execution constraint (must follow exactly) + +Proceed through languages **one-at-a-time** in this exact order: + +1. `dotnet` +2. `go` +3. `nodejs` +4. `python` +5. `rust` + +❌❌ **Do not continue to the next language unless and until the current language gets a clean run with the new test in isolation.** ❌❌ + +Do **not** run full cross-language or full-repo test suites. Let CI/CD handle broad runs. + +--- + +## Snapshot/name mapping requirements + +Ensure each language's test naming/harness maps to: + +- snapshot folder: `tools` +- snapshot file: `low_level_tool_definition.yaml` + +Do not create alternate snapshot names for this scenario. + +--- + +## Per-language isolated run commands + +Use these commands for isolated validation while iterating. + +### 1) dotnet + +Implement in dotnet E2E tests (preferred: new `LowLevelToolDefinitionE2ETests` class or add to existing `ToolsE2ETests` class using snapshot category `tools`, test method `Low_Level_Tool_Definition`). + +Isolated run: + +```bash +cd dotnet && dotnet test test/GitHub.Copilot.SDK.Test.csproj --filter "FullyQualifiedName~Low_Level_Tool_Definition" +``` + +### 2) go + +Implement in Go E2E tests with snapshot mapping to `tools/low_level_tool_definition` (preferred: add to existing `go/internal/e2e/tools_e2e_test.go` or create new file, subtest name exactly `low_level_tool_definition`). + +Isolated run: + +```bash +cd go && go test ./internal/e2e -run 'TestToolsE2E/low_level_tool_definition$' -count=1 +``` + +### 3) nodejs + +Implement in Node E2E Vitest (preferred: add to existing `nodejs/test/e2e/tools.e2e.test.ts` or create new file, test name mapping to `low_level_tool_definition`). + +Isolated run: + +```bash +cd nodejs && npm test -- test/e2e/tools.e2e.test.ts -t "low_level_tool_definition" +``` + +### 4) python + +Implement in Python E2E pytest (preferred: add to existing `python/e2e/test_tools_e2e.py` or create new file, test function `test_low_level_tool_definition`). + +Isolated run: + +```bash +cd python && uv run pytest e2e/test_tools_e2e.py::test_low_level_tool_definition +``` + +### 5) rust + +Implement in Rust E2E tests (preferred: add to existing `rust/tests/e2e/tools.rs`; use `with_e2e_context("tools", "low_level_tool_definition", ...)`). + +Isolated run: + +```bash +cd rust && cargo test --features test-support --test e2e tools::low_level_tool_definition -- --exact +``` + +--- + +## Implementation notes + +1. Reuse existing per-language E2E harness helpers and style conventions. +2. Keep changes scoped to test code and required wiring. +3. Do not hand-edit generated code. +4. ❌❌❌ DO NOT CHANGE ANY non-test CODE.❌❌❌ +5. ✅✅Put the test in the "right place" for each language. That means put it "near" any similar existing tests. The existing tools E2E test files are: + - `dotnet/test/E2E/ToolsE2ETests.cs` + - `go/internal/e2e/tools_e2e_test.go` + - `nodejs/test/e2e/tools.e2e.test.ts` + - `python/e2e/test_tools_e2e.py` + - `rust/tests/e2e/tools.rs` + Put the new test near those. ✅✅ +6. The snapshot `test/snapshots/tools/low_level_tool_definition.yaml` involves **two conversations**: one where tool calls are made without prior tool results, and one full round-trip (tool calls → tool results → final assistant message). Each language's replay proxy handles this; just ensure the test sends the right prompt and processes tool invocations correctly. +7. The `grep` override tool uses whatever "tool override" mechanism exists in each language (e.g., `ToolDefinition.createOverride` in Java, or the equivalent in each SDK). If a language has no override concept, define it as a regular custom tool named `grep`. + +--- + +## Deliverable + +When done, provide: + +1. files changed per language, +2. isolated command used per language, +3. pass/fail result per language (must be passing before moving to next), +4. any blockers (if any language cannot be completed). From bbb90275e362a80c23a4e2ef7de21fd7186639fb Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Wed, 17 Jun 2026 13:01:55 -0400 Subject: [PATCH 08/12] Add low-level tool-definition E2E coverage across SDKs Port the shared `tools/low_level_tool_definition` replay scenario to the remaining SDKs so every language validates the same low-level tool-definition behavior. - dotnet: add `Low_Level_Tool_Definition` E2E in `dotnet/test/E2E/ToolsE2ETests.cs` - go: add `low_level_tool_definition` subtest in `go/internal/e2e/tools_e2e_test.go` - nodejs: add `low_level_tool_definition` test in `nodejs/test/e2e/tools.e2e.test.ts` - python: add `test_low_level_tool_definition` in `python/e2e/test_tools_e2e.py` - rust: add `low_level_tool_definition` in `rust/tests/e2e/tools.rs` - snapshot: finalize `test/snapshots/tools/low_level_tool_definition.yaml` for consistent replay across languages Each test covers explicit tool registration, built-in tool override behavior, raw-argument access, available-tools filtering (`custom:*` + `builtin:web_fetch`), and mutable phase state (`analyzing`) asserted after tool execution. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- dotnet/test/E2E/ToolsE2ETests.cs | 68 ++++++++++ go/internal/e2e/tools_e2e_test.go | 83 +++++++++++++ nodejs/test/e2e/tools.e2e.test.ts | 57 ++++++++- python/e2e/test_tools_e2e.py | 55 ++++++++- rust/tests/e2e/tools.rs | 116 +++++++++++++++++- .../tools/low_level_tool_definition.yaml | 21 +--- 6 files changed, 377 insertions(+), 23 deletions(-) diff --git a/dotnet/test/E2E/ToolsE2ETests.cs b/dotnet/test/E2E/ToolsE2ETests.cs index 57ed6be2d..565d845d1 100644 --- a/dotnet/test/E2E/ToolsE2ETests.cs +++ b/dotnet/test/E2E/ToolsE2ETests.cs @@ -61,6 +61,74 @@ static string EncryptString([Description("String to encrypt")] string input) => input.ToUpperInvariant(); } + [Fact] + public async Task Low_Level_Tool_Definition() + { + string currentPhase = string.Empty; + + var session = await CreateSessionAsync(new SessionConfig + { + Tools = + [ + AIFunctionFactory.Create(SetCurrentPhase, new AIFunctionFactoryOptions + { + Name = "set_current_phase", + Description = "Sets the current phase of the agent", + }), + AIFunctionFactory.Create(SearchItems, new AIFunctionFactoryOptions + { + Name = "search_items", + Description = "Search for items by keyword", + }), + AIFunctionFactory.Create(CustomGrep, new AIFunctionFactoryOptions + { + Name = "grep", + Description = "Custom grep override", + AdditionalProperties = new Dictionary + { + ["is_override"] = true, + }, + }), + ], + AvailableTools = new ToolSet().AddCustom("*").AddBuiltIn("web_fetch"), + OnPermissionRequest = PermissionHandler.ApproveAll, + }); + + await session.SendAsync(new MessageOptions + { + Prompt = "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results." + }); + + var assistantMessage = await TestHelper.GetFinalAssistantMessageAsync(session); + + Assert.NotNull(assistantMessage); + var content = assistantMessage!.Data.Content ?? string.Empty; + Assert.NotEmpty(content); + Assert.Contains("analyzing", content, StringComparison.OrdinalIgnoreCase); + Assert.True(content.Contains("item_alpha", StringComparison.OrdinalIgnoreCase) + || content.Contains("item_beta", StringComparison.OrdinalIgnoreCase), + $"Expected content to mention item_alpha or item_beta, got: {content}"); + Assert.Equal("analyzing", currentPhase); + + Task SetCurrentPhase(string phase) + { + currentPhase = phase; + return Task.FromResult($"Phase set to {phase}"); + } + + Task SearchItems(AIFunctionArguments args) + { + _ = args["keyword"]; + return Task.FromResult("Found: item_alpha, item_beta"); + } + + Task CustomGrep(AIFunctionArguments args) + { + var query = args["query"]?.ToString() ?? string.Empty; + return Task.FromResult($"CUSTOM_GREP: {query}"); + } + } + [Fact] public async Task Handles_Tool_Calling_Errors() { diff --git a/go/internal/e2e/tools_e2e_test.go b/go/internal/e2e/tools_e2e_test.go index 621f7758d..b8fe3f75f 100644 --- a/go/internal/e2e/tools_e2e_test.go +++ b/go/internal/e2e/tools_e2e_test.go @@ -84,6 +84,89 @@ func TestToolsE2E(t *testing.T) { } }) + t.Run("low_level_tool_definition", func(t *testing.T) { + ctx.ConfigureForTest(t) + + type PhaseArgs struct { + Phase string `json:"phase" jsonschema:"Current phase,enum=searching,enum=analyzing,enum=done"` + } + type SearchArgs struct { + Keyword string `json:"keyword" jsonschema:"Search keyword"` + } + type GrepArgs struct { + Query string `json:"query" jsonschema:"Search query"` + } + + currentPhase := "" + + setCurrentPhaseTool := copilot.DefineTool("set_current_phase", "Sets the current phase of the agent", + func(params PhaseArgs, inv copilot.ToolInvocation) (string, error) { + currentPhase = params.Phase + return "Phase set to " + params.Phase, nil + }) + + searchItemsTool := copilot.DefineTool("search_items", "Search for items by keyword", + func(params SearchArgs, inv copilot.ToolInvocation) (string, error) { + _, _ = inv.Arguments.(map[string]any) + return "Found: item_alpha, item_beta", nil + }) + + grepTool := copilot.DefineTool("grep", "Custom grep override", + func(params GrepArgs, inv copilot.ToolInvocation) (string, error) { + _, _ = inv.Arguments.(map[string]any) + return "CUSTOM_GREP: " + params.Query, nil + }) + grepTool.OverridesBuiltInTool = true + + session, err := client.CreateSession(t.Context(), &copilot.SessionConfig{ + OnPermissionRequest: copilot.PermissionHandler.ApproveAll, + AvailableTools: copilot.NewToolSet().AddCustom("*").AddBuiltIn("web_fetch").ToSlice(), + Tools: []copilot.Tool{ + setCurrentPhaseTool, + searchItemsTool, + grepTool, + }, + }) + if err != nil { + t.Fatalf("Failed to create session: %v", err) + } + + _, err = session.Send(t.Context(), copilot.MessageOptions{ + Prompt: "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results.", + }) + if err != nil { + t.Fatalf("Failed to send message: %v", err) + } + + answer, err := testharness.GetFinalAssistantMessage(t.Context(), session) + if err != nil { + t.Fatalf("Failed to get assistant message: %v", err) + } + + if answer == nil { + t.Fatalf("Expected non-nil assistant message") + } + ad, ok := answer.Data.(*copilot.AssistantMessageData) + if !ok { + t.Fatalf("Expected AssistantMessageData") + } + + content := ad.Content + if content == "" { + t.Fatalf("Expected non-empty response") + } + lower := strings.ToLower(content) + if !strings.Contains(lower, "analyzing") { + t.Errorf("Expected response to contain 'analyzing', got %q", content) + } + if !strings.Contains(lower, "item_alpha") && !strings.Contains(lower, "item_beta") { + t.Errorf("Expected response to contain 'item_alpha' or 'item_beta', got %q", content) + } + if currentPhase != "analyzing" { + t.Errorf("Expected currentPhase to be 'analyzing', got %q", currentPhase) + } + }) + t.Run("handles tool calling errors", func(t *testing.T) { ctx.ConfigureForTest(t) diff --git a/nodejs/test/e2e/tools.e2e.test.ts b/nodejs/test/e2e/tools.e2e.test.ts index 29e2ccf21..49bfaeb82 100644 --- a/nodejs/test/e2e/tools.e2e.test.ts +++ b/nodejs/test/e2e/tools.e2e.test.ts @@ -6,7 +6,7 @@ import { writeFile } from "fs/promises"; import { join } from "path"; import { assert, describe, expect, it } from "vitest"; import { z } from "zod"; -import { defineTool, approveAll } from "../../src/index.js"; +import { defineTool, approveAll, ToolSet } from "../../src/index.js"; import type { PermissionRequest } from "../../src/index.js"; import { createSdkTestContext } from "./harness/sdkTestContext"; @@ -45,6 +45,61 @@ describe("Custom tools", async () => { expect(assistantMessage?.data.content).toContain("HELLO"); }); + it("low_level_tool_definition", async () => { + let currentPhase = ""; + const session = await client.createSession({ + onPermissionRequest: approveAll, + availableTools: new ToolSet().addCustom("*").addBuiltIn("web_fetch"), + tools: [ + defineTool("set_current_phase", { + description: "Sets the current phase of the agent", + parameters: z.object({ + phase: z.enum(["searching", "analyzing", "done"]), + }), + handler: ({ phase }) => { + currentPhase = phase; + return `Phase set to ${phase}`; + }, + }), + defineTool("search_items", { + description: "Search for items by keyword", + parameters: z.object({ + keyword: z.string(), + }), + handler: (_args, invocation) => { + const args = invocation.arguments as Record; + void args.keyword; + return "Found: item_alpha, item_beta"; + }, + }), + defineTool("grep", { + description: "Custom grep override", + parameters: z.object({ + query: z.string(), + }), + overridesBuiltInTool: true, + handler: (_args, invocation) => { + const args = invocation.arguments as { query?: string }; + return `CUSTOM_GREP: ${args.query ?? ""}`; + }, + }), + ], + }); + + const assistantMessage = await session.sendAndWait({ + prompt: + "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results.", + }); + + const content = assistantMessage?.data.content ?? ""; + expect(content.length).toBeGreaterThan(0); + expect(content.toLowerCase()).toContain("analyzing"); + expect( + content.toLowerCase().includes("item_alpha") || content.toLowerCase().includes("item_beta") + ).toBe(true); + expect(currentPhase).toBe("analyzing"); + }); + it("handles tool calling errors", async () => { const session = await client.createSession({ onPermissionRequest: approveAll, diff --git a/python/e2e/test_tools_e2e.py b/python/e2e/test_tools_e2e.py index 55cbc2e84..92f183c9c 100644 --- a/python/e2e/test_tools_e2e.py +++ b/python/e2e/test_tools_e2e.py @@ -5,7 +5,7 @@ import pytest from pydantic import BaseModel, Field -from copilot import define_tool +from copilot import ToolSet, define_tool from copilot.rpc import ( PermissionDecisionApproveOnce, PermissionDecisionReject, @@ -48,6 +48,59 @@ def encrypt_string(params: EncryptParams, invocation: ToolInvocation) -> str: assistant_message = await get_final_assistant_message(session) assert "HELLO" in assistant_message.data.content + async def test_low_level_tool_definition(self, ctx: E2ETestContext): + class PhaseArgs(BaseModel): + phase: str = Field( + description="Current phase", + pattern="^(searching|analyzing|done)$", + ) + + class SearchArgs(BaseModel): + keyword: str + + class GrepArgs(BaseModel): + query: str + + current_phase = "" + + @define_tool("set_current_phase", description="Sets the current phase of the agent") + def set_current_phase(params: PhaseArgs, invocation: ToolInvocation) -> str: + nonlocal current_phase + current_phase = params.phase + return f"Phase set to {params.phase}" + + @define_tool("search_items", description="Search for items by keyword") + def search_items(params: SearchArgs, invocation: ToolInvocation) -> str: + args = invocation.arguments or {} + _ = args.get("keyword") + return "Found: item_alpha, item_beta" + + @define_tool( + "grep", + description="Custom grep override", + overrides_built_in_tool=True, + ) + def grep(params: GrepArgs, invocation: ToolInvocation) -> str: + args = invocation.arguments or {} + query = str(args.get("query", "")) + return f"CUSTOM_GREP: {query}" + + session = await ctx.client.create_session( + on_permission_request=PermissionHandler.approve_all, + available_tools=ToolSet().add_custom("*").add_builtin("web_fetch"), + tools=[set_current_phase, search_items, grep], + ) + + await session.send( + "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results." + ) + assistant_message = await get_final_assistant_message(session) + content = assistant_message.data.content or "" + assert content != "" + assert "analyzing" in content.lower() + assert "item_alpha" in content.lower() or "item_beta" in content.lower() + assert current_phase == "analyzing" + async def test_handles_tool_calling_errors(self, ctx: E2ETestContext): @define_tool("get_user_location", description="Gets the user's location") def get_user_location() -> str: diff --git a/rust/tests/e2e/tools.rs b/rust/tests/e2e/tools.rs index 85d15b571..47141f238 100644 --- a/rust/tests/e2e/tools.rs +++ b/rust/tests/e2e/tools.rs @@ -4,10 +4,10 @@ use github_copilot_sdk::handler::{ApproveAllHandler, PermissionHandler, Permissi use github_copilot_sdk::tool::ToolHandler; use github_copilot_sdk::{ Error, PermissionRequestData, RequestId, SessionConfig, SessionId, Tool, ToolInvocation, - ToolResult, + ToolResult, ToolSet, }; use serde_json::json; -use tokio::sync::mpsc; +use tokio::sync::{Mutex, mpsc}; use super::support::{assistant_message_content, recv_with_timeout, with_e2e_context}; @@ -73,6 +73,56 @@ async fn invokes_custom_tool() { .await; } +#[tokio::test] +async fn low_level_tool_definition() { + with_e2e_context("tools", "low_level_tool_definition", |ctx| { + Box::pin(async move { + ctx.set_default_copilot_user(); + let client = ctx.start_client().await; + let __perm = Arc::new(ApproveAllHandler); + let current_phase = Arc::new(Mutex::new(String::new())); + let tools = vec![ + set_current_phase_tool(current_phase.clone()), + search_items_tool(), + custom_grep_tool(), + ]; + let available_tools = ToolSet::new() + .add_custom("*") + .expect("add custom wildcard") + .add_builtin("web_fetch") + .expect("add web_fetch") + .into_vec(); + let session = client + .create_session( + SessionConfig::default() + .with_github_token(super::support::DEFAULT_TEST_TOKEN) + .with_permission_handler(__perm) + .with_tools(tools) + .with_available_tools(available_tools), + ) + .await + .expect("create session"); + + let answer = session + .send_and_wait( + "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results.", + ) + .await + .expect("send") + .expect("assistant message"); + let content = assistant_message_content(&answer); + assert!(!content.is_empty()); + assert!(content.to_lowercase().contains("analyzing")); + assert!(content.contains("item_alpha") || content.contains("item_beta")); + assert_eq!(current_phase.lock().await.clone(), "analyzing"); + + session.disconnect().await.expect("disconnect session"); + client.stop().await.expect("stop client"); + }) + }) + .await; +} + #[tokio::test] async fn handles_tool_calling_errors() { with_e2e_context("tools", "handles_tool_calling_errors", |ctx| { @@ -502,6 +552,68 @@ impl ToolHandler for ErrorTool { struct CustomGrepTool; +struct SetCurrentPhaseTool { + current_phase: Arc>, +} + +fn set_current_phase_tool(current_phase: Arc>) -> Tool { + Tool::new("set_current_phase") + .with_description("Sets the current phase of the agent") + .with_parameters(json!({ + "type": "object", + "properties": { + "phase": { + "type": "string", + "description": "Current phase", + "pattern": "^(searching|analyzing|done)$" + } + }, + "required": ["phase"] + })) + .with_handler(Arc::new(SetCurrentPhaseTool { current_phase })) +} + +#[async_trait::async_trait] +impl ToolHandler for SetCurrentPhaseTool { + async fn call(&self, invocation: ToolInvocation) -> Result { + let phase = invocation + .arguments + .get("phase") + .and_then(serde_json::Value::as_str) + .unwrap_or_default() + .to_string(); + *self.current_phase.lock().await = phase.clone(); + Ok(ToolResult::Text(format!("Phase set to {phase}"))) + } +} + +struct SearchItemsTool; + +fn search_items_tool() -> Tool { + Tool::new("search_items") + .with_description("Search for items by keyword") + .with_parameters(json!({ + "type": "object", + "properties": { + "keyword": { "type": "string" } + }, + "required": ["keyword"] + })) + .with_handler(Arc::new(SearchItemsTool)) +} + +#[async_trait::async_trait] +impl ToolHandler for SearchItemsTool { + async fn call(&self, invocation: ToolInvocation) -> Result { + let _keyword = invocation + .arguments + .get("keyword") + .and_then(serde_json::Value::as_str) + .unwrap_or_default(); + Ok(ToolResult::Text("Found: item_alpha, item_beta".to_string())) + } +} + fn custom_grep_tool() -> Tool { Tool::new("grep") .with_description("A custom grep implementation that overrides the built-in") diff --git a/test/snapshots/tools/low_level_tool_definition.yaml b/test/snapshots/tools/low_level_tool_definition.yaml index d8fcb9022..03cb0748a 100644 --- a/test/snapshots/tools/low_level_tool_definition.yaml +++ b/test/snapshots/tools/low_level_tool_definition.yaml @@ -5,25 +5,8 @@ conversations: - role: system content: ${system} - role: user - content: First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results. - - role: assistant - content: I'll set the phase and run the search now. - tool_calls: - - id: toolcall_0 - type: function - function: - name: set_current_phase - arguments: '{"phase":"analyzing"}' - - id: toolcall_1 - type: function - function: - name: search_items - arguments: '{"keyword":"copilot"}' - - messages: - - role: system - content: ${system} - - role: user - content: First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results. + content: First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and + search results. - role: assistant content: I'll set the phase and run the search now. tool_calls: From 4719550abe53158e31124dbfc3adf7c5f2489a02 Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Wed, 17 Jun 2026 13:18:46 -0400 Subject: [PATCH 09/12] Fix python ruff lint in low-level tool e2e test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- python/e2e/test_tools_e2e.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/python/e2e/test_tools_e2e.py b/python/e2e/test_tools_e2e.py index 92f183c9c..e1e719b3d 100644 --- a/python/e2e/test_tools_e2e.py +++ b/python/e2e/test_tools_e2e.py @@ -91,9 +91,11 @@ def grep(params: GrepArgs, invocation: ToolInvocation) -> str: tools=[set_current_phase, search_items, grep], ) - await session.send( - "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results." + prompt = ( + "First, set the current phase to 'analyzing'. Then search for items with " + "keyword 'copilot'. Report the phase and search results." ) + await session.send(prompt) assistant_message = await get_final_assistant_message(session) content = assistant_message.data.content or "" assert content != "" From 7138fe4ba2074786944ab72f87d238017c50753f Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Wed, 17 Jun 2026 13:23:03 -0400 Subject: [PATCH 10/12] Fix Node e2e test prettier formatting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- nodejs/test/e2e/tools.e2e.test.ts | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/nodejs/test/e2e/tools.e2e.test.ts b/nodejs/test/e2e/tools.e2e.test.ts index 49bfaeb82..41335aafa 100644 --- a/nodejs/test/e2e/tools.e2e.test.ts +++ b/nodejs/test/e2e/tools.e2e.test.ts @@ -87,15 +87,15 @@ describe("Custom tools", async () => { }); const assistantMessage = await session.sendAndWait({ - prompt: - "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results.", + prompt: "First, set the current phase to 'analyzing'. Then search for items with keyword 'copilot'. Report the phase and search results.", }); const content = assistantMessage?.data.content ?? ""; expect(content.length).toBeGreaterThan(0); expect(content.toLowerCase()).toContain("analyzing"); expect( - content.toLowerCase().includes("item_alpha") || content.toLowerCase().includes("item_beta") + content.toLowerCase().includes("item_alpha") || + content.toLowerCase().includes("item_beta") ).toBe(true); expect(currentPhase).toBe("analyzing"); }); From f14bbb2b1c7f7e3bc8cf0818725b30176bef418e Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Wed, 17 Jun 2026 13:57:36 -0400 Subject: [PATCH 11/12] Fix abort E2E snapshot to handle cleared-history recovery request After an abort during active streaming the CLI sometimes resets the session history. Add a second conversation entry covering the 2-message recovery case [system, user] so the replay proxy can match regardless of which code path runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../abort/should_abort_during_active_streaming.yaml | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/test/snapshots/abort/should_abort_during_active_streaming.yaml b/test/snapshots/abort/should_abort_during_active_streaming.yaml index bd18eab2f..ea70c0d53 100644 --- a/test/snapshots/abort/should_abort_during_active_streaming.yaml +++ b/test/snapshots/abort/should_abort_during_active_streaming.yaml @@ -28,3 +28,10 @@ conversations: content: Say 'abort_recovery_ok'. - role: assistant content: abort_recovery_ok + - messages: + - role: system + content: ${system} + - role: user + content: Say 'abort_recovery_ok'. + - role: assistant + content: abort_recovery_ok From 150b8417778049ebc3bf81a79963678f3a936b71 Mon Sep 17 00:00:00 2001 From: Ed Burns Date: Wed, 17 Jun 2026 19:22:38 -0400 Subject: [PATCH 12/12] Fix Windows session lifecycle replay mismatch Use the same second prompt (Say hi) as the lifecycle snapshot to avoid replay cache misses in the list-sessions test path on Windows CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- nodejs/test/e2e/session_lifecycle.e2e.test.ts | 2 +- .../should_list_created_sessions_after_sending_a_message.yaml | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/nodejs/test/e2e/session_lifecycle.e2e.test.ts b/nodejs/test/e2e/session_lifecycle.e2e.test.ts index fae878273..0cb570ceb 100644 --- a/nodejs/test/e2e/session_lifecycle.e2e.test.ts +++ b/nodejs/test/e2e/session_lifecycle.e2e.test.ts @@ -32,7 +32,7 @@ describe("Session Lifecycle", async () => { // Sessions must have activity to be persisted to disk await session1.sendAndWait({ prompt: "Say hello" }); - await session2.sendAndWait({ prompt: "Say world" }); + await session2.sendAndWait({ prompt: "Say hi" }); // Poll until both sessions are visible on disk instead of a hard 500ms wait. await waitFor(async () => { diff --git a/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml b/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml index 274ab7d2f..779cffb03 100644 --- a/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml +++ b/test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml @@ -13,6 +13,6 @@ conversations: - role: system content: ${system} - role: user - content: Say world + content: Say hi - role: assistant - content: world + content: Hi! I'm GitHub Copilot CLI, ready to help with your software engineering tasks. What would you like to work on?