Start iterating on #1682 On branch edburns/1682-java-tool-ergonomics#1692
Start iterating on #1682 On branch edburns/1682-java-tool-ergonomics#1692edburns wants to merge 13 commits into
Conversation
6c3fe31 to
299c3e4
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
new file: 1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md Signed-off-by: Ed Burns <edburns@microsoft.com>
Packages the knowledge of creating net-new Java E2E integration tests with handcrafted replay proxy YAML snapshots into a reusable Copilot skill. New files: - .github/skills/new-java-e2e-test-yaml-and-test/SKILL.md Main skill instructions covering: YAML snapshot format, proxy matching logic, Java IT test template, verification commands, key classes/files reference table, and common pitfalls. Includes the critical constraint that Java's CapiProxy always sets GITHUB_ACTIONS=true so snapshots must be handcrafted rather than recorded. - .github/skills/new-java-e2e-test-yaml-and-test/examples.md Two working examples from the codebase: a simple single-turn test (Botanica identity REPLACE) and a multi-turn test with tool calls (system message transform with view tool). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a new Java failsafe integration test and replay snapshot that exercise the current explicit tool-definition APIs before ergonomic annotations are added. - add `LowLevelToolDefinitionIT` covering `create`, `createOverride`, `getArgumentsAs(record)`, `getArguments()`, and `ToolSet` available tools - add `tools/low_level_tool_definition.yaml` with multi-tool call and final response replay conversations - assert handler-driven state mutation (`currentPhase`) and expected response content Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
721eeb4 to
ac67ecf
Compare
This comment has been minimized.
This comment has been minimized.
Port the shared `tools/low_level_tool_definition` replay scenario to the remaining SDKs so every language validates the same low-level tool-definition behavior. - dotnet: add `Low_Level_Tool_Definition` E2E in `dotnet/test/E2E/ToolsE2ETests.cs` - go: add `low_level_tool_definition` subtest in `go/internal/e2e/tools_e2e_test.go` - nodejs: add `low_level_tool_definition` test in `nodejs/test/e2e/tools.e2e.test.ts` - python: add `test_low_level_tool_definition` in `python/e2e/test_tools_e2e.py` - rust: add `low_level_tool_definition` in `rust/tests/e2e/tools.rs` - snapshot: finalize `test/snapshots/tools/low_level_tool_definition.yaml` for consistent replay across languages Each test covers explicit tool registration, built-in tool override behavior, raw-argument access, available-tools filtering (`custom:*` + `builtin:web_fetch`), and mutable phase state (`analyzing`) asserted after tool execution. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Generated by SDK Consistency Review Agent for issue #1692 · sonnet46 1.6M
| Description = "Custom grep override", | ||
| AdditionalProperties = new Dictionary<string, object?> | ||
| { | ||
| ["is_override"] = true, |
There was a problem hiding this comment.
Cross-SDK consistency: hardcoded internal key "is_override"
This line uses the string "is_override" directly, but that key is declared as internal const string OverridesBuiltInToolKey = "is_override" in CopilotTool.cs — it isn't part of the public API surface.
Every other SDK exposes this intent through a typed, discoverable API:
- Node.js:
overridesBuiltInTool: true(option indefineTool) - Python:
overrides_built_in_tool=True(kwarg in@define_tool) - Go:
grepTool.OverridesBuiltInTool = true(struct field) - Java:
ToolDefinition.createOverride(...)(factory method) - Rust:
.with_overrides_built_in_tool(true)(builder method)
Two options to bring .NET into alignment:
Option A — Use CopilotTool.DefineTool (the idiomatic .NET helper):
CopilotTool.DefineTool(
CustomGrep,
new CopilotToolOptions { OverridesBuiltInTool = true },
new AIFunctionFactoryOptions { Name = "grep", Description = "Custom grep override" }),This keeps the AIFunctionArguments-based parameter access (the "low-level" part of the test) while using the typed override flag.
Option B — Make OverridesBuiltInToolKey public so callers who intentionally use raw AIFunctionFactory.Create can reference a typed constant instead of a magic string:
["is_override"] = true, // ← becomes: [CopilotTool.OverridesBuiltInToolKey] = trueOption A is preferred for consistency, since it still demonstrates the low-level AIFunctionArguments pattern without leaking an internal key.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Generated by SDK Consistency Review Agent for issue #1692 · sonnet46 2.2M
| Description = "Custom grep override", | ||
| AdditionalProperties = new Dictionary<string, object?> | ||
| { | ||
| ["is_override"] = true, |
There was a problem hiding this comment.
Cross-SDK consistency note: This hardcodes the raw "is_override" string, which is the internal value of CopilotTool.OverridesBuiltInToolKey (currently internal const). In all other SDKs the low-level override path uses a named, public API element:
| SDK | Approach |
|---|---|
| Go | grepTool.OverridesBuiltInTool = true (exported struct field) |
| Java | ToolDefinition.createOverride(...) (dedicated factory) |
| Node.js | overridesBuiltInTool: true (named option) |
| Python | overrides_built_in_tool=True (named parameter) |
| Rust | .with_overrides_built_in_tool(true) (builder method) |
For .NET, callers doing the raw approach must know the magic string "is_override". Consider making CopilotTool.OverridesBuiltInToolKey public so this test (and any other low-level callers) can write:
[CopilotTool.OverridesBuiltInToolKey] = true,instead of relying on the private string value. Alternatively, the test could use CopilotTool.DefineTool() with CopilotToolOptions { OverridesBuiltInTool = true } for the override tool — even in a "low-level" test, the other two tools still demonstrate the raw AIFunctionArguments path.
| 60_000).get(90, TimeUnit.SECONDS); | ||
|
|
||
| assertNotNull(response, "Expected a response from the assistant"); | ||
| String content = response.getData().content().toLowerCase(); |
There was a problem hiding this comment.
Minor null-safety concern: content() returns String and can be null if the field is absent in the JSON response. Calling .toLowerCase() directly on it would throw a NullPointerException.
All other SDK tests guard against this:
- Node.js:
const content = assistantMessage?.data.content ?? ""; - Python:
content = assistant_message.data.content or "" - Go: explicitly checks
if content == "" { t.Fatalf(...) } - Rust: uses a helper
assistant_message_content()that handles null
Suggest:
String content = response.getData().content();
assertNotNull(content, "Expected non-null content in assistant response");
assertFalse(content.isEmpty(), "Expected non-empty content");
assertTrue(content.toLowerCase().contains("analyzing"), ...);After an abort during active streaming the CLI sometimes resets the session history. Add a second conversation entry covering the 2-message recovery case [system, user] so the replay proxy can match regardless of which code path runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Generated by SDK Consistency Review Agent for issue #1692 · sonnet46 2.6M
| Description = "Custom grep override", | ||
| AdditionalProperties = new Dictionary<string, object?> | ||
| { | ||
| ["is_override"] = true, |
There was a problem hiding this comment.
Cross-SDK consistency note — .NET "low-level" override key discoverability
The magic string "is_override" here is intentionally testing the low-level AIFunctionFactory.Create path, and that's correct. However, CopilotTool.OverridesBuiltInToolKey (which equals "is_override") is declared internal, so any .NET consumer who wants to use AIFunctionFactory.Create directly (instead of CopilotTool.DefineTool) has no discoverable public constant to reference.
The equivalent APIs in every other SDK are public and discoverable:
- Node.js:
overridesBuiltInTool: trueproperty ondefineTool - Python:
overrides_built_in_tool=Trueparameter on@define_tool - Go:
grepTool.OverridesBuiltInTool = truepublic struct field - Java:
ToolDefinition.createOverride()public factory method - Rust:
.with_overrides_built_in_tool(true)public builder method
Suggestion: consider either (a) making OverridesBuiltInToolKey a public const to allow consumers to reference it without hardcoding the string, or (b) adding a comment here pointing to CopilotTool.DefineTool with new CopilotToolOptions { OverridesBuiltInTool = true } as the idiomatic alternative when AIFunctionFactory.Create isn't required.
Use the same second prompt (Say hi) as the lifecycle snapshot to avoid replay cache misses in the list-sessions test path on Windows CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review✅ Consistent:
|
| SDK | Public override API |
|---|---|
| Java | ToolDefinition.createOverride(...) |
| Go | grepTool.OverridesBuiltInTool = true |
| Node.js | overridesBuiltInTool: true in defineTool options |
| Python | overrides_built_in_tool=True in @define_tool |
| Rust | .with_overrides_built_in_tool(true) |
| .NET | AdditionalProperties = new Dictionary<string, object?> { ["is_override"] = true } ← uses internal key string |
The "is_override" key is declared internal const string OverridesBuiltInToolKey in CopilotTool.cs. The .NET SDK provides CopilotTool.DefineTool() with CopilotToolOptions { OverridesBuiltInTool = true } as the public equivalent. Using it for the override tool in the low-level test would align the .NET SDK's documented approach with the other SDKs. (See inline comment on dotnet/test/E2E/ToolsE2ETests.cs for a suggested alternative.)
🗑️ Cleanup needed before merge
The 1682-java-tool-ergonomics-prompts-remove-before-merge/ directory (3 planning/prompt markdown files) should be removed before merging — the folder name itself indicates this. These files appear to be working notes from the development process.
Generated by SDK Consistency Review Agent for issue #1692 · sonnet46 2.2M · ◷
There was a problem hiding this comment.
Generated by SDK Consistency Review Agent for issue #1692 · sonnet46 2.2M
| Description = "Custom grep override", | ||
| AdditionalProperties = new Dictionary<string, object?> | ||
| { | ||
| ["is_override"] = true, |
There was a problem hiding this comment.
Cross-SDK consistency note: All other SDKs expose overridesBuiltInTool as a typed, public API in their low-level tool definition (e.g., Java's ToolDefinition.createOverride(), Go's grepTool.OverridesBuiltInTool = true, Node.js's overridesBuiltInTool: true, Python's overrides_built_in_tool=True, Rust's .with_overrides_built_in_tool(true)). Here the .NET test sets "is_override" via a raw AdditionalProperties dictionary key, which is an internal SDK constant (CopilotTool.OverridesBuiltInToolKey).
Using CopilotTool.DefineTool() with CopilotToolOptions { OverridesBuiltInTool = true } would be the equivalent public SDK API and would align better with how other languages expose this feature:
CopilotTool.DefineTool(CustomGrep, new CopilotToolOptions
{
OverridesBuiltInTool = true,
}, new AIFunctionFactoryOptions
{
Name = "grep",
Description = "Custom grep override",
}),This is especially relevant since "is_override" is internal const — SDK users reading this test as documentation would copy a magic string that's not part of the documented public surface.
new file: 1682-java-tool-ergonomics-prompts-remove-before-merge/20260615-prompts.md