Skip to content

feat(llm): add GitHub Copilot as an LLM provider via copilot-sdk/go#12872

Open
raykao wants to merge 27 commits intodagger:mainfrom
raykao:raykao/github-copilot-llm-provider
Open

feat(llm): add GitHub Copilot as an LLM provider via copilot-sdk/go#12872
raykao wants to merge 27 commits intodagger:mainfrom
raykao:raykao/github-copilot-llm-provider

Conversation

@raykao
Copy link
Copy Markdown

@raykao raykao commented Mar 28, 2026

Summary

Adds GitHub Copilot as a first-class LLM provider for Dagger, using the official github.com/github/copilot-sdk/go SDK.

Closes discussion: #12689

Motivation

GitHub Copilot subscribers have access to powerful models (GPT-4o, Claude Sonnet, etc.) via their existing Copilot license. This adds Copilot as a zero-extra-cost LLM option for Dagger users who already have Copilot access - no separate API key required beyond a GitHub PAT.

How it works

The SDK requires the Copilot CLI binary to be running as a local server. Rather than requiring users to install it, Dagger builds a minimal on-demand sidecar service:

  1. Fetches the @github/copilot-{platform} npm tarball via dag.HTTP (content-addressed, cached)
  2. Extracts the binary into a debian:bookworm-slim container
  3. SDK connects to the sidecar via CLIUrl TCP option - no Node.js, no image to pre-publish

Configuration

export GITHUB_TOKEN=<fine-grained PAT with copilot: read permission>

dagger shell
> llm | with-prompt "hello" | last-reply

Optional env vars:

  • GITHUB_COPILOT_CLI_VERSION - pin the npm package version (default: 1.0.10)
  • GITHUB_COPILOT_CLI_URL - override tarball URL for airgapped/internal use
  • GITHUB_COPILOT_PROVIDER_URL - BYOK: route to Azure OpenAI, Anthropic, or OpenAI endpoint
  • GITHUB_COPILOT_LEGACY=true - fall back to the original node:24+npm container approach

What's included

  • core/llm_github_copilot.go - full provider implementation
  • core/llm_github_copilot.md - user-facing docs
  • core/llm.go - GitHub routing wired into LLMRouter (GITHUB_TOKEN, GITHUB_MODEL, GITHUB_COPILOT_CLI_VERSION)
  • core/llm_test.go - Copilot config vars added to TestLlmConfig

Features

  • Multi-turn session history (server-side, cursor-tracked)
  • Streaming (AssistantMessageDelta events written to telemetry span stdio)
  • Tool calling (Dagger tools registered via SDK Tool.Handler)
  • IsRetryable with HTTP status + Copilot-specific error classification
  • BYOK provider config (Azure, Anthropic, OpenAI endpoint inference from URL)
  • Sidecar crash recovery with ping + reconnect
  • Legacy fallback (GITHUB_COPILOT_LEGACY=true)

Limitations / known issues

  • Requires a fine-grained PAT (copilot: read). OAuth and stored keyring credentials are not supported - this avoids the ToS-questionable approaches discussed in the linked discussion.
  • SDK is Technical Preview (github.com/github/copilot-sdk/go v0.2.0) - API may have breaking changes. A legacy fallback is included as an escape hatch.
  • No integration golden file yet - requires a live Copilot credential to record.

Testing

Unit tests pass (TestLlmConfig now covers Copilot config vars).

Manual validation: tested end-to-end via dagger call engine-dev playground terminal with GITHUB_TOKEN set, running llm | with-prompt "hello" | last-reply in the nested dagger shell.

Checklist

  • Changie release note added (.changes/unreleased/Added-20260328-174753.yaml)
  • dagger generate - run, no generated files changed (only .go files modified)
  • DCO sign-off - all 23 commits have Signed-off-by: Ray Kao <ray@peopleandcode.com>
  • dagger checks *:lint - will run in CI; local environment has a broken Go toolchain

Relates to: #12689

raykao and others added 23 commits March 28, 2026 13:50
Signed-off-by: Ray Kao <ray@peopleandcode.com>
…age parsing

Signed-off-by: Ray Kao <ray@peopleandcode.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
…ization

Signed-off-by: Ray Kao <ray@peopleandcode.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
…ay to preserve state.

Signed-off-by: Ray Kao <ray@peopleandcode.com>
since session history is not readable by ghcp cli even with native session.jsonl file since each ghcp cli call seems to be without memory

Signed-off-by: Ray Kao <ray@peopleandcode.com>
…f session in SendQuery

Signed-off-by: Ray Kao <ray@peopleandcode.com>
Covers: usage, config, how it works (CLI-container model), known
limitations (no tools, no multi-turn, no streaming, no retry), and
the roadmap to migrate to direct API calls once available.

Key callout: today's 'GitHub Copilot SDKs' wrap the CLI, not the
API — a true direct-API Go client does not yet exist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
…ecar

Replace CLI-in-container approach with github.com/github/copilot-sdk/go SDK.

Architecture: Dagger builds a minimal sidecar service on-demand from the
Copilot CLI npm tarball (dag.HTTP → debian:bookworm-slim + binary). SDK
connects via CLIUrl TCP option — no Node.js, no pre-published image.

Phase 1 delivers:
- SDK client connected via CLIUrl to Dagger sidecar service
- SendQuery using CreateSession + SendAndWait
- Token usage from SDK event data (replaces fragile stderr regex)
- IsRetryable with transient error detection
- GITHUB_CLI_VERSION env var for version pinning (default: 1.0.10)
- GITHUB_COPILOT_CLI_URL env var for custom/internal tarball source

Phase 2 (multi-turn), Phase 3 (tool calling), Phase 4 (streaming) follow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Track sent message cursor (sentMsgCount) to send only new messages each
call. The Copilot CLI maintains conversation history server-side within
the session, so we send only the delta since the last SendQuery.

Adds session expiry detection and reconnect: if session is gone, clear
state and retry once with a fresh session.

System prompt from history is passed to SessionConfig.SystemMessage.Content
on session creation (SDK SystemMessageConfig.Content field confirmed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Wire LLMToolFunc into SDK Tool.Handler so tool calls execute via
Dagger's MCP execution path. The SDK auto-executes tools internally
during SendAndWait; tool call metadata is captured via session.On
ExternalToolRequested events for Dagger history/display.

Tool set fingerprinting (hash of sorted tool names) triggers session
recreation when tools change between SendQuery calls.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Replace SendAndWait with session.Send + session.On event loop.
AssistantMessageDelta tokens are written directly to telemetry
span stdio as they arrive, matching the streaming behaviour of
the Anthropic and OpenAI providers.

Full content is accumulated concurrently for LLMResponse.Content.
SessionIdle signals completion; SessionError and ctx.Done provide
cancellation paths. 120s timeout applied when ctx has no deadline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Replace placeholder with full error classification: HTTP status codes
(429/500/503/504), Copilot-specific messages (rate limit, overloaded,
capacity), and transient network errors. Context cancellation and
deadline exceeded are explicitly non-retryable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Wire LLMEndpoint.BaseURL and GITHUB_COPILOT_PROVIDER_URL env var into
SessionConfig.Provider. Infers provider type from URL pattern (azure,
anthropic, openai). Passes model to SessionConfig.Model when set.

Nil ProviderConfig = default GitHub Copilot backend (no change for
standard users).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Extract connect() from ensureConnected() and add reconnect() for
full teardown + re-establishment. ensureConnected now pings with a
5s timeout on each call; a ping failure triggers reconnect, clearing
session state and restarting the sidecar service + SDK client.

Works alongside the Phase 2 session-expiry retry: session gone →
new session; sidecar gone → full reconnect.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Restore original CLI-in-container (node:24 + npm) approach as
GhcpLegacyClient. Set GITHUB_COPILOT_LEGACY=true to activate it,
bypassing the SDK path entirely.

newGhcpClient() checks the flag and returns the appropriate
implementation. Restores parseCopilotTokenMetadata + parseTokenValue
for legacy stderr token parsing.

Use as an escape hatch while copilot-sdk/go is in Technical Preview.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
…x default

- env var renamed GITHUB_CLI_VERSION -> GITHUB_COPILOT_CLI_VERSION to match
  the GITHUB_COPILOT_* naming pattern and avoid confusion with the gh CLI
- remove spurious 'latest' default in LLMRouter; leave empty so
  ghcpDefaultCLIVersion in the implementation takes effect
  ('latest' is not a valid npm tarball version and would 404)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
…comment

copilotSidecar was reading GITHUB_CLI_VERSION via os.Getenv internally
as well as receiving it via parameter from LLMRouter - redundant now
that the router owns all env var resolution.

Remove the internal override; the parameter path + ghcpDefaultCLIVersion
fallback is sufficient.

Update constant comment to clarify ghcpDefaultCLIVersion is the npm
package version of @github/copilot-{platform}, not an SDK version.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Adds GITHUB_TOKEN, GITHUB_MODEL, and GITHUB_COPILOT_CLI_VERSION to the
unit config test, matching the coverage pattern for Anthropic, OpenAI,
and Gemini providers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Three bugs found in code review:

1. toolsHash not cleared on session expiry retry
   Session expiry path cleared session+sentMsgCount but not toolsHash,
   leaving stale fingerprint state. Now mirrors reconnect() which
   correctly clears all three fields.

2. Sidecar resource leak on connect() partial failure
   If Endpoint() or sdkClient.Start() failed after the sidecar was
   already started, the running service was abandoned. Now explicitly
   stopped on both error paths.

3. capturedCalls slice aliasing on retry
   capturedCalls[:0] reused the backing array across retry iterations.
   If the SDK unsubscribe is async, an appending goroutine from the
   previous attempt could corrupt the next attempt's slice. Changed to
   nil to force a fresh allocation each attempt.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
@raykao raykao force-pushed the raykao/github-copilot-llm-provider branch from 9290733 to 999a2b9 Compare March 28, 2026 17:59
raykao and others added 4 commits March 28, 2026 15:02
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Service.Stop returns (*Service, error) not just error.
Fix both error paths in connect() to handle both return values.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
Previous fix used `_, _ =` to silence the compile error but silently
discarded the Stop() error. On cleanup paths, if Stop() itself fails
that information is lost.

Use errors.Join() to surface both the original connection error and any
error from stopping the sidecar service.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
- Remove empty if block in llm.go (SA9003)
- Revert accidental telemetry.End regressions back to EndWithCause (SA1019)
- Add nolint:nilerr on tool result error paths - errors are surfaced
  through ToolResult.Error, not as Go errors (intentional pattern)
- Remove unused ctx from newGhcpLegacyClient / ghcpLegacyContainer
- Add nolint:gocyclo on SendQuery with justification - complexity is
  inherent to handling streaming, tools, retries, and legacy fallback
- Run gofmt on all modified files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ray Kao <ray@peopleandcode.com>
@raykao
Copy link
Copy Markdown
Author

raykao commented Mar 28, 2026

CI status note for reviewers: the 4 failing checks are not caused by this PR's changes.

Locally: go build ./core/... clean, golangci-lint run clean, go mod tidy produces no diff.

@raykao raykao marked this pull request as ready for review March 28, 2026 20:37
@grouville grouville self-requested a review April 3, 2026 23:29
@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant