Skip to content

Propagate Copilot BYOK provider hosts into threat-detection allowlists#39653

Merged
pelikhan merged 4 commits into
mainfrom
copilot/fix-threat-detection-job-allowlist
Jun 16, 2026
Merged

Propagate Copilot BYOK provider hosts into threat-detection allowlists#39653
pelikhan merged 4 commits into
mainfrom
copilot/fix-threat-detection-job-allowlist

Conversation

Copilot AI commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Threat-detection runs inherited Copilot BYOK env, but their AWF allowlist did not include the BYOK provider host. As a result, workflows that worked in the main agent job failed in the detection job when COPILOT_PROVIDER_BASE_URL was used.

  • Allowlist propagation

    • Add a shared Copilot allowlist target resolver.
    • Keep existing Copilot API target behavior (engine.api-target, GITHUB_COPILOT_BASE_URL).
    • Add BYOK provider host support when COPILOT_PROVIDER_BASE_URL is a literal URL.
  • Secret-backed BYOK safety

    • Do not derive hosts from ${{ ... }} expressions at compile time.
    • For threat-detection BYOK runs, reuse explicit literal hosts from network.allowed so secret-backed provider URLs can still work without emitting expressions into AWF config.
  • Threat-detection behavior

    • Preserve the minimal detection network model.
    • Only carry forward concrete user-specified host entries needed for BYOK, not ecosystem expansions or the broader agent allowlist.
  • Allowlist consistency

    • Keep GH_AW_ALLOWED_DOMAINS aligned with the runtime firewall/domain computation.
  • Regression coverage

    • Add focused tests for:
      • Copilot allowlist target resolution
      • detection-step BYOK host propagation
      • compiled workflow output for secret-backed BYOK provider URLs
      • unchanged minimal detection behavior when BYOK is not configured
  • Docs

    • Clarify how Copilot BYOK provider hosts are allowed for agent and threat-detection runs, including the secret-backed network.allowed case.
engine:
  id: copilot
  env:
    COPILOT_PROVIDER_BASE_URL: ${{ secrets.PROVIDER_BASE_URL }}

network:
  allowed:
    - defaults
    - llm.corp.example.com

With this change, the detection job can reuse llm.corp.example.com in its AWF allowlist without trying to compile ${{ secrets.PROVIDER_BASE_URL }} into the generated config.

Copilot AI and others added 2 commits June 16, 2026 19:08
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix network.allowed domains propagation to threat-detection job Propagate Copilot BYOK provider hosts into threat-detection allowlists Jun 16, 2026
Copilot AI requested a review from pelikhan June 16, 2026 19:40
@pelikhan pelikhan marked this pull request as ready for review June 16, 2026 19:56
Copilot AI review requested due to automatic review settings June 16, 2026 19:56
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Copilot BYOK workflows failing in the threat-detection job by ensuring the BYOK provider host (when determinable or explicitly provided) is present in the AWF firewall allowlist for both the main agent run and the detection run.

Changes:

  • Introduces a shared Copilot allowlist target resolver (GetCopilotAllowlistTargets) that includes both BYOK provider host (literal only) and the Copilot API target.
  • Propagates additional literal, non-ecosystem network.allowed entries into the threat-detection run when BYOK is enabled, and keeps GH_AW_ALLOWED_DOMAINS aligned with runtime allowlist computation.
  • Adds targeted regression tests and updates docs to explain literal vs secret-backed BYOK provider URL behavior; regenerates affected .lock.yml workflows.
Show a summary per file
File Description
pkg/workflow/threat_detection.go Changes detection-run WorkflowData networking to optionally propagate additional allowed domains for BYOK scenarios.
pkg/workflow/threat_detection_test.go Adds a unit test asserting detection steps include/exclude BYOK provider host as expected.
pkg/workflow/engine_api_targets.go Adds GetCopilotAllowlistTargets and literal-only engine.env host extraction helper.
pkg/workflow/domains.go Updates GH_AW_ALLOWED_DOMAINS computation to include Copilot BYOK/API targets consistently.
pkg/workflow/copilot_engine_execution.go Ensures Copilot execution allowlist includes the unified Copilot allowlist targets.
pkg/workflow/awf_helpers_test.go Adds tests for Copilot allowlist target resolution and deduplication behavior.
pkg/workflow/allowed_domains_sanitization_test.go Adds a compilation/regression test for secret-backed BYOK + explicit provider hostname propagation into detection output.
docs/src/content/docs/reference/glossary.md Documents BYOK allowlist behavior for literal and secret-backed provider URLs.
docs/src/content/docs/reference/engines.md Clarifies BYOK mode and required network.allowed setup for secret/variable expressions.
.github/workflows/smoke-copilot-aoai-entra.lock.yml Regenerated lockfile reflecting updated detection/network allowlist behavior.
.github/workflows/smoke-copilot-aoai-apikey.lock.yml Regenerated lockfile reflecting updated detection/network allowlist behavior.
.github/workflows/pr-code-quality-reviewer.lock.yml Regenerated lockfile metadata hash update.
.github/workflows/daily-byok-ollama-test.lock.yml Regenerated lockfile reflecting updated Copilot allowlist targets (e.g., host:port propagation).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 13/13 changed files
  • Comments generated: 3

Comment on lines 699 to 702
IsDetectionRun: true, // Mark as detection run for phase tagging
NetworkPermissions: &NetworkPermissions{
Allowed: []string{}, // no user-specified additional domains; engine provides its own minimal set
Allowed: getThreatDetectionAdditionalAllowedDomains(data),
},
Comment on lines +772 to +776
func getThreatDetectionAdditionalAllowedDomains(data *WorkflowData) []string {
if !engineEnvHasKey(data, constants.CopilotProviderBaseURL) || data == nil || data.NetworkPermissions == nil {
return []string{}
}

Comment on lines +779 to +783
for _, entry := range data.NetworkPermissions.Allowed {
if entry == "" || strings.Contains(entry, "${{") {
continue
}
if len(getEcosystemDomains(entry)) > 0 {
@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (303 new lines across pkg/workflow/) but does not have a linked Architecture Decision Record (ADR).

⚠️ The draft ADR could not be auto-committed to this branch — the push step reported a detached-HEAD state in its checkout (an environment condition outside this gate's control). The full draft is embedded below; please add it to your branch manually at docs/adr/39653-resolve-copilot-byok-provider-hosts-for-allowlists.md.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📄 Draft ADR (copy to `docs/adr/39653-resolve-copilot-byok-provider-hosts-for-allowlists.md`)
# ADR-39653: Resolve Copilot BYOK Provider Hosts for Firewall Allowlists

**Date**: 2026-06-16
**Status**: Draft

## Context

The Copilot engine supports BYOK ("Bring Your Own Key") mode, where requests are routed to an external LLM provider via `COPILOT_PROVIDER_BASE_URL` in `engine.env`. The AWF (Agentic Workflow Firewall) restricts outbound network access to an explicit host allow-list. Threat-detection runs inherit the main agent's Copilot BYOK environment but previously built their allow-list from a minimal, fixed domain set that did not include the BYOK provider host. As a result, a workflow that succeeded in the main agent job failed in the detection job whenever `COPILOT_PROVIDER_BASE_URL` was used. Compounding this, the three BYOK credential variables are allowed to carry `${{ secrets.* }}` expressions under strict mode, so the provider host is not always a literal value resolvable at compile time.

## Decision

We will introduce a single shared resolver, `GetCopilotAllowlistTargets`, that derives the Copilot hosts which must appear in the firewall allow-list — the BYOK provider host from `COPILOT_PROVIDER_BASE_URL` followed by the Copilot API target — and use it from both the main agent execution step and the `GH_AW_ALLOWED_DOMAINS` computation, so all paths stay in sync. Host extraction is performed only from literal URLs: any value containing a `${{ ... }}` expression is skipped so secret-backed provider URLs are never compiled into the generated AWF config. For threat-detection BYOK runs, we will reuse only the concrete, literal, non-ecosystem host entries already present in `network.allowed`, allowing a secret-backed provider URL to be paired with an explicitly listed provider hostname without re-opening the broader agent allow-list.

## Alternatives Considered

### Alternative 1: Compile the `${{ secrets.* }}` expression into the AWF config
Emit the raw expression (or its host) into the generated firewall config and let GitHub Actions interpolate it at runtime. Rejected because it leaks secret-backed expressions into compiled `.lock.yml` artifacts and the AWF allow-list, and the firewall config is consumed where expression interpolation is not guaranteed.

### Alternative 2: Propagate the full agent allow-list into detection
Hand the threat-detection job the same `network.allowed` set (including `defaults` and ecosystem expansions) the main agent uses. Rejected because it abandons the deliberately minimal detection network model; the detection job is a security control and should reach the smallest possible host set.

## Consequences

### Positive
- Threat-detection runs now succeed under Copilot BYOK because the provider host is present in their allow-list.
- Secret-backed BYOK provider URLs are never written into compiled workflow config.
- A single resolver keeps the main agent allow-list, `GH_AW_ALLOWED_DOMAINS`, and the detection step consistent.

### Negative
- Secret-backed BYOK requires the user to manually add the provider hostname to `network.allowed`.
- The detection path adds a second, narrower allow-list code path with its own filtering rules to keep in sync.

### Neutral
- Non-BYOK Copilot workflows are unchanged: detection still receives an empty additional allow-list.
- The resolver orders the BYOK provider host before the Copilot API target, reflecting the actual outbound destination in BYOK mode.
📋 What to do next
  1. Add the draft ADR above to docs/adr/39653-resolve-copilot-byok-provider-hosts-for-allowlists.md on your branch.
  2. Complete / refine the sections — confirm the decision rationale and alternatives match your intent.
  3. Reference the ADR in this PR body, e.g.:

    ADR: ADR-39653: Resolve Copilot BYOK Provider Hosts for Firewall Allowlists

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does. This change makes a non-obvious security trade-off (never compiling secret expressions into firewall config; keeping detection minimal) that future contributors will need the rationale for.

🔒 Blocking: link an ADR in the PR body before merge.

🏗️ ADR gate enforced by Design Decision Gate 🏗️ ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose, /tdd, and /zoom-out — requesting changes on a correctness issue, two test coverage gaps, and a fragile test.

📋 Key Themes & Highlights

Blocking Issues

  • Nil-check ordering (threat_detection.go:773): data == nil guard appears after the call that uses data. Safe today because engineEnvHasKey defends internally, but non-idiomatic and a latent risk.
  • Missing test for literal BYOK URL path through buildDetectionEngineExecutionStep: only the secret-backed path (${{ secrets.PROVIDER_BASE_URL }}) is exercised. The extractLiteralEngineEnvHost branch has no end-to-end detection-step test.
  • Fragile lock-file parsing in TestCopilotProviderBaseURLInThreatDetectionStep: hand-rolled JSON scan with dual escaped/unescaped variant handling will silently degrade if the lock format changes.

Minor / Non-Blocking

  • Ecosystem isolation ("defaults" exclusion) is not explicitly asserted in the detection-step test — one unwantedHost field would pin this safety guarantee.
  • getThreatDetectionAdditionalAllowedDomains needs a doc comment explaining its relationship to GetCopilotAllowlistTargets and the two-path design.
  • nil vs []string{} return inconsistency between GetCopilotAllowlistTargets and getThreatDetectionAdditionalAllowedDomains.

Positive Highlights

  • ✅ Clean abstraction in GetCopilotAllowlistTargets with dedup via seen map
  • ✅ Good ${{ expression guard to prevent compile-time host extraction from secret refs
  • ✅ Solid unit tests for GetCopilotAllowlistTargets covering all four logical branches
  • ✅ Well-structured PR description with clear rationale for each decision

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer

}

func getThreatDetectionAdditionalAllowedDomains(data *WorkflowData) []string {
if !engineEnvHasKey(data, constants.CopilotProviderBaseURL) || data == nil || data.NetworkPermissions == nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Nil-check ordering: data == nil is evaluated after calling engineEnvHasKey(data, ...). engineEnvHasKey defends against nil internally today, but placing the nil guard at the end violates conventional Go defensive-coding style and is a latent risk if the helper is ever changed.

💡 Suggested reorder
if data == nil || data.NetworkPermissions == nil || !engineEnvHasKey(data, constants.CopilotProviderBaseURL) {
    return []string{}
}

This also matches the ordering used by engineEnvHasKey itself (see engine_helpers.go:94).

Allowed: []string{"defaults", "llm.corp.example.com"},
},
},
wantHost: "llm.corp.example.com",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] This test case only exercises the secret-backed BYOK path (${{ secrets.PROVIDER_BASE_URL }}). There is no test case where COPILOT_PROVIDER_BASE_URL is a literal URL (e.g., (llm.corp.example.com/redacted), which is handled by a separate code path via GetCopilotAllowlistTargetsextractLiteralEngineEnvHost`. Both paths should be covered here.

💡 Suggested additional test case
{
    name: "detection allow-domains includes BYOK host from literal provider URL",
    data: &WorkflowData{
        AI: "copilot",
        EngineConfig: &EngineConfig{
            ID: "copilot",
            Env: map[string]string{
                constants.CopilotProviderBaseURL: "(llm.corp.example.com/redacted),
            },
        },
        SafeOutputs: &SafeOutputsConfig{
            ThreatDetection: &ThreatDetectionConfig{},
        },
        NetworkPermissions: &NetworkPermissions{
            Firewall: &FirewallConfig{Enabled: true},
        },
    },
    wantHost: "llm.corp.example.com",
},

This path differs from the existing case: the host comes from parsing the literal URL (via extractLiteralEngineEnvHost), not from network.allowed.

},
NetworkPermissions: &NetworkPermissions{
Firewall: &FirewallConfig{Enabled: true},
Allowed: []string{"defaults", "llm.corp.example.com"},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The test verifies llm.corp.example.com is present in the detection output, but does not assert that the ecosystem entry "defaults" is excluded. The isolation guarantee — that ecosystem expansions do not leak into the detection job — is one of the stated safety properties of this change and should be pinned with an unwantedHost assertion.

💡 Suggested addition
unwantedHost: "defaults",

Or, if "defaults" itself is not emitted literally, assert that no domain expanded from "defaults" (e.g., "github.com") appears in the detection allowDomains payload beyond what the minimal engine set already provides.

lockStr := string(lockContent)

requiredDomain := "llm.corp.example.com"
allowDomainsPrefix := `"allowDomains":[`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The hand-rolled allowDomains scan — handling both unescaped and JSON-escaped variants, then manually slicing to find array boundaries — is fragile. If the lock file format changes (e.g., pretty-printed JSON, double-escaped), this silently passes without checking anything meaningful.

💡 Alternative approach

Parse the AWF JSON blob from the lock file with encoding/json or use a helper already in the test suite to extract the firewall config, then assert on the parsed allowDomains slice directly. If structured parsing is overkill here, at minimum use assert.Contains(lockStr, "allowDomains":[+ ... +"llm.corp.example.com") on a known canonical form rather than the dual-format branching.

The occurrenceIdx >= 2 bound check would also benefit from assert.Equal(t, 2, occurrenceIdx) to catch regressions where the lock file grows unexpected extra allowDomains blobs.

return steps
}

func getThreatDetectionAdditionalAllowedDomains(data *WorkflowData) []string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The two-path design here is subtle: literal BYOK URLs are propagated by GetCopilotAllowlistTargets (via extractLiteralEngineEnvHost), while secret-backed URLs rely on this function to carry explicit hostnames from network.allowed. A comment at the top of the function body explaining why both paths coexist — and what each one is responsible for — would help future maintainers avoid accidental overlap or omission.

💡 Suggested doc comment
// getThreatDetectionAdditionalAllowedDomains returns the extra literal hostnames that the
// detection step needs beyond its minimal engine-provided set, gated on Copilot BYOK being
// configured. It handles the secret-backed case: when COPILOT_PROVIDER_BASE_URL is a GitHub
// expression (e.g. ${{ secrets.PROVIDER_BASE_URL }}), the host cannot be derived at compile
// time, so users are expected to list it explicitly in network.allowed. Literal BYOK URLs
// are handled separately via GetCopilotAllowlistTargets → extractLiteralEngineEnvHost.
// Ecosystem entries ("defaults", "npm", etc.) are intentionally filtered out to keep the
// detection job network footprint minimal.

//
// The BYOK provider host is added first because it is the actual outbound destination for
// Copilot CLI requests in BYOK mode. Duplicate hosts are removed.
func GetCopilotAllowlistTargets(workflowData *WorkflowData) []string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] GetCopilotAllowlistTargets returns nil when no targets are found, while getThreatDetectionAdditionalAllowedDomains returns []string{}. Both are range-safe in Go, but the inconsistency across the call boundary can surprise callers who do len(targets) == 0 checks vs targets == nil comparisons. Consider returning nil consistently (document the contract) or initialising to []string{} to be explicit.

Minor nit — no blocking concern here, just worth aligning.

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100 — Excellent

Analyzed 3 test function(s): 3 design, 0 implementation, 0 guideline violation(s).

📊 Metrics & Test Classification (3 tests analyzed)
Metric Value
New/modified tests analyzed 3
✅ Design tests (behavioral contracts) 3 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 3 (100%)
Duplicate test clusters 0
Test inflation detected YES — threat_detection_test.go +60 lines vs threat_detection.go +28 lines (ratio 2.14)
🚨 Coding-guideline violations 0
Test File Classification Issues Detected
TestCopilotProviderBaseURLInThreatDetectionStep pkg/workflow/allowed_domains_sanitization_test.go ✅ Design (integration) Minor: bare t.Fatal(err) without context message (style only)
TestGetCopilotAllowlistTargets pkg/workflow/awf_helpers_test.go ✅ Design
TestBuildDetectionEngineExecutionStepPropagatesBYOKProviderHost pkg/workflow/threat_detection_test.go ✅ Design

Go: 3 (*_test.go); JavaScript: 0. No other languages detected.

📝 Notes — Minor Style Observations (non-blocking)

TestCopilotProviderBaseURLInThreatDetectionStep (pkg/workflow/allowed_domains_sanitization_test.go) — one bare t.Fatal(err) for the os.WriteFile setup call lacks a descriptive context string. All other 6 assertions in the same function have descriptive messages. Not a violation but consider t.Fatalf("write test workflow file: %v", err) for consistency.

TestBuildDetectionEngineExecutionStepPropagatesBYOKProviderHost (pkg/workflow/threat_detection_test.go) — uses if cond { t.Errorf(...) } guards instead of assert.Contains/assert.NotContains. Functionally equivalent; the rest of the file uses testify. Alignment would improve readability.

Test inflation (threat_detection_test.go): 60 test lines added vs. 28 production lines in threat_detection.go (ratio 2.14). This is the source of the 10-point inflation deduction but reflects appropriate coverage depth — both a positive and a negative scenario are covered with table-driven rows.

Verdict

Check passed. 0% implementation tests (threshold: 30%). All 3 new tests verify observable behavioral contracts: TestGetCopilotAllowlistTargets covers 4 input scenarios for the BYOK allowlist helper (including deduplication and GitHub-expression passthrough); TestBuildDetectionEngineExecutionStepPropagatesBYOKProviderHost verifies the detection step includes/excludes the BYOK provider host correctly; and TestCopilotProviderBaseURLInThreatDetectionStep validates the full compilation pipeline end-to-end with a real lock-file assertion.

🧪 Test quality analysis by Test Quality Sentinel ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 tests verify observable behavioral contracts with no coding-guideline violations.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — three actionable issues found: one high (dead nil guard) and two medium (discarded validation, fragile test assertion).

🔍 Findings summary

Dead nil guard in getThreatDetectionAdditionalAllowedDomains

|| data == nil on line 773 of threat_detection.go is unreachable: because engineEnvHasKey is nil-safe, when data is nil the first operand of || is already true and evaluation never reaches data == nil. The check gives a false sense of nil safety that will mislead the next person who refactors engineEnvHasKey. Nil guards must come first.

Pre-validated value discarded in extractLiteralEngineEnvHost

The ${{ guard is applied to rawValue, but extractAPITargetHost is then called with the key name and re-reads the env var independently. The validated value is never forwarded. Any layer inserted between the two reads could reintroduce a skipped expression into the extracted host.

Over-broad test assertion in TestCopilotProviderBaseURLInThreatDetectionStep

Every allowDomains occurrence in the lock file is asserted to contain the BYOK domain. Non-agent steps (e.g., safe-outputs create-issue step) likely produce their own allowDomains arrays that do not include the provider host, making this test a future source of spurious failures.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

🔎 Code quality review by PR Code Quality Reviewer

}

func getThreatDetectionAdditionalAllowedDomains(data *WorkflowData) []string {
if !engineEnvHasKey(data, constants.CopilotProviderBaseURL) || data == nil || data.NetworkPermissions == nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead nil guard: data == nil is unreachable code and provides no actual protection.

💡 Detail and fix

engineEnvHasKey(data, ...) is nil-safe — when data is nil it returns false, so !engineEnvHasKey(...) evaluates to true and the || short-circuits before data == nil is ever tested. The data == nil check is therefore dead code: it can never be true by the time evaluation reaches it.

This creates a dangerous illusion of safety: a future developer who reads || data == nil may believe nil is explicitly guarded before the engineEnvHasKey call, and confidently refactor engineEnvHasKey to drop its own nil guard — at which point calling with nil panics.

Fix: put nil guards first, then check the BYOK key:

func getThreatDetectionAdditionalAllowedDomains(data *WorkflowData) []string {
    if data == nil || data.NetworkPermissions == nil {
        return []string{}
    }
    if !engineEnvHasKey(data, constants.CopilotProviderBaseURL) {
        return []string{}
    }
    // ...
}

awfHelpersLog.Printf("Skipping %s host extraction from GitHub expression value", envVar)
return ""
}
return extractAPITargetHost(workflowData, envVar)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-validated rawValue is silently discarded; extractAPITargetHost re-reads the env var from scratch.

💡 Detail and fix

extractLiteralEngineEnvHost reads rawValue from the env map and validates it (non-empty, no ${{ expression). It then calls extractAPITargetHost(workflowData, envVar), which independently re-fetches workflowData.EngineConfig.Env[envVar] — bypassing the validation entirely.

The consequence: the ${{ guard in this function provides no semantic guarantee, because the value used for host extraction is the one re-read by extractAPITargetHost, not the validated one. If someone adds an additional env-var mutation layer between extractLiteralEngineEnvHost and extractAPITargetHost, a secret expression could slip through.

Additionally, the two lookups diverge if the env-map access pattern ever changes (e.g., a lazy resolver). The validated value should be passed directly:

func extractLiteralEngineEnvHost(workflowData *WorkflowData, envVar string) string {
    env := getEngineEnvOverrides(workflowData)
    if env == nil {
        return ""
    }
    rawValue, ok := env[envVar]
    if !ok || rawValue == "" {
        return ""
    }
    if strings.Contains(rawValue, "${{") {
        awfHelpersLog.Printf("Skipping %s host extraction: GitHub expression value", envVar)
        return ""
    }
    // Pass the already-validated URL directly instead of re-reading from env.
    return extractHostFromURL(rawValue) // or inline the stripping logic here
}

arrayEnd = len(remaining) - arrayStart
}
section := remaining[arrayStart : arrayStart+arrayEnd]
if !strings.Contains(section, `"`+requiredDomain+`"`) && !strings.Contains(section, `\"`+requiredDomain+`\"`) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test asserts BYOK domain is present in ALL allowDomains occurrences — will spuriously fail when non-agent steps have their own arrays.

💡 Detail and fix

The loop fails the test if ANY allowDomains occurrence in the compiled lock file is missing llm.corp.example.com. But the workflow includes safe-outputs: create-issue:, which generates GitHub-write steps that likely contain their own allowDomains arrays (GitHub API hosts only). Those arrays will not contain the BYOK provider domain, so the test will spuriously fail whenever the safe-outputs step generation adds an allowDomains entry.

The test should verify that the agent job and detection job include the domain, not that every single allowDomains array in the lock file does. A more robust approach:

// Look for allowDomains only inside the agent and detection job blocks.
// Or: check that at least 2 specific named occurrences contain the domain,
// rather than asserting it for every occurrence indiscriminately.
if !strings.Contains(lockStr, `"allowDomains":["llm.corp.example.com`) &&
    !strings.Contains(lockStr, `"llm.corp.example.com"`) {
    t.Error("BYOK provider domain not found in any allowDomains array")
}

Alternatively, parse the lock file as YAML, find the agent and detection steps by step ID, and check those allowDomains arrays specifically.

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

@pelikhan pelikhan merged commit ad606e5 into main Jun 16, 2026
71 of 83 checks passed
@pelikhan pelikhan deleted the copilot/fix-threat-detection-job-allowlist branch June 16, 2026 20:53
Copilot stopped work on behalf of pelikhan due to an error June 16, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants