Skip to content

fix(proxy): restore happy-eyeballs for TLS upstream dial#4236

Open
Gauravsingh096 wants to merge 1 commit into
feat/aerospike-parserfrom
fix/proxy-tls-ipv6-dial
Open

fix(proxy): restore happy-eyeballs for TLS upstream dial#4236
Gauravsingh096 wants to merge 1 commit into
feat/aerospike-parserfrom
fix/proxy-tls-ipv6-dial

Conversation

@Gauravsingh096
Copy link
Copy Markdown
Contributor

Describe the changes that are made

Fixed a regression in the TLS upstream dial-target selection in pkg/agent/proxy/proxy.go that caused express-mongoose record_build CI jobs to fail 5/5.

Root cause: commit 2916804 changed the guard from if dstURL != "" to if dstAddr == "" && dstURL != "" to fix Aerospike-over-stunnel (SNI aerospike.local is not DNS-resolvable, so dialing by hostname failed with "lookup ... server misbehaving"). Side effect: for plain HTTPS connections (e.g. express-mongoose → reqres.in), eBPF captures the Cloudflare IPv6 literal [2606:4700:20::681a:ad5]:443 as the original destination. With the new guard, dstAddr is non-empty so the hostname branch is skipped — the proxy dials the IPv6 literal directly. GitHub Actions runners have no IPv6 route → dial tcp [...]:443: connect: network is unreachable → logged at ERROR → node test script's grep -q "ERROR" gate fails the build.

Fix: use net.LookupHost(dstURL) to decide which target to dial:

  • Real hostnames (reqres.in) resolve → dial by hostname → Go's dual-stack happy-eyeballs → IPv4 fallback on IPv6-less runners ✅
  • Logical cert names (aerospike.local) don't resolve → LookupHost errors → keep eBPF-captured dstAddr → Aerospike-stunnel unaffected ✅

Links & References

Closes: NA

🔗 Related PRs

🐞 Related Issues

  • NA

📄 Related Documents

  • NA

What type of PR is this? (check all applicable)

  • 🐞 Bug Fix

Added e2e test pipeline?

  • 🙅 no, because they aren't needed

Added comments for hard-to-understand areas?

  • 👍 yes

Added to documentation?

  • 🙅 no documentation needed

Are there any sample code or steps to test the changes?

  • 👍 yes, mentioned below
# express-mongoose record_build was failing 5/5 before this fix.
# After this fix all three matrix variants should pass:
#   record_build_replay_build   → was FAIL, expect PASS
#   record_build_replay_latest  → was FAIL, expect PASS
#   record_latest_replay_build  → was PASS, expect PASS

Commit 2916804 changed the dial-target guard from
  if dstURL != ""
to
  if dstAddr == "" && dstURL != ""
to fix Aerospike-over-stunnel (SNI "aerospike.local" is unresolvable,
so dialing by hostname failed with "lookup ... server misbehaving").

Side-effect: for plain HTTPS connections (e.g. express-mongoose →
reqres.in) the eBPF-captured original destination is the Cloudflare
IPv6 literal the app connected to. Without the SNI-hostname fallback
the proxy dials that IPv6 literal directly; on CI runners with no IPv6
route this produces "dial tcp [...]:443: connect: network is
unreachable", which is logged at ERROR and trips the test script's
  grep -q "ERROR" record_log && exit 1
gate — failing the express-mongoose record_build jobs 5/5.

Fix: prefer the SNI hostname when net.LookupHost resolves it (real
hostnames like "reqres.in" resolve → happy-eyeballs → IPv4 fallback).
For unresolvable cert names ("aerospike.local"), LookupHost returns an
error and we keep the authoritative eBPF-captured dstAddr unchanged —
Aerospike-over-stunnel is unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant