Skip to content

Make TC proxy tests use a dynamically allocated port and don't panic on failure to bind#8676

Merged
Eijebong merged 2 commits into
taskcluster:mainfrom
Eijebong:make-tc-proxy-test-less-intermittent
May 22, 2026
Merged

Make TC proxy tests use a dynamically allocated port and don't panic on failure to bind#8676
Eijebong merged 2 commits into
taskcluster:mainfrom
Eijebong:make-tc-proxy-test-less-intermittent

Conversation

@Eijebong
Copy link
Copy Markdown
Contributor

Fixes a bug where a collision on bind would panic and segfault because of a code change that didn't adapt the test for it and greatly reduces the chance of port collision by dynamically allocating the port during the test rather than hardcoding it to :34570.

See individual commit messages for more details.

Eijebong added 2 commits May 22, 2026 11:16
`TestTcProxy` registers a `defer` to cleanup the proxy *before* checking
whether or not the proxy has started. When `New()` failed, `ll` (the
proxy handle) was nil, the defer ended up calling `ll.Terminate()` on
it, and it made the computer very sad:

```
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x88f420]

goroutine 23 [running]:
[...]
panic({0x974920?, 0xd37290?})
	/home/task_177938425593536/go1.26.3/go/src/runtime/panic.go:860 +0x13a
github.com/taskcluster/taskcluster/v100/workers/generic-worker/tcproxy.(*TaskclusterProxy).Terminate(0x0)
	/home/task_177938425593536/taskcluster/workers/generic-worker/tcproxy/tcproxy.go:98 +0x40
github.com/taskcluster/taskcluster/v100/workers/generic-worker/tcproxy.TestTcProxy.func1()
	/home/task_177938425593536/taskcluster/workers/generic-worker/tcproxy/tcproxy_test.go:35 +0x38
[...]
```

The early defer used to have a purpose in
e4563c5 when it got introduced as the
original `New` could, in fact, return a handle to a proxy that was in
a bad shape and didn't kill it. That got changed in
c1786ee where `New` now kills the
process and returns nil in that case. The test was never changed to
match that change.
The test was hardcoding port 34570 for its proxy to listen on. This test
runs on a non d2g which means it's sharing its port pool with the
original system. 34570 also sits within the range of ephemeral ports
used by linux by default (32768-60999). With that, it's not impossible
to get a collision because 34570 got assigned to something else
(loopback outbound connection, other service listening on :0).

Whenever one of those connections was transiently assigned 34570 and
still open, the proxy's attempt to bind a listener on
it failed with: `Failed to listen on 127.0.0.1:34570: listen tcp
127.0.0.1:34570: bind: address already in use`, making the test fail for
no valid reason.

There are 2 ways of fixing that, the first is to support `:0` properly as a
target port in taskcluster-proxy. That requires somewhat invasive
changes to be useful because the proxy currently doesn't return the port
it's listening on anywhere (which is arguably, a bug because --port 0
would just print "I'm listening on :0"). While that does sound nice in
theory, I have *no* idea how to fix that without requiring one to
parse the output of the command when it runs and I'd rather avoid it.

The second way is what I ended up doing, we ask the OS for a port by
binding a socket to :0, we grab the port, we immediately unbind and pass the
port to the proxy, hoping that nothing uses it in the meantime. It's not
perfect but it reduces the race window from the entirety of the test to
a few ms between the moment we unbind and the moment the proxy binds again.
Copy link
Copy Markdown
Contributor

@lotas lotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@Eijebong Eijebong merged commit bf2197b into taskcluster:main May 22, 2026
76 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog / Inbox to Done in TC intake board May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants