Make TC proxy tests use a dynamically allocated port and don't panic on failure to bind#8676
Merged
Eijebong merged 2 commits intoMay 22, 2026
Conversation
`TestTcProxy` registers a `defer` to cleanup the proxy *before* checking
whether or not the proxy has started. When `New()` failed, `ll` (the
proxy handle) was nil, the defer ended up calling `ll.Terminate()` on
it, and it made the computer very sad:
```
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x88f420]
goroutine 23 [running]:
[...]
panic({0x974920?, 0xd37290?})
/home/task_177938425593536/go1.26.3/go/src/runtime/panic.go:860 +0x13a
github.com/taskcluster/taskcluster/v100/workers/generic-worker/tcproxy.(*TaskclusterProxy).Terminate(0x0)
/home/task_177938425593536/taskcluster/workers/generic-worker/tcproxy/tcproxy.go:98 +0x40
github.com/taskcluster/taskcluster/v100/workers/generic-worker/tcproxy.TestTcProxy.func1()
/home/task_177938425593536/taskcluster/workers/generic-worker/tcproxy/tcproxy_test.go:35 +0x38
[...]
```
The early defer used to have a purpose in
e4563c5 when it got introduced as the
original `New` could, in fact, return a handle to a proxy that was in
a bad shape and didn't kill it. That got changed in
c1786ee where `New` now kills the
process and returns nil in that case. The test was never changed to
match that change.
The test was hardcoding port 34570 for its proxy to listen on. This test runs on a non d2g which means it's sharing its port pool with the original system. 34570 also sits within the range of ephemeral ports used by linux by default (32768-60999). With that, it's not impossible to get a collision because 34570 got assigned to something else (loopback outbound connection, other service listening on :0). Whenever one of those connections was transiently assigned 34570 and still open, the proxy's attempt to bind a listener on it failed with: `Failed to listen on 127.0.0.1:34570: listen tcp 127.0.0.1:34570: bind: address already in use`, making the test fail for no valid reason. There are 2 ways of fixing that, the first is to support `:0` properly as a target port in taskcluster-proxy. That requires somewhat invasive changes to be useful because the proxy currently doesn't return the port it's listening on anywhere (which is arguably, a bug because --port 0 would just print "I'm listening on :0"). While that does sound nice in theory, I have *no* idea how to fix that without requiring one to parse the output of the command when it runs and I'd rather avoid it. The second way is what I ended up doing, we ask the OS for a port by binding a socket to :0, we grab the port, we immediately unbind and pass the port to the proxy, hoping that nothing uses it in the meantime. It's not perfect but it reduces the race window from the entirety of the test to a few ms between the moment we unbind and the moment the proxy binds again.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes a bug where a collision on bind would panic and segfault because of a code change that didn't adapt the test for it and greatly reduces the chance of port collision by dynamically allocating the port during the test rather than hardcoding it to :34570.
See individual commit messages for more details.