fix: use ephemeral port for postgres, except for TestPubsub_Disconnect#7798
Conversation
Signed-off-by: Spike Curtis <spike@coder.com>
| // and restart it on the same port. If we use an ephemeral port, there is a chance the OS will reallocate before we | ||
| // start back up. The downside is that if the test crashes and leaves the container up, subsequent test runs will fail | ||
| // until we manually kill the container. | ||
| const disconnectTestPort = 26892 |
There was a problem hiding this comment.
I hope this, by default, usually falls outside the dynamic port range of :0, so we won't collide with other tests/test packages. Are the rules same on Windows?
At least on my NAS, it seems OK:
❯ cat /proc/sys/net/ipv4/ip_local_port_range
32768 60999
Assuming these values are set differently, there could still be port collisions.
Something like port := testutil.Port(26892), where that function takes the input as preferred, but falls back to seeking the closest free port would help in the case mentioned in the comment. The package can store "used ports", unfortunately I don't think that state will be shared between test packages (could be wrong though).
There was a problem hiding this comment.
Regarding Windows, we don't run these tests there. Presumably because Docker on Windows is more effort/overhead than it's worth.
There was a problem hiding this comment.
Using testutil.Port() is clever, but thinking about it a bit, I don't think we should be that clever in tests. If the test leaks the container, I'd rather us know about it by failing tests, versus doing something different and rare.
Fixes #7752
Our implementation of starting postgres on ephemeral ports is racy: it
However, between 2 and 3 another process can come in and snatch the port.
Docker itself accepts 0 as a port mapping, and it asks the OS to allocate a free port. This closes the race.
A second issue is that we were using an ephemeral port for TestPubsub_Disconnect where we kill the postgres container and then restart it on the same port. It suffers from a similar race, where another process could snatch the port while postgres is down. To solve this, we use a hardcoded, arbitrary high port number, but one outside the Linux ephemeral range.