Skip to content

fix: use ephemeral port for postgres, except for TestPubsub_Disconnect#7798

Merged
spikecurtis merged 1 commit into
mainfrom
spike/7752-port-binding-flake
Jun 5, 2023
Merged

fix: use ephemeral port for postgres, except for TestPubsub_Disconnect#7798
spikecurtis merged 1 commit into
mainfrom
spike/7752-port-binding-flake

Conversation

@spikecurtis

Copy link
Copy Markdown
Contributor

Fixes #7752

Our implementation of starting postgres on ephemeral ports is racy: it

  1. opens a TCP listener with port 0, which asks the OS to allocate a free port
  2. stores the port number
  3. starts a Docker container with that port as the host mapping

However, between 2 and 3 another process can come in and snatch the port.

Docker itself accepts 0 as a port mapping, and it asks the OS to allocate a free port. This closes the race.

A second issue is that we were using an ephemeral port for TestPubsub_Disconnect where we kill the postgres container and then restart it on the same port. It suffers from a similar race, where another process could snatch the port while postgres is down. To solve this, we use a hardcoded, arbitrary high port number, but one outside the Linux ephemeral range.

Signed-off-by: Spike Curtis <spike@coder.com>
@spikecurtis spikecurtis requested review from a team, coadler and kylecarbs June 2, 2023 09:49
@spikecurtis spikecurtis changed the title fix: ask OS for postgres ephemeral port; don't use ephemeral port for TestPubsub_Disconnect fix: ask OS for postgres ephemeral port; don't use for TestPubsub_Disconnect Jun 2, 2023
@spikecurtis spikecurtis changed the title fix: ask OS for postgres ephemeral port; don't use for TestPubsub_Disconnect fix: use ephemeral port for postgres except for TestPubsub_Disconnect Jun 2, 2023
@spikecurtis spikecurtis changed the title fix: use ephemeral port for postgres except for TestPubsub_Disconnect fix: use ephemeral port for postgres, except for TestPubsub_Disconnect Jun 2, 2023
// and restart it on the same port. If we use an ephemeral port, there is a chance the OS will reallocate before we
// start back up. The downside is that if the test crashes and leaves the container up, subsequent test runs will fail
// until we manually kill the container.
const disconnectTestPort = 26892

@mafredri mafredri Jun 2, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this, by default, usually falls outside the dynamic port range of :0, so we won't collide with other tests/test packages. Are the rules same on Windows?

At least on my NAS, it seems OK:

❯ cat /proc/sys/net/ipv4/ip_local_port_range
32768	60999

Assuming these values are set differently, there could still be port collisions.

Something like port := testutil.Port(26892), where that function takes the input as preferred, but falls back to seeking the closest free port would help in the case mentioned in the comment. The package can store "used ports", unfortunately I don't think that state will be shared between test packages (could be wrong though).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Windows, we don't run these tests there. Presumably because Docker on Windows is more effort/overhead than it's worth.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using testutil.Port() is clever, but thinking about it a bit, I don't think we should be that clever in tests. If the test leaks the container, I'd rather us know about it by failing tests, versus doing something different and rare.

@spikecurtis spikecurtis merged commit e016c30 into main Jun 5, 2023
@spikecurtis spikecurtis deleted the spike/7752-port-binding-flake branch June 5, 2023 05:24
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestPubsub_Disconnect flake on port binding

3 participants