feat: add single tailnet support to pgcoord#9351
Conversation
20a5183 to
4a4bbdb
Compare
| db = database.New(sqlDB) | ||
|
|
||
| err = migrations.Up(sqlDB) | ||
| require.NoError(t, err) |
There was a problem hiding this comment.
if this is required, how were our tests working before?
There was a problem hiding this comment.
This is only if you specify a database URL to run tests against. I needed to run a standalone postgres to see the logs and previously it wouldn't run the migrations. I can prob remove this, I'm not sure if it has any greater effects.
| require.NoError(t, err) | ||
|
|
||
| assertEventuallyNoClientsForAgent(ctx, t, store, agent1.id) | ||
| assertEventuallyNoAgents(ctx, t, store, agent1.id) |
There was a problem hiding this comment.
Some additional test cases I'd like to see:
- MultiAgent with multiple coordinators
- MultiAgent
UpdateSelffirst, thenSubscribeAgent--- verify the agent gets the update. UnsubscribeAgent--- verify that the agent no longer gets updates about the client and vice versa- Subscribe to multiple agents
There was a problem hiding this comment.
I'll work on implementing these after the rest is solid 👍
1cd56e6 to
a7b39df
Compare
| delete(q.conns, c) | ||
|
|
||
| if _, ok := q.clientSubscriptions[c.UniqueID()]; !ok { | ||
| q.clientSubscriptions[c.UniqueID()] = map[uuid.UUID]struct{}{} |
There was a problem hiding this comment.
this is a memory leak: you add to q.clientSubscriptions but never remove.
There was a problem hiding this comment.
You still never delete anything from q.clientSubscriptions
There was a problem hiding this comment.
Whoops, I think I got this now.
| delete(q.conns, c) | ||
|
|
||
| if _, ok := q.clientSubscriptions[c.UniqueID()]; !ok { | ||
| q.clientSubscriptions[c.UniqueID()] = map[uuid.UUID]struct{}{} |
There was a problem hiding this comment.
You still never delete anything from q.clientSubscriptions
| kind: c.Kind(), | ||
| } | ||
| cm, ok := q.mappers[mk] | ||
| if ok { |
There was a problem hiding this comment.
the fact that you're checking ok here smells funny and indicates that we're not keeping the data structures in sync.
It should be an invariant in this code that a client subscription in the q.clientSubscriptions map implies that a corresponding mapper exists. They are always added together and always removed together while holding the mutex.
There was a problem hiding this comment.
How do you think we should handle a case where the mapper doesn't exist. Panic?
There was a problem hiding this comment.
yeah, panic, or at the very least, drop a Critical log
There was a problem hiding this comment.
I found a pretty easy way to trigger this invariant. It's possible for (*querier).cleanupConn to race with a call to removeClientSubscription, such that removeClientSubscription is trying to remove a subscription that no longer exists. I'm not too sure how to prevent this, except by ignoring duplicates.
There was a problem hiding this comment.
The invariant is that a client subscription exists in q.clientSubscriptions AND in the mappers, or neither. It can never be in one but not the other when the mutex is unlocked.
You're right though, that in this design, it's possible for removeClientSubscription to race with cleanupConn such that you can try to remove a subscription that was already cleaned up. What you've done is fine and follows the invariant --- if the subscription is not in q.clientSubscriptions that means it's already cleaned up from the mappers too, and removeClientSubscription can return early.
| if sub.active { | ||
| q.newClientSubscription(sub.q, sub.agentID) | ||
| } else { | ||
| q.removeClientSubscription(sub.q, sub.agentID) |
There was a problem hiding this comment.
you send a subscription with agentID unset on remove. The subscriber handles this case, but then it forwards it to the querier which I don't think handles the case.
There was a problem hiding this comment.
It's technically handled by checking the clientSubscriptions map, since there will never be an agent subscription with uuid.Nil. I had an explicit check but just replaced it with the map lookup, since it essentially does the same thing. Might be good to comment.
This adds support for the single tailnet expieriment to pgcoord. Afer this there aren't any major blockers for making it GA.