Fix flaky `TestServer_RedundantUpdateSuppression`. by Pranjali-2501 · Pull Request #8839 · grpc/grpc-go

Pranjali-2501 · 2026-01-15T20:21:37Z

The TestServer_RedundantUpdateSuppression test was flaky because it began monitoring ClientConn state transitions before the connection had fully stabilized in the expected Ready state.

This created a race condition where the monitoring loop could capture initial setup transitions (e.g., Connecting) and falsely report them as unexpected failures caused by the redundant update.

Successfully run the test on forge for 1 million times without any flake.

RELEASE NOTES: N/A

codecov · 2026-01-15T20:25:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.16%. Comparing base (629ef39) to head (f288aaf).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8839      +/-   ##
==========================================
- Coverage   83.22%   83.16%   -0.07%     
==========================================
  Files         417      414       -3     
  Lines       32920    32776     -144     
==========================================
- Hits        27397    27257     -140     
+ Misses       4108     4079      -29     
- Partials     1415     1440      +25

see 28 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arjan-bal · 2026-01-16T07:32:03Z

@Pranjali-2501 have you verified that the flakiness is indeed caused by the proto comparison and that the changes in this PR fix the flakiness? If so, can you mention the verification performed in the PR description?

Pranjali-2501 · 2026-01-16T10:09:48Z

@Pranjali-2501 have you verified that the flakiness is indeed caused by the protoc comparison and that the changes in this PR fix the flakiness? If so, can you mention the verification performed in the PR description?

No, I wasn't able to reproduce the exact flake locally and on forge also.
However, looking at the test failure logs, it seems the issue arises when the client receives a redundant update. This can be caused because of using bytes.Equal.

Why bytes.Equal is causing the flake: proto.Marshal is not deterministic. Specifically, Go randomizes map iteration order at runtime to prevent hash collisions. This means identical resources can produce different byte arrays in different runs .
When this happens, bytes.Equal returns false for identical data. Swapping to proto.Equal fixes this by performing a semantic comparison rather than a byte-for-byte one.

I have verified that this change maintains correctness for valid updates and doesn't introduce any regressions in the existing test suite.

Since I'm not sure whether the flake is because of this, we can monitor the Flaky test on github for sometime after that PR get merged.

arjan-bal · 2026-01-16T10:34:24Z

I was able to repro the failure in a 2*10^5 runs on forge. Shared the command offline. Can you use debug logs to verify that the serialized bytes are different but the comparison using proto.Equal returns true?

From the logs, it seems to me like a race in the channel state propagation.

Pranjali-2501 · 2026-01-16T19:57:00Z

I was able to repro the failure in a 2*10^5 runs on forge. Shared the command offline. Can you use debug logs to verify that the serialized bytes are different but the comparison using proto.Equal returns true?

From the logs, it seems to me like a race in the channel state propagation.

I have updated the PR to remove the server-side proto.Equal changes, as they turned out to be unnecessary.

The root cause of the flake was a race condition in the test itself: it was checking for connectivity state changes before the client connection had fully stabilized. I added testutils.AwaitState(ctx, t, cc, connectivity.Ready) to ensure strictly synchronized monitoring. The test now passes reliably without modifying the server logic.

arjan-bal

LGTM, nice find! Please address the remaining comment before merging.

arjan-bal · 2026-01-19T06:08:45Z

+	testutils.AwaitState(ctx, t, cc, connectivity.Ready)
 	errCh := make(chan error, 1)
 	go func() {
 		prev := connectivity.Ready // We know we are READY since we just did an RPC.


We should remove this trailing comment as it is technically incorrect. Once an LB policy produces a READY picker, two serial events occur:

The picker updates (unblocking queued RPCs).

The channel state updates.

This test appears to hit a race condition where it inspects the state between these two steps—observing a successful RPC while the channel state is still CONNECTING.

Removed the trailing comment.

Fixes grpc#7713 The `TestServer_RedundantUpdateSuppression` test was flaky because it began monitoring ClientConn state transitions before the connection had fully stabilized in the expected Ready state. This created a race condition where the monitoring loop could capture initial setup transitions (e.g., Connecting) and falsely report them as unexpected failures caused by the redundant update. Successfully run the test on forge for 1 million times without any flake. RELEASE NOTES: N/A

use proto.Equal instead of bytes.Equal

6d6970b

Pranjali-2501 added this to the 1.79 Release milestone Jan 15, 2026

Pranjali-2501 added the Type: Testing label Jan 15, 2026

Pranjali-2501 requested a review from arjan-bal January 15, 2026 20:29

Pranjali-2501 assigned arjan-bal Jan 15, 2026

arjan-bal assigned Pranjali-2501 and unassigned arjan-bal Jan 16, 2026

Pranjali-2501 assigned arjan-bal and unassigned Pranjali-2501 Jan 16, 2026

arjan-bal assigned Pranjali-2501 and unassigned arjan-bal Jan 16, 2026

fix flaky test

0c038cb

Pranjali-2501 changed the title ~~xds: use proto.Equal for xDS resources equality check.~~ Fix flaky TestServer_RedundantUpdateSuppression. Jan 16, 2026

Pranjali-2501 assigned arjan-bal and unassigned Pranjali-2501 Jan 16, 2026

arjan-bal approved these changes Jan 19, 2026

View reviewed changes

arjan-bal assigned Pranjali-2501 and unassigned arjan-bal Jan 19, 2026

removed unnecessary comment

f288aaf

Pranjali-2501 merged commit bd4444a into grpc:master Jan 19, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky `TestServer_RedundantUpdateSuppression`.#8839

Fix flaky `TestServer_RedundantUpdateSuppression`.#8839
Pranjali-2501 merged 3 commits intogrpc:masterfrom
Pranjali-2501:issue_7713

Pranjali-2501 commented Jan 15, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

arjan-bal commented Jan 16, 2026 •

edited

Loading

Uh oh!

Pranjali-2501 commented Jan 16, 2026 •

edited

Loading

Uh oh!

arjan-bal commented Jan 16, 2026 •

edited

Loading

Uh oh!

Pranjali-2501 commented Jan 16, 2026

Uh oh!

arjan-bal left a comment

Uh oh!

arjan-bal Jan 19, 2026

Uh oh!

Pranjali-2501 Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Pranjali-2501 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arjan-bal commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pranjali-2501 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arjan-bal commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pranjali-2501 commented Jan 16, 2026

Uh oh!

arjan-bal left a comment

Choose a reason for hiding this comment

Uh oh!

arjan-bal Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Pranjali-2501 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pranjali-2501 commented Jan 15, 2026 •

edited

Loading

codecov Bot commented Jan 15, 2026 •

edited

Loading

arjan-bal commented Jan 16, 2026 •

edited

Loading

Pranjali-2501 commented Jan 16, 2026 •

edited

Loading

arjan-bal commented Jan 16, 2026 •

edited

Loading