PubSub: Fix pubsub Streaming Pull shutdown on RetryError#7863
Merged
Conversation
If a RetryError occurs, it is time to stop waiting for the underlying gRPC channel to recover from a transient failure, and a clean shutdown needs to be triggered. This commit assures that this indeed happens (it used to happen on terminal channel errors only).
tseaver
approved these changes
May 7, 2019
Contributor
Author
|
As discussed offline, the failing reCAPTCHA Enterprise build is not related, and we agreed to merge this. |
parthea
pushed a commit
that referenced
this pull request
Mar 2, 2026
If a RetryError occurs, it is time to stop waiting for the underlying gRPC channel to recover from a transient failure, and a clean shutdown needs to be triggered. This commit assures that this indeed happens (it used to happen on terminal channel errors only).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #7709.
If a gRPC channel is in
TRANSIENT_FAILUREstate for too long, the retry timeout configured in subscriber client config kicks in, and aRetryErroris raised in a background thread, but the client keeps running, and the error is not propagated to the top level code.This PR makes sure that the following happens:
future.result(), allowing the user code a chance to catch the error and react to it.How to test
I was not able to reproduce the actual error users reported in a real setup (a sample pubsub app deployed to K8s), but figured out what is probably happening and faked the error.
Steps to reproduce:
grpcdependency in your local Python environment, example:total_timeout_millissetting in subscriber client config to 10 (seconds... in order to not wait for too long)Actual result (before the fix):
A
RetryErroroccurs in the background after ~10 seconds, some of the threads exit, but the subscriber client keeps running, and the error is not propagated to the main thread (the future returned by thesubscribe()method is not resolved)Expected result (after the fix):
Everything gets shut down cleanly, and
RetryErroris propagated to and raised in the main code.