Skip to content

PubSub: RetryError in batch publish causes futures to never complete #7103

@relud

Description

@relud

Environment details

macOS Mojave 10.14.2 and Docker Desktop Community Version 2.0.0.0-mac81 (29211)
Python 3.7.0 from docker image python:3.7.0 digest: sha256:10608fb357a18383f792efbf7472ec6d2e166dad62efc0d7c409ef2777aaafd0
google-cloud-pubsub version 0.39.1 hash sha256:32dbaf9b8c16d4a14b7e1eca2805689f82f69f542a68bf1516b9c77a3ef54b73

Steps to reproduce

  1. Set the environment variable PUBSUB_EMULATOR_HOST to :
  2. Instantiate the client
  3. Publish a message and receive a future
  4. Join all threads to ensure the future should be completed, which takes ten minutes unless configuration is modified
  5. Attempt to get the result from the future with a timeout of 0
  6. Receive a timeout error because the future will never complete

Code example

from functools import partial
from google.api_core.retry import Retry
from google.cloud.pubsub import PublisherClient
from os import environ
import threading

# Set the environment variable "PUBSUB_EMULATOR_HOST" to ":"
environ["PUBSUB_EMULATOR_HOST"] = ":"
# Instantiate the client
client = PublisherClient()
# Configure the client to retry all errors, but run out of time on the first one
client.api.publish = partial(client.api.publish, retry=Retry(predicate=lambda _: True, deadline=0))
# Publish a message and receive a future
future = client.publish("", b"")
# Join all threads to ensure the future should be completed
for thread in set(threading.enumerate()) - {threading.current_thread()}:
    try: thread.join()
    except RetryError: pass  # expected
# Attempt to get the result from the future with a timeout of 0
future.result(0)
# Receive a timeout error because the future will never complete

Stack trace

Exception in thread Thread-MonitorBatchPublisher:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 547, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 466, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Name resolution failure"
debug_error_string = "{"created":"@1546984814.701239700","description":"Failed to create subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2706,"referenced_errors":[{"created":"@1546984814.701230600","description":"Name resolution failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3035,"grpc_status":14}]}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 179, in retry_target
    return target()
  File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 Name resolution failure

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 254, in monitor
    return self._commit()
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 201, in _commit
    response = self._client.api.publish(self._topic, self._messages)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/gapic/publisher_client.py", line 410, in publish
    request, retry=retry, timeout=timeout, metadata=metadata
  File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 199, in retry_target
    last_exc,
  File "<string>", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 0.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7f764c2b3b70>, messages {
}
, metadata=[('x-goog-api-client', 'gl-python/3.7.0 grpc/1.17.1 gax/1.7.0 gapic/0.39.1')]), last exception: 503 Name resolution failure

Traceback (most recent call last):
  File "<string>", line 15, in <module>
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/futures.py", line 110, in result
    err = self.exception(timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/futures.py", line 133, in exception
    raise exceptions.TimeoutError("Timed out waiting for result.")
concurrent.futures._base.TimeoutError: Timed out waiting for result.

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the Pub/Sub API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions