Skip to content

[Pub/Sub] Publisher stops publishing after RetryError Deadline of 600 exceeded error not being surfaced #7822

@jam182

Description

@jam182

Often, the publisher client stops working but without surfacing the stacktrace that never reaches our code. The result is the application hanging without failing any healthcheck. For now we had to set up an alert for when we see the log in stackdriver but that is really bad.

Stacktrace

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 259, in monitor
    return self._commit()
  File "/usr/lib/python2.7/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 207, in _commit
    self._messages,
  File "/usr/lib/python2.7/site-packages/google/cloud/pubsub_v1/gapic/publisher_client.py", line 398, in publish
    request, retry=retry, timeout=timeout, metadata=metadata)
  File "/usr/lib/python2.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/usr/lib/python2.7/site-packages/google/api_core/retry.py", line 199, in retry_target
    last_exc,
  File "/usr/lib/python2.7/site-packages/six.py", line 737, in raise_from
    raise value
RetryError: Deadline of 600.0s exceeded while calling <functools.partial object at 0x7f34dc9c9838>, last exception: 503 Connect Failed

OS type and version

Alpine 3.8

Python version and virtual environment information: python --version

2.7.14

pip freeze

bernhard==0.2.6
boto==2.27.0
cachetools==2.1.0
certifi==2019.3.9
chardet==3.0.4
click-replayer==0.0.1
configobj==4.7.2
debugtrace==0.0.1
enum34==1.1.6
funcsigs==1.0.2
futures==3.2.0
google-api-core==1.9.0
google-auth==1.6.3
google-cloud-pubsub==0.38.0
googleapis-common-protos==1.5.9
grpc-google-iam-v1==0.11.4
grpcio==1.20.0
idna==2.8
meld3==1.0.2
mock==2.0.0
MySQL-python==1.2.5
pbr==5.1.3
protobuf==3.7.1
py==1.8.0
pyasn1==0.4.5
pyasn1-modules==0.2.4
pygeoip==0.2.7
pyparsing==2.2.0
pytest==3.2.2
python-dateutil==2.2
pytz==2019.1
raven==4.0.4
redirecting==0.0.1
redis==2.10.1
requests==2.21.0
rsa==4.0
six==1.12.0
skimpubsub==1.1.4
SQLAlchemy==1.1.15
supervisor==3.2.4
ua-parser==0.8.0
ujson==1.35
urllib3==1.24.2
user-agents==1.1.0
uWSGI==2.0.14
Werkzeug==0.9.6

Extra

It mostly happens in one particular region asia-southeast1-b but in general it happens in all the regions (US, EU, etc..).

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the Pub/Sub API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions