SubscriberClient.pull currently uses a default timeout of 12 seconds. If there are no messages, it continuously raises DeadlineExceeded exceptions that need to be ignored. This causes a race condition between the server and the client.
- Client calls
pull on an empty subscription.
- A message is published to the corresponding topic.
- The server decides to hand this message to the client.
- The client times out.
The message is now stuck until the ack deadline expires.
Solution
Ensure that clients set a deadline that is longer than the server's timeout. Currently it seems the server times out after 18 seconds, so a deadline of 20-30 seconds would probably be sufficient. (This probably means this is a "bug" in the pubsub gapic configuration and probably affects all official pubsub client libraries, not just the Python one?)
As a side effect: this means pull subscribers will have "successful" pull requests recorded in stackdriver when the subscription is "idle" instead of cancelled pull requests.
Environment details
OS: Linux, ContainerOS (GKE), Container is Debian9 (using distroless)
Python: 3.5.3
API: google-cloud-python 0.41.0
Steps to reproduce
- Create a subscription with a maximum length deadline:
gcloud pubsub subscriptions create deleteme_subscription --topic=deleteme_test --ack-deadline=600
- Create ~20 subscribers calling pull on this subscription in a loop.
- Publish 1 message a second to the corresponding topic.
- Watch the stackdriver
oldest_unacked_message_age metric. Eventually, you will see some messages get "stuck" and this metric begin to increase.
SubscriberClient.pullcurrently uses a default timeout of 12 seconds. If there are no messages, it continuously raisesDeadlineExceededexceptions that need to be ignored. This causes a race condition between the server and the client.pullon an empty subscription.The message is now stuck until the ack deadline expires.
Solution
Ensure that clients set a deadline that is longer than the server's timeout. Currently it seems the server times out after 18 seconds, so a deadline of 20-30 seconds would probably be sufficient. (This probably means this is a "bug" in the pubsub gapic configuration and probably affects all official pubsub client libraries, not just the Python one?)
As a side effect: this means pull subscribers will have "successful" pull requests recorded in stackdriver when the subscription is "idle" instead of cancelled pull requests.
Environment details
OS: Linux, ContainerOS (GKE), Container is Debian9 (using distroless)
Python: 3.5.3
API: google-cloud-python 0.41.0
Steps to reproduce
gcloud pubsub subscriptions create deleteme_subscription --topic=deleteme_test --ack-deadline=600oldest_unacked_message_agemetric. Eventually, you will see some messages get "stuck" and this metric begin to increase.