Skip to content

Commit 2f261b7

Browse files
author
Chris Rossi
authored
feat: retry global cache operations on transient errors (#603)
When an operation on the global cache is tried and there is a transient error, such as not being able to connect to a Redis or Memcached server, if the operation is being tried in strict mode, then the operation is now retried a number of times before the transient error is eventually raised to the application layer. Implements #601
1 parent 8e4aa82 commit 2f261b7

File tree

3 files changed

+165
-49
lines changed

3 files changed

+165
-49
lines changed

packages/google-cloud-ndb/google/cloud/ndb/_cache.py

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
import itertools
1717
import warnings
1818

19+
from google.api_core import retry as core_retry
20+
1921
from google.cloud.ndb import _batch
2022
from google.cloud.ndb import context as context_module
2123
from google.cloud.ndb import tasklets
@@ -132,9 +134,33 @@ def _handle_transient_errors(read=False):
132134
Will log as warning or reraise transient errors according to `strict_read` and
133135
`strict_write` attributes of the global cache and whether the operation is a read or
134136
a write.
137+
138+
If in strict mode, will retry the wrapped function up to 5 times before reraising
139+
the transient error.
135140
"""
136141

137142
def wrap(wrapped):
143+
def retry(wrapped, transient_errors):
144+
@functools.wraps(wrapped)
145+
@tasklets.tasklet
146+
def retry_wrapper(*args, **kwargs):
147+
sleep_generator = core_retry.exponential_sleep_generator(0.1, 1)
148+
attempts = 5
149+
for sleep_time in sleep_generator: # pragma: NO BRANCH
150+
# pragma is required because loop never exits normally, it only gets
151+
# raised out of.
152+
attempts -= 1
153+
try:
154+
result = yield wrapped(*args, **kwargs)
155+
raise tasklets.Return(result)
156+
except transient_errors:
157+
if not attempts:
158+
raise
159+
160+
yield tasklets.sleep(sleep_time)
161+
162+
return retry_wrapper
163+
138164
@functools.wraps(wrapped)
139165
@tasklets.tasklet
140166
def wrapper(*args, **kwargs):
@@ -145,17 +171,22 @@ def wrapper(*args, **kwargs):
145171
cache.clear()
146172
cache.clear_cache_soon = False
147173

148-
result = yield wrapped(*args, **kwargs)
174+
is_read = read
175+
if not is_read:
176+
is_read = kwargs.get("read", False)
177+
178+
strict = cache.strict_read if is_read else cache.strict_write
179+
if strict:
180+
function = retry(wrapped, cache.transient_errors)
181+
else:
182+
function = wrapped
183+
184+
result = yield function(*args, **kwargs)
149185
raise tasklets.Return(result)
150186

151187
except cache.transient_errors as error:
152188
cache.clear_cache_soon = True
153189

154-
strict_read = read
155-
if not strict_read:
156-
strict_read = kwargs.get("read", False)
157-
strict = cache.strict_read if strict_read else cache.strict_write
158-
159190
if strict:
160191
raise
161192

packages/google-cloud-ndb/google/cloud/ndb/global_cache.py

Lines changed: 71 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,20 @@ class GlobalCache(object):
4949
Attributes:
5050
strict_read (bool): If :data:`False`, transient errors that occur as part of a
5151
entity lookup operation will be logged as warnings but not raised to the
52-
application layer.
52+
application layer. If :data:`True`, in the event of transient errors, cache
53+
operations will be retried a number of times before eventually raising the
54+
transient error to the application layer, if it does not resolve after
55+
retrying. Setting this to :data:`True` will cause NDB operations to take
56+
longer to complete if there are transient errors in the cache layer.
5357
strict_write (bool): If :data:`False`, transient errors that occur as part of
5458
a put or delete operation will be logged as warnings, but not raised to the
55-
application layer. Setting this to :data:`True` somewhat increases the risk
56-
that other clients might read stale data from the cache.
59+
application layer. If :data:`True`, in the event of transient errors, cache
60+
operations will be retried a number of times before eventually raising the
61+
transient error to the application layer if it does not resolve after
62+
retrying. Setting this to :data:`False` somewhat increases the risk
63+
that other clients might read stale data from the cache. Setting this to
64+
:data:`True` will cause NDB operations to take longer to complete if there
65+
are transient errors in the cache layer.
5766
"""
5867

5968
__metaclass__ = abc.ABCMeta
@@ -243,8 +252,12 @@ class RedisCache(GlobalCache):
243252
strict_read (bool): If :data:`False`, connection errors during read operations
244253
will be logged with a warning and treated as cache misses, but will not
245254
raise an exception in the application, with connection errors during reads
246-
being treated as cache misses. If :data:`True`, connection errors will be
247-
raised as exceptions in the application. Default: :data:`False`.
255+
being treated as cache misses. If :data:`True`, in the event of connection
256+
errors, cache operations will be retried a number of times before eventually
257+
raising the connection error to the application layer, if it does not
258+
resolve after retrying. Setting this to :data:`True` will cause NDB
259+
operations to take longer to complete if there are transient errors in the
260+
cache layer. Default: :data:`False`.
248261
strict_write (bool): If :data:`False`, connection errors during write
249262
operations will be logged with a warning, but will not raise an exception in
250263
the application. If :data:`True`, connection errors during write will be
@@ -253,7 +266,12 @@ class RedisCache(GlobalCache):
253266
retrieve stale data from the cache. If there is a connection error, an
254267
internal flag will be set to clear the cache the next time any method is
255268
called on this object, to try and minimize the opportunity for clients to
256-
read stale data from the cache. Default: :data:`True`.
269+
read stale data from the cache. If :data:`True`, in the event of connection
270+
errors, cache operations will be retried a number of times before eventually
271+
raising the connection error to the application layer, if it does not
272+
resolve after retrying. Setting this to :data:`True` will cause NDB
273+
operations to take longer to complete if there are transient errors in the
274+
cache layer. Default: :data:`True`.
257275
"""
258276

259277
transient_errors = (
@@ -274,9 +292,12 @@ def from_environment(cls, strict_read=False, strict_write=True):
274292
strict_read (bool): If :data:`False`, connection errors during read
275293
operations will be logged with a warning and treated as cache misses,
276294
but will not raise an exception in the application, with connection
277-
errors during reads being treated as cache misses. If :data:`True`,
278-
connection errors will be raised as exceptions in the application.
279-
Default: :data:`False`.
295+
errors during reads being treated as cache misses. If :data:`True`, in
296+
the event of connection errors, cache operations will be retried a
297+
number of times before eventually raising the connection error to the
298+
application layer, if it does not resolve after retrying. Setting this
299+
to :data:`True` will cause NDB operations to take longer to complete if
300+
there are transient errors in the cache layer. Default: :data:`False`.
280301
strict_write (bool): If :data:`False`, connection errors during write
281302
operations will be logged with a warning, but will not raise an
282303
exception in the application. If :data:`True`, connection errors during
@@ -285,7 +306,12 @@ def from_environment(cls, strict_read=False, strict_write=True):
285306
allow other clients to retrieve stale data from the cache. If there is
286307
a connection error, an internal flag will be set to clear the cache the
287308
next time any method is called on this object, to try and minimize the
288-
opportunity for clients to read stale data from the cache. Default:
309+
opportunity for clients to read stale data from the cache. If
310+
:data:`True`, in the event of connection errors, cache operations will
311+
be retried a number of times before eventually raising the connection
312+
error to the application layer, if it does not resolve after retrying.
313+
Setting this to :data:`True` will cause NDB operations to take longer to
314+
complete if there are transient errors in the cache layer. Default:
289315
:data:`True`.
290316
291317
Returns:
@@ -398,20 +424,30 @@ class MemcacheCache(GlobalCache):
398424
399425
Args:
400426
client (pymemcache.Client): Instance of Memcache client to use.
401-
strict_read (bool): If :data:`False`, connection errors during read operations
402-
will be logged with a warning and treated as cache misses, but will not
403-
raise an exception in the application, with connection errors during reads
404-
being treated as cache misses. If :data:`True`, connection errors will be
405-
raised as exceptions in the application. Default: :data:`False`.
427+
strict_read (bool): If :data:`False`, connection errors during read
428+
operations will be logged with a warning and treated as cache misses,
429+
but will not raise an exception in the application, with connection
430+
errors during reads being treated as cache misses. If :data:`True`, in
431+
the event of connection errors, cache operations will be retried a
432+
number of times before eventually raising the connection error to the
433+
application layer, if it does not resolve after retrying. Setting this
434+
to :data:`True` will cause NDB operations to take longer to complete if
435+
there are transient errors in the cache layer. Default: :data:`False`.
406436
strict_write (bool): If :data:`False`, connection errors during write
407-
operations will be logged with a warning, but will not raise an exception in
408-
the application. If :data:`True`, connection errors during write will be
409-
raised as exceptions in the application. Because write operations involve
410-
cache invalidation, setting this to :data:`False` may allow other clients to
411-
retrieve stale data from the cache. If there is a connection error, an
412-
internal flag will be set to clear the cache the next time any method is
413-
called on this object, to try and minimize the opportunity for clients to
414-
read stale data from the cache. Default: :data:`True`.
437+
operations will be logged with a warning, but will not raise an
438+
exception in the application. If :data:`True`, connection errors during
439+
write will be raised as exceptions in the application. Because write
440+
operations involve cache invalidation, setting this to :data:`False` may
441+
allow other clients to retrieve stale data from the cache. If there is
442+
a connection error, an internal flag will be set to clear the cache the
443+
next time any method is called on this object, to try and minimize the
444+
opportunity for clients to read stale data from the cache. If
445+
:data:`True`, in the event of connection errors, cache operations will
446+
be retried a number of times before eventually raising the connection
447+
error to the application layer, if it does not resolve after retrying.
448+
Setting this to :data:`True` will cause NDB operations to take longer to
449+
complete if there are transient errors in the cache layer. Default:
450+
:data:`True`.
415451
"""
416452

417453
transient_errors = (
@@ -458,9 +494,12 @@ def from_environment(cls, max_pool_size=4, strict_read=False, strict_write=True)
458494
strict_read (bool): If :data:`False`, connection errors during read
459495
operations will be logged with a warning and treated as cache misses,
460496
but will not raise an exception in the application, with connection
461-
errors during reads being treated as cache misses. If :data:`True`,
462-
connection errors will be raised as exceptions in the application.
463-
Default: :data:`False`.
497+
errors during reads being treated as cache misses. If :data:`True`, in
498+
the event of connection errors, cache operations will be retried a
499+
number of times before eventually raising the connection error to the
500+
application layer, if it does not resolve after retrying. Setting this
501+
to :data:`True` will cause NDB operations to take longer to complete if
502+
there are transient errors in the cache layer. Default: :data:`False`.
464503
strict_write (bool): If :data:`False`, connection errors during write
465504
operations will be logged with a warning, but will not raise an
466505
exception in the application. If :data:`True`, connection errors during
@@ -469,7 +508,12 @@ def from_environment(cls, max_pool_size=4, strict_read=False, strict_write=True)
469508
allow other clients to retrieve stale data from the cache. If there is
470509
a connection error, an internal flag will be set to clear the cache the
471510
next time any method is called on this object, to try and minimize the
472-
opportunity for clients to read stale data from the cache. Default:
511+
opportunity for clients to read stale data from the cache. If
512+
:data:`True`, in the event of connection errors, cache operations will
513+
be retried a number of times before eventually raising the connection
514+
error to the application layer, if it does not resolve after retrying.
515+
Setting this to :data:`True` will cause NDB operations to take longer to
516+
complete if there are transient errors in the cache layer. Default:
473517
:data:`True`.
474518
475519
Returns:

0 commit comments

Comments
 (0)