PYTHON-853: delay timeout#899
Conversation
| # PYTHON-853: for short timeouts, we sometimes race with our __init__ | ||
| if self._connection is None: | ||
| self._timer = self.session.cluster.connection_class.create_timer(0.01, self._on_timeout) | ||
|
|
There was a problem hiding this comment.
Should there be a return here?
There was a problem hiding this comment.
I like this patch since it's really a solution. But I think it can get into a loop sometimes, for example, against only one C* node:
import time
import logging
log = logging.getLogger()
log.setLevel('DEBUG')
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
log.addHandler(handler)
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
print(cluster.metadata.all_hosts())
assert len(session._pools.values()) == 1
tuple(session._pools.values())[0].shutdown()
# stuck here
session.execute("SELECT * from system.local", timeout=0)Shutting down the pool tuple(session._pools.values())[0].shutdown() like that is not very legit but it could happen before some execute. In this case self._connection == None and is never set, therefore self._on_timeout() is in a loop calling itself.
2070368 to
5082c6a
Compare
|
(Responding to this comment) Re: |
|
Not sure to understand the example playing with the pool manually.... but I do see the point of looping. just brainstorming, if it doesn't make sense _on_timeout() is called before things are ready, shouldn't ResponseQuery._query set a particular events when mechanisms can start? Note that I'm not totally aware of all the context so only both of you know well this issue. |
|
Yeah, the user shouldn't/won't call shutdown manually, the script calls it that way so it's deterministic and trying to simulate what would happen if it was called internally by the driver. |
|
I agree with a such change for a use case like that yeah. |
| self._timer = self.session.cluster.connection_class.create_timer( | ||
| 0.01, | ||
| partial(self._on_timeout, attempts=attempts + 1) | ||
| ) |
There was a problem hiding this comment.
Since the looping issue is only valid for a particular case, I would tend to leave the main API untouched. Especially because the attempts parameter is only taken into account when connection is None... which is not clear with the function signature. What about something hidden like:
if self._connection is None:
timer_threshold = getattr(self, '_timer_threshold', 0)
self._timer_threshold = timer_threshold + 1
if self._timer_threshold <= 3:
self._timer = self.session.cluster.connection_class.create_timer(
0.01,
self._on_timeout
)
returnThere was a problem hiding this comment.
That'd work just as well, though I prefer using the kwarg. I think the parameter makes it clear that the counter is only ever passed in. With the attribute soluion, it's not obvious that it's never set in some other method.
I'm not too worried about adding to the _on_timeout API since it's internal and since we can provide a default.
If I add a docstring/comment explaining attempts, would that address your concerns? We also could rename it to _attempts to make it clear that you should only use it in rare cases.
There was a problem hiding this comment.
Can't say I agree with this, but that's fine. I just feel that attempts doesn't make sense to be an API parameter, even if it's internal. e.g. From an API perspective, what does it mean? I'm OK with merging anyway... at least we definitively avoid the rare potential looping issue with this. 👍
There was a problem hiding this comment.
My thought was that this is a psuedo-recursive function now, so using a parameter would be a simple way to track depth. Thanks for the feedback, sorry to override it here -- I'll add a comment and merge when the tests look good.
e93ce0b to
8611fd1
Compare
8611fd1 to
3778312
Compare
Tries to solve the same issue as #857.
I'm a little wary of adding a new place where
._timeris set, but I think we're OK:Here, if
_on_timeoutoverwrites._timer, that's fine -- we know._timerisNoneanyway.I was worried about deferring to the event loop and returning from
_on_timeoutwithout the work actually being done. However, _on_speculative_execute is only ever called from the event loop (see below) so I think it's ok to replace the timer and put the timeoout processing on the event loop, since nothing explicitly waits on the completion of_on_speculative_execute.Here, we're definitely ok. If we ever replace
._timer, it's at the end of the execution of the deferred_on_timeoutor_on_speculative_executecall.I believe that we're ok here as well. Since the call ends after
_on_timeoutis called, we can't lose any timers --._timerwill only be set multiple times if_on_timeoutreschedules itself multiple times, and that's fine.I think this covers all the ways that
_on_timeoutis called -- it's always called by_send_request, sometimes via_on_speculative_execute. It also seems to cover all the ways that._timercan be set to a new value.