Alternative solution for PYTHON-836#891
Conversation
| # We reschedule this call until the main thread has succeeded | ||
| # making a query | ||
| if not self.attempted_hosts: | ||
| self.session.cluster.connection_class.create_timer(0.01, self._on_speculative_execute) |
There was a problem hiding this comment.
Not sure what delay to set here, we have some time.sleep(0.1) in other parts so I figured it wouldn't be that bad.
|
I've run this branch here a few times without failure, which gives me some confidence that the test won't fail. |
3bbc739 to
054a183
Compare
|
So it looks like this build failed with the delay set to I don't know what to think about the delay-based solution as a whole. This essentially probabilistically adds a Some ideas that are floating through my head on this:
|
|
That build failed because the test wasn't correct, the patch was being done for: ResponseFuture._on_speculative_execute = patch_and_count(ResponseFuture._on_speculative_execute)but with this patch ResponseFuture.send_request = patch_and_count(ResponseFuture.send_request)The reason I changed
Yes
What I don't like about this is that we would lock the event loop thread since it would have to wait for the main thread. Also I don't like creating another Lock object in such a hot path just for this particular case. I think something good of this implementation is that performance stays the same for the majority of the cases.
I'm not sure about how this would work, if the query_plan was a list we could save the first element for the main thread and use the rest of the speculative execution, but it's an iterator so I don't see an easy way to do this. Also there may be execution plans with only one host.
We could, this could however change the behavior depending on the order.
👍 I understand why this patch doesn't look too good. But a good thing about it is that it doesn't change the behavior for non speculative execution and adds just one comparison ( |
Got it, thanks for clearing that up.
Yeah, you've convinced me -- thanks for having a look and explaining the reasoning. It's easy to say "depending on a hardcoded rescheduling time is bad", but yeah, I agree -- given the way normal and speculative executions are setup, this is better than theoretical blocking or query-plan-altering fixes. So, +1 to this fix 👍 |
|
Awesome, thanks for the review! |
This is a different solution #873 and one that hopefully works better. The problem with the previous one was that for example if we are connected to one host cluster and the scheduled speculative execution happens before the regular attempt from the main thread (this is likely to happen since it's scheduled in
0seconds). The_query_planfromResponseFuturewill have already been exhausted by the speculative executions and the attempt from the main thread will fail withNoHostAvailable.In this PR
_on_speculative_executeschedules itself again if the attempt from the main thread hasn't yet happened