Alternative solution for PYTHON-836 by beltran · Pull Request #891 · apache/cassandra-python-driver

beltran · 2017-12-21T14:58:09Z

This is a different solution #873 and one that hopefully works better. The problem with the previous one was that for example if we are connected to one host cluster and the scheduled speculative execution happens before the regular attempt from the main thread (this is likely to happen since it's scheduled in 0 seconds). The _query_plan from ResponseFuture will have already been exhausted by the speculative executions and the attempt from the main thread will fail with NoHostAvailable.
In this PR _on_speculative_execute schedules itself again if the attempt from the main thread hasn't yet happened

beltran · 2017-12-21T14:59:32Z

+            # We reschedule this call until the main thread has succeeded
+            # making a query
+            if not self.attempted_hosts:
+                self.session.cluster.connection_class.create_timer(0.01, self._on_speculative_execute)


Not sure what delay to set here, we have some time.sleep(0.1) in other parts so I figured it wouldn't be that bad.

beltran · 2017-12-21T17:13:37Z

I've run this branch here a few times without failure, which gives me some confidence that the test won't fail.

mambocab · 2018-01-02T17:30:30Z

So it looks like this build failed with the delay set to 0.001, which implies to me that this basically depends on the .01 sleep providing a wide enough window to work. However, the code itself doesn't look like it could work that way, so maybe I'm wrong. Do you understand what's going on there?

I don't know what to think about the delay-based solution as a whole. This essentially probabilistically adds a .01 delay to speculative executions with delay < .01, yeah?

Some ideas that are floating through my head on this:

Could we try a lock or event-based solution?
Would it break request behavior to break off part of the query plan as "hosts we won't call from spec exec"? I'm guessing it's a bad idea, but I wanna get some ideas going.
Could we just allow spec exec to consume the query plan and have a fallback query plan for the main execution?
This line of thinking makes me wonder how valuable the distinction between speculative executions and non-speculative executions. I know we have to implement it as it exists now, but it's something to think about.

beltran · 2018-01-09T15:57:44Z

That build failed because the test wasn't correct, the patch was being done for:

ResponseFuture._on_speculative_execute = patch_and_count(ResponseFuture._on_speculative_execute)

but with this patch _on_speculative_execute could be called a different number of times on each time ( since it's rescheduled by itself), therefore the test now patches send_request,

ResponseFuture.send_request = patch_and_count(ResponseFuture.send_request)

The reason I changed 0.001 to 0.1 is because I realized the function was rescheduled a lot of times sometimes until the main thread finally sent the main request and the speculative query was sent after that. This is a something bad of this patch, figuring out how much to wait.

I don't know what to think about the delay-based solution as a whole. This essentially probabilistically adds a .01 delay to speculative executions with delay < .01, yeah?

Yes

Could we try a lock or event-based solution?

What I don't like about this is that we would lock the event loop thread since it would have to wait for the main thread. Also I don't like creating another Lock object in such a hot path just for this particular case. I think something good of this implementation is that performance stays the same for the majority of the cases.

Would it break request behavior to break off part of the query plan as "hosts we won't call from spec exec"? I'm guessing it's a bad idea, but I wanna get some ideas going.

I'm not sure about how this would work, if the query_plan was a list we could save the first element for the main thread and use the rest of the speculative execution, but it's an iterator so I don't see an easy way to do this. Also there may be execution plans with only one host.

Could we just allow spec exec to consume the query plan and have a fallback query plan for the main execution?

We could, this could however change the behavior depending on the order.
For example if the execution plan is only one node and a query was sent for the speculative execution to that node (exhausting that query plan), then we would send a new query from the main thread using the fallback query plan, sending two queries in total.
But if the original execution plan was exhausted by the main thread we wouldn't send any speculative execution since there would be no more hosts to query.

This line of thinking makes me wonder how valuable the distinction between speculative executions and non-speculative executions. I know we have to implement it as it exists now, but it's something to think about.

👍

I understand why this patch doesn't look too good. But a good thing about it is that it doesn't change the behavior for non speculative execution and adds just one comparison (if not self.attempted_hosts:) for almost the speculative ones. I don't like as well to reschedule something to be called later instead of waiting with a lock but I think for this particular case it's better. WDYT?

mambocab · 2018-01-09T20:36:18Z

That build failed because the test wasn't correct

Got it, thanks for clearing that up.

... I think for this particular case it's better.

Yeah, you've convinced me -- thanks for having a look and explaining the reasoning. It's easy to say "depending on a hardcoded rescheduling time is bad", but yeah, I agree -- given the way normal and speculative executions are setup, this is better than theoretical blocking or query-plan-altering fixes.

So, +1 to this fix 👍

beltran · 2018-01-10T18:06:23Z

Awesome, thanks for the review!

beltran commented Dec 21, 2017

View reviewed changes

Alternative solution for PYTHON-836

054a183

beltran force-pushed the python-836-2 branch from 3bbc739 to 054a183 Compare December 22, 2017 14:47

beltran merged commit 8ac3fbb into master Jan 10, 2018

beltran deleted the python-836-2 branch January 10, 2018 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative solution for PYTHON-836#891

Alternative solution for PYTHON-836#891
beltran merged 1 commit into
masterfrom
python-836-2

beltran commented Dec 21, 2017

Uh oh!

beltran Dec 21, 2017

Uh oh!

beltran commented Dec 21, 2017 •

edited

Loading

Uh oh!

mambocab commented Jan 2, 2018

Uh oh!

beltran commented Jan 9, 2018

Uh oh!

mambocab commented Jan 9, 2018

Uh oh!

beltran commented Jan 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

beltran commented Dec 21, 2017

Uh oh!

beltran Dec 21, 2017

Choose a reason for hiding this comment

Uh oh!

beltran commented Dec 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mambocab commented Jan 2, 2018

Uh oh!

beltran commented Jan 9, 2018

Uh oh!

mambocab commented Jan 9, 2018

Uh oh!

beltran commented Jan 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

beltran commented Dec 21, 2017 •

edited

Loading