Skip to content

PYTHON-644 - Timer logic#690

Merged
mambocab merged 2 commits into
apache:masterfrom
mambocab:timer-logic
Feb 8, 2017
Merged

PYTHON-644 - Timer logic#690
mambocab merged 2 commits into
apache:masterfrom
mambocab:timer-logic

Conversation

@mambocab
Copy link
Copy Markdown
Contributor

First commit reverts part of the fix for PYTHON-367 (#379) that put _start_timer calls in code that could be called from secondary threads. This caused a race between timer cancellation and that _start_timer call.

The second commit adds a _cancel_timer call before defuncting connections if the ResponseFuture isn't going to retry the request. Commit message explaining this reproduced below:

This allows us to revert some of the fix to PYTHON-367 without the timer
heap growing out of control as described on that ticket. We cancel
before defuncting to ensure that the timer doesn't hang around while
running callbacks. Between this change and the delegation of callbacks
to a secondary thread (see commit 5192302, part of the original
PYTHON-367 fix), this commit should not cause a regression on
PYTHON-367.

Note that we don't cancel timers before defuncting and retrying -- we're
not done with the logical request and still want that timer to fire.

Now that timers are started on ResponseFuture initialization, we
actually use the value of timeout in the tests, so we no longer pass
that argument, allowing the test to use the default value.

This reverts commit 1a01f11 for
PYTHON-644. The "create timer on request" strategy can let timers leak.
This allows us to revert some of the fix to PYTHON-367 without the timer
heap growing out of control as described on that ticket. We cancel
before defuncting to ensure that the timer doesn't hang around while
running callbacks. Between this change and the delegation of callbacks
to a secondary thread (see commit 5192302, part of the original
PYTHON-367 fix), this commit should not cause a regression on
PYTHON-367.

Note that we don't cancel timers before defuncting and retrying -- we're
not done with the logical request and still want that timer to fire.

Now that timers are started on ResponseFuture initialization, we
actually use the value of timeout in the tests, so we no longer pass
that argument, allowing the test to use the default value.
@mambocab
Copy link
Copy Markdown
Contributor Author

Travis tests on my fork all passed, except the expected Python 2.6 errors.

@aboudreault aboudreault changed the title Timer logic PYTHON-644 - Timer logic Jan 24, 2017
@aboudreault
Copy link
Copy Markdown
Contributor

LGTM Jim! I tested and confirm it fixes the issue. (Tested with libev and asyncore)

Comment thread cassandra/cluster.py
# we got some other kind of response message
msg = "Got unexpected message: %r" % (response,)
exc = ConnectionException(msg, host)
self._cancel_timer()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this line is really necessary, as the timer will be canceled inside inside
self._set_final_exception(exc)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to cancel as quickly as possible -- without waiting for callbacks to execute in defunct below -- to avoid a pileup of timers when clients don't back off requests on a failing connection as described in PYTHON-367. Do you think this accomplishes that goal?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense

@mambocab mambocab merged commit 115bdbd into apache:master Feb 8, 2017
@mambocab mambocab deleted the timer-logic branch February 8, 2017 20:01
@mambocab mambocab mentioned this pull request Feb 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants