Skip to content

Multiple fixes to speed up querys and remove exceptions at shutdown#45

Merged
stephenrauch merged 7 commits into
python-zeroconf:masterfrom
stephenrauch:master
Jun 20, 2016
Merged

Multiple fixes to speed up querys and remove exceptions at shutdown#45
stephenrauch merged 7 commits into
python-zeroconf:masterfrom
stephenrauch:master

Conversation

@stephenrauch
Copy link
Copy Markdown
Collaborator

Fix ability for a cache lookup to match properly

When querying for a service type, the response is processed. During the
processing, an info lookup is performed. If the info is not found in
the cache, then a query is sent. Trouble is that the info requested is
present in the same packet that triggered the lookup, and a query is not
necessary. But two problems caused the cache lookup to fail.

  1. The info was not yet in the cache. The call back was fired before
    all answers in the packet were cached.

  2. The test for a cache hit did not work, because the cache hit test
    uses a DNSEntry as the comparison object. But some of the objects in
    the cache are descendents of DNSEntry and have their own eq()
    defined which accesses fields only present on the descendent. Thus the
    test can NEVER work since the descendent's eq() will be used.

Also continuing the theme of some other recent pull requests, add three
_GLOBAL_DONE tests to avoid doing work after the attempted stop, and
thus avoid generating (harmless, but annoying) exceptions during
shutdown

When querying for a service type, the response is processed.  During the
processing, an info lookup is performed.  If the info is not found in
the cache, then a query is sent.  Trouble is that the info requested is
present in the same packet that triggered the lookup, and a query is not
necessary.  But two problems caused the cache lookup to fail.

1) The info was not yet in the cache.  The call back was fired before
all answers in the packet were cached.

2) The test for a cache hit did not work, because the cache hit test
uses a DNSEntry as the comparison object.  But some of the objects in
the cache are descendents of DNSEntry and have their own __eq__()
defined which accesses fields only present on the descendent.  Thus the
test can NEVER work since the descendent's __eq__() will be used.

Also continuing the theme of some other recent pull requests, add three
_GLOBAL_DONE tests to avoid doing work after the attempted stop, and
thus avoid generating (harmless, but annoying) exceptions during
shutdown
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.6%) to 76.031% when pulling 7403dd7 on stephenrauch:master into f33b8f9 on jstasiak:master.

@stephenrauch
Copy link
Copy Markdown
Collaborator Author

Remove unnecessary packet send in ServiceInfo.request()

Added another commit. My understanding of this code was not complete when writing up the previous description. The two fixes above are needed and improved the situation, but more is necessary if a simple query isn't going to generally take 5-10 seconds:

When performing an info query via request(), a listener is started, and a packet is formed. As the packet is formed, known answers are taken from the cache and placed into the packet. Then the packet is sent. The packet is self received (via multicast loopback, I assume). At that point the listener is fired and the answers in the packet are propagated back to the object that started the request. This is a really long way around the barn.

This PR queries the cache directly in request() and then calls update_record(). If all of the information is in the cache, then no packet is formed or sent or received. This approach was taken because, for whatever reason, the reception of the packets on windows via the loopback was proving to be unreliable. The method has the side benefit of being a whole lot faster.

This PR also incorporates the joins() from PR #30. In addition it moves the two joins() in close() to their own thread because they can take quite a while to execute.

@stephenrauch stephenrauch changed the title Fix ability for a cache lookup to match properly Multiple fixes to speed up querys and remove exceptions at shutdown Mar 22, 2016
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.3%) to 76.353% when pulling 9b5e4cb on stephenrauch:master into f33b8f9 on jstasiak:master.

@stephenrauch
Copy link
Copy Markdown
Collaborator Author

Fix locking race condition in Engine.run()

This commit fixes a race condition in which the receive engine was waiting
against its condition variable under a different lock than the one it
used to determine if it needed to wait. This was causing the code to
sometimes take 5 seconds to do anything useful.

When fixing the race condition, decided to also fix the other
correctness issues in the loop which was likely causing the errors that
led to the inclusion of the 'except Exception' catch all. This in turn
allowed the use of EBADF error due to closing the socket during exit to
be used to get out of the select in a timely manner.

Finally, this allowed reorganizing the shutdown code to shutdown from
the front to the back. That is to say, shutdown the recv socket first,
which then allows a clean join with the engine thread. After the engine
thread exits, most everything else is inert as all callbacks have been
unwound.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.2%) to 76.424% when pulling 2f9d535 on stephenrauch:master into f33b8f9 on jstasiak:master.

@stephenrauch
Copy link
Copy Markdown
Collaborator Author

Shutdown the service listeners in an organized fashion

The zc.close() was not doing a join() to these threads, and thus could leave some callbacks half done on termination, because they are daemon threads and would quit when others quit.

The check in also adds names to the various threads to make debugging easier.

This should be my last commit on this topic of thread safety. The code I am using this with is now running well. If you need any assistance in integrating this, please let me know.

@stephenrauch
Copy link
Copy Markdown
Collaborator Author

In doing a bit of review of the other open pull requests, I am pretty sure the code here covers the functionality of the PRs #20, #30 and #41. In addition it fixes Issue #47.

When performing an info query via request(), a listener is started, and
a packet is formed. As the packet is formed, known answers are taken
from the cache and placed into the packet.  Then the packet is sent.
The packet is self received (via multicast loopback, I assume).  At that
point the listener is fired and the answers in the packet are propagated
back to the object that started the request.  This is a really long way
around the barn.

The PR queries the cache directly in request() and then calls
update_record().  If all of the information is in the cache, then no
packet is formed or sent or received.  This approach was taken because,
for whatever reason, the reception of the packets on windows via the
loopback was proving to be unreliable.  The method has the side benefit
of being a whole lot faster.

This PR also incorporates the joins() from PR python-zeroconf#30.  In addition it moves
the two joins() in close() to their own thread because they can take
quite a while to execute.
This fixes a race condition in which the receive engine was waiting
against its condition variable under a different lock than the one it
used to determine if it needed to wait.  This was causing the code to
sometimes take 5 seconds to do anything useful.

When fixing the race condition, decided to also fix the other
correctness issues in the loop which was likely causing the errors that
led to the inclusion of the 'except Exception' catch all.  This in turn
allowed the use of EBADF error due to closing the socket during exit to
be used to get out of the select in a timely manner.

Finally, this allowed reorganizing the shutdown code to shutdown from
the front to the back.  That is to say, shutdown the recv socket first,
which then allows a clean join with the engine thread.  After the engine
thread exits most everything else is inert as all callbacks have been
unwound.
With the restructure of shutdown, Listener() now needs to throw EBADF on
a closed socket to allow a timely and graceful shutdown.
Also adds names to the various threads to make debugging easier.
Add more needed shutdown cleanup found via additional test coverage.

Force timeout calculation from milli to seconds to use floating point.
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+13.5%) to 90.11% when pulling 75232cc on stephenrauch:master into f33b8f9 on jstasiak:master.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+14.2%) to 90.818% when pulling d909942 on stephenrauch:master into f33b8f9 on jstasiak:master.

@stephenrauch
Copy link
Copy Markdown
Collaborator Author

Improve test coverage from 76% to 91%

Add more needed shutdown cleanup found via additional test coverage.

Force timeout calculation from milliseconds to seconds to use floating point.

Initialize ServiceInfo._properties

In the last commit note, I said it was likely the last one on this PR. Well this time I really mean it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants