Multiple fixes to speed up querys and remove exceptions at shutdown#45
Conversation
When querying for a service type, the response is processed. During the processing, an info lookup is performed. If the info is not found in the cache, then a query is sent. Trouble is that the info requested is present in the same packet that triggered the lookup, and a query is not necessary. But two problems caused the cache lookup to fail. 1) The info was not yet in the cache. The call back was fired before all answers in the packet were cached. 2) The test for a cache hit did not work, because the cache hit test uses a DNSEntry as the comparison object. But some of the objects in the cache are descendents of DNSEntry and have their own __eq__() defined which accesses fields only present on the descendent. Thus the test can NEVER work since the descendent's __eq__() will be used. Also continuing the theme of some other recent pull requests, add three _GLOBAL_DONE tests to avoid doing work after the attempted stop, and thus avoid generating (harmless, but annoying) exceptions during shutdown
Remove unnecessary packet send in ServiceInfo.request()Added another commit. My understanding of this code was not complete when writing up the previous description. The two fixes above are needed and improved the situation, but more is necessary if a simple query isn't going to generally take 5-10 seconds: When performing an info query via request(), a listener is started, and a packet is formed. As the packet is formed, known answers are taken from the cache and placed into the packet. Then the packet is sent. The packet is self received (via multicast loopback, I assume). At that point the listener is fired and the answers in the packet are propagated back to the object that started the request. This is a really long way around the barn. This PR queries the cache directly in request() and then calls update_record(). If all of the information is in the cache, then no packet is formed or sent or received. This approach was taken because, for whatever reason, the reception of the packets on windows via the loopback was proving to be unreliable. The method has the side benefit of being a whole lot faster. This PR also incorporates the joins() from PR #30. In addition it moves the two joins() in close() to their own thread because they can take quite a while to execute. |
Fix locking race condition in Engine.run()This commit fixes a race condition in which the receive engine was waiting When fixing the race condition, decided to also fix the other Finally, this allowed reorganizing the shutdown code to shutdown from |
Shutdown the service listeners in an organized fashionThe zc.close() was not doing a join() to these threads, and thus could leave some callbacks half done on termination, because they are daemon threads and would quit when others quit. The check in also adds names to the various threads to make debugging easier. This should be my last commit on this topic of thread safety. The code I am using this with is now running well. If you need any assistance in integrating this, please let me know. |
When performing an info query via request(), a listener is started, and a packet is formed. As the packet is formed, known answers are taken from the cache and placed into the packet. Then the packet is sent. The packet is self received (via multicast loopback, I assume). At that point the listener is fired and the answers in the packet are propagated back to the object that started the request. This is a really long way around the barn. The PR queries the cache directly in request() and then calls update_record(). If all of the information is in the cache, then no packet is formed or sent or received. This approach was taken because, for whatever reason, the reception of the packets on windows via the loopback was proving to be unreliable. The method has the side benefit of being a whole lot faster. This PR also incorporates the joins() from PR python-zeroconf#30. In addition it moves the two joins() in close() to their own thread because they can take quite a while to execute.
This fixes a race condition in which the receive engine was waiting against its condition variable under a different lock than the one it used to determine if it needed to wait. This was causing the code to sometimes take 5 seconds to do anything useful. When fixing the race condition, decided to also fix the other correctness issues in the loop which was likely causing the errors that led to the inclusion of the 'except Exception' catch all. This in turn allowed the use of EBADF error due to closing the socket during exit to be used to get out of the select in a timely manner. Finally, this allowed reorganizing the shutdown code to shutdown from the front to the back. That is to say, shutdown the recv socket first, which then allows a clean join with the engine thread. After the engine thread exits most everything else is inert as all callbacks have been unwound.
With the restructure of shutdown, Listener() now needs to throw EBADF on a closed socket to allow a timely and graceful shutdown.
Also adds names to the various threads to make debugging easier.
Add more needed shutdown cleanup found via additional test coverage. Force timeout calculation from milli to seconds to use floating point.
Improve test coverage from 76% to 91%Add more needed shutdown cleanup found via additional test coverage. Force timeout calculation from milliseconds to seconds to use floating point. Initialize ServiceInfo._properties In the last commit note, I said it was likely the last one on this PR. Well this time I really mean it. |
Fix ability for a cache lookup to match properly
When querying for a service type, the response is processed. During the
processing, an info lookup is performed. If the info is not found in
the cache, then a query is sent. Trouble is that the info requested is
present in the same packet that triggered the lookup, and a query is not
necessary. But two problems caused the cache lookup to fail.
The info was not yet in the cache. The call back was fired before
all answers in the packet were cached.
The test for a cache hit did not work, because the cache hit test
uses a DNSEntry as the comparison object. But some of the objects in
the cache are descendents of DNSEntry and have their own eq()
defined which accesses fields only present on the descendent. Thus the
test can NEVER work since the descendent's eq() will be used.
Also continuing the theme of some other recent pull requests, add three
_GLOBAL_DONE tests to avoid doing work after the attempted stop, and
thus avoid generating (harmless, but annoying) exceptions during
shutdown