Test Case and fixes for DNSHInfo#49
Conversation
When querying for a service type, the response is processed. During the processing, an info lookup is performed. If the info is not found in the cache, then a query is sent. Trouble is that the info requested is present in the same packet that triggered the lookup, and a query is not necessary. But two problems caused the cache lookup to fail. 1) The info was not yet in the cache. The call back was fired before all answers in the packet were cached. 2) The test for a cache hit did not work, because the cache hit test uses a DNSEntry as the comparison object. But some of the objects in the cache are descendents of DNSEntry and have their own __eq__() defined which accesses fields only present on the descendent. Thus the test can NEVER work since the descendent's __eq__() will be used. Also continuing the theme of some other recent pull requests, add three _GLOBAL_DONE tests to avoid doing work after the attempted stop, and thus avoid generating (harmless, but annoying) exceptions during shutdown
When performing an info query via request(), a listener is started, and a packet is formed. As the packet is formed, known answers are taken from the cache and placed into the packet. Then the packet is sent. The packet is self received (via multicast loopback, I assume). At that point the listener is fired and the answers in the packet are propagated back to the object that started the request. This is a really long way around the barn. The PR queries the cache directly in request() and then calls update_record(). If all of the information is in the cache, then no packet is formed or sent or received. This approach was taken because, for whatever reason, the reception of the packets on windows via the loopback was proving to be unreliable. The method has the side benefit of being a whole lot faster. This PR also incorporates the joins() from PR python-zeroconf#30. In addition it moves the two joins() in close() to their own thread because they can take quite a while to execute.
This fixes a race condition in which the receive engine was waiting against its condition variable under a different lock than the one it used to determine if it needed to wait. This was causing the code to sometimes take 5 seconds to do anything useful. When fixing the race condition, decided to also fix the other correctness issues in the loop which was likely causing the errors that led to the inclusion of the 'except Exception' catch all. This in turn allowed the use of EBADF error due to closing the socket during exit to be used to get out of the select in a timely manner. Finally, this allowed reorganizing the shutdown code to shutdown from the front to the back. That is to say, shutdown the recv socket first, which then allows a clean join with the engine thread. After the engine thread exits most everything else is inert as all callbacks have been unwound.
With the restructure of shutdown, Listener() now needs to throw EBADF on a closed socket to allow a timely and graceful shutdown.
Also adds names to the various threads to make debugging easier.
Add more needed shutdown cleanup found via additional test coverage. Force timeout calculation from milli to seconds to use floating point.
|
Coverage increased (+15.6%) to 92.174% when pulling 79d0c1eea8f75971361fa072fe767f6eb0ca123d on stephenrauch:Test-HInfo into f33b8f9 on jstasiak:master. |
|
Thank you @stephenrauch. There seem to be multiple commits repeated in this pull request, #48 and #45, many of which don't seem relevant to the pull requests they're part of. May I ask you to leave only the relevant changes in the pull requests, squash commits that belong together (some of the commits break tests and are followed up by commits that fix the issues) and break other commits (that contain multiple unrelated changes, like 75232cc) into a smaller ones? I'd also very much appreciate providing a more detailed description of what's being done in each commit (for example it's not clear to me what's being fixed in 79d0c1e). |
|
Oh, I forgot: inheriting |
|
The repeated changes in this PR are there only because I built them on top of the branch from the previous PR as you noted. That code had yet to be merged, and I really needed to build on top of the previous code because building on top of the code prior to that branch was causing enough errors to be annoying. If the previous pull requests are merged, then this PR only has two commits present on this pull request. Those two submissions were kept separate because one was just formatting changes. The progression is pretty easy to see here: https://github.com/jstasiak/python-zeroconf/network I tried very hard to keep the check-ins monolithic. I have already spent considerable time squashing commits to discrete chucks. I do not believe there are any commits that broke existing test cases. If you are referring to 75232cc, I believe that failure was actually latent, and the new test cases I added, simply exposed the failure, which was a race condition, and had nothing to do with my test case. So I did not squash the check in with the test case in question because it is an independent bug. As a practical matter, 3 1/2 weeks after I started this work there really is no way to modify this progression at this point, short of squashing it all together, and that would be even more confusing as you noted in your commentary on 75232cc. As to inheriting from I don't mind submitting an additional PR to change the asserts to a more specific format needed by nose. Let me know if you would like me to do that. Have you considered using pytest instead? As you noted, it has less need for these special assert methods to be informative. Finally, for the three PRs in question, I wrote about 800 words of change description, so I am really unsure how I could be more detailed. This PR is a bit thin, but I frankly thought the changes in this PR to be documented in the title. The DNSHInfo packet generation was just broken. There was no test case for that functionality, and adding a test case showed four issues. Two of which were relative to unicode, one of which was a reference to an attribute which just didn't exist, and finally the fields used the wrong encoding which is what necessitated the change from write_string() to write_character_string(). So, looking back at it, the encoding failure is not self evident from the changes, but the other three are pretty straight forward in doing a review of the change itself. If you have any specific questions I would be thrilled to answer them, but I don't feel that a blanket statement of "add more detail" is a fair assessment of the work I did here. Cheers, |
| self.server = server | ||
| else: | ||
| self.server = name | ||
| self._properties = {} |
There was a problem hiding this comment.
Race condition in which field is used before it is declared. Latent bug exposed by new test case.
The DNSHInfo packet generation code was broken. There was no test case for that functionality, and adding a test case showed four issues. Two of which were relative to PY3 string, one of which was a typoed reference to an attribute, and finally the two fields present in the HInfo record were using the wrong encoding, which is what necessitated the change from write_string() to write_character_string().
79d0c1e to
175c553
Compare
|
@jstasiak There are several Pull Requests pending. Do you have a plan to get them merged anytime soon? Can I be of help? |
No description provided.