Skip to content

race condition of socket usage#30

Closed
veawor wants to merge 1 commit into
python-zeroconf:masterfrom
veawor:master
Closed

race condition of socket usage#30
veawor wants to merge 1 commit into
python-zeroconf:masterfrom
veawor:master

Conversation

@veawor
Copy link
Copy Markdown

@veawor veawor commented Nov 3, 2015

handler(self.zc) of ServiceBrowser.run() and select.select.() of Engine.run() sometimes throws socket.error when invoke ServiceBrowser.cancel() and Zeroconf.close() in another thread. And I've added join() to wait for the termination of the threads, then close socket. Hope this commit makes sense.

…his prevent the threads from using closed sockets.
@jstasiak
Copy link
Copy Markdown
Collaborator

jstasiak commented Dec 1, 2015

Thank you for the contribution @veawor. Could you provide some code or a stack trace so I'd understand what the issue is?

@veawor
Copy link
Copy Markdown
Author

veawor commented Dec 1, 2015

Stack trace is as below:
ERROR:zeroconf:Unknown error, possibly benign: error(10009, 'Bad file descriptor')
Traceback (most recent call last):
File "d:\python\zeroconf.py", line 829, in run
File "d:\python\socket.py", line 224, in meth
File "d:\python\socket.py", line 170, in _dummy
error: [Errno 10009] Bad file descriptor

stephenrauch added a commit to stephenrauch/python-zeroconf that referenced this pull request Mar 21, 2016
When performing an info query via request(), a listener is started, and
a packet is formed. As the packet is formed, known answers are taken
from the cache and placed into the packet.  Then the packet is sent.
The packet is self received (via multicast loopback, I assume).  At that
point the listener is fired and the answers in the packet are propagated
back to the object that started the request.  This is a really long way
around the barn.

The PR queries the cache directly in request() and then calls
update_record().  If all of the information is in the cache, then no
packet is formed or sent or received.  This approach was taken because,
for whatever reason, the reception of the packets on windows via the
loopback was proving to be unreliable.  The method has the side benefit
of being a whole lot faster.

This PR also incorporates the joins() from PR python-zeroconf#30.  In addition it moves
the two joins() in close() to their own thread because they can take
quite a while to execute.
stephenrauch added a commit to stephenrauch/python-zeroconf that referenced this pull request Apr 2, 2016
When performing an info query via request(), a listener is started, and
a packet is formed. As the packet is formed, known answers are taken
from the cache and placed into the packet.  Then the packet is sent.
The packet is self received (via multicast loopback, I assume).  At that
point the listener is fired and the answers in the packet are propagated
back to the object that started the request.  This is a really long way
around the barn.

The PR queries the cache directly in request() and then calls
update_record().  If all of the information is in the cache, then no
packet is formed or sent or received.  This approach was taken because,
for whatever reason, the reception of the packets on windows via the
loopback was proving to be unreliable.  The method has the side benefit
of being a whole lot faster.

This PR also incorporates the joins() from PR python-zeroconf#30.  In addition it moves
the two joins() in close() to their own thread because they can take
quite a while to execute.
stephenrauch added a commit that referenced this pull request Jun 26, 2016
* Fix ability for a cache lookup to match properly

When querying for a service type, the response is processed.  During the
processing, an info lookup is performed.  If the info is not found in
the cache, then a query is sent.  Trouble is that the info requested is
present in the same packet that triggered the lookup, and a query is not
necessary.  But two problems caused the cache lookup to fail.

1) The info was not yet in the cache.  The call back was fired before
all answers in the packet were cached.

2) The test for a cache hit did not work, because the cache hit test
uses a DNSEntry as the comparison object.  But some of the objects in
the cache are descendents of DNSEntry and have their own __eq__()
defined which accesses fields only present on the descendent.  Thus the
test can NEVER work since the descendent's __eq__() will be used.

Also continuing the theme of some other recent pull requests, add three
_GLOBAL_DONE tests to avoid doing work after the attempted stop, and
thus avoid generating (harmless, but annoying) exceptions during
shutdown

* Remove unnecessary packet send in ServiceInfo.request()

When performing an info query via request(), a listener is started, and
a packet is formed. As the packet is formed, known answers are taken
from the cache and placed into the packet.  Then the packet is sent.
The packet is self received (via multicast loopback, I assume).  At that
point the listener is fired and the answers in the packet are propagated
back to the object that started the request.  This is a really long way
around the barn.

The PR queries the cache directly in request() and then calls
update_record().  If all of the information is in the cache, then no
packet is formed or sent or received.  This approach was taken because,
for whatever reason, the reception of the packets on windows via the
loopback was proving to be unreliable.  The method has the side benefit
of being a whole lot faster.

This PR also incorporates the joins() from PR #30.  In addition it moves
the two joins() in close() to their own thread because they can take
quite a while to execute.

* Fix locking race condition in Engine.run()

This fixes a race condition in which the receive engine was waiting
against its condition variable under a different lock than the one it
used to determine if it needed to wait.  This was causing the code to
sometimes take 5 seconds to do anything useful.

When fixing the race condition, decided to also fix the other
correctness issues in the loop which was likely causing the errors that
led to the inclusion of the 'except Exception' catch all.  This in turn
allowed the use of EBADF error due to closing the socket during exit to
be used to get out of the select in a timely manner.

Finally, this allowed reorganizing the shutdown code to shutdown from
the front to the back.  That is to say, shutdown the recv socket first,
which then allows a clean join with the engine thread.  After the engine
thread exits most everything else is inert as all callbacks have been
unwound.

* Remove a now invalid test case

With the restructure of shutdown, Listener() now needs to throw EBADF on
a closed socket to allow a timely and graceful shutdown.

* Shutdown the service listeners in an organized fashion

Also adds names to the various threads to make debugging easier.

* Improve test coverage

Add more needed shutdown cleanup found via additional test coverage.

Force timeout calculation from milli to seconds to use floating point.

* init ServiceInfo._properties

* Add query support and test case for _services._dns-sd._udp.local.

* pep8 cleanup

* Add testcase and fixes for HInfo Record Generation

The DNSHInfo packet generation code was broken. There was no test case for that
functionality, and adding a test case showed four issues. Two of which were
relative to PY3 string, one of which was a typoed reference to an attribute,
and finally the two fields present in the HInfo record were using the wrong
encoding, which is what necessitated the change from write_string() to
write_character_string().
@stephenrauch
Copy link
Copy Markdown
Collaborator

Thanks so much for the pull request.

I made several improvements to ZC() shutdown, including incorporating these changes, before I had access to this repo. As of now I believe the functionality from this PR has been incorporated, and I unfortunately have no good way to merge the PR to give you credit for the change, so I am going to close it.

If you think this an error, or an otherwise bad idea, please comment back.

Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants