global: convert to fiber-local storage to fix exit races#5314
global: convert to fiber-local storage to fix exit races#5314ethomson merged 1 commit intolibgit2:masterfrom
Conversation
| * before calling `TlsFree()`, but in fact after | ||
| * shutting down all subsystems. | ||
| */ | ||
| git__free_tls_data(); |
There was a problem hiding this comment.
Mh, in fact this isn't quite right either, as we'll now always free TLS data when calling git_libgit2_shutdown(). But if the same thread has called git_libgit2_init() multiple times, then we definitely don't want that. We could fix the issue by adding another reference count to the global state, but I dunno...
|
The best solution is probably to switch from TLS to FLS, as we can specify an FlsCallback similar to |
ac5cf42 to
dc142aa
Compare
On Windows platforms, we automatically clean up the thread-local storage
upon detaching a thread via `DllMain()`. The thing is that this happens
for every thread of applications that link against the libgit2 DLL, even
those that don't have anything to do with libgit2 itself. As a result,
we cannot assume that these unsuspecting threads make use of our
`git_libgit2_init()` and `git_libgit2_shutdow()` reference counting,
which may lead to racy situations:
Thread 1 Thread 2
git_libgit2_shutdown()
DllMain(DETACH_THREAD)
git__free_tls_data()
git_atomic_dec() == 0
git__free_tls_data()
TlsFree(_tls_index)
TlsGetValue(_tls_index)
Due to the second thread never having executed `git_libgit2_init()`, the
first thread will clean up TLS data and as a result also free the
`_tls_index` variable. When detaching the second thread, we
unconditionally access the now-free'd `_tls_index` variable, which is
obviously not going to work out well.
Fix the issue by converting the code to use fiber-local storage instead
of thread-local storage. While FLS will behave the exact same as TLS if
no fibers are in use, it does allow us to specify a destructor similar
to the one that is accepted by pthread_key_create(3P). Like this, we do
not have to manually free indices anymore, but will let the FLS handle
calling the destructor. This allows us to get rid of `DllMain()`
completely, as we only used it to keep track of when threads were
exiting and results in an overall simplification of TLS cleanup.
dc142aa to
5c6180b
Compare
|
Hum, the results look a lot better than expected. Windows' threading code got a lot cleaner due to the conversion and should now be race-free in the above described failure scenario. Yay for FLS. |
|
Look at you with your fancy Windows fibers. I didn't know that this was a thing that you could do. |
|
There are lots of weird idiosyncrasies to using fibers. But we're not really using fibers, just fiber-local storage, and Raymond Chen has basically blessed this pattern, so I'm 👍 on this. Clever solution. Thanks for digging in to this. |
On Windows platforms, we automatically clean up the thread-local storage
upon detaching a thread via
DllMain(). The thing is that this happensfor every thread of applications that link against the libgit2 DLL, even
those that don't have anything to do with libgit2 itself. As a result,
we cannot assume that these unsuspecting threads make use of our
git_libgit2_init()andgit_libgit2_shutdow()reference counting,which may lead to racy situations:
Due to the second thread never having executed
git_libgit2_init(), thefirst thread will clean up TLS data and as a result also free the
_tls_indexvariable. When detaching the second thread, weunconditionally access the now-free'd
_tls_indexvariable, which isobviously not going to work out well.
Fix the issue by completely removing the
DllMain()function. Its solepurpose was to call
git__free_tls_data()on detaching threads. But aswe require that every thread making use of libgit2 needs to call
git_libgit2_init()andgit_libgit2_shutdow()anyway, we can justmove cleanup of thread-local storage to
git_libgit2_shutdown(). Thisnicely avoids any issues of non-libgit2-related threads trying to clean
up their TLS while still making sure that every libgit2-using thread
will do so. Furthermore, it will avoid free'ing of
_tls_indexwhileany thread still uses libgit2 due to the reference-counting semantics of
the global init/shutdown procedures.
Fixes #5207, #5154.