Skip to content

global: convert to fiber-local storage to fix exit races#5314

Merged
ethomson merged 1 commit intolibgit2:masterfrom
pks-t:pks/dll-main-removal
Dec 1, 2019
Merged

global: convert to fiber-local storage to fix exit races#5314
ethomson merged 1 commit intolibgit2:masterfrom
pks-t:pks/dll-main-removal

Conversation

@pks-t
Copy link
Copy Markdown
Member

@pks-t pks-t commented Nov 29, 2019

On Windows platforms, we automatically clean up the thread-local storage
upon detaching a thread via DllMain(). The thing is that this happens
for every thread of applications that link against the libgit2 DLL, even
those that don't have anything to do with libgit2 itself. As a result,
we cannot assume that these unsuspecting threads make use of our
git_libgit2_init() and git_libgit2_shutdow() reference counting,
which may lead to racy situations:

Thread 1                    Thread 2

git_libgit2_shutdown()
                            DllMain(DETACH_THREAD)
                            git__free_tls_data()
git_atomic_dec() == 0
git__free_tls_data()
TlsFree(_tls_index)
                            TlsGetValue(_tls_index)

Due to the second thread never having executed git_libgit2_init(), the
first thread will clean up TLS data and as a result also free the
_tls_index variable. When detaching the second thread, we
unconditionally access the now-free'd _tls_index variable, which is
obviously not going to work out well.

Fix the issue by completely removing the DllMain() function. Its sole
purpose was to call git__free_tls_data() on detaching threads. But as
we require that every thread making use of libgit2 needs to call
git_libgit2_init() and git_libgit2_shutdow() anyway, we can just
move cleanup of thread-local storage to git_libgit2_shutdown(). This
nicely avoids any issues of non-libgit2-related threads trying to clean
up their TLS while still making sure that every libgit2-using thread
will do so. Furthermore, it will avoid free'ing of _tls_index while
any thread still uses libgit2 due to the reference-counting semantics of
the global init/shutdown procedures.


Fixes #5207, #5154.

Comment thread src/global.c Outdated
* before calling `TlsFree()`, but in fact after
* shutting down all subsystems.
*/
git__free_tls_data();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, in fact this isn't quite right either, as we'll now always free TLS data when calling git_libgit2_shutdown(). But if the same thread has called git_libgit2_init() multiple times, then we definitely don't want that. We could fix the issue by adding another reference count to the global state, but I dunno...

@pks-t
Copy link
Copy Markdown
Member Author

pks-t commented Nov 29, 2019

The best solution is probably to switch from TLS to FLS, as we can specify an FlsCallback similar to pthread_key_create. I'll take a stab at that.

@pks-t pks-t force-pushed the pks/dll-main-removal branch from ac5cf42 to dc142aa Compare November 29, 2019 11:55
On Windows platforms, we automatically clean up the thread-local storage
upon detaching a thread via `DllMain()`. The thing is that this happens
for every thread of applications that link against the libgit2 DLL, even
those that don't have anything to do with libgit2 itself. As a result,
we cannot assume that these unsuspecting threads make use of our
`git_libgit2_init()` and `git_libgit2_shutdow()` reference counting,
which may lead to racy situations:

    Thread 1                    Thread 2

    git_libgit2_shutdown()
                                DllMain(DETACH_THREAD)
                                git__free_tls_data()
    git_atomic_dec() == 0
    git__free_tls_data()
    TlsFree(_tls_index)
                                TlsGetValue(_tls_index)

Due to the second thread never having executed `git_libgit2_init()`, the
first thread will clean up TLS data and as a result also free the
`_tls_index` variable. When detaching the second thread, we
unconditionally access the now-free'd `_tls_index` variable, which is
obviously not going to work out well.

Fix the issue by converting the code to use fiber-local storage instead
of thread-local storage. While FLS will behave the exact same as TLS if
no fibers are in use, it does allow us to specify a destructor similar
to the one that is accepted by pthread_key_create(3P). Like this, we do
not have to manually free indices anymore, but will let the FLS handle
calling the destructor. This allows us to get rid of `DllMain()`
completely, as we only used it to keep track of when threads were
exiting and results in an overall simplification of TLS cleanup.
@pks-t pks-t force-pushed the pks/dll-main-removal branch from dc142aa to 5c6180b Compare November 29, 2019 12:09
@pks-t pks-t changed the title global: fix racy use of TLS cleanup global: convert to fiber-local storage to fix exit races Nov 29, 2019
@pks-t
Copy link
Copy Markdown
Member Author

pks-t commented Nov 29, 2019

Hum, the results look a lot better than expected. Windows' threading code got a lot cleaner due to the conversion and should now be race-free in the above described failure scenario.

Yay for FLS.

@ethomson
Copy link
Copy Markdown
Member

ethomson commented Dec 1, 2019

Look at you with your fancy Windows fibers. I didn't know that this was a thing that you could do.

@ethomson
Copy link
Copy Markdown
Member

ethomson commented Dec 1, 2019

There are lots of weird idiosyncrasies to using fibers. But we're not really using fibers, just fiber-local storage, and Raymond Chen has basically blessed this pattern, so I'm 👍 on this.

Clever solution. Thanks for digging in to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heap is corrupted during shutdown and TLS cleanup when loaded as DLL in a .NET process.

2 participants