Fix GH-8739: OPcache restart corrupts SHM in threaded SAPIs (ZTS) by nacholibre · Pull Request #22281 · php/php-src

nacholibre · 2026-06-11T19:42:01Z

Summary

In threaded SAPIs (FrankenPHP, mod_php with a threaded MPM), any OPcache restart — out-of-memory, hash overflow, or opcache_reset() — wipes shared memory while other threads are still executing code from it, crashing the whole server with zend_mm_heap corrupted (SIGABRT) or SIGSEGV. This is the ZTS manifestation of GH-8739 (GH-18517 and GH-14471 were closed as duplicates of it; also reported as php/frankenphp#2170, php/frankenphp#1737).

Root cause

accel_is_inactive() gates restart execution on "no in-flight SHM readers". On POSIX this is implemented with fcntl() record locks: readers take F_RDLCK in accel_activate_add(), and the restarting thread probes with F_GETLK.

POSIX record locks are per-process — a process never conflicts with its own locks. When all requests are threads of one process, F_GETLK reports no conflict, accel_is_inactive() always returns true, and the restart memsets ZCSG(hash) and resets the interned-strings buffer under hundreds of live readers.

This is 100 % reproducible: warm cache + ~120 concurrent requests + one opcache_reset() call crashes FrankenPHP within a second (3–4 threads typically report zend_mm_heap corrupted simultaneously). The OOM path crashes identically once accumulated wasted memory exhausts the SHM — for us this killed a production server roughly daily, on every PHP deployment cycle.

The fix

Windows already solved this exact problem: because Windows SAPIs are threaded, accel_is_inactive() there uses an atomic SHM counter (ZCSG(mem_usage)) instead of fcntl. This PR gives POSIX ZTS builds the equivalent:

ZCSG(ts_active_requests) — atomic counter of in-flight requests (ZTS non-Windows only).
Requests register in RINIT after the restart-pending block (so a thread never blocks its own restart check) and only while ZCG(accelerator_enabled) is true — so once a restart is scheduled, new requests stop registering and the counter drains within the duration of the longest in-flight request, even under saturation.
Deregistration in accel_post_deactivate(), paired via a per-thread flag (ZCG(ts_reader_registered)), so the counter stays balanced across early-return paths.
accel_is_inactive() returns false while the counter is non-zero.
After registering, the request re-checks ZCSG(restart_in_progress) and falls back to uncached execution if it lost the race against an already-running restart.

NTS builds are completely unaffected (no code change compiles in), so php-fpm/prefork keep the fcntl mechanism and its kernel-side crash cleanup. There is no per-request lock added — readers stay lock-free; the cost is one atomic add and one atomic sub per request.

Why not the EBR approach of #21778

I tested #21778 against the reproducer below first: it still crashes identically, because its deferral branch is only reachable as the else of if (accel_is_inactive()) — and under ZTS that check always returns true, so the original unsafe path runs before the new machinery is ever consulted (with opcache.log_verbosity_level=4, Restart Scheduled! and Restarting! appear in the same second with ~120 requests in flight, and the patch's "Deferring opcache restart" message never fires). The epoch/slot machinery also isn't needed for this problem: OPcache already prevents new readers during a pending restart via accelerator_enabled = false, so a plain drain counter is sufficient.

Validation

Real-world workload: FrankenPHP v1.12.2 static (PHP 8.4.20 + this patch), classic mode, 256 threads, Symfony 6.4 production mirror, ~120 concurrent requests replaying live URLs, ~3300 cached scripts:

scenario	unpatched	with this fix
`opcache_reset()` under load	crash on 1st call (`zend_mm_heap corrupted` ×3, SIGABRT/SIGSEGV)	5/5 survived; log shows `Restart Scheduled!` → 1–8 s reader drain → `Restarting!`
wasted-memory exhaustion (mass invalidation cycles with `validate_timestamps=1`)	crash when free SHM hits 0	clean deferred restart (`oom_restarts: 1`), cache rebuilds, server healthy

make test is unaffected (the counter path requires ZTS and concurrency; single-threaded .phpt tests cannot exercise the race — the existing behavior of all opcache tests is unchanged). I'm happy to retest revisions against the FrankenPHP reproducer harness — turnaround is ~15 minutes.

Known residual

A nanoseconds-wide TOCTOU remains between a thread snapshotting accelerator_enabled and its registration becoming visible to a concurrently executing restart. The restart_in_progress re-check narrows it, and the same window exists in the battle-tested fcntl protocol used by process-based SAPIs — this patch brings ZTS to parity with that behavior. Closing it completely would require taking the shared-alloc lock in the reader path, which was already rejected in #14803 for performance reasons.

Before executing a scheduled restart (OOM, hash overflow or opcache_reset()), accel_is_inactive() checks that no request is still reading shared memory. On POSIX this check is implemented with fcntl() record locks: readers take an F_RDLCK in accel_activate_add() and the restarting thread probes for it with F_GETLK. POSIX record locks are per-process: a process never conflicts with its own locks. In threaded SAPIs (FrankenPHP, mod_php with a threaded MPM), all in-flight requests are threads of one process, so F_GETLK never reports a conflict and accel_is_inactive() always returns true. The restart then runs zend_accel_hash_clean() and accel_interned_strings_restore_state() while other threads execute op_arrays and hold interned string pointers in the wiped memory, corrupting the heap ("zend_mm_heap corrupted", SIGSEGV/SIGABRT). Fix it the way ZEND_WIN32 already solved the same problem for threaded SAPIs on Windows: track in-flight requests with an atomic counter in SHM. Requests register in RINIT (only while the accelerator is enabled, so registrations stop as soon as a restart is pending and the counter drains within the duration of the longest in-flight request) and deregister in accel_post_deactivate(). accel_is_inactive() refuses to restart while the counter is non-zero. The counter is only compiled in for ZTS non-Windows builds; NTS process-based SAPIs (php-fpm, mod_php prefork) keep using fcntl() locks, whose kernel-side cleanup on process death they rely on. Verified on a production Symfony workload under FrankenPHP (256 threads, ~120 concurrent requests): repeated opcache_reset() calls and wasted-memory-exhaustion restarts, both previously crashing within seconds, now complete safely after draining readers.

nacholibre requested a review from dstogov as a code owner June 11, 2026 19:42

github-actions Bot added the Extension: opcache label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GH-8739: OPcache restart corrupts SHM in threaded SAPIs (ZTS)#22281

Fix GH-8739: OPcache restart corrupts SHM in threaded SAPIs (ZTS)#22281
nacholibre wants to merge 1 commit into
php:PHP-8.4from
nacholibre:zts-opcache-restart-safety

nacholibre commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nacholibre commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

The fix

Why not the EBR approach of #21778

Validation

Known residual

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nacholibre commented Jun 11, 2026 •

edited

Loading