Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
ext/uri: fast-path canonical URIs in get_normalized_uri
When Uri\Rfc3986\Uri::parse() produces a URI already in canonical form
(the common case: http/https URLs with no uppercase host, no
percent-encoding in unreserved ranges, no ".." path segments),
get_normalized_uri() no longer deep-copies the parsed struct and runs
a full normalization pass. It calls uriNormalizeSyntaxMaskRequiredExA
once to compute the dirty mask; a zero mask means we alias the raw
uri. The struct caches the dirty mask, so multiple non-raw reads on
the same instance only run the scan once.

Fallback: when the mask is nonzero, we copy and normalize as before,
but only for the flagged components (uriNormalizeSyntaxExMmA(...,
dirty_mask, ...) instead of (..., -1, ...)).

Measurements on a 17-URL mix with a realistic parse-and-read workload
(10 runs of 1.7M parses each, CPU pinned via taskset, same-session
stash-pop A/B so both builds share machine state):

                        baseline mean    optimized mean    delta
    parse only         0.3992s (4.26M/s)  0.4083s (4.16M/s)  noise
    parse + 1 read     0.6687s (2.54M/s)  0.5464s (3.11M/s)  -18.3%
    parse + 7 reads    0.8510s (2.00M/s)  0.7305s (2.33M/s)  -14.2%

The "parse + 1 read" row isolates the first-read cost where this
change lands. The "parse + 7 reads" row shows the amortized effect
under a realistic user pattern: the first getter pays the reduced
normalization cost, and the remaining six getters hit the cached
normalized uri and cost the same as before.

hyperfine cross-check on the whole benchmark script, 15 runs each:

    baseline   20.471 s +/- 1.052 s  [19.535 .. 22.985]
    optimized  17.240 s +/- 0.540 s  [16.556 .. 18.190]
    optimized runs 1.19 +/- 0.07 times faster.

All 309 tests in ext/uri/tests pass. I checked that URIs needing
normalization (http://EXAMPLE.com/A/%2e%2e/c resolving to /c) still
hit the full normalize path through the nonzero dirty mask.
  • Loading branch information
iliaal committed Apr 13, 2026
commit 05b9aa2459c9c98d2b900d94e7c8bfee1a1826aa
16 changes: 13 additions & 3 deletions ext/uri/uri_parser_rfc3986.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
struct php_uri_parser_rfc3986_uris {
UriUriA uri;
UriUriA normalized_uri;
unsigned int normalization_mask;
bool normalized_uri_initialized;
};
Comment thread
TimWolla marked this conversation as resolved.

Expand Down Expand Up @@ -85,12 +86,21 @@ ZEND_ATTRIBUTE_NONNULL static void copy_uri(UriUriA *new_uriparser_uri, const Ur

ZEND_ATTRIBUTE_NONNULL static UriUriA *get_normalized_uri(php_uri_parser_rfc3986_uris *uriparser_uris) {
if (!uriparser_uris->normalized_uri_initialized) {
copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri);
int result = uriNormalizeSyntaxExMmA(&uriparser_uris->normalized_uri, (unsigned int)-1, mm);
ZEND_ASSERT(result == URI_SUCCESS);
int mask_result = uriNormalizeSyntaxMaskRequiredExA(&uriparser_uris->uri, &uriparser_uris->normalization_mask);
ZEND_ASSERT(mask_result == URI_SUCCESS);

if (uriparser_uris->normalization_mask != URI_NORMALIZED) {
copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri);
int result = uriNormalizeSyntaxExMmA(&uriparser_uris->normalized_uri, uriparser_uris->normalization_mask, mm);
ZEND_ASSERT(result == URI_SUCCESS);
}
uriparser_uris->normalized_uri_initialized = true;
}

if (uriparser_uris->normalization_mask == URI_NORMALIZED) {
return &uriparser_uris->uri;
}

return &uriparser_uris->normalized_uri;
}

Expand Down
Loading