Speed-up HTTP 1.1 header and line parsing#12321
Conversation
|
These are my early results, but feel free to point me any other HTTP header parsing intensive benchmark, this PR: |
|
I need to check the ASM of the new methods (including SWAR) yet (that's why is still a DRAFT) |
8d74473 to
2294f81
Compare
|
Re-running builds. Autobahn got stuck and killed after 5 hours. (Which may or may not be a bad sign… haven't seen Autobahn get stuck like that on other PRs that I recall.) |
|
Got again stuck... @franz1981 seems like there is something wrong ? |
Investigating right now: I've tried hard to make the new code to work exactly the same as the previous one, but found weird things related size change and I'm not familiar with the purpose of the code itself :/ |
|
last build failed with jdk 11 and got cancelled here Let's see if I can reproduce the failure by doing the same |
|
@normanmaurer running it from my branch (without leak detection) -> https://github.com/franz1981/netty/runs/6108265337?check_suite_focus=true has worked fine with jdk 11, but I've found some weird thing in the original code and maybe you can help me to understand if I should do the same for this PR; for @Override
public boolean process(byte value) throws Exception {
char nextByte = (char) (value & 0xFF);
if (nextByte == HttpConstants.LF) {
int len = seq.length();
// Drop CR if we had a CRLF pair
if (len >= 1 && seq.charAtUnsafe(len - 1) == HttpConstants.CR) {
-- size;
seq.setLength(len - 1);
}
return false;
}
increaseCount();
seq.append(nextByte);
return true;
}
protected final void increaseCount() {
if (++ size > maxLength) {
// TODO: Respond with Bad Request and discard the traffic
// or close the connection.
// No need to notify the upstream handlers - just log.
// If decoding a response, just throw an exception.
throw newException(maxLength);
}
}Due to the check for |
|
Summoning @trustin as well for #12321 (comment) |
|
Hey, thanks for summoning, @franz1981. It indeed looks like a bug to me and we should count CR. |
|
@franz1981 agree with @trustin here... Can you do an extra PR to fix this on the existing code for visibility ? |
@normanmaurer eg and should both work without throwing any exception is it correct? |
2294f81 to
47e0425
Compare
There was a problem hiding this comment.
@trustin @normanmaurer this is the part to look at
There was a problem hiding this comment.
@trustin @normanmaurer if we read maxLength bytes because readableBytes are less or equal to maxLength we haven't read anything relevant
|
This is still performing as good as #12321 (comment) but I would like to run other benchmark(s) on it (end to end, possibly) @idelpivnitskiy anything in mind I can try? |
|
Let's see @chrisvest if it's still failing on jdk 11 build :( |
47e0425 to
2aa1cac
Compare
|
Adding same treatment for line parsing as well, results: Slightly improved over just the header parsing improvement(s). |
|
@normanmaurer I've decided to both fix and improve performance directly on this one; given that the logic is now slightly more complex, it would have been a waste of energies to fix it twice IMHO Now looking at the assembly and ie why using an appendable sequence while we could:
Let me know wdyt, these could be addressed in follow-up PRs because the new logic enable a clear separation between scanning and copying/encoding |
There was a problem hiding this comment.
TODO here is valid: @trustin @normanmaurer
size is used to track the length of body AND read bytes, in order to throw an exception when the maxLength is exceeded.
In this PR I'm not increasing size with the amount of (already) skipped control chars, meaning that I give the next parse (starting from the first non control chars) more room to read body data.
Not sure if it's correct, but giving that I've slightly changed the original behavior and not getting any test error (until now), it means this is uncovered by tests
|
@chrisvest I will send other changes (just a microbenchmark for the pooling part) in a follow-up PR likely; looking at the assembly I see that that the validation using the offset seems to work the same, hence I haven't decided if change it or not: feel free to merge it and/or wait till the others will complete their review round (@idelpivnitskiy and/or @normanmaurer and/or @vietj ) |
|
@franz1981 is "vert.x" happy as well ? |
|
waiting @vietj to run vertx CI too, thanks @normanmaurer |
|
@franz1981 @vietj ping me once done |
|
@normanmaurer we're ready to go, vertx CI is happy |
|
@franz1981 thanks a lot for all the work you put into this! |
|
@franz1981 can you please open a PR for main as well ? |
|
Yep @normanmaurer will do it today |
…"" This reverts commit 9993e07.
This reverts commit 9993e07.
Motivation: With netty#12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present. Modification: Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call. Result: - The .contains call works properly even when the backing Headers aren't Strings. - Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway. - .contains is now case-insensitive.
Motivation: With #12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present. Modification: Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call. Result: - The .contains call works properly even when the backing Headers aren't Strings. - Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway. - .contains is now case-insensitive.
Fixes netty#13273) Motivation: netty#12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length. It was a wrong assumption. Modifications: Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim Result: Fixes netty#13273
Fixes #13273) (#13274) Motivation: #12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length. It was a wrong assumption. Modifications: Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim Result: Fixes #13273 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation: We've done a number of HTTP decoding improvements in 4.1, including: - netty#13216 - netty#12321 - netty#14672 These should be included in the main branch as well. Modification: Forward-porting the above PRs and adjusting to the Netty 5 buffer APIs. Result: This improves the HTTP decoding performance. Previously we got: ``` Benchmark (step) Mode Cnt Score Error Units HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 2 thrpt 40 11389.290 ± 22.308 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 4 thrpt 40 17422.815 ± 54.609 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 8 thrpt 40 23544.948 ± 118.927 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 16 thrpt 40 28731.491 ± 157.794 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 32 thrpt 40 31864.836 ± 174.970 ops/s ``` With just the changes in this PR, we now get: ``` Benchmark (step) Mode Cnt Score Error Units HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 2 thrpt 40 12746.310 ± 142.026 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 4 thrpt 40 18470.509 ± 25.073 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 8 thrpt 40 25261.320 ± 60.964 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 16 thrpt 40 30133.656 ± 51.143 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 32 thrpt 40 33303.007 ± 144.573 ops/s ```
Motivation: We've done a number of HTTP decoding improvements in 4.1, including: - #13216 - #12321 - #14672 These should be included in the main branch as well. Modification: Forward-porting the above PRs and adjusting to the Netty 5 buffer APIs. Result: This improves the HTTP decoding performance. Previously we got: ``` Benchmark (step) Mode Cnt Score Error Units HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 2 thrpt 40 11389.290 ± 22.308 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 4 thrpt 40 17422.815 ± 54.609 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 8 thrpt 40 23544.948 ± 118.927 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 16 thrpt 40 28731.491 ± 157.794 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 32 thrpt 40 31864.836 ± 174.970 ops/s ``` With just the changes in this PR, we now get: ``` Benchmark (step) Mode Cnt Score Error Units HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 2 thrpt 40 12746.310 ± 142.026 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 4 thrpt 40 18470.509 ± 25.073 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 8 thrpt 40 25261.320 ± 60.964 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 16 thrpt 40 30133.656 ± 51.143 ops/s HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters 32 thrpt 40 33303.007 ± 144.573 ops/s ```
Motivation:
Http content parsing is making use of bi-morphic inheritance and loop fusion to improve performance: both
are not working as expected.
Modifications:
Splitting (CR)LF parsing into SWAR + copy loops ie loop fission
and saving LineParser to inherit HeaderParser
Result:
Parsing speedup