Skip to content

Speed-up HTTP 1.1 header and line parsing#12321

Merged
normanmaurer merged 6 commits intonetty:4.1from
franz1981:fast_header_parse
Jan 27, 2023
Merged

Speed-up HTTP 1.1 header and line parsing#12321
normanmaurer merged 6 commits intonetty:4.1from
franz1981:fast_header_parse

Conversation

@franz1981
Copy link
Copy Markdown
Contributor

Motivation:

Http content parsing is making use of bi-morphic inheritance and loop fusion to improve performance: both
are not working as expected.

Modifications:

Splitting (CR)LF parsing into SWAR + copy loops ie loop fission
and saving LineParser to inherit HeaderParser

Result:
Parsing speedup

@franz1981
Copy link
Copy Markdown
Contributor Author

These are my early results, but feel free to point me any other HTTP header parsing intensive benchmark,
4.1:

Benchmark                                                                              (step)   Mode  Cnt      Score      Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            2  thrpt   10   7417.600 ±  585.303  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       2  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            4  thrpt   10  11726.992 ±  385.804  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       4  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            8  thrpt   10  15545.752 ±  454.380  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       8  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           16  thrpt   10  18464.836 ±  699.593  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      16  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           32  thrpt   10  19669.667 ± 3289.623  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      32  thrpt             NaN               ---

this PR:

Benchmark                                                                              (step)   Mode  Cnt      Score      Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            2  thrpt   10   8596.119 ±  340.640  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       2  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            4  thrpt   10  12566.280 ±  383.398  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       4  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters            8  thrpt   10  17125.243 ±  158.794  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm       8  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           16  thrpt   10  19737.088 ± 1065.066  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      16  thrpt             NaN               ---
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters           32  thrpt   10  21936.390 ±  438.841  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters:·asm      32  thrpt             NaN               ---

@franz1981
Copy link
Copy Markdown
Contributor Author

I need to check the ASM of the new methods (including SWAR) yet (that's why is still a DRAFT)

@chrisvest
Copy link
Copy Markdown
Member

Re-running builds. Autobahn got stuck and killed after 5 hours. (Which may or may not be a bad sign… haven't seen Autobahn get stuck like that on other PRs that I recall.)

@normanmaurer
Copy link
Copy Markdown
Member

Got again stuck... @franz1981 seems like there is something wrong ?

@franz1981
Copy link
Copy Markdown
Contributor Author

franz1981 commented Apr 21, 2022

Got again stuck... @franz1981 seems like there is something wrong ?

Investigating right now: I've tried hard to make the new code to work exactly the same as the previous one, but found weird things related size change and I'm not familiar with the purpose of the code itself :/
Will try to get to the top of it, but not having any failed existing test means that there's something we're not covering (explicitly)

@franz1981 franz1981 changed the title Speed-up HT P 1.1 header parsing Speed-up HTTP 1.1 header parsing Apr 21, 2022
@franz1981
Copy link
Copy Markdown
Contributor Author

@normanmaurer

last build failed with jdk 11 and got cancelled here

2022-04-21T00:30:38.9418372Z 00:30:38.940 [main] INFO  i.n.c.e.EpollSocketShutdownOutputBySelfTest - Running: testShutdownOutput 1 of 2 with UnpooledByteBufAllocator
2022-04-21T00:30:38.9428094Z 00:30:38.942 [testsuite-epoll-worker-11-1] DEBUG i.n.channel.DefaultChannelPipeline - Discarded inbound message AdvancedLeakAwareByteBuf(PooledUnsafeDirectByteBuf(ridx: 0, widx: 1, cap: 64)) that reached at the tail of the pipeline. Please check your pipeline configuration.
2022-04-21T00:30:38.9429628Z 00:30:38.942 [testsuite-epoll-worker-11-1] DEBUG i.n.channel.DefaultChannelPipeline - Discarded message pipeline : [ChannelInboundHandlerAdapter#0, DefaultChannelPipeline$TailContext#0]. Channel : [id: 0x64cf054c, L:/127.0.0.1:35470 ! R:127.0.0.1/127.0.0.1:36221].
2022-04-21T00:30:40.9054161Z 00:30:40.904 [epollEventLoopGroup-34-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-34-1
2022-04-21T00:30:40.9057481Z 00:30:40.904 [epollEventLoopGroup-36-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-36-2
2022-04-21T00:30:40.9058821Z 00:30:40.904 [epollEventLoopGroup-36-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-36-1
2022-04-21T00:30:40.9060084Z 00:30:40.905 [epollEventLoopGroup-34-2] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-34-2
2022-04-21T00:30:40.9083363Z 00:30:40.907 [epollEventLoopGroup-38-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-38-1
2022-04-21T00:30:40.9159906Z 00:30:40.915 [epollEventLoopGroup-40-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 1 thread-local buffer(s) from thread: epollEventLoopGroup-40-1
2022-04-21T05:22:29.3824396Z ##[error]The operation was canceled.

Let's see if I can reproduce the failure by doing the same

@franz1981
Copy link
Copy Markdown
Contributor Author

franz1981 commented Apr 21, 2022

@normanmaurer running it from my branch (without leak detection) -> https://github.com/franz1981/netty/runs/6108265337?check_suite_focus=true

has worked fine with jdk 11, but I've found some weird thing in the original code and maybe you can help me to understand if I should do the same for this PR; for HeaderParser

        @Override
        public boolean process(byte value) throws Exception {
            char nextByte = (char) (value & 0xFF);
            if (nextByte == HttpConstants.LF) {
                int len = seq.length();
                // Drop CR if we had a CRLF pair
                if (len >= 1 && seq.charAtUnsafe(len - 1) == HttpConstants.CR) {
                    -- size;
                    seq.setLength(len - 1);
                }
                return false;
            }

            increaseCount();

            seq.append(nextByte);
            return true;
        }
        
        protected final void increaseCount() {
            if (++ size > maxLength) {
                // TODO: Respond with Bad Request and discard the traffic
                //    or close the connection.
                //       No need to notify the upstream handlers - just log.
                //       If decoding a response, just throw an exception.
                throw newException(maxLength);
            }
        }

Due to the check for LF before incrementing count and the check for (++ size > maxLength), if the readable bytes in a buffer are 4 and max length is 3, but the last byte (buffer offset 3) is LF the parsing won't fail with TooLongHttpHeaderException :O
Meaning that we've parsed 4 bytes, exceeding the configured limit of 3: it looks like a subtle (maybe innocuous) bug, but am I suppose to do the same? :O

@franz1981
Copy link
Copy Markdown
Contributor Author

Summoning @trustin as well for #12321 (comment)

@trustin
Copy link
Copy Markdown
Member

trustin commented Apr 21, 2022

Hey, thanks for summoning, @franz1981. It indeed looks like a bug to me and we should count CR.

@normanmaurer
Copy link
Copy Markdown
Member

@franz1981 agree with @trustin here... Can you do an extra PR to fix this on the existing code for visibility ?

@franz1981
Copy link
Copy Markdown
Contributor Author

franz1981 commented Apr 21, 2022

@franz1981 agree with @trustin here... Can you do an extra PR to fix this on the existing code for visibility ?

@normanmaurer
Yep, so, let me summarize in case I got it wrong: maxLength shouldn't be evaluated against the number of read bytes (ie including the trailing delimiters), but against the resulting body (after removing any trailer with \n or \r\n)

eg

 maxLength = 4096
 <body of 4096 bytes> \r\n = total 4098 bytes

and

 maxLength = 4096
 <body of 4096 bytes> \n = total 4097 bytes

should both work without throwing any exception

is it correct?

@franz1981 franz1981 marked this pull request as ready for review April 21, 2022 15:34
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trustin @normanmaurer this is the part to look at

Copy link
Copy Markdown
Contributor Author

@franz1981 franz1981 Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trustin @normanmaurer if we read maxLength bytes because readableBytes are less or equal to maxLength we haven't read anything relevant

@franz1981
Copy link
Copy Markdown
Contributor Author

This is still performing as good as #12321 (comment) but I would like to run other benchmark(s) on it (end to end, possibly) @idelpivnitskiy anything in mind I can try?

@franz1981
Copy link
Copy Markdown
Contributor Author

Let's see @chrisvest if it's still failing on jdk 11 build :(

Comment thread codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectDecoder.java Outdated
@franz1981 franz1981 changed the title Speed-up HTTP 1.1 header parsing Speed-up HTTP 1.1 header and line parsing Apr 22, 2022
@franz1981 franz1981 marked this pull request as draft April 22, 2022 08:11
@franz1981
Copy link
Copy Markdown
Contributor Author

franz1981 commented Apr 22, 2022

Adding same treatment for line parsing as well, results:

Benchmark                                                                                (step)   Mode  Cnt      Score     Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              2  thrpt   20   8811.375 ± 262.989  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              4  thrpt   20  13051.612 ±  61.193  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters              8  thrpt   20  17298.027 ±  78.159  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters             16  thrpt   20  20127.898 ± 655.783  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters             32  thrpt   20  22062.810 ± 516.576  ops/s

Slightly improved over just the header parsing improvement(s).
Now running other frameworks (based on Netty, Vert-x @vietj ) to see if I got better/similar improvements

@franz1981 franz1981 marked this pull request as ready for review April 22, 2022 09:16
@franz1981
Copy link
Copy Markdown
Contributor Author

franz1981 commented Apr 22, 2022

@normanmaurer I've decided to both fix and improve performance directly on this one; given that the logic is now slightly more complex, it would have been a waste of energies to fix it twice IMHO

Now looking at the assembly and LineParser to check if there's anything I've left that could improve further performance

ie why using an appendable sequence while we could:

  1. just create a slice of the origin buffer to use it temporarly to split into parts (as String): the encoding is ASCII to a byte-holder structure is enough -> no need for char sequences!
  2. create an Ascii appendable structure based on a byte[] to transform the foreach byte -> char into a memcpy

Let me know wdyt, these could be addressed in follow-up PRs because the new logic enable a clear separation between scanning and copying/encoding

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO here is valid: @trustin @normanmaurer

size is used to track the length of body AND read bytes, in order to throw an exception when the maxLength is exceeded.
In this PR I'm not increasing size with the amount of (already) skipped control chars, meaning that I give the next parse (starting from the first non control chars) more room to read body data.

Not sure if it's correct, but giving that I've slightly changed the original behavior and not getting any test error (until now), it means this is uncovered by tests

@franz1981
Copy link
Copy Markdown
Contributor Author

@chrisvest I will send other changes (just a microbenchmark for the pooling part) in a follow-up PR likely; looking at the assembly I see that that the validation using the offset seems to work the same, hence I haven't decided if change it or not: feel free to merge it and/or wait till the others will complete their review round (@idelpivnitskiy and/or @normanmaurer and/or @vietj )

@normanmaurer
Copy link
Copy Markdown
Member

@franz1981 is "vert.x" happy as well ?

@franz1981
Copy link
Copy Markdown
Contributor Author

waiting @vietj to run vertx CI too, thanks @normanmaurer

@normanmaurer
Copy link
Copy Markdown
Member

@franz1981 @vietj ping me once done

@franz1981
Copy link
Copy Markdown
Contributor Author

@normanmaurer we're ready to go, vertx CI is happy
I'll send a followup PR with an additional test, bench and improvement on the header names pooling part, but this PR is already pretty big as it is IMO

@normanmaurer normanmaurer merged commit 0daaec3 into netty:4.1 Jan 27, 2023
@normanmaurer
Copy link
Copy Markdown
Member

@franz1981 thanks a lot for all the work you put into this!

@normanmaurer
Copy link
Copy Markdown
Member

@franz1981 can you please open a PR for main as well ?

@normanmaurer normanmaurer added this to the 4.1.88.Final milestone Jan 27, 2023
@franz1981
Copy link
Copy Markdown
Contributor Author

Yep @normanmaurer will do it today ☺️
Thanks a lot for the patience bud

normanmaurer added a commit that referenced this pull request Feb 13, 2023
Motivation:

This commit introduced a regression while parsing headers that could lead to IndexOutOfBoundsException.

Modifications

Reverts commit 0daaec3.

Result:

Fixes #13213
normanmaurer pushed a commit to franz1981/netty that referenced this pull request Feb 13, 2023
normanmaurer pushed a commit that referenced this pull request Feb 13, 2023
yawkat added a commit to yawkat/netty that referenced this pull request Feb 14, 2023
Motivation:

With netty#12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present.

Modification:

Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call.

Result:

- The .contains call works properly even when the backing Headers aren't Strings.
- Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway.
- .contains is now case-insensitive.
normanmaurer pushed a commit that referenced this pull request Feb 15, 2023
Motivation:

With #12321, the HttpHeaders object returned by the standard HTTP/1 pipeline now contains AsciiStrings, not Strings. This uncovered a bug where HttpHeaders.names() did not behave properly as a Set<String> when the backing HttpHeaders contained a CharSequence type other than String. This leads to .contains incorrectly returning false when the header is present.

Modification:

Replace CharSequenceDelegatingStringSet with a class that delegates directly to the Headers object, instead of Headers.names(), for the .contains call.

Result:

- The .contains call works properly even when the backing Headers aren't Strings.
- Mutation methods were removed. This is an improvement imo, names() previously returned a copy, so changes would not be reflected in the Headers anyway.
- .contains is now case-insensitive.
franz1981 added a commit to franz1981/netty that referenced this pull request Mar 13, 2023
Fixes netty#13273)

Motivation:

netty#12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length.
It was a wrong assumption.

Modifications:

Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim

Result:

Fixes netty#13273
normanmaurer added a commit that referenced this pull request Mar 14, 2023
Fixes #13273) (#13274)


Motivation:

#12321 introduces the assumption that whitespaces/ctrl chars are already stripped out hex chars of HTTP chunk length. It was a wrong assumption.

Modifications:

Correctly check for whitespace and ctrl chars and re-introduce whitespace (leading) trim

Result:

Fixes #13273

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
chrisvest added a commit to chrisvest/netty that referenced this pull request Jan 18, 2025
Motivation:
We've done a number of HTTP decoding improvements in 4.1, including:
- netty#13216
- netty#12321
- netty#14672

These should be included in the main branch as well.

Modification:
Forward-porting the above PRs and adjusting to the Netty 5 buffer APIs.

Result:
This improves the HTTP decoding performance.

Previously we got:

```
Benchmark                                                                         (step)   Mode  Cnt      Score     Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       2  thrpt   40  11389.290 ±  22.308  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       4  thrpt   40  17422.815 ±  54.609  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       8  thrpt   40  23544.948 ± 118.927  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      16  thrpt   40  28731.491 ± 157.794  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      32  thrpt   40  31864.836 ± 174.970  ops/s
```

With just the changes in this PR, we now get:

```
Benchmark                                                                         (step)   Mode  Cnt      Score     Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       2  thrpt   40  12746.310 ± 142.026  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       4  thrpt   40  18470.509 ±  25.073  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       8  thrpt   40  25261.320 ±  60.964  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      16  thrpt   40  30133.656 ±  51.143  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      32  thrpt   40  33303.007 ± 144.573  ops/s
```
chrisvest added a commit that referenced this pull request Jan 21, 2025
Motivation:
We've done a number of HTTP decoding improvements in 4.1, including:
- #13216
- #12321
- #14672

These should be included in the main branch as well.

Modification:
Forward-porting the above PRs and adjusting to the Netty 5 buffer APIs.

Result:
This improves the HTTP decoding performance.

Previously we got:

```
Benchmark                                                                         (step)   Mode  Cnt      Score     Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       2  thrpt   40  11389.290 ±  22.308  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       4  thrpt   40  17422.815 ±  54.609  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       8  thrpt   40  23544.948 ± 118.927  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      16  thrpt   40  28731.491 ± 157.794  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      32  thrpt   40  31864.836 ± 174.970  ops/s
```

With just the changes in this PR, we now get:

```
Benchmark                                                                         (step)   Mode  Cnt      Score     Error  Units
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       2  thrpt   40  12746.310 ± 142.026  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       4  thrpt   40  18470.509 ±  25.073  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters       8  thrpt   40  25261.320 ±  60.964  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      16  thrpt   40  30133.656 ±  51.143  ops/s
HttpRequestDecoderBenchmark.testDecodeWholeRequestInMultipleStepsMixedDelimiters      32  thrpt   40  33303.007 ± 144.573  ops/s
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants