Fix flaky test test_horrible_queries by tenderlove · Pull Request #3857 · puma/puma

tenderlove · 2026-01-12T18:02:56Z

test_horrible_queries can be flaky, but it's very hard to detect. This commit fixes the flakiness. To understand the flakiness, let me put the original test code below:

 # then that large mangled field values are caught
10.times do |c|
  get = "GET /#{rand_data(10,120)} HTTP/1.1\r\nX-Test: #{rand_data(1024, 1024+(c*1024), false)}\r\n\r\n"
  assert_raises Puma::HttpParserError do
    parser.execute({}, get, 0)
    parser.reset
  end
end

The way I understand rand_data(xxx, false) is that the function should always return invalid bytes (IOW cause the parser to raise an exception).

However, it is possible for that function to sometimes return valid bytes. The above test asserts that an error is raised when parser.execute is called, but if an exception gets raised, then parser.reset isn't called. If parser.reset isn't called, then subsequent calls to parser.execute will raise an exception regardless of whether the input is parse-able or not. In other words, if the first iteration caused an exception, subsequent iterations are guaranteed to raise an exception.

This commit does two things:

Pulls parser.reset outside of the assert_raises so that the flake is more obvious / reproducible
Changes rand_data to always add an invalid byte so that if the random data happens to contain parsable data, we'll at least have one byte guaranteed to fail.

Description

Thank you for contributing! You're the best.

We can read your code, so consider leaving some comments here that are more about your motivations and decision making. Some things that may be helpful to address in your description:

What original problem led to this PR?
Are there related issues / prior discussions?
What alternatives have been tried? Does this supersede previous attempts?
Why do you make the choices you did? What are the tradeoffs?

Your checklist for this pull request

I have reviewed the guidelines for contributing to this repository.
I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
My pull request is 100 lines added/removed or less so that it can be easily reviewed.
If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
I have updated the documentation accordingly.
All new and existing tests passed, including Rubocop.

`test_horrible_queries` can be flaky, but it's very hard to detect. This commit fixes the flakiness. To understand the flakiness, let me put the original test code below: ```ruby # then that large mangled field values are caught 10.times do |c| get = "GET /#{rand_data(10,120)} HTTP/1.1\r\nX-Test: #{rand_data(1024, 1024+(c*1024), false)}\r\n\r\n" assert_raises Puma::HttpParserError do parser.execute({}, get, 0) parser.reset end end ``` The way I understand `rand_data(xxx, false)` is that the function should _always_ return invalid bytes (IOW cause the parser to raise an exception). However, it is possible for that function to _sometimes_ return valid bytes. The above test asserts that an error is raised when `parser.execute` is called, but if an exception gets raised, then `parser.reset` isn't called. If `parser.reset` isn't called, then subsequent calls to `parser.execute` will raise an exception _regardless_ of whether the input is parse-able or not. In other words, if the first iteration caused an exception, subsequent iterations are guaranteed to raise an exception. This commit does two things: 1. Pulls `parser.reset` outside of the assert_raises so that the flake is more obvious / reproducible 2. Changes `rand_data` to always add an invalid byte so that if the random data happens to contain parsable data, we'll at least have one byte guaranteed to fail.

tenderlove · 2026-01-13T00:56:27Z

btw, it looks like we're seeing timeouts in Truffle? This could be because of my change. This loop:

10.times do |c|
  get = "GET /#{rand_data(10,120)} HTTP/1.1\r\nX-Test: #{rand_data(1024, 1024+(c*1024), false)}\r\n\r\n"
  # ....
end

Is generating more and more data on each iteration (because of rand_data(1024, 1024+(c*1024), false)). Before my change, the first iteration would usually fail, causing subsequent iterations to just never parse that data. Now that the data is actually parsed, it's probably (definitely) slowing tests.

tenderlove · 2026-01-13T00:59:21Z

One question for the maintainers: is this test valuable? I mean, is the 10th iteration any more valuable than the 1st? Genuine question because I can't really tell (and I don't think it's my call). Especially since before my patch usually the first iteration made subsequent iterations do nothing (unless we're testing the behavior of not calling reset?)

schneems · 2026-01-13T03:08:12Z

First off. Thank you for contributing! You're the best.

btw, it looks like we're seeing timeouts in Truffle? This could be because of my change. This loop:

Just merged a fix on main. I triggered a rebase of your PR, it's green now.

One question for the maintainers: is this test valuable? I

I have resisted touching the HTTP parser code personally. This almost looks like a cross between fuzzing and regression protection. I'm aware of fuzzing, but haven't actually invested in trying it on a codebase for real. Also, as I understand it, it's a lengthy, expensive process that you probably wouldn't want to run in CI. Might be some opportunity there.

Running the blame, it was added circa 2006 3c804d5

I mean, is the 10th iteration any more valuable than the 1st?

IMHO, no. If you want to clean it up, split it into several real and distinct tests with names instead of comments, and remove the iteration, seems good.

Since you mention it, it also doesn't look like we have a test that asserts that calling parser.reset after a bad parse allows you to parse a good header later. It couldn't hurt to add a regression test.

tenderlove · 2026-01-13T16:35:18Z

This almost looks like a cross between fuzzing and regression protection.

Yes, that's what it looks like to me.

Also, as I understand it, it's a lengthy, expensive process that you probably wouldn't want to run in CI.

As long as the current test isn't causing timeouts, I feel it's probably fine to keep as-is.

Since you mention it, it also doesn't look like we have a test that asserts that calling parser.reset after a bad parse allows you to parse a good header later. It couldn't hurt to add a regression test.

Ok. I'll send a regression test!

This PR inspired me to try my hand at a pure Ruby implementation (which is why I'm looking at this stuff). I want to study the "real" uses of the parser inside of Puma and try to add tests around that. For example, I suspect that internally, Puma will either always call reset on the parser, or just always throw the parser away and allocate a new one (in which case I'm not sure how the reset method would be useful). I want to be cautious not to "over specify" the parser interface which would couple us to an API that isn't actually useful.

schneems · 2026-01-13T16:44:37Z

The parser is used in the Client class. The abstractions are a little off in the internal representation, but the client roughly represents a connection, which can carry multiple requests (wrote more https://www.heroku.com/blog/upgrade-to-puma-7-and-unlock-the-power-of-fair-scheduled-keep-alive/).

So after a request is parsed successfully then the client is reset. This happens in a loop (but only when using keepalive connections):

        while can_loop
        # ...
          case handle_request(client, requests + 1)
          # ... 
          when :keep_alive
            requests += 1

            client.reset

Which clears the state of the parser among other things

    def reset
      @parser.reset

Not having to allocate is a benefit of doing it this way, but at the cost of having more book-keeping and logic. In general we've leaned towards performance at the cost of complexity/maintenance (with the caveat that correctness and reliability or other guarantees take precedence over raw perf).

tenderlove · 2026-01-13T16:53:16Z

Not having to allocate is a benefit of doing it this way, but at the cost of having more book-keeping and logic. In general we've leaned towards performance at the cost of complexity/maintenance (with the caveat that correctness and reliability or other guarantees take precedence over raw perf).

That makes sense. The parser I've designed is a stateless singleton (with no side effects), so there isn't a reset. It takes an offset in to a buffer (same as the current parser), and execute returns the number of bytes read, so if there are multiple incoming requests to parse, the client can track the offset and pass it to execute.

For example (pseudo code):

offset = 0
loop do
  req = {}
  offset = AaronParser.execute(req, buff, offset)
  # If it's a POST, the post body can be extracted from buff via the `offset` + content-length if desired
  # do stuff with req
end

I added a stateful wrapper around my stateless implementation only so that I can make the tests pass. Once I get the tests passing I may send a PR, but honestly it's just for fun and possibly a ZJIT benchmark.

github-actions Bot added the waiting-for-review Waiting on review from anyone label Jan 12, 2026

MSP-Greg approved these changes Jan 12, 2026

View reviewed changes

github-actions Bot added waiting-for-merge and removed waiting-for-review Waiting on review from anyone labels Jan 12, 2026

schneems force-pushed the flaky-test branch from cedc139 to 1e2a96f Compare January 13, 2026 00:04

schneems merged commit 98ff11d into puma:main Jan 13, 2026
123 of 124 checks passed

tenderlove deleted the flaky-test branch January 13, 2026 16:14

MSP-Greg added CI / Testing and removed waiting-for-merge labels Jan 23, 2026

hayat01sh1da mentioned this pull request Apr 10, 2026

[ruby-on-rails] Bump Puma to 8.0.0 hayat01sh1da/botpress-accuracy-checkers#90

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky test test_horrible_queries#3857

Fix flaky test test_horrible_queries#3857
schneems merged 1 commit intopuma:mainfrom
tenderlove:flaky-test

tenderlove commented Jan 12, 2026

Uh oh!

tenderlove commented Jan 13, 2026

Uh oh!

tenderlove commented Jan 13, 2026 •

edited

Loading

Uh oh!

schneems commented Jan 13, 2026

Uh oh!

Uh oh!

tenderlove commented Jan 13, 2026

Uh oh!

schneems commented Jan 13, 2026

Uh oh!

tenderlove commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tenderlove commented Jan 12, 2026

Description

Your checklist for this pull request

Uh oh!

tenderlove commented Jan 13, 2026

Uh oh!

tenderlove commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schneems commented Jan 13, 2026

Uh oh!

Uh oh!

tenderlove commented Jan 13, 2026

Uh oh!

schneems commented Jan 13, 2026

Uh oh!

tenderlove commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tenderlove commented Jan 13, 2026 •

edited

Loading