Skip to content

Fix flaky test test_horrible_queries#3857

Merged
schneems merged 1 commit intopuma:mainfrom
tenderlove:flaky-test
Jan 13, 2026
Merged

Fix flaky test test_horrible_queries#3857
schneems merged 1 commit intopuma:mainfrom
tenderlove:flaky-test

Conversation

@tenderlove
Copy link
Copy Markdown
Contributor

test_horrible_queries can be flaky, but it's very hard to detect. This commit fixes the flakiness. To understand the flakiness, let me put the original test code below:

 # then that large mangled field values are caught
10.times do |c|
  get = "GET /#{rand_data(10,120)} HTTP/1.1\r\nX-Test: #{rand_data(1024, 1024+(c*1024), false)}\r\n\r\n"
  assert_raises Puma::HttpParserError do
    parser.execute({}, get, 0)
    parser.reset
  end
end

The way I understand rand_data(xxx, false) is that the function should always return invalid bytes (IOW cause the parser to raise an exception).

However, it is possible for that function to sometimes return valid bytes. The above test asserts that an error is raised when parser.execute is called, but if an exception gets raised, then parser.reset isn't called. If parser.reset isn't called, then subsequent calls to parser.execute will raise an exception regardless of whether the input is parse-able or not. In other words, if the first iteration caused an exception, subsequent iterations are guaranteed to raise an exception.

This commit does two things:

  1. Pulls parser.reset outside of the assert_raises so that the flake is more obvious / reproducible
  2. Changes rand_data to always add an invalid byte so that if the random data happens to contain parsable data, we'll at least have one byte guaranteed to fail.

Description

Thank you for contributing! You're the best.

We can read your code, so consider leaving some comments here that are more about your motivations and decision making. Some things that may be helpful to address in your description:

  • What original problem led to this PR?
  • Are there related issues / prior discussions?
  • What alternatives have been tried? Does this supersede previous attempts?
  • Why do you make the choices you did? What are the tradeoffs?

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

@github-actions github-actions Bot added the waiting-for-review Waiting on review from anyone label Jan 12, 2026
@github-actions github-actions Bot added waiting-for-merge and removed waiting-for-review Waiting on review from anyone labels Jan 12, 2026
`test_horrible_queries` can be flaky, but it's very hard to detect.
This commit fixes the flakiness.  To understand the flakiness, let me
put the original test code below:

```ruby
 # then that large mangled field values are caught
10.times do |c|
  get = "GET /#{rand_data(10,120)} HTTP/1.1\r\nX-Test: #{rand_data(1024, 1024+(c*1024), false)}\r\n\r\n"
  assert_raises Puma::HttpParserError do
    parser.execute({}, get, 0)
    parser.reset
  end
end
```

The way I understand `rand_data(xxx, false)` is that the function should
_always_ return invalid bytes (IOW cause the parser to raise an
exception).

However, it is possible for that function to _sometimes_ return valid
bytes.  The above test asserts that an error is raised when
`parser.execute` is called, but if an exception gets raised, then
`parser.reset` isn't called.  If `parser.reset` isn't called, then
subsequent calls to `parser.execute` will raise an exception
_regardless_ of whether the input is parse-able or not.  In other words,
if the first iteration caused an exception, subsequent iterations are
guaranteed to raise an exception.

This commit does two things:

1. Pulls `parser.reset` outside of the assert_raises so that the flake
   is more obvious / reproducible
2. Changes `rand_data` to always add an invalid byte so that if the
   random data happens to contain parsable data, we'll at least have one
   byte guaranteed to fail.
@tenderlove
Copy link
Copy Markdown
Contributor Author

btw, it looks like we're seeing timeouts in Truffle? This could be because of my change. This loop:

10.times do |c|
  get = "GET /#{rand_data(10,120)} HTTP/1.1\r\nX-Test: #{rand_data(1024, 1024+(c*1024), false)}\r\n\r\n"
  # ....
end

Is generating more and more data on each iteration (because of rand_data(1024, 1024+(c*1024), false)). Before my change, the first iteration would usually fail, causing subsequent iterations to just never parse that data. Now that the data is actually parsed, it's probably (definitely) slowing tests.

@tenderlove
Copy link
Copy Markdown
Contributor Author

tenderlove commented Jan 13, 2026

One question for the maintainers: is this test valuable? I mean, is the 10th iteration any more valuable than the 1st? Genuine question because I can't really tell (and I don't think it's my call). Especially since before my patch usually the first iteration made subsequent iterations do nothing (unless we're testing the behavior of not calling reset?)

@schneems
Copy link
Copy Markdown
Member

First off. Thank you for contributing! You're the best.

btw, it looks like we're seeing timeouts in Truffle? This could be because of my change. This loop:

Just merged a fix on main. I triggered a rebase of your PR, it's green now.

One question for the maintainers: is this test valuable? I

I have resisted touching the HTTP parser code personally. This almost looks like a cross between fuzzing and regression protection. I'm aware of fuzzing, but haven't actually invested in trying it on a codebase for real. Also, as I understand it, it's a lengthy, expensive process that you probably wouldn't want to run in CI. Might be some opportunity there.

Running the blame, it was added circa 2006 3c804d5

I mean, is the 10th iteration any more valuable than the 1st?

IMHO, no. If you want to clean it up, split it into several real and distinct tests with names instead of comments, and remove the iteration, seems good.

Since you mention it, it also doesn't look like we have a test that asserts that calling parser.reset after a bad parse allows you to parse a good header later. It couldn't hurt to add a regression test.

@schneems schneems merged commit 98ff11d into puma:main Jan 13, 2026
123 of 124 checks passed
@tenderlove tenderlove deleted the flaky-test branch January 13, 2026 16:14
@tenderlove
Copy link
Copy Markdown
Contributor Author

This almost looks like a cross between fuzzing and regression protection.

Yes, that's what it looks like to me.

Also, as I understand it, it's a lengthy, expensive process that you probably wouldn't want to run in CI.

As long as the current test isn't causing timeouts, I feel it's probably fine to keep as-is.

Since you mention it, it also doesn't look like we have a test that asserts that calling parser.reset after a bad parse allows you to parse a good header later. It couldn't hurt to add a regression test.

Ok. I'll send a regression test!

This PR inspired me to try my hand at a pure Ruby implementation (which is why I'm looking at this stuff). I want to study the "real" uses of the parser inside of Puma and try to add tests around that. For example, I suspect that internally, Puma will either always call reset on the parser, or just always throw the parser away and allocate a new one (in which case I'm not sure how the reset method would be useful). I want to be cautious not to "over specify" the parser interface which would couple us to an API that isn't actually useful.

@schneems
Copy link
Copy Markdown
Member

The parser is used in the Client class. The abstractions are a little off in the internal representation, but the client roughly represents a connection, which can carry multiple requests (wrote more https://www.heroku.com/blog/upgrade-to-puma-7-and-unlock-the-power-of-fair-scheduled-keep-alive/).

So after a request is parsed successfully then the client is reset. This happens in a loop (but only when using keepalive connections):

        while can_loop
        # ...
          case handle_request(client, requests + 1)
          # ... 
          when :keep_alive
            requests += 1

            client.reset

Which clears the state of the parser among other things

    def reset
      @parser.reset

Not having to allocate is a benefit of doing it this way, but at the cost of having more book-keeping and logic. In general we've leaned towards performance at the cost of complexity/maintenance (with the caveat that correctness and reliability or other guarantees take precedence over raw perf).

@tenderlove
Copy link
Copy Markdown
Contributor Author

Not having to allocate is a benefit of doing it this way, but at the cost of having more book-keeping and logic. In general we've leaned towards performance at the cost of complexity/maintenance (with the caveat that correctness and reliability or other guarantees take precedence over raw perf).

That makes sense. The parser I've designed is a stateless singleton (with no side effects), so there isn't a reset. It takes an offset in to a buffer (same as the current parser), and execute returns the number of bytes read, so if there are multiple incoming requests to parse, the client can track the offset and pass it to execute.

For example (pseudo code):

offset = 0
loop do
  req = {}
  offset = AaronParser.execute(req, buff, offset)
  # If it's a POST, the post body can be extracted from buff via the `offset` + content-length if desired
  # do stuff with req
end

I added a stateful wrapper around my stateless implementation only so that I can make the tests pass. Once I get the tests passing I may send a PR, but honestly it's just for fun and possibly a ZJIT benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants