Use a fixed Zlib::GzipWriter#mtime value in deflater gzip payload#2372
Use a fixed Zlib::GzipWriter#mtime value in deflater gzip payload#2372jeremyevans merged 1 commit intorack:mainfrom
Zlib::GzipWriter#mtime value in deflater gzip payload#2372Conversation
|
Is the current code causing any observable problems, or is this proposed change to reduce overall complexity? |
Good question! It's not causing any observable problems. I see it as:
|
|
Aha! Just found the discussion about this from 2008. Still a bit muddled though:
|
There was a problem hiding this comment.
Pull Request Overview
This PR removes the variable mtime property from gzipped assets produced by Rack::Deflater to ensure deterministic compression output. By setting Zlib::GzipWriter#mtime to a fixed value of 0 instead of using the Last-Modified header value, the compressed resource's content becomes solely dependent on the uncompressed content, not the modification time.
- Removes dynamic
mtimeassignment based onLast-Modifiedheader inRack::Deflater - Sets
Zlib::GzipWriter#mtimeto a fixed value of0for deterministic compression - Updates test assertions to verify the fixed
mtimevalue instead of dynamic time-based checks
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| lib/rack/deflater.rb | Removes Last-Modified header parsing and passes fixed mtime value of 0 to GzipStream |
| test/spec_deflater.rb | Simplifies test to verify gzip mtime is set to 0 instead of checking dynamic time values |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
From what I could tell, out of all the "common" compression content encodings, I looked up the RFC https://www.rfc-editor.org/rfc/rfc1952: So, in short:
That being said, one could assume that we are considering All that being said, what actual problems does this cause? Incorrect cache invalidation? |
I think it's important to note that that linked RFC uses neither "must" nor "should"; my reading is that it is descriptive of the purpose of the header field and how values are to be interpreted alone ("gives"). Under compliance, the spec says:
I wouldn't typically be this legalistic about the RFC, but I'm planning to propose this same change in some other libraries, so the steelmanning is helpful 🙇🏻
Yes, the practical impact is upon cache invalidation. |
|
One thing I don't get, is if the mtime is set from the last modified date, what does this have to do with cache invalidation, surely if the last modified date is set correctly, this is not an issue? |
d99c80a to
61a3426
Compare
|
Seems like the advantage of doing this is mostly simplicity. Possibly slightly better for performance, but unlikely enough to make a difference. While I don't think this is likely to break anything, it's still possible. The behavior isn't a bug, it is deliberate. While I wouldn't advocate for changing from always using 0 mtime to using last modified time as we currently do, I also don't think this is worth changing behavior from last modified time to 0 mtime. I can understand a change in last-modified time without a change in the uncompressed data results in a change in the compressed data. For this to cause unnecessary cache invalidation, the cache would have to not invalidate on a change in last-modified alone, which seems odd. I would expect if the last-modified changed, the cache would invalidate for that reason alone. You mentioned that this causes issues for a more than a couple feed readers, but it seems like the problem should only occur if last-modified is also not set correctly. As long as last-modified is set correctly, there shouldn't be a problem (looks like @ioquatix thinks the same way). |
|
I think we're in agreement on the rationale for this, though I think maybe the details are still muddled. This is more about little-c caching than about Conditional Requests, though there is overlap. The summary being like: Adding a variable
Ok, so on the "so what?"
I concede this is deliberate, but I haven't found evidence/discussion that the resulting behavior was intended, just that there was a historical necessity for setting some value. And the value chosen has these side effects. To move this ahead, it seems like 3 options:
Regardless, thanks for the feedback and discussion. |
|
🤯 Aside: Looking at the test failure in Ruby 2.6, apparently setting (though I can't seem to identify why that would be isolated to Ruby 2.6 alone given the tests passed on earlier ruby versions) |
It fails on 2.4 and 2.5 as well, we just ignore failures on those: |
|
Just to share a couple more of my notes: Apache static const char gzip_header[10] =
{ '\037', '\213', Z_DEFLATED, 0,
0, 0, 0, 0, /* mtime */
0, 0x03 /* Unix OS_CODE */
};nginx rc = deflateInit2(&ctx->zstream, (int) conf->level, Z_DEFLATED,
ctx->wbits + 16, ctx->memlevel, Z_DEFAULT_STRATEGY);Specifically the
There's also a fun security researcher story "HTTP GZIP Compression remote date and time leak" about using gzip mtime to de-anonymize Tor hidden services. At the bottom it describes how Tor itself uses the same |
|
@bensheldon Thank you for the additional evidence. You've convinced me this is a change worth making. To handle Ruby < 2.7, easiest fix is probably using 1 instead of 0 for the mtime on those versions. |
61a3426 to
e4746a2
Compare
Zlib::GzipWriter#mtime in deflater gzip payloadZlib::GzipWriter#mtime value in deflater gzip payload
Binary, I like it. |
|
Thanks @jeremyevans. I just pushed up a commit that sets a constant |
|
@jeremyevans please merge as you see fit. |
e4746a2 to
eeb8c60
Compare
This PR removes the variable
mtimeproperty from gzipped assets produced byRack::Deflater. TheZlib::GzipWriter#mtimeproperty:Assigning
Zlib::GzipWriter#mtimeof the gzipped content based on theLast-Modifiedvalue means that a new, unequal gzipped resource will be generated whenLast-Modifiedchanges even if the content being gzipped remains unchanged. SettingZlib::GzipWriter#mtimeto a fixed value/0causes the compressed resource's content to be deterministically tied solely to the content of its uncompressed counterpart.I couldn't find a rationale for why
Zlib::GzipWriter#mtimewas initially added in d2d51ff.I don't believe that
Zlib::GzipWriter#mtimeserves any purpose in the context of http payloads. TheLast-Modifiedheader is transmitted for both original and compressed resources and I haven't found evidence that any browser/client resolvesgzip#mtime. Some clients do mishandle conditional requests when the last-modified header changes but the contents do not:...but I think it would be weird to justify only handling that with gzip compression.