Skip to content

Use a fixed Zlib::GzipWriter#mtime value in deflater gzip payload#2372

Merged
jeremyevans merged 1 commit intorack:mainfrom
bensheldon:no-gzip-mtime
Sep 3, 2025
Merged

Use a fixed Zlib::GzipWriter#mtime value in deflater gzip payload#2372
jeremyevans merged 1 commit intorack:mainfrom
bensheldon:no-gzip-mtime

Conversation

@bensheldon
Copy link
Copy Markdown
Contributor

This PR removes the variable mtime property from gzipped assets produced by Rack::Deflater. The Zlib::GzipWriter#mtime property:

Specify the modification time (mtime) in the gzip header. Using an Integer.

Setting the mtime in the gzip header does not effect the mtime of the file generated. Different utilities that expand the gzipped files may use the mtime header. For example the gunzip utility can use the ‘-N` flag which will set the resultant file’s mtime to the value in the header. By default many tools will set the mtime of the expanded file to the mtime of the gzipped file, not the mtime in the header.

If you do not set an mtime, the default value will be the time when compression started. Setting a value of 0 indicates no time stamp is available.

Assigning Zlib::GzipWriter#mtime of the gzipped content based on the Last-Modified value means that a new, unequal gzipped resource will be generated when Last-Modified changes even if the content being gzipped remains unchanged. Setting Zlib::GzipWriter#mtime to a fixed value/0 causes the compressed resource's content to be deterministically tied solely to the content of its uncompressed counterpart.

I couldn't find a rationale for why Zlib::GzipWriter#mtime was initially added in d2d51ff.

I don't believe that Zlib::GzipWriter#mtime serves any purpose in the context of http payloads. The Last-Modified header is transmitted for both original and compressed resources and I haven't found evidence that any browser/client resolves gzip#mtime. Some clients do mishandle conditional requests when the last-modified header changes but the contents do not:

More than a couple of feed readers use a library which hashes the contents of the body and then skips updating their cached values for Last-Modified and ETag. This makes them send the old values over and over, turning them into a generator of unconditional requests. This is bad.

...but I think it would be weird to justify only handling that with gzip compression.

@jeremyevans
Copy link
Copy Markdown
Contributor

Is the current code causing any observable problems, or is this proposed change to reduce overall complexity?

@bensheldon
Copy link
Copy Markdown
Contributor Author

Is the current code causing any observable problems, or is this proposed change to reduce overall complexity?

Good question! It's not causing any observable problems. I see it as:

  • Improving correctness and exemplariness. Sprockets is doing the same thing (next on my list) where it is causing me observable problems.
  • Removing complexity
  • Speculatively improving cacheability.

@bensheldon
Copy link
Copy Markdown
Contributor Author

Aha! Just found the discussion about this from 2008. Still a bit muddled though:

I can't get a full test case, because gzip needs a header for the
mtime of the data, changing the output. Is there a good way to test
the output (maybe gunzipping the output and comparing it to the
original)? I should probably also change it to use the Last-Modified
header as the mtime.

@ioquatix ioquatix requested a review from Copilot August 27, 2025 08:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the variable mtime property from gzipped assets produced by Rack::Deflater to ensure deterministic compression output. By setting Zlib::GzipWriter#mtime to a fixed value of 0 instead of using the Last-Modified header value, the compressed resource's content becomes solely dependent on the uncompressed content, not the modification time.

  • Removes dynamic mtime assignment based on Last-Modified header in Rack::Deflater
  • Sets Zlib::GzipWriter#mtime to a fixed value of 0 for deterministic compression
  • Updates test assertions to verify the fixed mtime value instead of dynamic time-based checks

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
lib/rack/deflater.rb Removes Last-Modified header parsing and passes fixed mtime value of 0 to GzipStream
test/spec_deflater.rb Simplifies test to verify gzip mtime is set to 0 instead of checking dynamic time values

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ioquatix
Copy link
Copy Markdown
Member

ioquatix commented Aug 27, 2025

From what I could tell, out of all the "common" compression content encodings, gzip is the only one that has an mtime field. In other words, it looks like br, zstd and deflate do not include such metadata.

I looked up the RFC https://www.rfc-editor.org/rfc/rfc1952:

         MTIME (Modification TIME)
            This gives the most recent modification time of the original
            file being compressed.  The time is in Unix format, i.e.,
            seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
            may cause problems for MS-DOS and other systems that use
            local rather than Universal time.)  If the compressed data
            did not come from a file, MTIME is set to the time at which
            compression started.  MTIME = 0 means no time stamp is
            available.

So, in short:

  1. If the compressed data is from a file, it should use the mtime of the file (last-modified is the best we have).
  2. If the compressed data is not from a file, it should use mtime = 0.

That being said, one could assume that we are considering content-encoding always to be a stream format, so I'd be okay with (2) for everything. In addition, making this change simplifies the implementation, and I don't think there is going to be any impact.

All that being said, what actual problems does this cause? Incorrect cache invalidation?

@bensheldon
Copy link
Copy Markdown
Contributor Author

So, in short:

  1. If the compressed data is from a file, it should use the mtime of the file (last-modified is the best we have).
  2. If the compressed data is not from a file, it should use mtime = 0.

I think it's important to note that that linked RFC uses neither "must" nor "should"; my reading is that it is descriptive of the purpose of the header field and how values are to be interpreted alone ("gives").

Under compliance, the spec says:

A compliant compressor must produce files with correct ID1,
ID2, CM, CRC32, and ISIZE, but may set all the other fields in
the fixed-length part of the header to default values (255 for
OS, 0 for all others). The compressor must set all reserved
bits to zero.

I wouldn't typically be this legalistic about the RFC, but I'm planning to propose this same change in some other libraries, so the steelmanning is helpful 🙇🏻

All that being said, what actual problems does this cause? Incorrect cache invalidation?

Yes, the practical impact is upon cache invalidation.

@ioquatix
Copy link
Copy Markdown
Member

ioquatix commented Sep 1, 2025

One thing I don't get, is if the mtime is set from the last modified date, what does this have to do with cache invalidation, surely if the last modified date is set correctly, this is not an issue?

@jeremyevans
Copy link
Copy Markdown
Contributor

Seems like the advantage of doing this is mostly simplicity. Possibly slightly better for performance, but unlikely enough to make a difference. While I don't think this is likely to break anything, it's still possible. The behavior isn't a bug, it is deliberate. While I wouldn't advocate for changing from always using 0 mtime to using last modified time as we currently do, I also don't think this is worth changing behavior from last modified time to 0 mtime.

I can understand a change in last-modified time without a change in the uncompressed data results in a change in the compressed data. For this to cause unnecessary cache invalidation, the cache would have to not invalidate on a change in last-modified alone, which seems odd. I would expect if the last-modified changed, the cache would invalidate for that reason alone. You mentioned that this causes issues for a more than a couple feed readers, but it seems like the problem should only occur if last-modified is also not set correctly. As long as last-modified is set correctly, there shouldn't be a problem (looks like @ioquatix thinks the same way).

@bensheldon
Copy link
Copy Markdown
Contributor Author

I think we're in agreement on the rationale for this, though I think maybe the details are still muddled.

This is more about little-c caching than about Conditional Requests, though there is overlap. The summary being like: Adding a variable GZip#mtime causes the compressed content (e.g. integrity hash) to have a different lifecycle--it changes more frequently--than the uncompressed content's lifecycle.

  • If there is no last-modified header, the compressed version changes (e.g. integrity hash) on every request / once per second.
  • if there is a last-modified header, the compressed version changes whenever the last-modified header changes, irrespective of the uncompressed version (maybe it also changed maybe it didn't).

Ok, so on the "so what?"

  • If doing Conditional Requests strictly/correctly, then yes, it doesn't matter from the outside. Though if the implementation of the cache is storing payload and headers separately, and deduping/compressing based on the payload, it's more inefficient. But eh, maybe that's their problem.
  • If not doing Conditional Requests strictly/correctly, or not at all (no 'last-modified') then this makes it more difficult to implement a heuristic-based cache (e.g. comparing content hashes) because the content changes more frequently (maybe no big deal) if not all the time (maybe a big deal).

The behavior isn't a bug, it is deliberate.

I concede this is deliberate, but I haven't found evidence/discussion that the resulting behavior was intended, just that there was a historical necessity for setting some value. And the value chosen has these side effects.

To move this ahead, it seems like 3 options:

  • Don't make any changes (I totally appreciate the like "we wouldn't have made the original change today, but changing it again wouldn't be worth the squeeze")
  • Change this PR to just set 0 when last-modified is blank so the payload doesn't change all the request / every second.
  • Accept the PR because I've made my case better 😄

Regardless, thanks for the feedback and discussion.

@bensheldon
Copy link
Copy Markdown
Contributor Author

🤯 Aside: Looking at the test failure in Ruby 2.6, apparently setting GZip#mtime = 0 didn't work for a time: https://bugs.ruby-lang.org/issues/16285

(though I can't seem to identify why that would be isolated to Ruby 2.6 alone given the tests passed on earlier ruby versions)

@jeremyevans
Copy link
Copy Markdown
Contributor

🤯 Aside: Looking at the test failure in Ruby 2.6, apparently setting GZip#mtime = 0 didn't work for a time: https://bugs.ruby-lang.org/issues/16285

(though I can't seem to identify why that would be isolated to Ruby 2.6 alone given the tests passed on earlier ruby versions)

It fails on 2.4 and 2.5 as well, we just ignore failures on those:

@bensheldon
Copy link
Copy Markdown
Contributor Author

bensheldon commented Sep 2, 2025

Just to share a couple more of my notes:

Apache mod_deflate sets gzip mtime to 0 by using a static file header:

static const char gzip_header[10] =
{ '\037', '\213', Z_DEFLATED, 0,
  0, 0, 0, 0, /* mtime */
  0, 0x03 /* Unix OS_CODE */
};

nginx ngx_http_gzip_module uses deflateInit2 with a setting that sets it to 0:

    rc = deflateInit2(&ctx->zstream, (int) conf->level, Z_DEFLATED,
                      ctx->wbits + 16, ctx->memlevel, Z_DEFAULT_STRATEGY);

Specifically the ctx->wbits + 16, which the deflateInit2 docs say (emphasis mine):

windowBits can also be greater than 15 for optional gzip encoding. Add 16 to windowBits to write a simple gzip header and trailer around the compressed data instead of a zlib wrapper. The gzip header will have no file name, no extra data, no comment, no modification time (set to zero), no header crc, and the operating system will be set to the appropriate value, if the operating system was determined at compile time. If a gzip stream is being written, strm->adler is a CRC-32 instead of an Adler-32.

There's also a fun security researcher story "HTTP GZIP Compression remote date and time leak" about using gzip mtime to de-anonymize Tor hidden services. At the bottom it describes how Tor itself uses the same deflateInit2 strategy as nginx to zero out the headers.

@jeremyevans
Copy link
Copy Markdown
Contributor

@bensheldon Thank you for the additional evidence. You've convinced me this is a change worth making.

To handle Ruby < 2.7, easiest fix is probably using 1 instead of 0 for the mtime on those versions.

@bensheldon bensheldon changed the title Don't include variable Zlib::GzipWriter#mtime in deflater gzip payload Use a fixed Zlib::GzipWriter#mtime value in deflater gzip payload Sep 3, 2025
@ioquatix
Copy link
Copy Markdown
Member

ioquatix commented Sep 3, 2025

To handle Ruby < 2.7, easiest fix is probably using 1 instead of 0 for the mtime on those versions.

Binary, I like it.

@bensheldon
Copy link
Copy Markdown
Contributor Author

Thanks @jeremyevans. I just pushed up a commit that sets a constant GZIP_MTIME = RUBY_VERSION >= "2.7" ? 0 : 1 and also includes a changelog entry.

@ioquatix
Copy link
Copy Markdown
Member

ioquatix commented Sep 3, 2025

@jeremyevans please merge as you see fit.

@jeremyevans jeremyevans merged commit 8116397 into rack:main Sep 3, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants