Skip to content

Reading gzip file with very long filename or comment takes long time #150144

@serhiy-storchaka

Description

@serhiy-storchaka

The gzip file can contain filename and comment which are written like null terminated sequences of bytes. GzipFile ignores filename and comment (only calculates their checksum if needed), but simply searching for the terminating null byte, while reading byte-by-byte, takes time. On my computer, with fast CPU and SSD, reading a gzip file containing 1 GiB filename or comment will take over 5 minutes. This is not a security issue per se, because to trigger it, attacker need to send a large file at first place, but this is not fine.

This issue was discovered during discussion in #149945. The original proposed solution for that issue imposed a limit on the size of filename and comment. While the limit on filename is reasonable (but it can depend on platform?), we cannot be sure that there are no uses cases for large comments.

The following PR uses reading by chunks of growing size. It reads a 1 GiB header in fractions of second.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixes3.14bugs and security fixes3.15pre-release feature fixes, bugs and security fixesperformancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions