-
-
Notifications
You must be signed in to change notification settings - Fork 34.7k
Reading gzip file with very long filename or comment takes long time #150144
Copy link
Copy link
Open
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15pre-release feature fixes, bugs and security fixespre-release feature fixes, bugs and security fixesperformancePerformance or resource usagePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15pre-release feature fixes, bugs and security fixespre-release feature fixes, bugs and security fixesperformancePerformance or resource usagePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Fields
Give feedbackNo fields configured for issues without a type.
Projects
Status
No status
The gzip file can contain filename and comment which are written like null terminated sequences of bytes. GzipFile ignores filename and comment (only calculates their checksum if needed), but simply searching for the terminating null byte, while reading byte-by-byte, takes time. On my computer, with fast CPU and SSD, reading a gzip file containing 1 GiB filename or comment will take over 5 minutes. This is not a security issue per se, because to trigger it, attacker need to send a large file at first place, but this is not fine.
This issue was discovered during discussion in #149945. The original proposed solution for that issue imposed a limit on the size of filename and comment. While the limit on filename is reasonable (but it can depend on platform?), we cannot be sure that there are no uses cases for large comments.
The following PR uses reading by chunks of growing size. It reads a 1 GiB header in fractions of second.
Linked PRs