gh-151788: Speed up http.server directory listing by using os.scandir()#151789
gh-151788: Speed up http.server directory listing by using os.scandir()#151789mjbommar wants to merge 3 commits into
Conversation
…candir SimpleHTTPRequestHandler.list_directory() called os.path.isdir() and os.path.islink() for every entry, issuing two stat-family syscalls per file. This is wasted work on any filesystem and dominates listing time for large directories; on network filesystems such as NFS, where each call is a round-trip, it becomes severe. Use os.scandir(), whose DirEntry objects report the type from the directory read itself (d_type / READDIRPLUS), eliminating the per-entry stats in the common case and never doing more work than before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| # Append / for directories or @ for symbolic links. | ||
| # Use cached os.DirEntry methods to avoid a stat() per entry, | ||
| # which is costly on network filesystems such as NFS. |
There was a problem hiding this comment.
Don't change this. The performance comment is unnecessary because it's not meant to be used in production.
| @@ -0,0 +1,4 @@ | |||
| Speed up :class:`http.server.SimpleHTTPRequestHandler` directory listings by | |||
There was a problem hiding this comment.
There is no need for a NEWS entry in this case.
| displayname = name + "/" | ||
| linkname = name + "/" | ||
| if os.path.islink(fullname): | ||
| if entry.is_symlink(): |
There was a problem hiding this comment.
Actually, those methods can raise exceptions while their os.path counter do not. So I'm not that convinced.
There was a problem hiding this comment.
See https://docs.python.org/3/library/os.html#os.DirEntry.is_symlink for instance.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
- Wrap entry.is_dir()/is_symlink() in try/except OSError, falling back to False to exactly mirror os.path.isdir()/islink() (matches os.walk()). - Reword the NEWS entry to avoid speedup-multiplier claims; describe it as improving list_directory() on systems with slow stat calls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the performance rationale from the inline comment; keep only the note explaining the OSError fallback (correctness, not performance). - Reword the NEWS entry to state the mechanism without magnitude claims. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SimpleHTTPRequestHandler.list_directory()calledos.path.isdir()(astat) andos.path.islink()(anlstat) for every entry — two stat-family syscalls per file. That is wasted work on any filesystem and dominates listing time for large directories; on network filesystems such as NFS, where each call is a round-trip, it becomes severe.This switches to
os.scandir(), whoseDirEntryobjects report the entry type from the directory read itself (POSIXd_type/ NFSREADDIRPLUS), eliminating the per-entry stats in the common case. CPython already made this exact migration foros.walk(),glob, andpathlib.Path.iterdir()(gh-117727);http.serverwas simply missed.Behavior preserved
DirEntry.is_dir()/is_symlink()matchos.path.isdir/os.path.islinksemantics — same follow-symlinks behavior and same return-False-on-error behavior — verified across real dirs/files, symlink-to-dir (still rendered with@but linked with/), symlink-to-file, and broken symlinks. The existingLib/test/test_httpservers.pysuite passes unchanged (92/92), including the undecodable/unencodable filename cases that exercise surrogate-escaped names.Benchmark
Directory with 1000 files + 1000 dirs:
strace), oldstatlatency injected)Caveat (no overselling): the large NFS figures assume the filesystem reports entry types in the directory read (
d_type; local filesystems and NFS withREADDIRPLUS, the Linux default). If a mount returnsDT_UNKNOWN,os.scandir()falls back to one cachedlstatper entry — still fewer calls than today, and never worse.This PR was prepared with AI assistance (Claude Code). I reviewed the change and benchmarks and can explain it in my own words.