-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
gh-89727: Fix os.walk RecursionError on deep trees #99803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
99f4691
3552d49
3078ea6
f37bbe8
f26a5b8
955d21b
8fbfb2e
ef46eda
e261b9f
5ec53f7
507b650
1c35610
73138a6
2814cc5
37f3cc7
d9c766c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -343,86 +343,96 @@ def walk(top, topdown=True, onerror=None, followlinks=False): | |
| return _walk(fspath(top), topdown, onerror, followlinks) | ||
|
|
||
| def _walk(top, topdown, onerror, followlinks): | ||
| dirs = [] | ||
| nondirs = [] | ||
| walk_dirs = [] | ||
|
|
||
| # We may not have read permission for top, in which case we can't | ||
| # get a list of the files the directory contains. os.walk | ||
| # always suppressed the exception then, rather than blow up for a | ||
| # minor reason when (say) a thousand readable directories are still | ||
| # left to visit. That logic is copied here. | ||
| try: | ||
| # Note that scandir is global in this module due | ||
| # to earlier import-*. | ||
| scandir_it = scandir(top) | ||
| except OSError as error: | ||
| if onerror is not None: | ||
| onerror(error) | ||
| return | ||
| stack = [(False, top)] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will it make more sense to use a deque here? They are optimized for appending and popping while having the same interface.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think when append and pop occur only at the end (as in this case), performance of list and deque is basically the same. When append/pop can occur at the start, then you need deque ("double-ended queue") to avoid O(n) copy.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory I think a deque is a slightly better fit, since we're only popping and appending (at a certain point
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, list has to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, good points. I just did some rough benchmarking as well and found no real difference. Also, I didn't find the exact cause, but adding I'll switch back to list |
||
| while stack: | ||
| is_result, top = stack.pop() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like the In case we will not find a better name, I believe we should add a comment describing what this variable signifies.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I wasn't sure what to call it. Maybe
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something like that would make a lot more sense, yeah. I used |
||
| if is_result: | ||
| yield top | ||
| continue | ||
|
|
||
| with scandir_it: | ||
| while True: | ||
| try: | ||
| dirs = [] | ||
| nondirs = [] | ||
| walk_dirs = [] | ||
|
|
||
| # We may not have read permission for top, in which case we can't | ||
| # get a list of the files the directory contains. os.walk | ||
| # always suppressed the exception then, rather than blow up for a | ||
| # minor reason when (say) a thousand readable directories are still | ||
| # left to visit. That logic is copied here. | ||
|
jonburdo marked this conversation as resolved.
Outdated
|
||
| try: | ||
| # Note that scandir is global in this module due | ||
| # to earlier import-*. | ||
|
jonburdo marked this conversation as resolved.
Outdated
|
||
| scandir_it = scandir(top) | ||
| except OSError as error: | ||
| if onerror is not None: | ||
| onerror(error) | ||
| continue | ||
|
|
||
| cont = False | ||
| with scandir_it: | ||
| while True: | ||
| try: | ||
| entry = next(scandir_it) | ||
| except StopIteration: | ||
| try: | ||
| entry = next(scandir_it) | ||
| except StopIteration: | ||
| break | ||
| except OSError as error: | ||
| if onerror is not None: | ||
| onerror(error) | ||
| cont = True | ||
| break | ||
| except OSError as error: | ||
| if onerror is not None: | ||
| onerror(error) | ||
| return | ||
|
|
||
| try: | ||
| is_dir = entry.is_dir() | ||
| except OSError: | ||
| # If is_dir() raises an OSError, consider that the entry is not | ||
| # a directory, same behaviour than os.path.isdir(). | ||
| is_dir = False | ||
|
|
||
| if is_dir: | ||
| dirs.append(entry.name) | ||
| else: | ||
| nondirs.append(entry.name) | ||
| try: | ||
| is_dir = entry.is_dir() | ||
| except OSError: | ||
| # If is_dir() raises an OSError, consider that the entry is not | ||
| # a directory, same behaviour than os.path.isdir(). | ||
|
jonburdo marked this conversation as resolved.
Outdated
|
||
| is_dir = False | ||
|
|
||
| if not topdown and is_dir: | ||
| # Bottom-up: recurse into sub-directory, but exclude symlinks to | ||
| # directories if followlinks is False | ||
| if followlinks: | ||
| walk_into = True | ||
| if is_dir: | ||
| dirs.append(entry.name) | ||
| else: | ||
| try: | ||
| is_symlink = entry.is_symlink() | ||
| except OSError: | ||
| # If is_symlink() raises an OSError, consider that the | ||
| # entry is not a symbolic link, same behaviour than | ||
| # os.path.islink(). | ||
| is_symlink = False | ||
| walk_into = not is_symlink | ||
|
|
||
| if walk_into: | ||
| walk_dirs.append(entry.path) | ||
|
|
||
| # Yield before recursion if going top down | ||
| if topdown: | ||
| yield top, dirs, nondirs | ||
|
|
||
| # Recurse into sub-directories | ||
| islink, join = path.islink, path.join | ||
| for dirname in dirs: | ||
| new_path = join(top, dirname) | ||
| # Issue #23605: os.path.islink() is used instead of caching | ||
| # entry.is_symlink() result during the loop on os.scandir() because | ||
| # the caller can replace the directory entry during the "yield" | ||
| # above. | ||
| if followlinks or not islink(new_path): | ||
| yield from _walk(new_path, topdown, onerror, followlinks) | ||
| else: | ||
| # Recurse into sub-directories | ||
| for new_path in walk_dirs: | ||
| yield from _walk(new_path, topdown, onerror, followlinks) | ||
| # Yield after recursion if going bottom up | ||
| yield top, dirs, nondirs | ||
| nondirs.append(entry.name) | ||
|
|
||
| if not topdown and is_dir: | ||
| # Bottom-up: traverse into sub-directory, but exclude symlinks to | ||
| # directories if followlinks is False | ||
| if followlinks: | ||
| walk_into = True | ||
| else: | ||
| try: | ||
| is_symlink = entry.is_symlink() | ||
| except OSError: | ||
| # If is_symlink() raises an OSError, consider that the | ||
| # entry is not a symbolic link, same behaviour than | ||
| # os.path.islink(). | ||
| is_symlink = False | ||
| walk_into = not is_symlink | ||
|
|
||
| if walk_into: | ||
| walk_dirs.append(entry.path) | ||
| if cont: | ||
| continue | ||
|
|
||
| # Yield before sub-directory traversal if going top down | ||
| if topdown: | ||
| yield top, dirs, nondirs | ||
| # Traverse into sub-directories | ||
| islink, join = path.islink, path.join | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're caching these functions in locals, might as well do it outside the main while loop to micro-optimize a bit more. |
||
| for dirname in reversed(dirs): | ||
| new_path = join(top, dirname) | ||
| # Issue #23605: os.path.islink() is used instead of caching | ||
|
jonburdo marked this conversation as resolved.
Outdated
|
||
| # entry.is_symlink() result during the loop on os.scandir() because | ||
| # the caller can replace the directory entry during the "yield" | ||
| # above. | ||
| if followlinks or not islink(new_path): | ||
| stack.append((False, new_path)) | ||
| else: | ||
| # Yield after sub-directory traversal if going bottom up | ||
| stack.append((True, (top, dirs, nondirs))) | ||
| # Traverse into sub-directories | ||
| for new_path in reversed(walk_dirs): | ||
| stack.append((False, new_path)) | ||
|
JelleZijlstra marked this conversation as resolved.
|
||
|
|
||
| __all__.append("walk") | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Fix issue with :func:`os.walk` where a :exc:`RecursionError` would occur on | ||
| deep directory structures by adjusting the implementation of | ||
| :func:`os._walk` to be iterative instead of recursive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we remove the
_walkfunction and simply put everything underwalkbecause the second one was only necessary because of sys.audit not making sense for recursive calls.