Recursive Filesystem Traversal#
Note
Source: Adapted from recursion-book-python-filesystems by George K. Thiruvathukal.
File systems are trees: each directory may contain files and other directories. Recursive algorithms map naturally onto this structure. This section presents a reusable event-driven walker that separates traversal logic from what you do at each node.
The Observer Pattern#
The walker fires events — enter_dir, visit_file, leave_dir,
error — and passes them to a handler object you provide. This is
the same idea as a SAX XML parser: the traversal engine and the
application logic are completely separate. The base class defines the
interface:
class FileSystemEventWalker:
"""Base handler for filesystem tree-walk events.
Override any methods you need. Return False from enter_dir to
skip a subtree without raising an error.
"""
def enter_dir(self, path: str, depth: int) -> Optional[bool]:
pass
def leave_dir(self, path: str, depth: int) -> None:
pass
def visit_file(self, path: str, depth: int) -> None:
pass
def error(self, path: str, depth: int, exc: Exception) -> None:
pass
Every method defaults to doing nothing, so a handler only overrides the
events it cares about. Returning False from enter_dir prunes
that subtree — the walker skips it without raising an error.
DFS: Depth-First Traversal#
The recursive walker visits a directory fully before moving to the next sibling:
def walk_dfs(path: str, handler: FileSystemEventWalker, depth: int = 0) -> None:
"""Depth-first recursive traversal rooted at path."""
try:
if not os.path.isdir(path):
handler.visit_file(path, depth)
return
if handler.enter_dir(path, depth) is False:
return
for name in sorted(os.listdir(path)):
child = os.path.join(path, name)
if os.path.isdir(child) and not os.path.islink(child):
walk_dfs(child, handler, depth + 1)
else:
try:
handler.visit_file(child, depth + 1)
except Exception as exc:
handler.error(child, depth + 1, exc)
handler.leave_dir(path, depth)
except Exception as exc:
handler.error(path, depth, exc)
walk_dfs fires enter_dir when it arrives at a directory, then
recurses into each subdirectory (skipping symlinks to prevent cycles),
fires visit_file for each file, and fires leave_dir before
returning. All filesystem exceptions are caught and routed to
handler.error so a single unreadable directory does not abort the
entire walk.
A Concrete Handler#
This handler prints every entry indented by its depth in the tree:
class PrintHandler(FileSystemEventWalker):
"""Prints every entry indented by its depth in the tree."""
def enter_dir(self, path: str, depth: int) -> None:
print(" " * depth + f"[{os.path.basename(path)}/]")
def visit_file(self, path: str, depth: int) -> None:
print(" " * depth + os.path.basename(path))
Calling it on a small project directory:
walk_dfs("project", PrintHandler())
Output:
[project/]
README.md
[src/]
main.py
utils.py
[tests/]
test_main.py
BFS: Breadth-First Traversal#
Depth-first recursion finishes one branch before backtracking. Breadth-first traversal visits every entry at depth 1 before anything at depth 2. Because BFS uses a queue rather than the call stack it is naturally iterative:
def walk_bfs(root: str, handler: FileSystemEventWalker) -> None:
"""Breadth-first iterative traversal rooted at root."""
queue = deque([(root, 0)])
while queue:
path, depth = queue.popleft()
try:
if not os.path.isdir(path):
handler.visit_file(path, depth)
continue
if handler.enter_dir(path, depth) is False:
continue
for name in sorted(os.listdir(path)):
child = os.path.join(path, name)
if os.path.isdir(child) and not os.path.islink(child):
queue.append((child, depth + 1))
else:
try:
handler.visit_file(child, depth + 1)
except Exception as exc:
handler.error(child, depth + 1, exc)
handler.leave_dir(path, depth)
except Exception as exc:
handler.error(path, depth, exc)
Instead of recursive calls, walk_bfs appends child directories to a
deque and processes them in FIFO order. The same PrintHandler
works with either traversal — the handler interface does not change.
Choosing DFS or BFS#
DFS (recursive) |
BFS (iterative queue) |
|
|---|---|---|
Order |
Finishes a full subtree before moving to the next sibling |
All entries at depth N before any entry at depth N+1 |
Stack |
Call stack — depth limited by Python’s recursion limit |
Explicit |
Use when |
You need |
You want level-order output or the tree is thousands of levels deep |
For everyday filesystem tools — pretty-printing, searching, size accounting — DFS is simpler and the recursion depth is well within Python’s default limit. BFS is the better choice when trees are pathologically deep or when level-order output is required.