Recursive Filesystem Traversal#

Note

Source: Adapted from recursion-book-python-filesystems by George K. Thiruvathukal.

File systems are trees: each directory may contain files and other directories. Recursive algorithms map naturally onto this structure. This section presents a reusable event-driven walker that separates traversal logic from what you do at each node.

The Observer Pattern#

The walker fires eventsenter_dir, visit_file, leave_dir, error — and passes them to a handler object you provide. This is the same idea as a SAX XML parser: the traversal engine and the application logic are completely separate. The base class defines the interface:

class FileSystemEventWalker:
    """Base handler for filesystem tree-walk events.

    Override any methods you need.  Return False from enter_dir to
    skip a subtree without raising an error.
    """
    def enter_dir(self, path: str, depth: int) -> Optional[bool]:
        pass

    def leave_dir(self, path: str, depth: int) -> None:
        pass

    def visit_file(self, path: str, depth: int) -> None:
        pass

    def error(self, path: str, depth: int, exc: Exception) -> None:
        pass

Every method defaults to doing nothing, so a handler only overrides the events it cares about. Returning False from enter_dir prunes that subtree — the walker skips it without raising an error.

DFS: Depth-First Traversal#

The recursive walker visits a directory fully before moving to the next sibling:

def walk_dfs(path: str, handler: FileSystemEventWalker, depth: int = 0) -> None:
    """Depth-first recursive traversal rooted at path."""
    try:
        if not os.path.isdir(path):
            handler.visit_file(path, depth)
            return
        if handler.enter_dir(path, depth) is False:
            return
        for name in sorted(os.listdir(path)):
            child = os.path.join(path, name)
            if os.path.isdir(child) and not os.path.islink(child):
                walk_dfs(child, handler, depth + 1)
            else:
                try:
                    handler.visit_file(child, depth + 1)
                except Exception as exc:
                    handler.error(child, depth + 1, exc)
        handler.leave_dir(path, depth)
    except Exception as exc:
        handler.error(path, depth, exc)

walk_dfs fires enter_dir when it arrives at a directory, then recurses into each subdirectory (skipping symlinks to prevent cycles), fires visit_file for each file, and fires leave_dir before returning. All filesystem exceptions are caught and routed to handler.error so a single unreadable directory does not abort the entire walk.

A Concrete Handler#

This handler prints every entry indented by its depth in the tree:

class PrintHandler(FileSystemEventWalker):
    """Prints every entry indented by its depth in the tree."""
    def enter_dir(self, path: str, depth: int) -> None:
        print("  " * depth + f"[{os.path.basename(path)}/]")

    def visit_file(self, path: str, depth: int) -> None:
        print("  " * depth + os.path.basename(path))

Calling it on a small project directory:

walk_dfs("project", PrintHandler())

Output:

[project/]
  README.md
  [src/]
    main.py
    utils.py
  [tests/]
    test_main.py

BFS: Breadth-First Traversal#

Depth-first recursion finishes one branch before backtracking. Breadth-first traversal visits every entry at depth 1 before anything at depth 2. Because BFS uses a queue rather than the call stack it is naturally iterative:

def walk_bfs(root: str, handler: FileSystemEventWalker) -> None:
    """Breadth-first iterative traversal rooted at root."""
    queue = deque([(root, 0)])
    while queue:
        path, depth = queue.popleft()
        try:
            if not os.path.isdir(path):
                handler.visit_file(path, depth)
                continue
            if handler.enter_dir(path, depth) is False:
                continue
            for name in sorted(os.listdir(path)):
                child = os.path.join(path, name)
                if os.path.isdir(child) and not os.path.islink(child):
                    queue.append((child, depth + 1))
                else:
                    try:
                        handler.visit_file(child, depth + 1)
                    except Exception as exc:
                        handler.error(child, depth + 1, exc)
            handler.leave_dir(path, depth)
        except Exception as exc:
            handler.error(path, depth, exc)

Instead of recursive calls, walk_bfs appends child directories to a deque and processes them in FIFO order. The same PrintHandler works with either traversal — the handler interface does not change.

Choosing DFS or BFS#

DFS (recursive)

BFS (iterative queue)

Order

Finishes a full subtree before moving to the next sibling

All entries at depth N before any entry at depth N+1

Stack

Call stack — depth limited by Python’s recursion limit

Explicit deque — bounded only by available memory

Use when

You need enter/leave pairing (size rollups, XML-style output)

You want level-order output or the tree is thousands of levels deep

For everyday filesystem tools — pretty-printing, searching, size accounting — DFS is simpler and the recursion depth is well within Python’s default limit. BFS is the better choice when trees are pathologically deep or when level-order output is required.