Testing for Correctness#

This is where the chapter pays off. A pure function is the easiest thing in any program to test, because its result depends on nothing but its arguments — no files, no clock, no hidden state, no setup. Give it the same inputs and it gives the same output, every time. A dynamic language like Python gives you no compiler to catch mistakes ahead of time (Paradigms, Languages, and Types), so the job of showing your code is correct falls to tests — and a functional design makes that job easy. We build it up in three layers.

Layer 1: doctest#

You have been reading tests since the start of this book. Every Try it live block is written in doctest format: a >>> line with its expected output underneath. doctest is a standard-library module that finds those examples inside a function’s docstring and checks that the output still matches:

def average(scores):
    """Return the mean of a list of numbers.

    >>> average([92, 85, 79])
    85.33333333333333
    >>> average([])
    0.0
    """
    return sum(scores) / len(scores) if scores else 0.0

Run the file’s doctests from the terminal:

python -m doctest pure_core.py -v

doctests are documentation and tests at once: the docstring shows a reader how to call the function, and doctest guarantees the example has not gone stale. They are perfect for the small, pure functions of a functional core.

Layer 2: Example-Based Tests#

For anything beyond a one-line example, write proper tests with pytest, as in the Testing chapter. An example-based test picks specific inputs and asserts the expected output:

from pure_core import average, letter_grade

def test_average_simple():
    assert average([92, 85, 79]) == (92 + 85 + 79) / 3

def test_letter_grade_boundaries():
    assert letter_grade(90) == "A"
    assert letter_grade(89.9) == "B"
    assert letter_grade(0) == "F"

Picking good examples — ordinary cases, boundaries, and the empty case — is a skill the Writing Effective Tests section covers in depth. But notice the limitation: you only ever test the inputs you thought to write down. The bug hiding at the input you didn’t think of slips through.

Layer 3: Property-Based Testing#

Property-based testing attacks that limitation. Instead of naming specific inputs, you state a property that must hold for every input, and a library generates hundreds of cases — including awkward ones you would never pick by hand — trying to find a counterexample. The library here is hypothesis (pip install hypothesis).

The @given decorator says what kind of inputs to generate. A few property shapes cover most situations. An invariant is something that is always true of the result — the mean of a list always lies between its smallest and largest element:

@given(st.lists(st.integers(min_value=-10 ** 6, max_value=10 ** 6), min_size=1))
def test_average_within_bounds(scores):
    """The mean of a non-empty list lies between its smallest and largest."""
    assert min(scores) <= average(scores) <= max(scores)

A round-trip property checks that one operation undoes another — reversing a list twice gives back the original:

@given(st.lists(st.integers()))
def test_reverse_roundtrip(xs):
    """Reversing a list twice gives back the original."""
    assert list(reversed(list(reversed(xs)))) == xs

An oracle property checks your function against a simpler, trusted one. Newton’s method on x² − c should agree with math.sqrt:

@given(st.floats(min_value=1.0, max_value=1e6))
def test_newton_matches_sqrt(c):
    """Newton's method on x^2 - c finds the square root math.sqrt gives."""
    root = newton(c, lambda x: x * x - c, lambda x: 2 * x)
    assert math.isclose(root, math.sqrt(c), rel_tol=1e-6)

Run them with pytest like any other tests:

python -m pytest test_properties.py

The binary search from the Recursion chapter is a natural fit too — for any sorted list and any key, the result is either -1 or a valid index holding that key:

@given(st.lists(st.integers()), st.integers())
def test_binary_search_property(arr, key):
    arr.sort()
    result = binary_search(arr, key)
    if key not in arr:
        assert result == -1
    else:
        assert 0 <= result < len(arr)
        assert arr[result] == key

Property tests are good at finding edge cases you would never write down by hand. Stated as the invariant above but over floating-point numbers, the average property actually fails: hypothesis discovers that averaging several identical large floats can land a whisker outside [min, max], because floating-point division rounds. That is not a flaw in your reasoning — it is floating-point arithmetic showing its seams, surfaced by a test you barely had to write. This is the reward for a functional design: when your logic lives in pure functions, you can hand it to a tool that tries thousands of inputs and tells you the truth about your code.