Testing for Correctness#
This is where the chapter pays off. A pure function is the easiest thing in any program to test, because its result depends on nothing but its arguments — no files, no clock, no hidden state, no setup. Give it the same inputs and it gives the same output, every time. A dynamic language like Python gives you no compiler to catch mistakes ahead of time (Paradigms, Languages, and Types), so the job of showing your code is correct falls to tests — and a functional design makes that job easy. We build it up in three layers.
Layer 1: doctest#
You have been reading tests since the start of this book. Every Try it
live block is written in doctest format: a >>> line with its
expected output underneath. doctest is a standard-library module that finds
those examples inside a function’s docstring and checks that the output
still matches:
def average(scores):
"""Return the mean of a list of numbers.
>>> average([92, 85, 79])
85.33333333333333
>>> average([])
0.0
"""
return sum(scores) / len(scores) if scores else 0.0
Run the file’s doctests from the terminal:
python -m doctest pure_core.py -v
doctests are documentation and tests at once: the docstring shows a reader
how to call the function, and doctest guarantees the example has not
gone stale. They are perfect for the small, pure functions of a functional
core.
Layer 2: Example-Based Tests#
For anything beyond a one-line example, write proper tests with pytest,
as in the Testing chapter. An example-based test
picks specific inputs and asserts the expected output:
from pure_core import average, letter_grade
def test_average_simple():
assert average([92, 85, 79]) == (92 + 85 + 79) / 3
def test_letter_grade_boundaries():
assert letter_grade(90) == "A"
assert letter_grade(89.9) == "B"
assert letter_grade(0) == "F"
Picking good examples — ordinary cases, boundaries, and the empty case — is a skill the Writing Effective Tests section covers in depth. But notice the limitation: you only ever test the inputs you thought to write down. The bug hiding at the input you didn’t think of slips through.
Layer 3: Property-Based Testing#
Property-based testing attacks that limitation. Instead of naming
specific inputs, you state a property that must hold for every input,
and a library generates hundreds of cases — including awkward ones you would
never pick by hand — trying to find a counterexample. The library here is
hypothesis (pip install hypothesis).
The @given decorator says what kind of inputs to generate. A few
property shapes cover most situations. An invariant is something that is
always true of the result — the mean of a list always lies between its
smallest and largest element:
@given(st.lists(st.integers(min_value=-10 ** 6, max_value=10 ** 6), min_size=1))
def test_average_within_bounds(scores):
"""The mean of a non-empty list lies between its smallest and largest."""
assert min(scores) <= average(scores) <= max(scores)
A round-trip property checks that one operation undoes another — reversing a list twice gives back the original:
@given(st.lists(st.integers()))
def test_reverse_roundtrip(xs):
"""Reversing a list twice gives back the original."""
assert list(reversed(list(reversed(xs)))) == xs
An oracle property checks your function against a simpler, trusted one.
Newton’s method on x² − c should agree with math.sqrt:
@given(st.floats(min_value=1.0, max_value=1e6))
def test_newton_matches_sqrt(c):
"""Newton's method on x^2 - c finds the square root math.sqrt gives."""
root = newton(c, lambda x: x * x - c, lambda x: 2 * x)
assert math.isclose(root, math.sqrt(c), rel_tol=1e-6)
Run them with pytest like any other tests:
python -m pytest test_properties.py
The binary search from the Recursion chapter is
a natural fit too — for any sorted list and any key, the result is
either -1 or a valid index holding that key:
@given(st.lists(st.integers()), st.integers())
def test_binary_search_property(arr, key):
arr.sort()
result = binary_search(arr, key)
if key not in arr:
assert result == -1
else:
assert 0 <= result < len(arr)
assert arr[result] == key
Property tests are good at finding edge cases you would never write down by
hand. Stated as the invariant above but over floating-point numbers, the
average property actually fails: hypothesis discovers that averaging several
identical large floats can land a whisker outside [min, max], because
floating-point division rounds. That is not a flaw in your reasoning — it is
floating-point arithmetic showing its seams, surfaced by a test you barely
had to write. This is the reward for a functional design: when your logic
lives in pure functions, you can hand it to a tool that tries thousands of
inputs and tells you the truth about your code.