Dictionary Efficiency#
Note
Source: Adapted from the C# edition (dictionaries/dictionaryefficiency.rst).
The hash-table explanation is language-agnostic; the Python context
replaces C#’s List.IndexOf with Python’s list.index().
Why is dictionary lookup so fast? This section explains the idea behind hash tables — the data structure that makes O(1) average lookup possible.
The Naive Approach: Parallel Lists#
You could simulate a dictionary with two parallel lists — one for keys, one for values — and search for a key with a linear scan:
keys = ["one", "two", "three"]
values = ["uno", "dos", "tres"]
def slow_lookup(keys, values, key):
i = keys.index(key) # linear search: O(N)
return values[i]
list.index() scans from left to right, so on average it examines
N/2 elements — linear order O(N). For a dictionary with one million
entries, this means up to a million comparisons per lookup.
Hash Tables#
A hash table uses a hash function to convert a key directly into an array index. The idea:
Compute
h = hash(key)— a large integer derived from the key’s content.Use
h % table_sizeto pick a slot in an internal array.Store the value in that slot.
For lookup, repeat steps 1–2 to find the same slot without searching. This makes both insert and lookup O(1) on average, regardless of how many entries the dictionary holds.
print(hash("one")) # some large integer, e.g. 3713082716806266542
print(hash("two")) # a different integer
print(hash(42)) # integers hash to themselves (usually)
The Immutability Requirement#
For a hash table to work, the hash of a key must never change while it is in the dictionary. If it did, the key could no longer be found. This is why dictionary keys must be immutable: strings, numbers, and tuples are allowed; lists and other mutable objects are not.
d = {}
d[(1, 2)] = "point" # tuple key: OK
# d[[1, 2]] = "point" # list key: TypeError
Performance Summary#
Operation |
Average time |
|---|---|
|
O(1) |
|
O(1) |
|
O(1) |
|
O(N) |
|
O(N) |
The constant-time guarantee makes dictionaries the right tool whenever you need to look up data by a meaningful key rather than by position.