Python Dataclasses and Named Tuples#
Writing a data-holding class by hand requires repeating each field name
three or four times — in __init__ parameters, in self.x = x
assignments, and in __repr__. The @dataclass decorator generates
all of this automatically from a simple field declaration.
Basic Dataclass#
from dataclasses import dataclass
@dataclass
class Contact:
name: str
phone: str
email: str
This single declaration is equivalent to:
class Contact:
def __init__(self, name: str, phone: str, email: str):
self.name = name
self.phone = phone
self.email = email
def __repr__(self):
return f"Contact(name={self.name!r}, phone={self.phone!r}, email={self.email!r})"
def __eq__(self, other):
return (self.name, self.phone, self.email) == (other.name, other.phone, other.email)
Here we create a Contact instance and verify that display and equality
work without any extra code — run it to see the generated __repr__ and
__eq__ in action:
>>> from dataclasses import dataclass
>>> @dataclass
... class Contact:
... name: str
... phone: str
... email: str
...
>>> c = Contact("Marie Ortiz", "773-508-7890", "mortiz2@luc.edu")
>>> print(c)
Contact(name='Marie Ortiz', phone='773-508-7890', email='mortiz2@luc.edu')
>>> print(c == Contact("Marie Ortiz", "773-508-7890", "mortiz2@luc.edu"))
True
Default Values#
Fields can have default values, just like function parameters. Run this and try omitting or supplying the coordinates:
>>> from dataclasses import dataclass
>>> @dataclass
... class Point:
... x: float = 0.0
... y: float = 0.0
...
>>> origin = Point()
>>> p = Point(3.0, 4.0)
>>> print(origin, p)
Point(x=0.0, y=0.0) Point(x=3.0, y=4.0)
Adding Methods#
You can still add your own methods — @dataclass only generates the
boilerplate:
import math
@dataclass
class Point:
x: float
y: float
def distance_to(self, other):
return math.sqrt((self.x - other.x)**2 + (self.y - other.y)**2)
def __str__(self):
return f"({self.x}, {self.y})"
Immutable Dataclasses#
Pass frozen=True to make instances immutable (and hashable, so they
can be used as dictionary keys):
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
# p.x = 5.0 # raises FrozenInstanceError
Named Tuples#
A named tuple is a tuple subclass whose fields have names as well as
positions. It comes from the collections module and requires no
class body at all — you describe the fields in a single line:
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
Point is now a full type. You create instances the same way you
would call a constructor. Run this to see named-field and positional access
side by side:
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(3.0, 4.0)
>>> print(p)
Point(x=3.0, y=4.0)
>>> print(p.x, p.y)
3.0 4.0
>>> print(p[0], p[1]) # positional access still works
3.0 4.0
Because a named tuple is a tuple, unpacking, indexing, and iteration all work exactly as with a plain tuple.
Immutability and Hashing#
Named tuples are immutable — you cannot reassign a field after creation:
p = Point(1.0, 2.0)
# p.x = 5.0 # raises AttributeError
Because they are immutable they are also hashable, so they can be used as dictionary keys, just like plain tuples. Run this to look one up:
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> labels = {Point(0, 0): "origin", Point(1, 0): "east"}
>>> print(labels[Point(0, 0)])
origin
Creating Modified Copies with _replace#
Since you cannot mutate a named tuple in place, the _replace method
returns a new instance with chosen fields changed. Run this and note that
p is unchanged:
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(3.0, 4.0)
>>> q = p._replace(y=0.0)
>>> print(p, q)
Point(x=3.0, y=4.0) Point(x=3.0, y=0.0)
Reworking the Contact Example#
The Contact class from the previous section becomes a one-liner. Run it:
>>> from collections import namedtuple
>>> Contact = namedtuple('Contact', ['name', 'phone', 'email'])
>>> c = Contact("Marie Ortiz", "773-508-7890", "mortiz2@luc.edu")
>>> print(c)
Contact(name='Marie Ortiz', phone='773-508-7890', email='mortiz2@luc.edu')
>>> print(c.name)
Marie Ortiz
This is ideal when a contact record is read-only data being passed
around the program. If you need to update a contact’s email, use
_replace; if you need validation logic in __init__, use a full
class or @dataclass.
Reworking the Point Example#
Point as a named tuple gives you named-field access, a readable
repr, equality comparison, and hashability — all for free. Run it:
>>> import math
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> def distance(p1: Point, p2: Point) -> float:
... return math.sqrt((p1.x - p2.x)**2 + (p1.y - p2.y)**2)
...
>>> origin = Point(0, 0)
>>> p = Point(3, 4)
>>> print(distance(origin, p))
5.0
>>> print(origin == Point(0, 0)) # value equality, like tuples
True
The trade-off: because namedtuple produces a true tuple subclass,
you cannot attach methods directly to it the way you can with a class.
The distance function above is a standalone function rather than a
method — a reasonable choice for simple geometric helpers, but
awkward for richer objects.
Choosing the Right Tool#
Python offers several ways to bundle related data, each with a different trade-off between simplicity, mutability, and power:
Plain tuple — use when the structure is anonymous and throwaway,
or when the positional meaning is universally obvious (e.g., (x, y)
in a tiny helper). No named access, no repr, no type hint.
Named tuple — use for lightweight, immutable records where named field access matters and no behaviour is needed. Zero boilerplate, a clean repr, hashable by default. Good for coordinates, colours, simple value objects.
Color = namedtuple('Color', ['r', 'g', 'b'])
red = Color(255, 0, 0)
Dataclass — use when fields need defaults, mutability, or a small
amount of domain logic (methods). frozen=True gives you the same
immutability and hashability as a named tuple but with richer
annotation support.
@dataclass
class Point:
x: float
y: float
def distance_to(self, other):
return math.sqrt((self.x - other.x)**2 + (self.y - other.y)**2)
Full class — use when initialisation requires validation, when attributes must be kept private, when inheritance is involved, or when the object has complex mutable state that changes over its lifetime.
Feature |
tuple |
namedtuple |
@dataclass |
@dataclass(frozen) |
class |
|---|---|---|---|---|---|
Named fields |
No |
Yes |
Yes |
Yes |
Yes |
Mutable |
No |
No |
Yes |
No |
Yes |
Hashable |
Yes |
Yes |
No |
Yes |
No * |
Methods |
No |
Limited ** |
Yes |
Yes |
Yes |
Boilerplate |
None |
One line |
Decorator |
Decorator |
Full body |
* Unless __hash__ is defined manually.
** Methods can be added via subclassing, but it is uncommon.
When in doubt: start with a named tuple. If you find yourself
wanting defaults, mutation, or methods, switch to @dataclass. If
you need private state or validation logic, write a full class.