Python collections Module: A Practical Guide

Blog / Python · January 3, 2024 · Updated June 10, 2026 · 12 min read
Python collections Module: A Practical Guide

Python's collections module is a small standard-library toolbox of specialised container datatypes that go beyond the built-in list, dict, set, and tuple. Reach for it when a plain container forces you to write awkward boilerplate: counting items, providing default values for missing keys, building a fast double-ended queue, giving tuples readable field names, or layering several dictionaries into one view. Each type is written in C, so it is fast, memory-efficient, and battle-tested.

This guide is a practical, up-to-date (Python 3.12+) tour of every container the module ships, with a clear "what it is / when to use it / how" for each. The containers covered:

  • namedtuple — tuples with named fields, for lightweight readable records
  • deque — a thread-safe double-ended queue with O(1) appends/pops at both ends
  • Counter — a dict subclass for counting hashable items
  • defaultdict — a dict that auto-creates missing values
  • OrderedDict — an ordered dict (now niche; see the honest 2026 note below)
  • ChainMap — a single updatable view over multiple mappings
  • UserDict / UserList / UserString — easy-to-subclass wrappers around the built-ins

If you want a deeper, example-heavy walkthrough of Counter, OrderedDict, and defaultdict, see Working with Python Collections — Part 2. The teams at MicroPyramid have leaned on these containers across 50+ Python projects over 12+ years, so the notes below reflect what actually holds up in production code, not just textbook usage.

The collections module at a glance

Type Use it when you need… Key methods Improves on
namedtuple A small immutable record with named, self-documenting fields _make, _replace, _asdict, _fields tuple (no magic indexes)
deque Fast O(1) appends/pops at both ends; a queue or bounded buffer append, appendleft, pop, popleft, rotate, extendleft list (O(n) pop(0)/insert(0))
Counter To tally how often each hashable item appears most_common, elements, total, `+ - & `
defaultdict A missing key to auto-create a default value default_factory, all dict methods dict.setdefault boilerplate
OrderedDict Order-sensitive equality or move_to_end / LRU behaviour move_to_end, popitem(last=) dict (in niche cases only)
ChainMap One updatable view over several dicts (e.g. config layers) new_child, parents, maps merging dicts manually
UserDict/UserList/UserString To subclass a built-in container reliably wraps .data subclassing dict/list directly

namedtuple — readable, immutable records

A namedtuple is a tuple whose positions also have names. You get all the speed and immutability of a tuple, plus the readability of attribute access (point.x instead of point[0]). It is the lightest possible way to model a small fixed record.

When to use it: for small, immutable value objects — a coordinate, an RGB colour, a row from a query — where you want clear field names but don't need methods or mutability. Because it is still a real tuple, it unpacks, indexes, and compares exactly like one.

from collections import namedtuple

Company = namedtuple("Company", ["name", "location", "website"])

mp = Company(name="MicroPyramid", location="Hyderabad", website="micropyramid.com")
# positional construction works too:
mp_alt = Company("MicroPyramid", "Hyderabad", "micropyramid.com")

print(mp.name)        # MicroPyramid  -> access by field name
print(mp.location)    # Hyderabad
print(mp[0])          # MicroPyramid  -> still a real tuple
name, location, site = mp   # ...and it unpacks like one

Useful helpers come built in. Build an instance from any iterable with _make, copy-with-changes via _replace (it returns a new tuple — the original is immutable), inspect the schema with _fields, and convert to a plain dict with _asdict:

google = Company._make(["Google", "Hyderabad", "google.com"])
print(google)
# Company(name='Google', location='Hyderabad', website='google.com')

# _replace returns a NEW object; namedtuples are immutable
bengaluru = google._replace(location="Bengaluru")
print(bengaluru)
# Company(name='Google', location='Bengaluru', website='google.com')

print(Company._fields)   # ('name', 'location', 'website')

# In Python 3.8+ _asdict() returns a regular (insertion-ordered) dict
print(mp._asdict())
# {'name': 'MicroPyramid', 'location': 'Hyderabad', 'website': 'micropyramid.com'}

2026 note: namedtuple._asdict() returns a plain dict since Python 3.8 — it used to return an OrderedDict. Regular dicts have preserved insertion order since 3.7, so you lose nothing.

Modern alternatives: typing.NamedTuple and @dataclass

If you want type hints, defaults, or methods, two cleaner alternatives have largely replaced raw namedtuple in new code:

  • typing.NamedTuple — same immutable, tuple-based object, but defined with a class body, type annotations, and field defaults. Use it when you specifically want tuple behaviour (immutable, hashable, unpacks, compares positionally).
  • @dataclass — a mutable (by default) class with fields, type hints, and auto-generated __init__/__repr__/__eq__. Use @dataclass(frozen=True, slots=True) when you want an immutable record with low memory overhead but still want methods and flexibility.

Rule of thumb: pick @dataclass for most new record types; pick typing.NamedTuple when you genuinely need tuple semantics; reach for raw namedtuple mainly in quick scripts or when matching existing tuple-based code.

from typing import NamedTuple
from dataclasses import dataclass

class Point(NamedTuple):       # immutable, tuple-like, typed
    x: int
    y: int
    label: str = "origin"      # field defaults are easy here

p = Point(1, 2)
print(p.x, p.label)            # 1 origin
x, y, label = p                # still unpacks like a tuple

@dataclass(frozen=True, slots=True)   # immutable + memory-efficient
class Account:
    id: int
    balance: float = 0.0
    def credit(self, amount: float) -> "Account":
        return Account(self.id, self.balance + amount)

acct = Account(42)
print(acct.credit(100).balance)   # 100.0

deque — a fast double-ended queue

A deque ("deck") is a list-like sequence optimised for adding and removing items at both ends in O(1) time. A plain list is O(1) at the right end but O(n) at the left (every list.pop(0) or list.insert(0, ...) shifts every other element). deque fixes that, and it is thread-safe for appends and pops from opposite ends.

When to use it: queues (FIFO) and stacks, breadth-first search frontiers, sliding-window buffers, "keep the last N items" logs, and any producer/consumer pattern. Pass maxlen to get a bounded buffer that automatically discards from the opposite end when full.

from collections import deque

dq = deque(["b", "c"])
dq.append("d")          # add on the right
dq.appendleft("a")      # add on the left  -> O(1), unlike list.insert(0, ...)
print(dq)               # deque(['a', 'b', 'c', 'd'])

print(dq.popleft())     # 'a'   -> O(1) FIFO dequeue
print(dq.pop())         # 'd'   -> O(1) stack pop
print(dq)               # deque(['b', 'c'])

dq.extend(["d", "e"])         # extend on the right
dq.extendleft([1, 2, 3])      # NOTE: each item is prepended, so order reverses
print(dq)                     # deque([3, 2, 1, 'b', 'c', 'd', 'e'])

dq.rotate(1)            # rotate right by 1 (use -1 for left)
print(dq)               # deque(['e', 3, 2, 1, 'b', 'c', 'd'])

Pass maxlen to build a bounded buffer. Once it is full, every push from one end silently drops an item from the other — perfect for "last N events" or fixed-size sliding windows:

# Keep only the most recent 3 readings
recent = deque(maxlen=3)
for value in [10, 20, 30, 40, 50]:
    recent.append(value)
print(recent)           # deque([30, 40, 50], maxlen=3)

# Classic BFS frontier using a deque as a FIFO queue
def bfs(graph, start):
    seen, queue, order = {start}, deque([start]), []
    while queue:
        node = queue.popleft()
        order.append(node)
        for nxt in graph[node]:
            if nxt not in seen:
                seen.add(nxt)
                queue.append(nxt)
    return order

graph = {"a": ["b", "c"], "b": ["d"], "c": ["d"], "d": []}
print(bfs(graph, "a"))  # ['a', 'b', 'c', 'd']

Use a deque for queue/stack semantics and fast end operations — but not for random access by index. Indexing into the middle of a deque is O(n), so if you mostly read by position, stick with a list.

Counter — tally hashable items

Counter is a dict subclass built for counting. Feed it any iterable and it returns a mapping of element to count. Missing keys return 0 instead of raising KeyError, and it supports arithmetic between counters (+, -, &, |).

When to use it: word/character frequencies, tallying log levels or event types, finding the most common items, or any "how many of each?" question.

from collections import Counter

words = "red green red blue green red".split()
counts = Counter(words)
print(counts)                 # Counter({'red': 3, 'green': 2, 'blue': 1})

print(counts["red"])          # 3
print(counts["yellow"])       # 0   -> missing keys don't raise KeyError
print(counts.most_common(2))  # [('red', 3), ('green', 2)]
print(counts.total())         # 6   -> total() added in Python 3.10

# Counters support arithmetic
inventory = Counter(apple=3, pear=1)
sold = Counter(apple=1)
print(inventory - sold)       # Counter({'apple': 2, 'pear': 1})

Part 2 of this series digs deeper into Counter — see Working with Python Collections — Part 2 for frequency-analysis examples.

defaultdict — dictionaries with automatic defaults

defaultdict is a dict that calls a factory function to supply a value the first time you access a missing key, instead of raising KeyError. You pass the factory (list, int, set, or any zero-argument callable) to the constructor.

When to use it: grouping items into lists, accumulating sums/counts, or building adjacency maps — anywhere you'd otherwise write repetitive if key not in d or dict.setdefault calls.

from collections import defaultdict

# Group words by their first letter
words = ["apple", "avocado", "banana", "cherry", "cranberry"]
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)   # no KeyError on first access
print(dict(by_letter))
# {'a': ['apple', 'avocado'], 'b': ['banana'], 'c': ['cherry', 'cranberry']}

# Accumulate counts with int (defaults to 0)
tally = defaultdict(int)
for ch in "mississippi":
    tally[ch] += 1
print(dict(tally))   # {'m': 1, 'i': 4, 's': 4, 'p': 2}

defaultdict vs dict.setdefault

Both avoid KeyError, but they differ in cost and intent. dict.setdefault(key, default) evaluates its default argument on every call (so setdefault(k, []) builds a throwaway list each time even when the key exists), whereas defaultdict's factory only fires for genuinely missing keys.

  • Use defaultdict when one default applies to the whole dictionary and you access missing keys repeatedly in a loop — it's cleaner and faster.
  • Use dict.setdefault for a one-off insert on an ordinary dict, or when different keys need different defaults.
  • Watch out: simply reading a missing key from a defaultdict creates it. If you don't want that, use d.get(key, default) or a plain dict.

See Part 2 for more defaultdict patterns.

OrderedDict — ordered dictionaries (now niche)

OrderedDict is a dict subclass that remembers insertion order. Historically that was its whole point — but regular dicts have preserved insertion order since Python 3.7, so for plain iteration you no longer need it.

When it still matters in 2026:

  • Order-sensitive equality. Two OrderedDicts compare equal only if their items are in the same order; two regular dicts compare equal regardless of order. If order is part of your data's identity, OrderedDict encodes that.
  • move_to_end(key, last=True/False). Cheaply move a key to either end — the standard building block for an LRU cache. Regular dicts have no equivalent.
  • popitem(last=False). Pop from the front (FIFO). A regular dict.popitem() only pops the last item.
from collections import OrderedDict

# Order-sensitive equality
print(OrderedDict(a=1, b=2) == OrderedDict(b=2, a=1))  # False
print(dict(a=1, b=2) == dict(b=2, a=1))                # True

# move_to_end + popitem(last=False): a tiny LRU cache
class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.store = OrderedDict()

    def get(self, key):
        if key not in self.store:
            return None
        self.store.move_to_end(key)          # mark as recently used
        return self.store[key]

    def put(self, key, value):
        self.store[key] = value
        self.store.move_to_end(key)
        if len(self.store) > self.capacity:
            self.store.popitem(last=False)   # evict the oldest entry

cache = LRUCache(2)
cache.put("a", 1); cache.put("b", 2); cache.get("a"); cache.put("c", 3)
print(list(cache.store))   # ['a', 'c']  -> 'b' was evicted as least-recently-used

For most everyday "I just want to keep insertion order" cases, a plain dict is now the right answer. Reach for OrderedDict only when you need the three behaviours above. (For an in-memory function cache, functools.lru_cache is usually simpler than hand-rolling one.)

ChainMap — one view over many dictionaries

ChainMap groups several dictionaries into a single, updatable view. Lookups search the underlying mappings in order and return the first match, so earlier maps "shadow" later ones. Crucially, it does not copy the data — it holds references, so the view stays live as the underlying dicts change.

When to use it: layered configuration (CLI args → environment → defaults), template/scope resolution, or any "look here first, then fall back" lookup. Writes affect only the first mapping by default.

from collections import ChainMap

defaults = {"theme": "light", "show_sidebar": True, "page_size": 20}
user_prefs = {"theme": "dark"}
cli_args = {"page_size": 50}

config = ChainMap(cli_args, user_prefs, defaults)
print(config["theme"])      # dark   -> from user_prefs (defaults shadowed)
print(config["page_size"])  # 50     -> CLI wins
print(config["show_sidebar"])  # True -> falls through to defaults

# Writes go to the FIRST mapping only
config["theme"] = "solarized"
print(cli_args)             # {'page_size': 50, 'theme': 'solarized'}

# new_child() pushes a fresh layer on top (great for nested scopes)
scoped = config.new_child({"page_size": 5})
print(scoped["page_size"])  # 5

UserDict, UserList, UserString — easy subclassing

Subclassing the built-in dict, list, or str directly is surprisingly leaky: many of their methods are implemented in C and don't route through your overridden __getitem__/__setitem__, so behaviour can be inconsistent. The UserDict, UserList, and UserString classes wrap the real container in a plain .data attribute and implement everything in Python, so your overrides are honoured everywhere.

When to use them: when you want to customise container behaviour — validation on insert, key normalisation, logging — and need that customisation to apply consistently. For most cases collections.abc mixins or composition also work; the User* classes are the simplest path when you want a drop-in replacement for the built-in.

from collections import UserDict

class LowerCaseDict(UserDict):
    """A dict that normalises every key to lowercase on insert."""
    def __setitem__(self, key, value):
        super().__setitem__(key.lower(), value)   # honoured by update(), too

d = LowerCaseDict()
d["Name"] = "MicroPyramid"
d.update({"CITY": "Hyderabad"})
print(d["name"], d["city"])   # MicroPyramid Hyderabad
print(d.data)                 # {'name': 'MicroPyramid', 'city': 'Hyderabad'}

Frequently Asked Questions

When should I use defaultdict vs dict.setdefault?

Use defaultdict when the same default applies across the whole dictionary and you access missing keys repeatedly — for example grouping items into lists inside a loop. It is cleaner and faster because the factory only runs for genuinely missing keys. Use dict.setdefault(key, default) for a one-off insert on an ordinary dict, or when different keys need different defaults. One gotcha: merely reading a missing key from a defaultdict creates it, so use d.get(key, default) when you want to avoid that side effect.

Is OrderedDict still needed in 2026?

For preserving insertion order alone, no — regular dicts have done that since Python 3.7. OrderedDict is still useful in three specific cases: when you need order-sensitive equality (two OrderedDicts are equal only if their items are in the same order), when you need move_to_end() (the basis of an LRU cache), or when you need popitem(last=False) to pop from the front. Outside those, prefer a plain dict.

namedtuple vs dataclass — which should I use?

Pick @dataclass for most new record types: it gives you type hints, defaults, methods, and a clean class syntax, and @dataclass(frozen=True, slots=True) makes it immutable and memory-efficient. Choose typing.NamedTuple when you specifically need tuple semantics — immutability, hashability, positional unpacking, and tuple comparison. Use the raw collections.namedtuple mainly in quick scripts or when matching existing tuple-based code.

What is a deque good for?

A deque gives O(1) appends and pops at both ends, where a list is O(n) at the left. Use it for FIFO queues, stacks, breadth-first-search frontiers, producer/consumer buffers, and "keep the last N items" logs (via maxlen, which auto-discards from the opposite end when full). Avoid it for random access by index — that is O(n) in a deque, so use a list if you mostly read by position.

How does Counter work?

Counter is a dict subclass that counts hashable items. Pass it an iterable and it tallies each element; missing keys return 0 instead of raising KeyError. Handy methods include most_common(n) for the top items, total() (Python 3.10+) for the sum of all counts, and elements() to expand counts back into items. Counters also support arithmetic (+, -, &, |), which makes combining tallies trivial. See Part 2 for worked frequency-analysis examples.

Are collections types faster than plain dict and list?

Generally yes for their intended job, because they're implemented in C. A deque beats a list decisively for left-end operations; Counter and defaultdict remove Python-level branching from counting and grouping loops; namedtuple is as fast and memory-light as a tuple while adding readability. They are not magic, though — for plain indexed iteration a regular list/dict is just as fast, so choose the type that matches your access pattern rather than reaching for collections reflexively.

Where to go next

The collections module rewards a quick mental index: namedtuple for readable records, deque for fast double-ended queues, Counter for tallies, defaultdict for auto-defaults, OrderedDict for the few order-sensitive cases that remain, ChainMap for layered lookups, and the User* wrappers when you need to subclass a container cleanly.

Continue with Working with Python Collections — Part 2 for a deeper dive into Counter, OrderedDict, and defaultdict. If you'd like to go further into idiomatic Python, see our guides on generators and yield, decorators, and magic (dunder) methods.

Building or modernising a Python codebase? MicroPyramid has shipped 50+ Python projects over 12+ years across Django, FastAPI, data pipelines, and AI systems — learn more about our Python development services.

Share this article