Lesson 08

Python Dictionaries

Key-value pairs — the structure that backs every JSON record, every CSV row, and most of the data a humanist will touch.

A dictionary stores key-value pairs. Unlike lists and tuples, where you access an item by its position, you access a dictionary value by a label you chose. That single difference makes dictionaries the natural fit for almost any structured humanities data: a row from a CSV, a record from an API, a manuscript with metadata, a single entry in someone’s correspondence.

Creating a dictionary

Curly braces around a sequence of key: value pairs:

person = {
    "name": "Ada Lovelace",
    "born": 1815,
    "died": 1852,
    "field": "mathematics",
}
print(person)

The trailing comma after the last pair is optional — Python doesn’t care, and including it makes diffs cleaner when you add another field later.

Keys are usually strings, but they can be any immutable type: strings, numbers, tuples, even booleans. Values can be anything at all — a string, a number, a list, another dictionary.

counts_by_year = {1815: 12, 1820: 23, 1825: 8}
print(counts_by_year)

Accessing values

To pull a value out, write the dictionary, square brackets, and the key:

person = {
    "name": "Ada Lovelace",
    "born": 1815,
    "died": 1852,
    "field": "mathematics",
}

print(person["name"])    # 'Ada Lovelace'
print(person["born"])    # 1815

If you ask for a key that isn’t there, you get a KeyError. To avoid that, use .get, which returns None (or a default you supply) when the key is missing:

person = {
    "name": "Ada Lovelace",
    "born": 1815,
    "died": 1852,
    "field": "mathematics",
}

print(person.get("father"))             # None
print(person.get("father", "unknown"))  # 'unknown'

.get is the safer choice when you’re working with data that might have missing fields — and almost all real humanities data does.

Adding and updating

To add a new key or change an existing one, just assign:

person = {
    "name": "Ada Lovelace",
    "born": 1815,
    "died": 1852,
    "field": "mathematics",
}

person["father"] = "Lord Byron"   # adds
person["born"] = 1815             # overwrites (same key, same value, no change)
print(person)

To remove a key, use del:

person = {
    "name": "Ada Lovelace",
    "born": 1815,
    "died": 1852,
    "field": "mathematics",
    "father": "Lord Byron",
}

del person["father"]
print(person)

To merge two dictionaries, use update or, in modern Python (3.9+), the | operator:

person = {
    "name": "Ada Lovelace",
    "born": 1815,
    "died": 1852,
    "field": "mathematics",
}

extra = {"died": 1852, "buried": "Hucknall"}
person.update(extra)              # in place
merged = person | extra           # new dict, leaves person alone
print(person)
print(merged)

Looping through a dictionary

Three patterns come up over and over:

person = {"name": "Ada", "born": 1815, "field": "mathematics"}

# Just the keys
for key in person:
    print(key)

# Just the values
for value in person.values():
    print(value)

# Both at once — by far the most common
for key, value in person.items():
    print(f"{key}: {value}")

.items() is the one you’ll use most. It hands you each (key, value) pair in turn.

Nested data — the real shape of research datasets

A single record is one dictionary. A dataset is usually a list of dictionaries, where every dictionary has the same keys:

correspondents = [
    {"name": "Voltaire",  "born": 1694, "letters": 21000},
    {"name": "Émilie",    "born": 1706, "letters":   430},
    {"name": "Diderot",   "born": 1713, "letters":  3500},
]
print(correspondents)

This shape — list of dicts — is what csv.DictReader produces when you read a spreadsheet, what most JSON APIs return, and what pandas calls a DataFrame under the hood. You’ll see it in nearly every project.

To get the second person’s name:

correspondents = [
    {"name": "Voltaire",  "born": 1694, "letters": 21000},
    {"name": "Émilie",    "born": 1706, "letters":   430},
    {"name": "Diderot",   "born": 1713, "letters":  3500},
]

print(correspondents[1]["name"])   # 'Émilie'

To get every name:

correspondents = [
    {"name": "Voltaire",  "born": 1694, "letters": 21000},
    {"name": "Émilie",    "born": 1706, "letters":   430},
    {"name": "Diderot",   "born": 1713, "letters":  3500},
]

print([c["name"] for c in correspondents])
# ['Voltaire', 'Émilie', 'Diderot']

To filter:

correspondents = [
    {"name": "Voltaire",  "born": 1694, "letters": 21000},
    {"name": "Émilie",    "born": 1706, "letters":   430},
    {"name": "Diderot",   "born": 1713, "letters":  3500},
]

print([c for c in correspondents if c["letters"] > 1000])

Counting with a dictionary

A frequency count is one of the most common DH operations: how often does each word appear in a text, each letter-recipient appear in a corpus, each year appear in a dataset? The classic dictionary idiom:

text = "the quick brown fox jumps over the lazy dog the fox"
counts = {}
for word in text.split():
    counts[word] = counts.get(word, 0) + 1
print(counts)
# {'the': 3, 'quick': 1, 'brown': 1, 'fox': 2, ...}

The standard library has a purpose-built tool for this — collections.Counter — that handles the boilerplate for you:

from collections import Counter

text = "the quick brown fox jumps over the lazy dog the fox"
counts = Counter(text.split())
print(counts.most_common(3))   # [('the', 3), ('fox', 2), ('quick', 1)]

We’ll see Counter again in the modules lesson. For now, recognize the pattern: dictionary keys are labels; dictionary values can be running totals.

Once you can read, write, loop, and count with dictionaries, you have the four moves that underlie almost every script in this course. Continue to Lesson 10: Python Conditionals.

Running the code

Save any snippet from this lesson to a file — say try.py — and run it from your project folder:

uv run try.py

uv run uses the project’s Python and dependencies automatically; no virtualenv to activate. If you haven’t set the project up yet, Lesson 01 walks through it.