Convert XML to JSON in Python with xmltodict

Blog / Django · June 25, 2019 · Updated June 10, 2026 · 8 min read
Convert XML to JSON in Python with xmltodict

To convert XML to JSON in Python, install xmltodict, call xmltodict.parse(xml_string) to turn the XML into a plain Python dict, then pass that dict to json.dumps(data, indent=2) to serialize it as JSON. That is the whole round trip in two lines, and the reverse direction is just as short with xmltodict.unparse().

XML is still everywhere a modern Python service has to touch it: SOAP and legacy enterprise APIs, RSS and Atom feeds, XML sitemaps, ISO 20022 bank statements, and government open-data dumps. Your application code, however, wants dicts and JSON. xmltodict bridges the two with one function call, so you never hand-walk a DOM tree again. This guide is updated for Python 3.12 / 3.13 in 2026.

Key takeaways

  • xmltodict.parse() converts XML into an insertion-ordered Python dict; json.dumps() converts that dict into a JSON string.
  • XML attributes become dict keys prefixed with @; an element's free text is stored under #text. Both markers are configurable (attr_prefix, cdata_key).
  • xmltodict.unparse(data, pretty=True) does the reverse trip, turning a dict or JSON object back into XML.
  • For multi-GB files, stream with item_depth + item_callback so memory usage stays flat.
  • xmltodict is safe by default: it refuses to expand entities (disable_entities=True), blocking "billion laughs" / XML-bomb attacks.
  • Reach for stdlib xml.etree.ElementTree or lxml when you need raw speed or XPath instead of a dict.

How do you install xmltodict?

xmltodict is a small, pure-Python package with no compiled dependencies, so it installs cleanly on every supported interpreter (Python 3.9 through 3.13):

pip install xmltodict

Then import it alongside the standard-library json module — that pairing is all you need for XML-to-JSON conversion.

How do you convert XML to JSON in Python?

Call xmltodict.parse() on an XML string (or a file-like object) to get a dict, then hand that dict to json.dumps() with indent=2 for readable, pretty-printed JSON:

import json
import xmltodict

xml = """
<library>
  <book id="b1" lang="en">
    <title>Dune</title>
    <author>Frank Herbert</author>
  </book>
  <book id="b2" lang="en">
    <title>Neuromancer</title>
    <author>William Gibson</author>
  </book>
</library>
"""

data = xmltodict.parse(xml)        # XML -> dict (insertion-ordered)
print(json.dumps(data, indent=2))  # dict -> pretty JSON

The output is plain JSON you can write to a file or return from an API:

{
  "library": {
    "book": [
      {
        "@id": "b1",
        "@lang": "en",
        "title": "Dune",
        "author": "Frank Herbert"
      },
      {
        "@id": "b2",
        "@lang": "en",
        "title": "Neuromancer",
        "author": "William Gibson"
      }
    ]
  }
}

Two things to notice: repeated <book> siblings collapse into a JSON array automatically, and the id/lang attributes show up as @id and @lang keys. One caveat — if only a single <book> were present, xmltodict would give you a dict, not a one-element list. Use force_list (shown below) when you need a stable shape.

How are XML attributes and text handled?

xmltodict has to flatten two XML concepts that JSON has no native slot for: attributes and an element's mixed text. It does so with two configurable markers:

  • Attributes are keyed with an @ prefix (controlled by attr_prefix, default "@").
  • An element's own text, when it also has attributes or children, is stored under #text (controlled by cdata_key, default "#text").
import xmltodict

xml = '<task priority="high" done="false">Email the client<due>2026-07-01</due></task>'

data = xmltodict.parse(xml)
print(data["task"]["@priority"])  # 'high'              -> attribute, "@" prefix
print(data["task"]["#text"])      # 'Email the client'  -> the element's own text
print(data["task"]["due"])        # '2026-07-01'

# Rename the markers when "@" / "#text" clash with your JSON schema:
data = xmltodict.parse(xml, attr_prefix="attr_", cdata_key="value")
print(data["task"]["attr_priority"])  # 'high'

# Force repeated tags to always be a list, even when only one occurs:
xmltodict.parse("<r><item>a</item></r>", force_list=("item",))
# -> {'r': {'item': ['a']}}

How do you handle XML namespaces?

By default xmltodict keeps namespace prefixes verbatim in the key (for example "h:title"). Pass process_namespaces=True to expand each prefix to its full URI so keys are unambiguous across documents:

xml = '''<root xmlns:h="http://example.com/html">
  <h:title>Hello</h:title>
</root>'''

# Off by default — prefixes are kept as written:
xmltodict.parse(xml)
# -> {'root': {'@xmlns:h': 'http://example.com/html', 'h:title': 'Hello'}}

# Expand namespaces to full URIs:
xmltodict.parse(xml, process_namespaces=True)
# -> {'root': {'http://example.com/html:title': 'Hello'}}

Which Python library should you use: xmltodict vs ElementTree vs lxml?

xmltodict is the fastest path to JSON, but it is not always the right tool. If you need XPath queries, schema validation, or the absolute top speed on huge files, the stdlib xml.etree.ElementTree or third-party lxml are better fits. Here is how the four common approaches compare:

Approach Ease Dict / JSON output Attributes Namespaces Large-file streaming Speed
xmltodict (in-memory) Highest — one call Native dict, JSON-ready Automatic @ prefix Opt-in (process_namespaces) No — loads whole doc Moderate (expat)
xmltodict (streaming) Medium — callback dict per item via callback Automatic @ prefix Opt-in Yes (item_depth + item_callback) Moderate, flat memory
xml.etree.ElementTree (stdlib) Medium No — Element objects, map by hand .attrib dict Verbose {uri}tag keys Yes (iterparse) Fast, zero install
lxml Medium No — Element objects .attrib dict Full XPath / XSLT support Yes (iterparse) Fastest (C libxml2), needs install

Rule of thumb: use xmltodict when the goal is "give me this XML as JSON/dict"; drop to ElementTree or lxml when you need to query, validate, or transform the tree itself.

How do you convert JSON back to XML?

Use xmltodict.unparse() for the reverse trip. Load your JSON with json.loads() if needed, then unparse the dict with pretty=True for indented output. The dict must have exactly one root key, and the same @ / #text markers are honoured on the way out:

import xmltodict

data = {
    "library": {
        "book": {
            "@id": "b1",
            "title": "Dune",
            "author": "Frank Herbert",
        }
    }
}

xml = xmltodict.unparse(data, pretty=True)
print(xml)

Output:

<?xml version="1.0" encoding="utf-8"?>
<library>
	<book id="b1">
		<title>Dune</title>
		<author>Frank Herbert</author>
	</book>
</library>

If you are exposing this data through a web API rather than emitting raw XML, you will usually hand the dict to a serializer instead of json.dumps() directly — see our walkthrough of Django REST Framework serializers and the introduction to API development with DRF for that pattern.

How do you parse large XML files without running out of memory?

xmltodict.parse() builds the entire dict in memory, which is fine for a few megabytes but fatal for a multi-GB dump. For those, switch to streaming mode: pass a file object plus item_depth and an item_callback. xmltodict invokes your callback once per matching element and then frees it, so memory stays flat regardless of file size.

item_depth=2 means "call me at the second level of nesting" — here that is each <book> inside the root <library>. Return True to keep going, or False to stop early (which raises xmltodict.ParsingInterrupted):

import xmltodict

def handle_book(path, book):
    # Called once per <book>; the element is discarded after the call,
    # so memory stays flat no matter how large the file is.
    print(book["@id"], book["title"])
    return True   # return False to stop parsing early

with open("huge_catalog.xml", "rb") as f:
    xmltodict.parse(f, item_depth=2, item_callback=handle_book)

This same dict-first approach is how we consume third-party feeds in client work — the pattern we use when integrating the GitHub API in Python/Django and integrating the LinkedIn API in Python/Django applies just as well to any paginated XML endpoint.

Is xmltodict safe for untrusted XML?

Mostly yes, by default. Since modern releases, xmltodict.parse() ships with disable_entities=True, so it refuses to expand XML entities. That means a classic "billion laughs" / XML-bomb payload raises a ValueError instead of exhausting your memory:

import xmltodict

# Safe by default: entity expansion is disabled, so XML bombs raise ValueError
# instead of blowing up memory.
xmltodict.parse(untrusted_xml)              # disable_entities=True is the default

# Only relax the guard for XML you fully trust:
xmltodict.parse(trusted_xml, disable_entities=False)

# Need DTDs or external entities from an untrusted source? Don't relax the guard —
# pre-parse with the hardened defusedxml stack instead (pip install defusedxml).

The takeaway: keep disable_entities=True for anything that crosses a trust boundary, and use defusedxml if you must process DTDs or external entities from sources you don't control (XXE protection).

Convert XML to JSON in your Python stack

xmltodict turns a fiddly XML integration into a two-line dict conversion — and that simplicity scales from a one-off feed parser to a high-volume ingestion pipeline. MicroPyramid has shipped 50+ projects since 2014, wiring legacy XML systems into modern JSON APIs for clients across the US, UK, Australia, and beyond.

If you want help designing that data layer, our Python development services team builds and maintains exactly these pipelines, and when the work lives inside Django, our Django development services cover the API and admin around it.

Frequently Asked Questions

What does xmltodict.parse() return — a dict or JSON?

It returns a native Python dict whose keys preserve the source XML order. It is not a JSON string. To get JSON, pass that dict to json.dumps(data, indent=2). The two-step split keeps you in control of how the JSON is formatted and where it is written.

Why are some of my dict keys prefixed with @?

The @ prefix marks values that were XML attributes rather than child elements. For <book id="b1">, the id attribute becomes the key @id. You can change the marker with the attr_prefix argument, for example attr_prefix="attr_" to produce attr_id instead.

How do I make a single XML element parse as a list?

Use the force_list option. By default xmltodict only creates a list when a tag repeats, so a lone element becomes a dict. Passing force_list=("book",) guarantees book is always a list, giving you a stable shape whether the document has one item or many.

How do I convert JSON back to XML with xmltodict?

Call xmltodict.unparse(data, pretty=True), where data is a dict (use json.loads() first if you have a JSON string). The dict must have a single root key, attributes go under @-prefixed keys, and an element's text goes under #text. The pretty=True flag adds indentation.

Is xmltodict safe to use on untrusted XML?

By default yes for entity attacks: disable_entities=True is on, so "billion laughs" / XML-bomb payloads raise a ValueError instead of consuming all memory. Keep that default for untrusted input, and use the defusedxml library if you must handle DTDs or external entities from sources you don't trust.

When should I use lxml or ElementTree instead of xmltodict?

Choose lxml or stdlib xml.etree.ElementTree when you need XPath queries, schema validation, XSLT transforms, or maximum speed on very large files. xmltodict is the better choice when your only goal is to convert XML into a JSON-ready dict with the least code.

Share this article