How to Convert XML to JSON and Dict in Python (xmltodict Guide)
Learn how to convert XML to JSON and Python Dictionaries using the xmltodict module. A step-by-step guide on parsing XML files, handling namespaces, and unparsing JSON back to XML.
Drake Nguyen
Founder · System Architect
In this guide, we will explore how to convert XML data into JSON and Python Dictionaries. We will utilize the Python xmltodict module, a lightweight and efficient tool designed to parse XML files and transform them into standard Dictionary or JSON formats. We will also cover how to stream large XML files and reverse the process by converting JSON back into XML.
Why Convert XML to Dict or JSON?
While XML (Extensible Markup Language) was once the standard for data interchange, modern web development largely favors JSON (JavaScript Object Notation) due to its lighter weight and ease of use with JavaScript-based stacks. However, many legacy systems and enterprise applications still rely on XML.
When integrating these systems with modern Python applications, converting XML to JSON or Dictionaries is often necessary for easier data manipulation. The xmltodict module simplifies this process, allowing developers to work with XML data as if it were a native Python dictionary.
Getting Started with xmltodict
Before writing code, we need to install the xmltodict module. Since it is not included in the standard Python library, we will use the Python Package Index (pip).
Installation via Pip
Run the following command in your terminal to install the module:
pip install xmltodict
Note: The installation process is typically fast as xmltodict has no external dependencies, minimizing the risk of version conflicts.
For users on Debian-based Linux systems, the module can also be installed via the apt package manager:
sudo apt install python-xmltodict
Python XML to JSON Conversion
The primary use case for this module is parsing XML strings into JSON. Below is a simple example demonstrating how to parse an XML string and output it as formatted JSON.
import xmltodict
import json
# Sample XML Data
my_xml = """
<audience>
<id what="attribute">123</id>
<name>Shubham</name>
</audience>
"""
# Parse XML to Dict, then dump to JSON
parsed_data = xmltodict.parse(my_xml)
json_output = json.dumps(parsed_data, indent=4)
print(json_output)
Output description: The code outputs a JSON object where the root tag <audience> becomes the primary key, containing nested keys for "id" and "name".
Converting an XML File to JSON
Hardcoding XML data is rarely practical. In real-world scenarios, you will likely read data from a file or an API response. Here is how to read an XML file from the disk and convert it directly to JSON.
import xmltodict
import json
# Open the XML file and parse it
with open('person.xml', 'r') as file_descriptor:
doc = xmltodict.parse(file_descriptor.read())
# Convert the dictionary to a JSON string
print(json.dumps(doc, indent=4))
This method uses the standard Python open() function to read the file content, which is then passed to the parser. This is efficient for processing configuration files or data exports.
Python XML to Dict
Under the hood, xmltodict.parse() converts XML directly into a Python Dictionary (specifically an OrderedDict by default). This allows you to access data using standard key lookups without needing to convert to JSON first.
import xmltodict
my_xml = """
<audience>
<id what="attribute">123</id>
<name>Shubham</name>
</audience>
"""
# Convert XML to Python Dictionary
my_dict = xmltodict.parse(my_xml)
# Accessing elements directly
print(my_dict['audience']['id']['#text']) # Access text content of ID if mixed content exists
# OR if simple content:
# print(my_dict['audience']['name'])
# Accessing Attributes
print(my_dict['audience']['id']['@what'])
Key Concept: XML attributes are converted to dictionary keys prefixed with the@symbol (e.g.,@what). Text content inside tags may sometimes be accessed directly or via keys like#textdepending on the structure.
Handling XML Namespaces
Enterprise XML files often use namespaces to avoid element name collisions. To preserve these namespaces in your JSON or Dictionary output, you can use the process_namespaces parameter.
Example XML with Namespaces
<root xmlns="https://defaultns.com/"
xmlns:a="https://a.com/">
<audience>
<id>123</id>
<name>Shubham</name>
</audience>
</root>
Python Code to Process Namespaces
import xmltodict
import json
with open('person.xml') as fd:
# Enable namespace processing
doc = xmltodict.parse(fd.read(), process_namespaces=True)
print(json.dumps(doc, indent=4))
When process_namespaces=True is set, the resulting dictionary keys will include the namespace URLs, ensuring data integrity is maintained during the conversion.
Converting JSON Back to XML
The xmltodict module is bidirectional. It can also unparse a dictionary (or JSON object) back into XML format. This is useful when your Python application needs to send data back to a legacy system that requires XML input.
import xmltodict
student_data = {
"student": {
"name": "Shubham",
"marks": {
"math": 92,
"english": 99
},
"id": "s387hs3"
}
}
# Convert Dict to XML
xml_output = xmltodict.unparse(student_data, pretty=True)
print(xml_output)
Critical Requirement: XML documents must have a single root element. Therefore, your dictionary must have exactly one key at the top level (e.g., "student" in the example above).
If you attempt to unparse a dictionary with multiple root keys, Python will raise an error because a valid XML document cannot have multiple root tags.
Example of an Invalid Structure
# This will fail conversion
student = {
"name": "Shubham", # Root key 1
"id": "s387hs3" # Root key 2
}
Conclusion
The xmltodict module provides a straightforward, Pythonic way to handle XML data. Whether you need to convert Python XML to JSON for a web API or parse complex XML files into dictionaries for data analysis, this tool offers a robust solution without the overhead of heavy XML parsers.