Serializing Django objects

Django’s serialization framework provides a mechanism for “translating” Django objects into other formats. Usually these other formats will be text-based and used for sending Django objects over a wire, but it’s possible for a serializer to handle any format (text-based or not).

See also

If you just want to get some data from your tables into a serialized form, you could use the dumpdata management command.

Serializing data

At the highest level, serializing data is a very simple operation:

from django.core import serializers
data = serializers.serialize("xml", SomeModel.objects.all())

The arguments to the serialize function are the format to serialize the data to (see Serialization formats) and a QuerySet to serialize. (Actually, the second argument can be any iterator that yields Django objects, but it'll almost always be a QuerySet).

You can also use a serializer object directly:

XMLSerializer = serializers.get_serializer("xml")
xml_serializer = XMLSerializer()
xml_serializer.serialize(queryset)
data = xml_serializer.getvalue()

This is useful if you want to serialize data directly to a file-like object (which includes an HttpResponse):

out = open("file.xml", "w")
xml_serializer.serialize(SomeModel.objects.all(), stream=out)

Subset of fields

If you only want a subset of fields to be serialized, you can specify a fields argument to the serializer:

from django.core import serializers
data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))

In this example, only the name and size attributes of each model will be serialized.

Note

Depending on your model, you may find that it is not possible to deserialize a model that only serializes a subset of its fields. If a serialized object doesn't specify all the fields that are required by a model, the deserializer will not be able to save deserialized instances.

Inherited Models

If you have a model that is defined using an abstract base class, you don't have to do anything special to serialize that model. Just call the serializer on the object (or objects) that you want to serialize, and the output will be a complete representation of the serialized object.

However, if you have a model that uses multi-table inheritance, you also need to serialize all of the base classes for the model. This is because only the fields that are locally defined on the model will be serialized. For example, consider the following models:

class Place(models.Model):
    name = models.CharField(max_length=50)

class Restaurant(Place):
    serves_hot_dogs = models.BooleanField()

If you only serialize the Restaurant model:

data = serializers.serialize('xml', Restaurant.objects.all())

the fields on the serialized output will only contain the serves_hot_dogs attribute. The name attribute of the base class will be ignored.

In order to fully serialize your Restaurant instances, you will need to serialize the Place models as well:

all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
data = serializers.serialize('xml', all_objects)

Deserializing data

Deserializing data is also a fairly simple operation:

for obj in serializers.deserialize("xml", data):
    do_something_with(obj)

As you can see, the deserialize function takes the same format argument as serialize, a string or stream of data, and returns an iterator.

However, here it gets slightly complicated. The objects returned by the deserialize iterator aren't simple Django objects. Instead, they are special DeserializedObject instances that wrap a created -- but unsaved -- object and any associated relationship data.

Calling DeserializedObject.save() saves the object to the database.

This ensures that deserializing is a non-destructive operation even if the data in your serialized representation doesn't match what's currently in the database. Usually, working with these DeserializedObject instances looks something like:

for deserialized_object in serializers.deserialize("xml", data):
    if object_should_be_saved(deserialized_object):
        deserialized_object.save()

In other words, the usual use is to examine the deserialized objects to make sure that they are "appropriate" for saving before doing so. Of course, if you trust your data source you could just save the object and move on.

The Django object itself can be inspected as deserialized_object.object.

Serialization formats

Django supports a number of serialization formats, some of which require you to install third-party Python modules:

Identifier Information
xml Serializes to and from a simple XML dialect.
json Serializes to and from JSON (using a version of simplejson bundled with Django).
python Translates to and from "simple" Python objects (lists, dicts, strings, etc.). Not really all that useful on its own, but used as a base for other serializers.
yaml Serializes to YAML (YAML Ain't a Markup Language). This serializer is only available if PyYAML is installed.

Notes for specific serialization formats

json

If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON serializer, you must pass ensure_ascii=False as a parameter to the serialize() call. Otherwise, the output won't be encoded correctly.

For example:

json_serializer = serializers.get_serializer("json")()
json_serializer.serialize(queryset, ensure_ascii=False, stream=response)

The Django source code includes the simplejson module. However, if you're using Python 2.6 (which includes a builtin version of the module), Django will use the builtin json module automatically. If you have a system installed version that includes the C-based speedup extension, or your system version is more recent than the version shipped with Django (currently, 2.0.7), the system version will be used instead of the version included with Django.

Be aware that if you're serializing using that module directly, not all Django output can be passed unmodified to simplejson. In particular, lazy translation objects need a special encoder written for them. Something like this will work:

from django.utils.functional import Promise
from django.utils.encoding import force_unicode

class LazyEncoder(simplejson.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Promise):
            return force_unicode(obj)
        return super(LazyEncoder, self).default(obj)

Natural keys

New in Django 1.2: Please, see the release notes

The default serialization strategy for foreign keys and many-to-many relations is to serialize the value of the primary key(s) of the objects in the relation. This strategy works well for most types of object, but it can cause difficulty in some circumstances.

Consider the case of a list of objects that have foreign key on ContentType. If you're going to serialize an object that refers to a content type, you need to have a way to refer to that content type. Content Types are automatically created by Django as part of the database synchronization process, so you don't need to include content types in a fixture or other serialized data. As a result, the primary key of any given content type isn't easy to predict - it will depend on how and when syncdb was executed to create the content types.

There is also the matter of convenience. An integer id isn't always the most convenient way to refer to an object; sometimes, a more natural reference would be helpful.

It is for these reasons that Django provides natural keys. A natural key is a tuple of values that can be used to uniquely identify an object instance without using the primary key value.

Deserialization of natural keys

Consider the following two models:

from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    birthdate = models.DateField()

    class Meta:
        unique_together = (('first_name', 'last_name'),)

class Book(models.Model):
    name = models.CharField(max_length=100)
    author = models.ForeignKey(Person)

Ordinarily, serialized data for Book would use an integer to refer to the author. For example, in JSON, a Book might be serialized as:

...
{
    "pk": 1,
    "model": "store.book",
    "fields": {
        "name": "Mostly Harmless",
        "author": 42
    }
}
...

This isn't a particularly natural way to refer to an author. It requires that you know the primary key value for the author; it also requires that this primary key value is stable and predictable.

However, if we add natural key handling to Person, the fixture becomes much more humane. To add natural key handling, you define a default Manager for Person with a get_by_natural_key() method. In the case of a Person, a good natural key might be the pair of first and last name:

from django.db import models

class PersonManager(models.Manager):
    def get_by_natural_key(self, first_name, last_name):
        return self.get(first_name=first_name, last_name=last_name)

class Person(models.Model):
    objects = PersonManager()

    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    birthdate = models.DateField()

    class Meta:
        unique_together = (('first_name', 'last_name'),)

Now books can use that natural key to refer to Person objects:

...
{
    "pk": 1,
    "model": "store.book",
    "fields": {
        "name": "Mostly Harmless",
        "author": ["Douglas", "Adams"]
    }
}
...

When you try to load this serialized data, Django will use the get_by_natural_key() method to resolve ["Douglas", "Adams"] into the primary key of an actual Person object.

Note

Whatever fields you use for a natural key must be able to uniquely identify an object. This will usually mean that your model will have a uniqueness clause (either unique=True on a single field, or unique_together over multiple fields) for the field or fields in your natural key. However, uniqueness doesn't need to be enforced at the database level. If you are certain that a set of fields will be effectively unique, you can still use those fields as a natural key.

Serialization of natural keys

So how do you get Django to emit a natural key when serializing an object? Firstly, you need to add another method -- this time to the model itself:

class Person(models.Model):
    objects = PersonManager()

    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    birthdate = models.DateField()

    def natural_key(self):
        return (self.first_name, self.last_name)

    class Meta:
        unique_together = (('first_name', 'last_name'),)

That method should always return a natural key tuple -- in this example, (first name, last name). Then, when you call serializers.serialize(), you provide a use_natural_keys=True argument:

>>> serializers.serialize([book1, book2], format='json', indent=2, use_natural_keys=True)

When use_natural_keys=True is specified, Django will use the natural_key() method to serialize any reference to objects of the type that defines the method.

If you are using dumpdata to generate serialized data, you use the --natural command line flag to generate natural keys.

Note

You don't need to define both natural_key() and get_by_natural_key(). If you don't want Django to output natural keys during serialization, but you want to retain the ability to load natural keys, then you can opt to not implement the natural_key() method.

Conversely, if (for some strange reason) you want Django to output natural keys during serialization, but not be able to load those key values, just don't define the get_by_natural_key() method.

Dependencies during serialization

Since natural keys rely on database lookups to resolve references, it is important that data exists before it is referenced. You can't make a forward reference with natural keys - the data you are referencing must exist before you include a natural key reference to that data.

To accommodate this limitation, calls to dumpdata that use the --natural option will serialize any model with a natural_key() method before it serializes normal key objects.

However, this may not always be enough. If your natural key refers to another object (by using a foreign key or natural key to another object as part of a natural key), then you need to be able to ensure that the objects on which a natural key depends occur in the serialized data before the natural key requires them.

To control this ordering, you can define dependencies on your natural_key() methods. You do this by setting a dependencies attribute on the natural_key() method itself.

For example, consider the Permission model in contrib.auth. The following is a simplified version of the Permission model:

class Permission(models.Model):
    name = models.CharField(max_length=50)
    content_type = models.ForeignKey(ContentType)
    codename = models.CharField(max_length=100)
    # ...
    def natural_key(self):
        return (self.codename,) + self.content_type.natural_key()

The natural key for a Permission is a combination of the codename for the Permission, and the ContentType to which the Permission applies. This means that ContentType must be serialized before Permission. To define this dependency, we add one extra line:

class Permission(models.Model):
    # ...
    def natural_key(self):
        return (self.codename,) + self.content_type.natural_key()
    natural_key.dependencies = ['contenttypes.contenttype']

This definition ensures that ContentType models are serialized before Permission models. In turn, any object referencing Permission will be serialized after both ContentType and Permission.