Yoong Kang Lim

Magic in Django's ORM

Object relational mappers (ORMs) let you map tables in relational databases to classes. This allows you to treat database rows as though they were objects, allowing you to write helper methods for domain-specific logic.

Django has an ORM that allows developers to be productive. Part of what makes it so useful is that a lot of complexity is hidden away from you, freeing you to mostly think about your domain. However, this comes with a lot of “magic”.

What do I mean by “magic”? Wikipedia has the following definition (emphasis mine):

In the context of computer programming, magic is an informal term for abstraction; it is used to describe code that handles complex tasks while hiding that complexity to present a simple interface. The term is somewhat tongue-in-cheek and carries bad connotations, implying that the true behavior of the code is not immediately apparent.

Sometimes the magic in Django’s ORM could surprise you.

This is by no means a dig at Django. ORMs in general are hard – there was even a well-known blog post calling object relational mapping the “Vietnam of computer science”.

So all things considered, I think the Django ORM is quite good.

But there were things about the ORM that surprised me at first, and I think by writing this it could help someone.

Rows in a table are not objects

While it’s really convenient to treat database rows as objects – they’re not the same thing. This has some confusing consequences.

Let’s look at this piece of code:

from blog.models import Post

post = Post.objects.get(id=1)
print(post.title)  # prints "Hello world!"
post.title = "Goodbye world!"
print(post.title)  # now prints "Goodbye world!"

So now the post object has a title attribute with the value “Goodbye world!”. But here’s the kicker, the database row is unchanged. If you look at the database, the row will still has the old value.

The database will only be updated when you include post.save().

Now, your mental model is no longer “this is a plain old Python object where I can set instance variables”, but instead you now think of models as “this strange object that somehow only talks to the database in very specific ways, with some non-obvious side effects”.

If you’re new to the framework, it’s actually not obvious at all that the save() method would take all your instance variables and update the database row with them. This happens magically, buried under the framework code.

In fact, that’s not all save() does. If you look at the Django documentation, save() does a some other things too:

  • Emit a pre_save() signal
  • Preprocess the data
  • Prepare the data for the database
  • Insert data into the database
  • Emit a post_save() signal

Did you write a post_save() signal elsewhere but forgot about it? That’s going to bite you when you create or update data in your shell, and then you accidentally charge your customers.

You can’t tell if changes have been saved

Let’s say you do this:

post = Post.objects.get(id=1)
post.title = "goodbye"

Given model instance post, how would you check, without an additional query, if the instance is consistent with the database?

That’s right, you can’t. Not with Django’s defaults.

There is no way to check that a model object is “dirty” by only looking at the instance itself. You need to perform another query, e.g. refresh_from_db(). Another alternative is to implement this yourself by hacking __setattr__ or from_db(), or to use a library that does this (like django-dirtyfields or django-model-utils).

Often, in attempts to adhere to “skinny controller” we would write helper functions or classes that expect to be passed model instances. I can’t write a helper function and can be sure that the model instance matches the data in the database.

That’s problematic when your code has to make a decision based on the data that is actually in the database.

Some strange things require migrations

Django’s has this concept of an inner Meta class, which contains “options” for the model. Anytime you change the Meta class, Django generates a migration using the AlterModelOptions operation.

The intention is for things that don’t change the database schema, but still affect how some migrations are done (like RunPython migrations).

At first glance, it would seem that it means that the database is untouched in AlterModelOptions, but that’s actually not true.

Adding permissions to the Meta class will create a migration with the AlterModelOptions operation. This operation will actually insert rows into the permissions table. That’s a side effect that hits the database.

When you do a DeleteModel to remove a model completely, those permissions remain in the database.

Reverse relations

Django makes you write down the fields that map to database columns in your models, and they’ll be available as instance variables at runtime. But when you have a model relationship, via a ForeignKey, ManyToManyField, or a OneToOneField, Django magically adds an attribute in the “reverse” direction.

For example, if you have a ForeignKey in Post which points to an Author, Django will magically add an attribute to Author. If you don’t specify a related_name in the ForeignKey, the attribute name will be post_set.

That’s actually pretty cool – you can do queries as though it was any other field:

Author.objects.filter(post__in=[1, 2, 3])

However, now there’s the problem of how you modify this relation. Do you treat it like a ForeignKey field, and allow direct assignment like this:

Author.post_set = [1, 2]

Or do you disallow that, creating an inconsistency between this pseudo-field with other fields?

Django actually started out using the direct assignment at first. Doing Author.post_set = [1], would have actually persisted the data immediately in earlier versions, without save().

Later on, Django added methods like add(), remove(), set(), and clear() to make this a bit more explicit. But now it’s sometimes similar to a field like ForeignKey (while querying), but not really in other situations.

Another thing is that these reverse relation attributes like post_set are model managers. This allows you to lazily evaluate the relation, as well as perform additional filters, just like in any manager. Be careful with mixing up delete() and remove()!

There’s also a bit more about filtering. You may have been conditioned such that your mental model is that the following two lines of code are identical:

Post.objects.filter(title__icontains='hello').filter(author__name='Stanley')
Post.objects.filter(title__icontains='hello', author_name='Stanley')

Indeed, if you print the query for both of these querysets, you will see they are identical.

This is not true when your filters span relationships (taken from Django’s own example):

Blog.objects.filter(
    entry__headline__contains='Lennon',
    entry__pub_date__year=2008,
)

Blog.objects.filter(
    entry__headline__contains='Lennon',
).filter(entry__pub_date__year=2008)

The first one gives you “all blogs with at least one entry with headline containing ‘Lennon’ and is published in 2008”. The second one gives you “all blogs with at least one entry with headline containing ‘Lennon’, and blogs containing entries from 2008”. We’re filtering Blog, not Entry!

Oh, and exclude() behaves differently to the above.

What’s my point in all this? There just seems to be a lot of information I need to store in my head to use reverse relations without burning myself.

Model inheritance

There are three types of model inheritance. One of them, called multi-table inheritance, will create two tables, related via a OneToOneField. This can be confusing, but also cool because now you can represent hierarchical relationships.

Multi-table inheritance in Django models isn’t what it’s like in plain Python objects because it has ramifications for the database schema. Unlike plain Python, you can’t override the base model to hide fields. You can, however, do this when the base model is abstract.

Abstract base classes need special ways to specify related_name.

Honorable mention

Generic foreign keys.

Recommendations

There are two parts to this. The first part is what recommendations I would make for people using the Django ORM. The second part will be what kinds of things I would like the Django ORM to be. The second part is probably impossible.

Suggestions for users

Keep database logic in the model

The best way, in my opinion, to manage mutation is to write code such that the responsibility of mutating the data resides in the model class.

That means, please don’t do this in the view, especially a long view:

def some_view(request):
    ...
    obj.attr1 = 'some_value'
    obj.attr2 = 'another_value'
    ...

Or worse, this:

def some_view(request):
    ...
    obj.attr1 = 'some_value'
    mutate_even_more(obj)
    ...

If you can, write a method in the model that does this. Ideally, the only code that calls save() should be in the model class. There are fewer surprises (like getting passed a dirty object), and is better encapsulation.

If really needed, use a third party library that tracks model field changes.

Avoid using signals

There are a few legitimate uses for signals. For example, post_commit hooks that trigger Celery tasks are often necessary.

Outside of these, such as post_save() or pre_save() can be done in helper functions, or domain objects (that are not database-backed). Some people tell you to put it in save() but that also has weird side effects you might not be paying attention to.

Don’t be afraid to write something like this:


class PurchaseProcessor:
    def __init__(self, user, product):
        self.user = user
        self.product = product

    def generate_invoice(self):
        self.process()
        return PDFGenerator(user, products).get_content()

    def process(self):
        self._prepare()  # this replaces pre_save
        Purchase.objects.create(self.user, self.product)
        self._email_user()  # this replaces post_save


@require_http_methods(['POST'])
def purchase(request):
    form = PurchaseForm(request.POST)
    if form.is_valid():
        obj = PurchaseProcessor(request.user, form.cleaned_data['product'])
        return HttpResponse(obj.generate_invoice(), content_type='application/pdf')
    return HttpResponseBadRequest()

The PurchaseProcessor could also just be a function. There are fewer surprises here, and avoids weird side effects when you use post_save() (for example, when you’re hacking around in Django shell).

Use model inheritance sparingly

Actually, use any inheritance sparingly. Ever hear people tell you “favour composition over inheritance”? It might apply here.

Suggestions for the ORM

Include an is_dirty attribute

It might be helpful if there is an indicator that the model instances have been mutated before writing to the database. There’s already heaps of metaprogramming in the models, so I’m sure it won’t hurt to add this to __setattr__.

Unfortunately, this had been proposed before, and was rejected. There are third party libraries to do this now, such as django-dirtyfields.

Immutable model objects

I think it would be a nice idea to have model objects be “frozen” after a query. That way, we don’t have to guess if the model instance matches the database – instead we know it’s a “snapshot” of a row at the time of the query.

Use the data mapper pattern

Django follows the Active Record pattern as described by Martin Fowler. Basically all that means is that the model object contains the data, as well as the mechanism to access the data.

That might seem obvious if the only ORMs you’ve ever used are Active Record ORMs, but there is an alternative called the Data Mapper pattern, where the database access logic resides in a different class.

Neither one is necessarily better than the other, and each has its pros and cons. The nice thing about this pattern is that now your domain models are just plain Python objects. You can do inheritance like other any other Python class, without side effects in the database. On the other hand, you’re now writing a lot more boilerplate.

Actually, I think SQLAlchemy already does this, which might be a place to get inspiration.

Conclusion

There is a lot of information to store in your head when it comes to dealing with the Django ORM. Sometimes this can surprise you, so make sure you’re aware of some of these gotchas!

If you like posts like this, you might want to follow me on Twitter. Also, if you need any help building or improving your projects (Python/Django, JavaScript, Machine Learning, etc.) feel free to shoot me an email.