Event sourcing in Django

Posted on Thu 05 January 2017 in Django

Django comes with "batteries included" to make CRUD (create, read, update, delete) operations easy. It's nice that the CR part (create and read) of CRUD is so easy, but have you ever paused to think about the UD part (update and delete)?

Let's look at delete. All you need to do is this:

ReallyImportantModel.objects.get(id=32).delete()  # gone from the database forever

Just one line, and your data is gone forever. It can be done accidentally. Or you can be do it deliberately, only to later realise that your old data is valuable too.

Now what about updating?

Updating is deleting in disguise.

When you update, you're deleting the old data and replacing it with something new. It's still deletion.

important = ReallyImportantModel.object.get(id=32)
important.update(data={'new_data': 'This is new data'})  # OLD DATA GONE FOREVER

Okay, but why do we care?

Let's say we want to know the state of ReallyImportantModel 6 months ago. Oh that's right, you've deleted it, so you can't get it back.

Well, that's not exactly true -- you can recreate your data from backups (if you don't backup your database, stop reading right now and fix that immediately). But that's clumsy.

So by only storing the current state of the object, you lose all the contextual information on how the object arrived at this current state. Not only that, you make it difficult to make projections about the future.

Event sourcing 1 can help with that.

Event sourcing

The basic concept of event sourcing is this:

  • Instead of just storing the current state, we also store the events that lead up to the current state
  • Events are replayable. We can travel back in time to any point by replaying every event up to that point in time
  • That also means we can recover the current state just by replaying every event, even if the current state was accidentally deleted
  • Events are append-only.

To gain an intuition, let's look at an event sourcing system you're familiar with: your bank account.

Your "state" is your account balance, while your "events" are your transactions (deposit, withdrawal, etc.).

Can you imagine a bank account that only shows you the current balance?

That is clearly unacceptable ("Why do I only have $50? Where did my money go? If only I could see the the history."). So we always store the history of transfers as the source of truth.

Implementing event sourcing in Django

Let's look at a few ways to do this in Django.

Ad-hoc models

If you have a one or two important models, you probably don't need a generalizable event sourcing solution that applies to all models.

You could do it on an ad-hoc basis like this, if you can have a relationship that makes sense:

# in an app called 'account'
from django.db import models
from django.conf import settings


class Account(models.Model):
    """Bank account"""
    balance = models.DecimalField(max_digits=19, decimal_places=6)
    owner = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='account')


class Transfer(models.Model):
    """
    Represents a transfer in or out of an account. A positive amount indicates
    that it is a transfer into the account, whereas a negative amount indicates
    that it is a transfer out of the account.
    """
    account = models.ForeignKey('account.Account', on_delete=models.PROTECT, 
                                related_name='transfers')
    amount = models.DecimalField(max_digits=19, decimal_places=6)
    date = models.DateTimeField()

In this case your "state" is in your Account model, whereas your Transfer model contains the "events".

Having Transfer objects makes it trivial to recreate any account.

Using an Event Store

You could also use a single Event model to store every possible event in any model. A nice way to do this is to encode the changes in a JSON field.

This example uses Postgres:

from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.contrib.postgres.fields import JSONField
from django.db import models


class Event(models.Model):
    """Event table that stores all model changes"""
    content_type = models.ForeignKey(ContentType, on_delete=models.PROTECT)
    object_id = models.PositiveIntegerField()
    time_created = models.DateTimeField()
    content_object = GenericForeignKey('content_type', 'object_id')
    body = JSONField()

You can then add methods to any model that mutates the state:

class Account(models.Model):
    balance = models.DecimalField(max_digits=19, decimal_places=6, default=0
    owner = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='account')

    def make_deposit(self, amount):
        """Deposit money into account"""
        Event.objects.create(
            content_object=self,
            time_created=timezone.now(),
            body=json.dumps({
                'type': 'made_deposit',
                'amount': amount,
            })
        )
        self.balance += amount
        self.save()

    def make_withdrawal(self, amount):
        """Withdraw money from account"""
        Event.objects.create(
            content_object=self,
            time_created=timezone.now(),
            body=json.dumps({
                'type': 'made_withdrawal',
                'amount': -amount,  # withdraw = negative amount
            })
        )
        self.balance -= amount
        self.save()

    @classmethod
    def create_account(cls, owner):
        """Create an account"""
        account = cls.objects.create(owner=owner, balance=0)
        Event.objects.create(
            content_object=account,
            time_created=timezone.now(),
            body=json.dumps({
                'type': 'created_account',
                'id': account.id,
                'owner_id': owner.id
            })
        )
        return account

So now you can do this:

account = Account.create_account(owner=User.objects.first())
account.make_deposit(decimal.Decimal(50.0))
account.make_deposit(decimal.Decimal(125.0))
account.make_withdrawal(decimal.Decimal(75.0))

events = Event.objects.filter(
    content_type=ContentType.objects.get_for_model(account), 
    object_id=account.id
)

for event in events:
    print(event.body)

Which should give you this:

{"type": "created_account", "id": 2, "owner_id": 1}
{"type": "made_deposit", "amount": 50.0}
{"type": "made_deposit", "amount": 125.0}
{"type": "made_withdrawal", "amount": -75}

Again, this makes it trivial to write any utility methods to recreate any instance of Account, even if you accidentally dropped the whole accounts table.

Snapshotting

There will come a time when you have too many events to efficiently replay the entire history. In this case, a good optimisation step would be snapshots taken at various points in history. For example, in our accounting example one could save snapshots of the account in an AccountBalance model, which is a snapshot of the account's state at a point in time.

You could do this via a scheduled task. Celery 2 is a good option.

Summary

Use event sourcing to maintain an append-only list of events for your critical data. This effectively allows you to travel in time to any point in history to see the state of your data at that time.

UPDATE: If you want to see an example repo, feel free to take a look here: https://github.com/yoongkang/event_sourcing_example


  1. Martin Fowler wrote a detailed description of event sourcing in his website here: http://martinfowler.com/eaaDev/EventSourcing.html 

  2. Celery project. http://www.celeryproject.org/ 

Subscribe to receive free Python tips

Sometimes I keep my best content for my email subscribers. Subscribe now so that you don't miss out! You can unsubscribe at any time, and I will never spam you.
* indicates required

Comments