2016-04-01 08:50:53 +02:00
|
|
|
# Schema Migrations
|
|
|
|
|
|
|
|
Zulip uses the [standard Django system for doing schema
|
2017-02-23 08:35:37 +01:00
|
|
|
migrations](https://docs.djangoproject.com/en/1.10/topics/migrations/).
|
2016-06-26 18:47:23 +02:00
|
|
|
There is some example usage in the [new feature
|
2016-06-28 05:54:18 +02:00
|
|
|
tutorial](new-feature-tutorial.html).
|
2016-04-01 08:50:53 +02:00
|
|
|
|
|
|
|
This page documents some important issues related to writing schema
|
|
|
|
migrations.
|
|
|
|
|
|
|
|
* **Large tables**: For large tables like Message and UserMessage, you
|
|
|
|
want to take precautions when adding columns to the table,
|
|
|
|
performing data backfills, or building indexes. We have a
|
|
|
|
`zerver/lib/migrate.py` library to help with adding columns and
|
|
|
|
backfilling data. For building indexes on these tables, we should do
|
|
|
|
this using SQL with postgres's CONCURRENTLY keyword.
|
|
|
|
|
|
|
|
* **Numbering conflicts across branches**: If you've done your schema
|
|
|
|
change in a branch, and meanwhile another schema change has taken
|
|
|
|
place, Django will now have two migrations with the same number. To
|
2016-07-18 06:45:38 +02:00
|
|
|
fix this, you need to renumber your migration(s), fix up
|
|
|
|
the "dependencies" entries in your migration(s), and rewrite your
|
|
|
|
git history as needed. There is a tutorial
|
|
|
|
[here](migration-renumbering.html) that walks you though that
|
|
|
|
process.
|
2017-02-23 08:35:37 +01:00
|
|
|
|
|
|
|
* **Atomicity**. By default, each Django migration is run atomically
|
|
|
|
inside a transaction. This can be problematic if one wants to do
|
|
|
|
something in a migration that touches a lot of data and would best
|
|
|
|
be done in batches of e.g. 1000 objects (e.g. a `Message` or
|
|
|
|
`UserMessage` table change). There is a new Django feature added in
|
|
|
|
[Django 1.10][migrations-non-atomic] that makes it possible to add
|
|
|
|
`atomic=False` at the top of a `Migration` class and thus not have
|
|
|
|
the entire migration in a transaction. This should make it possible
|
|
|
|
to use the batch update tools in `zerver/lib/migrate.py` (originally
|
|
|
|
written to work with South) for doing larger database migrations.
|
|
|
|
|
2017-03-05 02:29:08 +01:00
|
|
|
* **Accessing models in RunPython migrations**. When writing a
|
|
|
|
migration that includes custom python code (aka `RunPython`), you
|
|
|
|
cannot just use `from zerver.models import UserProfile` to access
|
|
|
|
models; that would import the model as it is right now. What you
|
|
|
|
want is to import a version of model "as of just before the
|
|
|
|
currently executing migration". You can do this inside the relevant
|
|
|
|
migration function with `apps.get_model`. We have a linter rule to
|
|
|
|
warn about this sort of issue, since it often manifests long after
|
|
|
|
the actual mistake.
|
|
|
|
|
2017-02-23 08:43:36 +01:00
|
|
|
* **Making large migrations work**. Major migrations should have a
|
|
|
|
few properties:
|
|
|
|
|
|
|
|
* **Unit tests**. You'll want to carefully test these, so you might
|
|
|
|
as well write some unit tests to verify the migration works
|
|
|
|
correctly, rather than doing everything by hand. This often saves
|
|
|
|
a lot of time in re-testing the migration process as we make
|
|
|
|
adjustments to the plan.
|
|
|
|
* **Run in batches**. Updating more than 1K-10K rows (depending on
|
|
|
|
type) in a single transaction can lock up a database. It's best
|
|
|
|
to do lots of small batches, potentially with a brief sleep in
|
|
|
|
between, so that we don't block other operations from finishing.
|
|
|
|
* **Rerunnability/idempotency**. Good migrations are ones where if
|
|
|
|
operational concerns (e.g. it taking down the Zulip server for
|
|
|
|
users) interfere with it finishing, it's easy to restart the
|
|
|
|
migration without doing a bunch of hand investigation. Ideally,
|
|
|
|
the migration can even continue where it left off, without needing
|
|
|
|
to redo work.
|
|
|
|
* **Multi-step migrations**. For really big migrations, one wants
|
|
|
|
to split the transition into into several commits that are each
|
|
|
|
individually correct, and can each be deployed independently:
|
|
|
|
|
|
|
|
1. First, do a migration to add the new column to the Message table
|
|
|
|
and start writing to that column (but don't use it for anything)
|
|
|
|
2. Second, do a migration to copy values from the old column to
|
|
|
|
the new column, to ensure that the two data stores agree.
|
|
|
|
3. Third, a commit that stops writing to the old field.
|
|
|
|
4. Any cleanup work, e.g. if the old field were a column, we'd do
|
|
|
|
a migration to remove it entirely here.
|
|
|
|
|
|
|
|
This multi-step process is how most migrations on large database
|
|
|
|
tables are done in large-scale systems, since it ensures that the
|
|
|
|
system can continue running happily during the migration.
|
|
|
|
|
2017-02-23 08:35:37 +01:00
|
|
|
[migrations-non-atomic]: https://docs.djangoproject.com/en/1.10/howto/writing-migrations/#non-atomic-migrations
|