zulip/docs/schema-changes.rst

132 lines
4.9 KiB
ReStructuredText

==============
Schema changes
==============
If you are making a change that requires a database schema upgrade,
there are a few extra things you need to keep in mind.
Using South for migrations
--------------------------
1. Discuss the change informally with your team.
#. Edit ``zerver/models.py`` for your particular class.
* See notes below about keep\_default.
#. Run ``./manage.py schemamigration zerver --auto``
* This will create the ``000#_***.py`` schema migration file in
``zerver/migrations``.
#. Read `Notes and Cautions`_ section below, as you may need to edit
the migration. A common step here is setting keep\_default to True.
#. Do ``git add`` with your new migration.
#. Run ``./manage.py migrate zerver``.
#. Write supporting code or otherwise validate the DB change locally.
TODO: Advice on testing schema changes?
#. Commit your changes:
a. The migration must be in the same commit as the models.py changes.
#. Include [schema] in the commit message.
#. Include [manual] in the commit message if additional steps are
required.
#. Before deploying your code fix, read the notes on `Deploying to
staging`_.
Deploying to staging
--------------------
Always follow this process.
1. Schedule the migration for after hours.
#. For long-running migrations, double check that you use appropriate
library helpers in ``migrate.py`` to ensure that changes happen in small
batches that are committed frequently.
#. Announce that you are doing the migration to your team, to avoid
simultaneous migrations and other outcomes of poor communication.
#. Do any administrative steps, such as increasing
checkpoint\_segments.
#. Apply the migration in advance from staging, using commands similar
to the following, where ``[your commit]`` is the commit that has your
migration::
cd ~/zulip
git fetch
cd /tmp
git clone ~/zulip
cd zulip
git checkout [your commit]
./manage.py migrate zerver
cd /tmp
rm -Rf zulip
#. Undo any temporary administrative changes, such as increasing
checkpoint\_segments.
Because staging and prod share a database, for most migrations, nothing
special needs to be done when deploying to prod since the shared
database schema will have already been updated, but in some cases some
code to properly initialize data structures may need to be run.
Migrating to a new schema
-------------------------
When doing a git pull and noticing a [schema] commit, you must manually
perform a schema upgrade: ``./manage.py migrate zerver``.
``generate-fixtures`` should automatically detect whether
the schema has changed and update things accordingly.
Notes and Cautions
------------------
**Large tables**
For large tables like Message and UserMessage, you
want to take precautions when adding columns to the table, performing
data backfills, or building indexes. We have a ``migrate.py`` library to
help with adding columns and backfilling data. For building indexes,
we should do this outside of South using postgres's CONCURRENTLY
keyword.
**Numbering conflicts across branches**
If you've done your schema change in a branch, and meanwhile another
schema change has taken place, South will now have two migrations with
the same number. To fix this, delete the migration file that South
generated, and re-run ``./manage.py schemamigration zerver --auto``.
**Avoid nullables**
You generally no longer need a Nullable column
to avoid problems with staging and prod not having the same models.
See the next point about setting ``keep_default=True``.
**Use keep\_default**
When adding a new column to an existing table,
you almost always will want to set ``keep_default=True`` in the South
migration ``db.add_column`` call. If you don't, everything will
appear to work fine in testing and on staging, but once the schema
migration is done, the pre-migration code running on prod will be
unable to save new rows for that table (so e.g. if you were adding a
new field to UserProfile, we'd be unable to create new users). The
exception to this rule is when your field default is not a constant
value. In this case, you'll need to do something special to either
set a database-level default or use a Nullable field and a multi-step
schema deploy process.
**Rebase pain**
If you ever need to rebase a schema change past
other schema changes made on other branches, in addition to
renumbering your schema change, youalso need to be sure to regenerate
at least the bottom part of your migration (which shows the current
state of all the models) after rebasing; if you don't, then the next
migration made after your migration is merged will incorrectly
attempt to re-apply all the schema changes made in the migration you
skipped. This can be potentially dangerous.
**Upstreaming**
We recommend upstreaming schema changes as soon as possible to
avoid schema numbering conflicts (see above).