==============
Schema changes
==============

If you are making a change that requires a database schema upgrade,
there are a few extra things you need to keep in mind.

Using South for migrations
--------------------------

1. Discuss the change informally with your team.
#. Edit ``zerver/models.py`` for your particular class.

   * See notes below about keep\_default.

#. Run ``./manage.py schemamigration zerver --auto``

   * This will create the ``000#_***.py`` schema migration file in
     ``zerver/migrations``.

#. Read `Notes and Cautions`_ section below, as you may need to edit
   the migration. A common step here is setting keep\_default to True.
#. Do ``git add`` with your new migration.
#. Run ``./manage.py migrate zerver``.
#. Write supporting code or otherwise validate the DB change locally.
   TODO: Advice on testing schema changes?
#. Commit your changes:

   a. The migration must be in the same commit as the models.py changes.
   #. Include [schema] in the commit message.
   #. Include [manual] in the commit message if additional steps are
      required.

#. Before deploying your code fix, read the notes on `Deploying to
   staging`_.

Deploying to staging
--------------------

Always follow this process.

1. Schedule the migration for after hours.

#. For long-running migrations, double check that you use appropriate
   library helpers in ``migrate.py`` to ensure that changes happen in small
   batches that are committed frequently.

#. Announce that you are doing the migration to your team, to avoid
   simultaneous migrations and other outcomes of poor communication.

#. Do any administrative steps, such as increasing
   checkpoint\_segments.

#. Apply the migration in advance from staging, using commands similar
   to the following, where ``[your commit]`` is the commit that has your
   migration::

     cd ~/zulip
     git fetch
     cd /tmp
     git clone ~/zulip
     cd zulip
     git checkout [your commit]
     ./manage.py migrate zerver
     cd /tmp
     rm -Rf zulip

#. Undo any temporary administrative changes, such as increasing
   checkpoint\_segments.

Because staging and prod share a database, for most migrations, nothing
special needs to be done when deploying to prod since the shared
database schema will have already been updated, but in some cases some
code to properly initialize data structures may need to be run.

Migrating to a new schema
-------------------------

When doing a git pull and noticing a [schema] commit, you must manually
perform a schema upgrade: ``./manage.py migrate zerver``.
``generate-fixtures`` should automatically detect whether
the schema has changed and update things accordingly.

Notes and Cautions
------------------

**Large tables**
  For large tables like Message and UserMessage, you
  want to take precautions when adding columns to the table, performing
  data backfills, or building indexes. We have a ``migrate.py`` library to
  help with adding columns and backfilling data. For building indexes,
  we should do this outside of South using postgres's CONCURRENTLY
  keyword.

**Numbering conflicts across branches**
  If you've done your schema change in a branch, and meanwhile another
  schema change has taken place, South will now have two migrations with
  the same number. To fix this, delete the migration file that South
  generated, and re-run ``./manage.py schemamigration zerver --auto``.

**Avoid nullables**
  You generally no longer need a Nullable column
  to avoid problems with staging and prod not having the same models.
  See the next point about setting ``keep_default=True``.

**Use keep\_default**
  When adding a new column to an existing table,
  you almost always will want to set ``keep_default=True`` in the South
  migration ``db.add_column`` call. If you don't, everything will
  appear to work fine in testing and on staging, but once the schema
  migration is done, the pre-migration code running on prod will be
  unable to save new rows for that table (so e.g. if you were adding a
  new field to UserProfile, we'd be unable to create new users). The
  exception to this rule is when your field default is not a constant
  value. In this case, you'll need to do something special to either
  set a database-level default or use a Nullable field and a multi-step
  schema deploy process.

**Rebase pain**
  If you ever need to rebase a schema change past
  other schema changes made on other branches, in addition to
  renumbering your schema change, youalso need to be sure to regenerate
  at least the bottom part of your migration (which shows the current
  state of all the models) after rebasing; if you don't, then the next
  migration made after your migration is merged will incorrectly
  attempt to re-apply all the schema changes made in the migration you
  skipped. This can be potentially dangerous.

**Upstreaming**
  We recommend upstreaming schema changes as soon as possible to
  avoid schema numbering conflicts (see above).