docs: Rename travis.md to continuous integration and add more content.

2018-12-28 13:51:03 +05:30 · 2018-12-28 13:51:03 +05:30 · cddce5bd34
parent 96bb27fa84
commit cddce5bd34
6 changed files with 212 additions and 119 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -1,3 +1,5 @@
+# See https://zulip.readthedocs.io/en/latest/testing/continuous-integration.html for
+#   high-level documentation on our CircleCI setup.
 # See CircleCI upstream's docs on this config format:
 #   https://circleci.com/docs/2.0/language-python/
 #
--- a/.travis.yml
+++ b/.travis.yml
@ -1,4 +1,4 @@
-# See https://zulip.readthedocs.io/en/latest/testing/travis.html for
+# See https://zulip.readthedocs.io/en/latest/testing/continuous-integration.html for
 # high-level documentation on our Travis CI setup.
 dist: trusty
 group: deprecated-2017Q4
--- a/docs/testing/continuous-integration.md
+++ b/docs/testing/continuous-integration.md
@ -0,0 +1,207 @@
+# Continuous integration (CI)
+
+The Zulip server uses [CircleCI](https://circleci.com/) and
+[Travis CI](https://travis-ci.org/) for continuous
+integration. CircleCI is the primary CI, and runs frontend and backend
+tests across a wide range of Ubuntu distributions. Travis CI is
+legacy, used only for running the end-to-end production installer
+test.  This page documents useful tools and tips to know about when
+using CircleCI and Travis CI and debugging issues with them.
+
+## Goals
+
+The overall goal of our CI is to avoid regressions and minimize the
+total time spent debugging Zulip.  We do that by trying to catch as
+many possible future bugs as possible, while minimizing both latency
+and false positives, both of which can waste a lot of developer time.
+There are a few implications of this overall goal:
+
+* If a test is failing nondeterministically in CI, we consider that to
+be an urgent problem.
+* If the tests become a lot slower, that is also an urgent problem.
+* Everything we do in CI should also have a way to run it quickly
+(under 1 minute, preferably under 3 seconds), in order to iterate fast
+in development. Except when working on the CI configuration itself, a
+developer should never have to repeatedly wait 10 minutes for a full CI
+run to iteratively debug something.
+
+## CircleCI
+
+### Useful debugging tips and tools
+
+* Zulip uses the `ts` tool to log the current time on every line of the output in
+our Travis CI scripts.  You can use this output to determine which steps are
+actually consuming a lot of time.
+
+* You can [sign up your personal repo for CircleCI][circleci-setup] so
+that every remote branch you push will be tested, which can be helpful
+when debugging something complicated.
+
+* With your personal repo signed up, CircleCI
+[allows you to SSH][circleci-ssh] into the job container if a job
+fails. SSHing into the containers can be helpful, especially in rare
+cases where the tests are passing in your computer but failing in the
+CI. Make sure that you have uploaded your SSH keys to GitHub: CircleCI
+uses those SSH keys for authentication.
+
+[docker-hub]: https://hub.docker.com/
+[circleci-setup]: ../git/cloning.html#step-3-configure-continuous-integration-for-your-fork
+[circleci-ssh]: https://circleci.com/docs/2.0/ssh-access-jobs/
+
+### Suites
+
+The main CircleCI configuration file is
+[./circleci/config.yml](https://github.com/zulip/zulip/blob/master/.circleci/config.yml).
+We currently run several jobs during a CircleCI build. They are:
+* trusty-python-3.4
+* xenial-python-3.5
+* bionic-python-3.6
+
+Each runs the Zulip backend test suites, using the indicated
+platform/OS and Python version. `bionic-python-3.6` job for example
+runs the tests in Ubuntu Xenial with Python 3.6 pre-installed.
+Additionally, the `trusty` suite also runs the Zulip frontend test
+suites; since those are not platform-dependent, it doesn't make sense
+to run them on all platforms.  Your build for the PR will pass only if
+all the 3 jobs are executed successfully.
+
+### Configuration
+
+The remaining details in this section are primarily relevant for doing
+development on our CI system and/or provisioning process.
+
+The first key of the job section is `docker`. The docker key specifies
+the image CircleCI should get from [Docker Hub][docker-hub] for running
+the job. Once CircleCI fetches the image from Docker Hub, it will spin
+up a docker container. See [images](#images) section to know more about
+the images we use in CircleCI for testing.
+
+After booting the container from the configured image, CircleCI will
+create the directory mentioned in `working_directory` and all the
+steps are be run from here.
+
+The `steps` section describes describes everything: fetching the Zulip
+code, provisioning, fetching catched data, running tests and uploading
+coverage reports. The steps with prefix `*` reference aliases, which
+are defined in the `aliases` section at the top of the file.
+
+### Images
+
+CircleCI tests are run in containers that are spun off from the images
+maintained by Zulip team. The Dockerfiles for the various images can be
+generated by running `./tools/circleci/generate-dockerfiles`. This command
+will generate the Dockerfiles of the three Ubuntu releases in
+`./tools/circleci/images/{release_name}` directories. Take a look at
+`./tools/circleci/images.yml` to see how the Dockerfiles for the three
+releases differ from each other. To further generate images from the
+Dockerfiles and upload it to Docker Hub follow the instructions in the
+generated Dockerfiles.
+
+### Performance optimizations
+
+#### Caching
+
+An important element of making CircleCI perform effectively is
+caching the provisioning of a Zulip development environment. In
+particular, we cache the following.:
+
+* Python virtualenvs
+* node_modules directories
+
+This has a huge impact on the performance of running tests in CircleCI
+CI; without these caches, the average test time would be several times
+longer.
+
+We have designed these caches carefully (they are also used in
+production and the Zulip development environment) to ensure that each
+is named by a hash of its dependencies and ubuntu distribution name,
+so Zulip should always be using the same version of dependencies it
+would have used had the cache not existed.  In practice, bugs are
+always possible, so be mindful of this possibility.
+
+A consequence of this caching is that test jobs for branches which
+modify `package.json`, `requirements/`, and other key dependencies
+will be significantly slower than normal, because they won't get to
+benefit from the cache.
+
+## Travis CI
+
+### Configuration
+
+The main Travis configuration file is
+[.travis.yml](https://github.com/zulip/zulip/blob/master/.travis.yml).
+The specific test suites we have are listed in the `matrix` section,
+which has a matrix of Python versions and test suites (`$TEST_SUITE`).
+
+Currently there is only the production test suite in this section as we
+have moved the backend and frontend suite to CircleCI. So the value of
+the variable `$TEST_SUITE` would be always `production`.
+
+We've configured it to use a few helper scripts for each job:
+
+* `tools/ci/setup-$TEST_SUITE`: This script sets up the test
+  environment for the production suite. This is a complicated process
+  because of all the packages Travis installs.  See the comments in
+  `tools/ci/setup-production` for details.
+* `tools/ci/$TEST_SUITE`: The script that runs the actual test
+  production test suite.
+
+The main purpose of the distinction between the two is that if the
+`setup-production` job fails, Travis CI will report it as the suite
+having "Errored" (grey in their emails), whereas if the `production` job
+fails, it'll be reported as "Failed" failure (red in their emails).
+Note that Travis CI's web UI seems to make no visual distinction
+between these.
+
+An important detail is that Travis CI will by default hide most phases
+other than the actual test; you can see this easily by looking at the
+line numbers in the Travis CI output.  There are actually a bunch of
+phases (e.g. the project's setup job, downloading caches near the
+beginning, uploading caches at the end, etc.), and if you're debugging
+our configuration, you'll want to look at these closely.
+
+### Useful debugging tips and tools
+
+* Zulip uses the `ts` tool to log the current time on every line of
+  the output in our Travis CI scripts.  You can use this output to
+  determine which steps are actually consuming a lot of time.
+
+* For performance issues,
+  [this statistics tool](https://scribu.github.io/travis-stats/#zulip/zulip/master)
+  can give you test runtime history data that can help with
+  determining when a performance issue was introduced and whether it
+  was fixed.  Note you need to click the "Run" button for it to do
+  anything.
+
+* You can [sign up your personal repo for Travis CI][travis-fork] so
+  that every remote branch you push will be tested, which can be
+  helpful when debugging something complicated.
+
+[travis-fork]: ../git/cloning.html#step-3-configure-continuous-integration-for-your-fork
+
+### Performance optimizations
+
+#### Caching
+
+We cache the following as well apart from what is mentioned in CircleCI
+caching section.
+
+* Built/downloaded emoji sprite sheets and data.
+
+This is probably worth eventually adding to the CircleCI caches, but
+because it only saves ~5s, it hasn't been a priority yet.
+
+#### Uninstalling packages
+
+In the production suite, we run `apt-get upgrade` at some point
+(effectively, because the Zulip installer does).  This carries a huge
+performance cost in Travis CI, because (1) they don't keep their test
+systems up to date and (2) literally everything is installed in their
+build workers (e.g. several copies of Postgres, Java, MySQL, etc.).
+
+In order to make Zulip's tests performance reasonably well, we
+uninstall (or mark with `apt-mark hold`) many of these dependencies
+that are irrelevant to Zulip in
+[`tools/ci/setup-production`][setup-production].
+
+[setup-production]: https://github.com/zulip/zulip/blob/master/tools/ci/setup-production
--- a/docs/testing/index.rst
+++ b/docs/testing/index.rst
@ -11,5 +11,5 @@ Code Testing
   testing-with-node
   testing-with-casper
   mypy
-   travis
+   continuous-integration
   manual-testing
--- a/docs/testing/testing.md
+++ b/docs/testing/testing.md
@ -9,7 +9,7 @@ important components are documented in depth in their own sections:
 - [Casper](../testing/testing-with-casper.html): end-to-end UI tests
 - [Node](../testing/testing-with-node.html): unit tests for JS front end code
 - [Linters](../testing/linters.html): Our parallel linter suite
- [CI details](travis.html): How all of these run in CI
+- [CI details](continuous-integration.html): How all of these run in CI
 - [Other test suites](#other-test-suites): Our various smaller test suites.

 This document covers more general testing issues, such as how to run the
--- a/docs/testing/travis.md
+++ b/docs/testing/travis.md
@ -1,116 +0,0 @@
-# Travis CI
-
-The Zulip server uses [Travis CI](https://travis-ci.org/) for its
-continuous integration.  This page documents useful tools and tips to
-know about when using Travis CI and debugging issues with it.
-
-## Goals
-
-The overall goal of our Travis CI setup is to avoid regressions and
-minimize the total time spent debugging Zulip.  We do that by trying
-to catch as many possible future bugs as possible, while minimizing
-both latency and false positives, both of which can waste a lot of
-developer time.  There are a few implications of this overall goal:
-
-* If a test is failing nondeterministically in Travis CI, we consider
-  that to be an urgent problem.
-* If the tests become a lot slower, that is also an urgent problem.
-* Everything we do in CI should also have a way to run it quickly
-(under 1 minute, preferably under 3 seconds), in order to iterate fast
-in development. Except when working on the Travis CI configuration
-itself, a developer should never have to repeatedly wait 10 minutes
-for a full Travis run to iteratively debug something.
-
-## Configuration
-
-The main Travis configuration file is
-[.travis.yml](https://github.com/zulip/zulip/blob/master/.travis.yml).
-The specific test suites we have are listed in the `matrix` section,
-which has a matrix of Python versions and test suites (`$TEST_SUITE`).
-We've configured it to use a few helper scripts for each job:
-
-* `tools/ci/setup-$TEST_SUITE`: The script that sets up the test
-  environment for that suite (E.g., installing dependencies).
-  * For the backend and frontend suites, this is a thin wrapper around
-    `tools/provision`, aka the development environment provision script.
-  * For the production suite, this is a more complicated process
-    because of all the packages Travis installs.  See the comments in
-    `tools/ci/setup-production` for details.
-* `tools/ci/$TEST_SUITE`: The script that runs the actual test
-  suite.
-
-The main purpose of the distinction between the two is that if the
-`setup-backend` job fails, Travis CI will report it as the suite
-having "Errored" (grey in their emails), whereas if the `backend` job
-fails, it'll be reported as "Failed" failure (red in their emails).
-Note that Travis CI's web UI seems to make no visual distinction
-between these.
-
-An important detail is that Travis CI will by default hide most phases
-other than the actual test; you can see this easily by looking at the
-line numbers in the Travis CI output.  There are actually a bunch of
-phases (e.g. the project's setup job, downloading caches near the
-beginning, uploading caches at the end, etc.), and if you're debugging
-our configuration, you'll want to look at these closely.
-
-## Useful debugging tips and tools
-
-* Zulip uses the `ts` tool to log the current time on every line of
-  the output in our Travis CI scripts.  You can use this output to
-  determine which steps are actually consuming a lot of time.
-
-* For performance issues,
-  [this statistics tool](https://scribu.github.io/travis-stats/#zulip/zulip/master)
-  can give you test runtime history data that can help with
-  determining when a performance issue was introduced and whether it
-  was fixed.  Note you need to click the "Run" button for it to do
-  anything.
-
-* You can [sign up your personal repo for Travis CI][travis-fork] so
-  that every remote branch you push will be tested, which can be
-  helpful when debugging something complicated.
-
-[travis-fork]: ../git/cloning.html#step-3-configure-continuous-integration-for-your-fork
-
-## Performance optimizations
-
-### Caching
-
-An important element of making Travis CI perform effectively is
-caching the provisioning of a Zulip development environment.  In
-particular, we cache the following across jobs:
-
-* Python virtualenvs
-* node_modules directories
-* Built/downloaded emoji sprite sheets and data
-
-This has a huge impact on the performance of running tests in Travis
-CI; without these caches, the average test time would be several times
-longer.
-
-We have designed these caches carefully (they are also used in
-production and the Zulip development environment) to ensure that each
-is named by a hash of its dependencies, so Zulip should always be
-using the same version of dependencies it would have used had the
-cache not existed.  In practice, bugs are always possible, so be
-mindful of this possibility.
-
-A consequence of this caching is that test jobs for branches which
-modify `package.json`, `requirements/`, and other key dependencies
-will be significantly slower than normal, because they won't get to
-benefit from the cache.
-
-### Uninstalling packages
-
-In the production suite, we run `apt-get upgrade` at some point
-(effectively, because the Zulip installer does).  This carries a huge
-performance cost in Travis CI, because (1) they don't keep their test
-systems up to date and (2) literally everything is installed in their
-build workers (e.g. several copies of Postgres, Java, MySQL, etc.).
-
-In order to make Zulip's tests performance reasonably well, we
-uninstall (or mark with `apt-mark hold`) many of these dependencies
-that are irrelevant to Zulip in
-[`tools/ci/setup-production`][setup-production].
-
-[setup-production]: https://github.com/zulip/zulip/blob/master/tools/ci/setup-production