From cddce5bd34f12bbec3ee62385d2c3b522b2234d5 Mon Sep 17 00:00:00 2001 From: Vishnu Ks Date: Fri, 28 Dec 2018 13:51:03 +0530 Subject: [PATCH] docs: Rename travis.md to continuous integration and add more content. --- .circleci/config.yml | 2 + .travis.yml | 2 +- docs/testing/continuous-integration.md | 207 +++++++++++++++++++++++++ docs/testing/index.rst | 2 +- docs/testing/testing.md | 2 +- docs/testing/travis.md | 116 -------------- 6 files changed, 212 insertions(+), 119 deletions(-) create mode 100644 docs/testing/continuous-integration.md delete mode 100644 docs/testing/travis.md diff --git a/.circleci/config.yml b/.circleci/config.yml index b766125e5e..8be052176c 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -1,3 +1,5 @@ +# See https://zulip.readthedocs.io/en/latest/testing/continuous-integration.html for +# high-level documentation on our CircleCI setup. # See CircleCI upstream's docs on this config format: # https://circleci.com/docs/2.0/language-python/ # diff --git a/.travis.yml b/.travis.yml index 3d5e339523..8daa1ada5e 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,4 +1,4 @@ -# See https://zulip.readthedocs.io/en/latest/testing/travis.html for +# See https://zulip.readthedocs.io/en/latest/testing/continuous-integration.html for # high-level documentation on our Travis CI setup. dist: trusty group: deprecated-2017Q4 diff --git a/docs/testing/continuous-integration.md b/docs/testing/continuous-integration.md new file mode 100644 index 0000000000..c8b53f8161 --- /dev/null +++ b/docs/testing/continuous-integration.md @@ -0,0 +1,207 @@ +# Continuous integration (CI) + +The Zulip server uses [CircleCI](https://circleci.com/) and +[Travis CI](https://travis-ci.org/) for continuous +integration. CircleCI is the primary CI, and runs frontend and backend +tests across a wide range of Ubuntu distributions. Travis CI is +legacy, used only for running the end-to-end production installer +test. This page documents useful tools and tips to know about when +using CircleCI and Travis CI and debugging issues with them. + +## Goals + +The overall goal of our CI is to avoid regressions and minimize the +total time spent debugging Zulip. We do that by trying to catch as +many possible future bugs as possible, while minimizing both latency +and false positives, both of which can waste a lot of developer time. +There are a few implications of this overall goal: + +* If a test is failing nondeterministically in CI, we consider that to +be an urgent problem. +* If the tests become a lot slower, that is also an urgent problem. +* Everything we do in CI should also have a way to run it quickly +(under 1 minute, preferably under 3 seconds), in order to iterate fast +in development. Except when working on the CI configuration itself, a +developer should never have to repeatedly wait 10 minutes for a full CI +run to iteratively debug something. + +## CircleCI + +### Useful debugging tips and tools + +* Zulip uses the `ts` tool to log the current time on every line of the output in +our Travis CI scripts. You can use this output to determine which steps are +actually consuming a lot of time. + +* You can [sign up your personal repo for CircleCI][circleci-setup] so +that every remote branch you push will be tested, which can be helpful +when debugging something complicated. + +* With your personal repo signed up, CircleCI +[allows you to SSH][circleci-ssh] into the job container if a job +fails. SSHing into the containers can be helpful, especially in rare +cases where the tests are passing in your computer but failing in the +CI. Make sure that you have uploaded your SSH keys to GitHub: CircleCI +uses those SSH keys for authentication. + +[docker-hub]: https://hub.docker.com/ +[circleci-setup]: ../git/cloning.html#step-3-configure-continuous-integration-for-your-fork +[circleci-ssh]: https://circleci.com/docs/2.0/ssh-access-jobs/ + +### Suites + +The main CircleCI configuration file is +[./circleci/config.yml](https://github.com/zulip/zulip/blob/master/.circleci/config.yml). +We currently run several jobs during a CircleCI build. They are: +* trusty-python-3.4 +* xenial-python-3.5 +* bionic-python-3.6 + +Each runs the Zulip backend test suites, using the indicated +platform/OS and Python version. `bionic-python-3.6` job for example +runs the tests in Ubuntu Xenial with Python 3.6 pre-installed. +Additionally, the `trusty` suite also runs the Zulip frontend test +suites; since those are not platform-dependent, it doesn't make sense +to run them on all platforms. Your build for the PR will pass only if +all the 3 jobs are executed successfully. + +### Configuration + +The remaining details in this section are primarily relevant for doing +development on our CI system and/or provisioning process. + +The first key of the job section is `docker`. The docker key specifies +the image CircleCI should get from [Docker Hub][docker-hub] for running +the job. Once CircleCI fetches the image from Docker Hub, it will spin +up a docker container. See [images](#images) section to know more about +the images we use in CircleCI for testing. + +After booting the container from the configured image, CircleCI will +create the directory mentioned in `working_directory` and all the +steps are be run from here. + +The `steps` section describes describes everything: fetching the Zulip +code, provisioning, fetching catched data, running tests and uploading +coverage reports. The steps with prefix `*` reference aliases, which +are defined in the `aliases` section at the top of the file. + +### Images + +CircleCI tests are run in containers that are spun off from the images +maintained by Zulip team. The Dockerfiles for the various images can be +generated by running `./tools/circleci/generate-dockerfiles`. This command +will generate the Dockerfiles of the three Ubuntu releases in +`./tools/circleci/images/{release_name}` directories. Take a look at +`./tools/circleci/images.yml` to see how the Dockerfiles for the three +releases differ from each other. To further generate images from the +Dockerfiles and upload it to Docker Hub follow the instructions in the +generated Dockerfiles. + +### Performance optimizations + +#### Caching + +An important element of making CircleCI perform effectively is +caching the provisioning of a Zulip development environment. In +particular, we cache the following.: + +* Python virtualenvs +* node_modules directories + +This has a huge impact on the performance of running tests in CircleCI +CI; without these caches, the average test time would be several times +longer. + +We have designed these caches carefully (they are also used in +production and the Zulip development environment) to ensure that each +is named by a hash of its dependencies and ubuntu distribution name, +so Zulip should always be using the same version of dependencies it +would have used had the cache not existed. In practice, bugs are +always possible, so be mindful of this possibility. + +A consequence of this caching is that test jobs for branches which +modify `package.json`, `requirements/`, and other key dependencies +will be significantly slower than normal, because they won't get to +benefit from the cache. + +## Travis CI + +### Configuration + +The main Travis configuration file is +[.travis.yml](https://github.com/zulip/zulip/blob/master/.travis.yml). +The specific test suites we have are listed in the `matrix` section, +which has a matrix of Python versions and test suites (`$TEST_SUITE`). + +Currently there is only the production test suite in this section as we +have moved the backend and frontend suite to CircleCI. So the value of +the variable `$TEST_SUITE` would be always `production`. + +We've configured it to use a few helper scripts for each job: + +* `tools/ci/setup-$TEST_SUITE`: This script sets up the test + environment for the production suite. This is a complicated process + because of all the packages Travis installs. See the comments in + `tools/ci/setup-production` for details. +* `tools/ci/$TEST_SUITE`: The script that runs the actual test + production test suite. + +The main purpose of the distinction between the two is that if the +`setup-production` job fails, Travis CI will report it as the suite +having "Errored" (grey in their emails), whereas if the `production` job +fails, it'll be reported as "Failed" failure (red in their emails). +Note that Travis CI's web UI seems to make no visual distinction +between these. + +An important detail is that Travis CI will by default hide most phases +other than the actual test; you can see this easily by looking at the +line numbers in the Travis CI output. There are actually a bunch of +phases (e.g. the project's setup job, downloading caches near the +beginning, uploading caches at the end, etc.), and if you're debugging +our configuration, you'll want to look at these closely. + +### Useful debugging tips and tools + +* Zulip uses the `ts` tool to log the current time on every line of + the output in our Travis CI scripts. You can use this output to + determine which steps are actually consuming a lot of time. + +* For performance issues, + [this statistics tool](https://scribu.github.io/travis-stats/#zulip/zulip/master) + can give you test runtime history data that can help with + determining when a performance issue was introduced and whether it + was fixed. Note you need to click the "Run" button for it to do + anything. + +* You can [sign up your personal repo for Travis CI][travis-fork] so + that every remote branch you push will be tested, which can be + helpful when debugging something complicated. + +[travis-fork]: ../git/cloning.html#step-3-configure-continuous-integration-for-your-fork + +### Performance optimizations + +#### Caching + +We cache the following as well apart from what is mentioned in CircleCI +caching section. + +* Built/downloaded emoji sprite sheets and data. + +This is probably worth eventually adding to the CircleCI caches, but +because it only saves ~5s, it hasn't been a priority yet. + +#### Uninstalling packages + +In the production suite, we run `apt-get upgrade` at some point +(effectively, because the Zulip installer does). This carries a huge +performance cost in Travis CI, because (1) they don't keep their test +systems up to date and (2) literally everything is installed in their +build workers (e.g. several copies of Postgres, Java, MySQL, etc.). + +In order to make Zulip's tests performance reasonably well, we +uninstall (or mark with `apt-mark hold`) many of these dependencies +that are irrelevant to Zulip in +[`tools/ci/setup-production`][setup-production]. + +[setup-production]: https://github.com/zulip/zulip/blob/master/tools/ci/setup-production diff --git a/docs/testing/index.rst b/docs/testing/index.rst index 4ffd57b6f3..daea8d907f 100644 --- a/docs/testing/index.rst +++ b/docs/testing/index.rst @@ -11,5 +11,5 @@ Code Testing testing-with-node testing-with-casper mypy - travis + continuous-integration manual-testing diff --git a/docs/testing/testing.md b/docs/testing/testing.md index 55e3a02e85..b837237557 100644 --- a/docs/testing/testing.md +++ b/docs/testing/testing.md @@ -9,7 +9,7 @@ important components are documented in depth in their own sections: - [Casper](../testing/testing-with-casper.html): end-to-end UI tests - [Node](../testing/testing-with-node.html): unit tests for JS front end code - [Linters](../testing/linters.html): Our parallel linter suite -- [CI details](travis.html): How all of these run in CI +- [CI details](continuous-integration.html): How all of these run in CI - [Other test suites](#other-test-suites): Our various smaller test suites. This document covers more general testing issues, such as how to run the diff --git a/docs/testing/travis.md b/docs/testing/travis.md deleted file mode 100644 index 7435258fc5..0000000000 --- a/docs/testing/travis.md +++ /dev/null @@ -1,116 +0,0 @@ -# Travis CI - -The Zulip server uses [Travis CI](https://travis-ci.org/) for its -continuous integration. This page documents useful tools and tips to -know about when using Travis CI and debugging issues with it. - -## Goals - -The overall goal of our Travis CI setup is to avoid regressions and -minimize the total time spent debugging Zulip. We do that by trying -to catch as many possible future bugs as possible, while minimizing -both latency and false positives, both of which can waste a lot of -developer time. There are a few implications of this overall goal: - -* If a test is failing nondeterministically in Travis CI, we consider - that to be an urgent problem. -* If the tests become a lot slower, that is also an urgent problem. -* Everything we do in CI should also have a way to run it quickly -(under 1 minute, preferably under 3 seconds), in order to iterate fast -in development. Except when working on the Travis CI configuration -itself, a developer should never have to repeatedly wait 10 minutes -for a full Travis run to iteratively debug something. - -## Configuration - -The main Travis configuration file is -[.travis.yml](https://github.com/zulip/zulip/blob/master/.travis.yml). -The specific test suites we have are listed in the `matrix` section, -which has a matrix of Python versions and test suites (`$TEST_SUITE`). -We've configured it to use a few helper scripts for each job: - -* `tools/ci/setup-$TEST_SUITE`: The script that sets up the test - environment for that suite (E.g., installing dependencies). - * For the backend and frontend suites, this is a thin wrapper around - `tools/provision`, aka the development environment provision script. - * For the production suite, this is a more complicated process - because of all the packages Travis installs. See the comments in - `tools/ci/setup-production` for details. -* `tools/ci/$TEST_SUITE`: The script that runs the actual test - suite. - -The main purpose of the distinction between the two is that if the -`setup-backend` job fails, Travis CI will report it as the suite -having "Errored" (grey in their emails), whereas if the `backend` job -fails, it'll be reported as "Failed" failure (red in their emails). -Note that Travis CI's web UI seems to make no visual distinction -between these. - -An important detail is that Travis CI will by default hide most phases -other than the actual test; you can see this easily by looking at the -line numbers in the Travis CI output. There are actually a bunch of -phases (e.g. the project's setup job, downloading caches near the -beginning, uploading caches at the end, etc.), and if you're debugging -our configuration, you'll want to look at these closely. - -## Useful debugging tips and tools - -* Zulip uses the `ts` tool to log the current time on every line of - the output in our Travis CI scripts. You can use this output to - determine which steps are actually consuming a lot of time. - -* For performance issues, - [this statistics tool](https://scribu.github.io/travis-stats/#zulip/zulip/master) - can give you test runtime history data that can help with - determining when a performance issue was introduced and whether it - was fixed. Note you need to click the "Run" button for it to do - anything. - -* You can [sign up your personal repo for Travis CI][travis-fork] so - that every remote branch you push will be tested, which can be - helpful when debugging something complicated. - -[travis-fork]: ../git/cloning.html#step-3-configure-continuous-integration-for-your-fork - -## Performance optimizations - -### Caching - -An important element of making Travis CI perform effectively is -caching the provisioning of a Zulip development environment. In -particular, we cache the following across jobs: - -* Python virtualenvs -* node_modules directories -* Built/downloaded emoji sprite sheets and data - -This has a huge impact on the performance of running tests in Travis -CI; without these caches, the average test time would be several times -longer. - -We have designed these caches carefully (they are also used in -production and the Zulip development environment) to ensure that each -is named by a hash of its dependencies, so Zulip should always be -using the same version of dependencies it would have used had the -cache not existed. In practice, bugs are always possible, so be -mindful of this possibility. - -A consequence of this caching is that test jobs for branches which -modify `package.json`, `requirements/`, and other key dependencies -will be significantly slower than normal, because they won't get to -benefit from the cache. - -### Uninstalling packages - -In the production suite, we run `apt-get upgrade` at some point -(effectively, because the Zulip installer does). This carries a huge -performance cost in Travis CI, because (1) they don't keep their test -systems up to date and (2) literally everything is installed in their -build workers (e.g. several copies of Postgres, Java, MySQL, etc.). - -In order to make Zulip's tests performance reasonably well, we -uninstall (or mark with `apt-mark hold`) many of these dependencies -that are irrelevant to Zulip in -[`tools/ci/setup-production`][setup-production]. - -[setup-production]: https://github.com/zulip/zulip/blob/master/tools/ci/setup-production