2018-12-28 09:21:03 +01:00
|
|
|
# Continuous integration (CI)
|
|
|
|
|
2021-03-15 18:39:44 +01:00
|
|
|
The Zulip server uses [GitHub Actions](https://docs.github.com/en/actions) for continuous
|
|
|
|
integration. GitHub Actions runs frontend, backend and end-to-end production
|
2020-04-28 14:44:04 +02:00
|
|
|
installer tests. This page documents useful tools and tips when using
|
2021-03-15 18:39:44 +01:00
|
|
|
GitHub Actions and debugging issues with it.
|
2018-12-28 09:21:03 +01:00
|
|
|
|
|
|
|
## Goals
|
|
|
|
|
|
|
|
The overall goal of our CI is to avoid regressions and minimize the
|
|
|
|
total time spent debugging Zulip. We do that by trying to catch as
|
|
|
|
many possible future bugs as possible, while minimizing both latency
|
|
|
|
and false positives, both of which can waste a lot of developer time.
|
|
|
|
There are a few implications of this overall goal:
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- If a test is failing nondeterministically in CI, we consider that to
|
2018-12-28 09:21:03 +01:00
|
|
|
be an urgent problem.
|
2021-08-20 21:45:39 +02:00
|
|
|
- If the tests become a lot slower, that is also an urgent problem.
|
|
|
|
- Everything we do in CI should also have a way to run it quickly
|
2018-12-28 09:21:03 +01:00
|
|
|
(under 1 minute, preferably under 3 seconds), in order to iterate fast
|
|
|
|
in development. Except when working on the CI configuration itself, a
|
|
|
|
developer should never have to repeatedly wait 10 minutes for a full CI
|
|
|
|
run to iteratively debug something.
|
|
|
|
|
2021-03-15 18:39:44 +01:00
|
|
|
## GitHub Actions
|
2018-12-28 09:21:03 +01:00
|
|
|
|
|
|
|
### Useful debugging tips and tools
|
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- GitHub Actions stores timestamps for every line in the logs. They
|
2021-09-08 00:23:24 +02:00
|
|
|
are hidden by default; you can see them by toggling the
|
|
|
|
`Show timestamps` option in the menu on any job's log page. (You can
|
|
|
|
get this sort of timestamp in a development environment by piping
|
|
|
|
output to `ts`).
|
2018-12-28 09:21:03 +01:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- GitHub Actions runs on every branch you push on your Zulip fork.
|
2021-03-15 18:39:44 +01:00
|
|
|
This is helpful when debugging something complicated.
|
2018-12-28 09:21:03 +01:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- You can also ssh into a container to debug failures. SSHing into
|
2021-03-15 18:39:44 +01:00
|
|
|
the containers can be helpful, especially in rare cases where the
|
|
|
|
tests are passing in your computer but failing in the CI. There are
|
|
|
|
various
|
|
|
|
[Actions](https://github.com/marketplace?type=actions&query=debug+ssh)
|
|
|
|
available on GitHub Marketplace to help you SSH into a container. Use
|
|
|
|
whichever you find easiest to set up.
|
2018-12-28 09:21:03 +01:00
|
|
|
|
|
|
|
### Suites
|
|
|
|
|
2021-03-15 18:39:44 +01:00
|
|
|
We run multiple jobs during a GitHub Actions build to efficiently run
|
|
|
|
Zulip's various test suites, some of them multiple times because we
|
|
|
|
support multiple versions of the base OS. See the [Actions
|
|
|
|
tabs](https://github.com/zulip/zulip/actions) for full list of Actions
|
|
|
|
that we run.
|
2018-12-28 09:21:03 +01:00
|
|
|
|
2021-03-15 18:39:44 +01:00
|
|
|
Files which define GitHub workflows live in `.github/workflows` directory.
|
|
|
|
`zulip-ci.yml` is the main file where most of the tests are run.
|
|
|
|
`production-suite.yml` builds a Zulip release tarball, which is
|
|
|
|
then installed in a fresh container. Various Nagios and other
|
|
|
|
checks are run to confirm the installation worked.
|
2020-04-28 14:44:04 +02:00
|
|
|
|
2021-03-15 18:39:44 +01:00
|
|
|
`zulip-ci.yml` is designed to run our main test suites on all of our
|
|
|
|
supported platforms. Out of them, only one of them runs the frontend
|
|
|
|
tests, since `puppeteer` is slow and unlikely to catch issues that
|
|
|
|
depend on the version of the base OS and/or Python.
|
2020-04-28 14:44:04 +02:00
|
|
|
|
2021-03-15 18:39:44 +01:00
|
|
|
Our code for running the tests in CI lives under `tools/ci`; but that
|
|
|
|
logic is mostly thin wrappers around [Zulip's test
|
|
|
|
suites](../testing/testing.md) or production installer.
|
2020-04-28 14:44:04 +02:00
|
|
|
|
2021-03-15 18:39:44 +01:00
|
|
|
The `Legacy OS` tests are designed to ensure we give good error
|
|
|
|
messages when trying to upgrade Zulip servers running on very old base
|
|
|
|
OS versions with EOL Python versions that Zulip no longer supports.
|
2020-04-28 15:02:37 +02:00
|
|
|
|
2018-12-28 09:21:03 +01:00
|
|
|
### Configuration
|
|
|
|
|
|
|
|
The remaining details in this section are primarily relevant for doing
|
|
|
|
development on our CI system and/or provisioning process.
|
|
|
|
|
|
|
|
The first key of the job section is `docker`. The docker key specifies
|
2021-05-10 07:02:14 +02:00
|
|
|
the image GitHub Actions should get from [Docker Hub][docker-hub] for running
|
|
|
|
the job. Once GitHub Actions fetches the image from Docker Hub, it will spin
|
2018-12-28 09:21:03 +01:00
|
|
|
up a docker container. See [images](#images) section to know more about
|
2021-05-10 07:02:14 +02:00
|
|
|
the images we use in GitHub Actions for testing.
|
2018-12-28 09:21:03 +01:00
|
|
|
|
2021-05-10 07:02:14 +02:00
|
|
|
After booting the container from the configured image, GitHub Actions will
|
2018-12-28 09:21:03 +01:00
|
|
|
create the directory mentioned in `working_directory` and all the
|
|
|
|
steps are be run from here.
|
|
|
|
|
|
|
|
The `steps` section describes describes everything: fetching the Zulip
|
2020-03-17 13:57:10 +01:00
|
|
|
code, provisioning, fetching caught data, running tests and uploading
|
2018-12-28 09:21:03 +01:00
|
|
|
coverage reports. The steps with prefix `*` reference aliases, which
|
|
|
|
are defined in the `aliases` section at the top of the file.
|
|
|
|
|
|
|
|
### Images
|
|
|
|
|
2021-07-22 21:33:39 +02:00
|
|
|
GitHub Actions tests are run in containers that are spun off from the
|
|
|
|
images maintained by Zulip team. The Docker images can be generated by
|
|
|
|
running `tools/ci/build-docker-images`; see instructions at the top of
|
|
|
|
`tools/ci/Dockerfile` for more information.
|
2018-12-28 09:21:03 +01:00
|
|
|
|
|
|
|
### Performance optimizations
|
|
|
|
|
|
|
|
#### Caching
|
|
|
|
|
2021-05-10 07:02:14 +02:00
|
|
|
An important element of making GitHub Actions perform effectively is caching
|
2020-04-28 14:44:04 +02:00
|
|
|
between jobs the various caches that live under `/srv/` in a Zulip
|
|
|
|
development or production environment. In particular, we cache the
|
|
|
|
following:
|
2018-12-28 09:21:03 +01:00
|
|
|
|
2021-08-20 21:45:39 +02:00
|
|
|
- Python virtualenvs
|
|
|
|
- node_modules directories
|
2018-12-28 09:21:03 +01:00
|
|
|
|
2021-05-10 07:02:14 +02:00
|
|
|
This has a huge impact on the performance of running tests in GitHub Actions
|
2018-12-28 09:21:03 +01:00
|
|
|
CI; without these caches, the average test time would be several times
|
|
|
|
longer.
|
|
|
|
|
|
|
|
We have designed these caches carefully (they are also used in
|
|
|
|
production and the Zulip development environment) to ensure that each
|
|
|
|
is named by a hash of its dependencies and ubuntu distribution name,
|
|
|
|
so Zulip should always be using the same version of dependencies it
|
|
|
|
would have used had the cache not existed. In practice, bugs are
|
|
|
|
always possible, so be mindful of this possibility.
|
|
|
|
|
|
|
|
A consequence of this caching is that test jobs for branches which
|
|
|
|
modify `package.json`, `requirements/`, and other key dependencies
|
|
|
|
will be significantly slower than normal, because they won't get to
|
|
|
|
benefit from the cache.
|