zulip/docs/testing/continuous-integration.md

# Continuous integration (CI)

The Zulip server uses [GitHub Actions](https://docs.github.com/en/actions) for continuous
integration. GitHub Actions runs frontend, backend and end-to-end production
installer tests. This page documents useful tools and tips when using
GitHub Actions and debugging issues with it.

## Goals

The overall goal of our CI is to avoid regressions and minimize the
total time spent debugging Zulip.  We do that by trying to catch as
many possible future bugs as possible, while minimizing both latency
and false positives, both of which can waste a lot of developer time.
There are a few implications of this overall goal:

* If a test is failing nondeterministically in CI, we consider that to
be an urgent problem.
* If the tests become a lot slower, that is also an urgent problem.
* Everything we do in CI should also have a way to run it quickly
(under 1 minute, preferably under 3 seconds), in order to iterate fast
in development. Except when working on the CI configuration itself, a
developer should never have to repeatedly wait 10 minutes for a full CI
run to iteratively debug something.

## GitHub Actions

### Useful debugging tips and tools

* GitHub Actions stores timestamps for every line in the logs. They
are hidden by default; you can see them by toggling the `Show
timestamps` option in the menu on any job's log page.  (You can get
this sort of timestamp in a development environemnt by piping output
to `ts`).

* GitHub Actions runs on every branch you push on your Zulip fork.
This is helpful when debugging something complicated.

* You can also ssh into a container to debug failures.  SSHing into
the containers can be helpful, especially in rare cases where the
tests are passing in your computer but failing in the CI. There are
various
[Actions](https://github.com/marketplace?type=actions&query=debug+ssh)
available on GitHub Marketplace to help you SSH into a container. Use
whichever you find easiest to set up.

### Suites

We run multiple jobs during a GitHub Actions build to efficiently run
Zulip's various test suites, some of them multiple times because we
support multiple versions of the base OS. See the [Actions
tabs](https://github.com/zulip/zulip/actions) for full list of Actions
that we run.

Files which define GitHub workflows live in `.github/workflows` directory.
`zulip-ci.yml` is the main file where most of the tests are run.
`production-suite.yml` builds a Zulip release tarball, which is
then installed in a fresh container. Various Nagios and other
checks are run to confirm the installation worked.

`zulip-ci.yml` is designed to run our main test suites on all of our
supported platforms. Out of them, only one of them runs the frontend
tests, since `puppeteer` is slow and unlikely to catch issues that
depend on the version of the base OS and/or Python.

Our code for running the tests in CI lives under `tools/ci`; but that
logic is mostly thin wrappers around [Zulip's test
suites](../testing/testing.md) or production installer.

The `Legacy OS` tests are designed to ensure we give good error
messages when trying to upgrade Zulip servers running on very old base
OS versions with EOL Python versions that Zulip no longer supports.

### Configuration

The remaining details in this section are primarily relevant for doing
development on our CI system and/or provisioning process.

The first key of the job section is `docker`. The docker key specifies
the image GitHub Action should get from [Docker Hub][docker-hub] for running
the job. Once GitHub Action fetches the image from Docker Hub, it will spin
up a docker container. See [images](#images) section to know more about
the images we use in GitHub Action for testing.

After booting the container from the configured image, GitHub Action will
create the directory mentioned in `working_directory` and all the
steps are be run from here.

The `steps` section describes describes everything: fetching the Zulip
code, provisioning, fetching caught data, running tests and uploading
coverage reports. The steps with prefix `*` reference aliases, which
are defined in the `aliases` section at the top of the file.

### Images

GitHub Action tests are run in containers that are spun off from the images
maintained by Zulip team. The Dockerfiles for the various images can be
generated by running `./tools/ci/generate-dockerfiles`. This command
will generate the Dockerfiles of the three Ubuntu releases in
`./tools/ci/images/{release_name}` directories. Take a look at
`./tools/ci/images.yml` to see how the Dockerfiles for the three
releases differ from each other. To further generate images from the
Dockerfiles and upload it to Docker Hub follow the instructions in the
generated Dockerfiles.

### Performance optimizations

#### Caching

An important element of making GitHub Action perform effectively is caching
between jobs the various caches that live under `/srv/` in a Zulip
development or production environment.  In particular, we cache the
following:

* Python virtualenvs
* node_modules directories

This has a huge impact on the performance of running tests in GitHub Action
CI; without these caches, the average test time would be several times
longer.

We have designed these caches carefully (they are also used in
production and the Zulip development environment) to ensure that each
is named by a hash of its dependencies and ubuntu distribution name,
so Zulip should always be using the same version of dependencies it
would have used had the cache not existed.  In practice, bugs are
always possible, so be mindful of this possibility.

A consequence of this caching is that test jobs for branches which
modify `package.json`, `requirements/`, and other key dependencies
will be significantly slower than normal, because they won't get to
benefit from the cache.