The job name is just the constant `production_build`. Renaming it to
have the OS in the key ensures that it is not shared across OS'es (for
instance between `4.x` and `main`, which are now bionic and buster,
respectively), and also allows it to share caches with the install
step, which uses the OS name in that place.
As a consequence:
• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
It should not use the configured zulip username, but should instead
pull from the login user (likely `nagios`), or an explicit alternate
provided PostgreSQL username. Failure to do so results in Nagios
failures because the `nagios` login does not have permissions to
authenticated the `zulip` PostgreSQL user.
This requires CI changes, as the install tests install as the `zulip`
login username, which allowed Nagios tests to pass previously; with
the custom database and username, however, they must be passed to
process_fts_updates explicitly when validating the install.
This tool helps catch common typos in code and documentation, which is
particularly useful for our many contributors who are not native
English speakers.
The config is based on the codespell that I ran in
https://github.com/zulip/zulip/pull/18535.
Production installs do not use the zilencer application, but the tests
do include it; as such, changes to any files which reference zilencer
are more likely to pass tests but fail production installs.
Run production tests when those files are changed.
We make a few adjustments:
* We now run full CI whenever pushing to master. It's cheap enough
that it's worth getting accurate signal.
* We now don't run production tests on PRs for changes to JavaScript/CSS
in static/ that don't also affect the webpack configuration.
* We sort the list of paths that trigger tests.
When Github Actions run in Docker, the default pid 1 entrypoint is
`tail -f /dev/null`. PID 1 is responsible for propagating signals to
its children, and calling `waitpid()` on defunct processes; `tail`
does not do these things. This results in zombie processes piling up
inside the container, which is not an issue in most contexts.
However, it affects `start-stop-daemon`, which hangs when stopping
daemon processes, as they are never reaped. This appears in CI as
`/etc/init.d/supervisor restart` never being able to succeed.
Run the docker container with `--init`, which spawns a
`/sbin/docker-init` PID 1 to handle the job of an init process.
We convert the `clean-unused-caches` script to a
python file so we can run it in provision by importing it
instead of running the script, hence saving some time.
This ensures that we exercise the fact that the Zulip installer may be
unpacked to a directory that may not be world-readable.
bc45525369 fixed a recent regression in
this behavior that would have been caught by this commit.
Thumbor and tc-aws have been dragging their feet on Python 3 support
for years, and even the alphas and unofficial forks we’ve been running
don’t seem to be maintained anymore. Depending on these projects is
no longer viable for us.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
I have made `tools/setup/optimize-svg` do the SVG optimization
automatically rather than just telling you the command to run if they
need optimizing. This included adding a `--check` parameter to use in
CI to only check as we previously did rather than actually running the
optimization.
I have also made `tools/setup/optimize-svg` execute
`tools/setup/generate_integration_bots_avatars.py` once it has run the
optimization to ensure it is always ran.
This makes it one less command to run when creating an integration,
but also means that we catch instances where a PNG has just been
copied into the `static/images/integrations/bot_avatars` folder as the
only instance where this won't be run is if `optimize-svg` has not
been run which would be caught in CI.
Fixes#18183. Fixes#18184.
We had used 2>&1 to redirect stderr to stdout so it could be piped
into ts, but commit dd3cdd6ec5 (#17611)
removed ts, so we no longer need the redirection.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This helps us reduce time to update dependencies on every CI
build since the previous containers used to take about 1 minute.
`sudo` had a bug due to which we were not able to create directories.
See https://github.com/sudo-project/sudo/issues/42.
We used these directories to restore caches.
Upgrading the focal dependencies via this commit naturally fixes that
bug.
Fixes#17854
We support Debian as an OS for setting up the Zulip server. But the CI
does not run on pull request to test the setting up of the server on
Debian. Hence, add the check to CI.
GitHub Actions gives us 2 cpus (probably shared) to run the
jobs. Specifying 6 processes here doesn't make a difference
since both jobs run in around 5 minutes right now.
We basically move all the tests from backend and frontend test
files to zulip-ci workflow. This results in GitHub Actions
nicely displaying all the tests separately.
Timestamps are logged automatically by GitHub Actions and can be
made visible using log settings easily. Hence we remove the
unnecessary timestamps here to make the logs look much cleaner.
This prevents Zulip CI from eventually consuming large amounts of
storage on one's GitHub account.
I picked a longer retention period for the Puppeteer artifacts because
humans look at those; the production tarballs are unlikely to be used
10 minutes after the run completes as they are just for the next stage
fo the build; certainly 14 days seems ample for any debugging.
It only cancel previous runs on forks or for pull request builds
and does not run for zulip/zulip repo pushes. It finishes pretty
quickly (under 1 minute) so it fine to have it as a workflow
rather than to add a new step under every single job to cancel it's
previous runs.
The only downside to this is that GitHub creates a notification for
the cancelled job just like it does for failed jobs!!!
The hash keys were missing hash for package.json and yarn.lock
because they were not present since we don't do a full checkout
in this job. We fix this by sending over those files and generating
hashes from them.
I usally verify these cache keys by clicking the Restore <cache>
step dropdown menu and then clicking the Run ... dropdown menu again
to see the generated hash.
The "event log" in question was never useful in our test systems (and
hasn't been used for anything real since 2014). I'm not sure how we
ended up with in the CI configuration.
All the steps are same from circleci except two steps:
1. The 'Add permissions ...' step is Actions specific as explained
in comments.
2. The step that used upload-artifacts is Actions verison of
presist_to_workspace.
Finally, I should note the duplication in this and zulip-ci
workflow. There are three reason this is not a problem:
1. It will be messy to mush this into zulip-ci workflow only for
benefit of un-duplicating the env and cache restore steps.
2. We needs this on its own workflow if we want to only run it
when production related dependencies are updated.
3. I don't see us updating the duplicated steps between both
workflow. Circle CI config is prefect example for this; nothing
is changed except for adding or updating steps which are not
duplicated.
This change makes it so if focal backend job fails the bionic
backend and frontend jobs keeps running. Previously, it failed both
of the jobs if one failed. This is expected since typically matrix
is used to run sames tests on multiple versions and such but our use
case is bit more than that.
Since we already run this on every push we don't need to run it as a
cron job every week for no reason. While we are touching this code
block, we convert it to on: [push, pull_request] since the previous
format felt weird. It was only written that way because we had the
cron job declared there.
This is a fine solution short-term until github implements the
yaml anchors support. The limitation of this method is that we
cannot re-use most of the steps again for production install test
builds.
Thanks, Anders for this solution.
Verifying everything is migrated correctly is a pain. This script
ensures everything is done correctly (previous commit message
contains explainations for the steps being ignored if; in case
of github-actions steps they are ignored because they are actions
specific):
"""
This script prints out the ignore steps first. Then
prints out each step of both circle and actions side-by-side.
One step is out of order for bionic but verfying correction is
still easier. Format:
Actions: Install dependencies
Circle CI: install dependencies
....
"""
import yaml
with open('.circleci/config.yml') as f:
circleci_config = yaml.safe_load(f)
with open('.github/workflows/zulip-ci.yml') as f:
actions_config = yaml.safe_load(f)
circle_bionic_steps = []
circle_focal_steps = []
actions_bionic_steps = []
actions_focal_steps = []
"""
We ignore casper artifact upload, save_cache, and
store_tests_reports steps.
"""
def get_circleci_steps(job, arr):
for step in circleci_config['jobs'][job]['steps']:
if isinstance(step, str):
arr.append(step)
continue
step_name = step.get('run', {}).get('name', False)
if not step_name:
if step.get('restore_cache'):
key = step['restore_cache']['keys'][0].split('.')[0]
step_name = f'<restore-cache> {key}'
elif step.get('store_artifacts', False):
destination = step['store_artifacts']['destination']
step_name = f'<store-artificats> {destination}'
if destination == 'casper':
\# This is no longer needed
print('Ignoring step:')
print(step)
print()
continue
else:
"""
We don't care about save_cache; github-actions
does this automatically, and store_tests_reports
is circelci timing specific.
"""
print('Ignoring step:')
print(step)
print()
continue
if step_name != 'On fail':
arr.append(step_name)
get_circleci_steps('bionic-backend-frontend', circle_bionic_steps)
get_circleci_steps('focal-backend', circle_focal_steps)
""" We ignore there steps specific to github-actions"""
for step in actions_config['jobs']['focal_bionic']['steps']:
BOTH_OS = 'BOTH_OS'
if_check = step.get('if', BOTH_OS)
step_name = step.get('name')
if step_name is None:
step_name = step['uses']
if (
step_name == 'Upgrade git for bionic' or
step_name == 'Add required permissions' or
step_name == 'Move test reports to var'
):
print('Ignoring step:')
print(step)
print()
"""These are github-actions specific; see comments"""
continue
if if_check == BOTH_OS:
actions_bionic_steps.append(step_name)
actions_focal_steps.append(step_name)
elif 'is_bionic' in if_check:
actions_bionic_steps.append(step_name)
else:
actions_focal_steps.append(step_name)
bionic = zip(circle_bionic_steps, actions_bionic_steps)
focal = zip(circle_focal_steps, actions_focal_steps)
print('Bionic steps:')
for (circle_step, actions_step) in bionic:
print(f'CircleCI: {circle_step}')
print(f'Actions: {actions_step}')
print()
print('Focal steps:')
for (circle_step, actions_step) in focal:
print(f'CircleCI: {circle_step}')
print(f'Actions: {actions_step}')
print()
Some noteable diffrence from circleci:
- We upgrade git to newer version (reason explained in comments)
- We set HOME to /home/github (also explained in comments)
- Adjust permissions (... comments)
- Minor changes to step names and cache keys.
- We don't need to port the save_cache steps they are done
automatically in actions. And, we did not port the
store_test_results step which is circleci specific.
- We didn't port the notify_failure step yet (see the TODO).
This file was generated by GitHub's code analysis tutorial; we were
just approved from their waitlist.
I deleted the part to run compilers as it is not relevant for us.