When Github Actions run in Docker, the default pid 1 entrypoint is
`tail -f /dev/null`. PID 1 is responsible for propagating signals to
its children, and calling `waitpid()` on defunct processes; `tail`
does not do these things. This results in zombie processes piling up
inside the container, which is not an issue in most contexts.
However, it affects `start-stop-daemon`, which hangs when stopping
daemon processes, as they are never reaped. This appears in CI as
`/etc/init.d/supervisor restart` never being able to succeed.
Run the docker container with `--init`, which spawns a
`/sbin/docker-init` PID 1 to handle the job of an init process.
We convert the `clean-unused-caches` script to a
python file so we can run it in provision by importing it
instead of running the script, hence saving some time.
This ensures that we exercise the fact that the Zulip installer may be
unpacked to a directory that may not be world-readable.
bc45525369 fixed a recent regression in
this behavior that would have been caught by this commit.
Thumbor and tc-aws have been dragging their feet on Python 3 support
for years, and even the alphas and unofficial forks we’ve been running
don’t seem to be maintained anymore. Depending on these projects is
no longer viable for us.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
I have made `tools/setup/optimize-svg` do the SVG optimization
automatically rather than just telling you the command to run if they
need optimizing. This included adding a `--check` parameter to use in
CI to only check as we previously did rather than actually running the
optimization.
I have also made `tools/setup/optimize-svg` execute
`tools/setup/generate_integration_bots_avatars.py` once it has run the
optimization to ensure it is always ran.
This makes it one less command to run when creating an integration,
but also means that we catch instances where a PNG has just been
copied into the `static/images/integrations/bot_avatars` folder as the
only instance where this won't be run is if `optimize-svg` has not
been run which would be caught in CI.
Fixes#18183. Fixes#18184.
We had used 2>&1 to redirect stderr to stdout so it could be piped
into ts, but commit dd3cdd6ec5 (#17611)
removed ts, so we no longer need the redirection.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This helps us reduce time to update dependencies on every CI
build since the previous containers used to take about 1 minute.
`sudo` had a bug due to which we were not able to create directories.
See https://github.com/sudo-project/sudo/issues/42.
We used these directories to restore caches.
Upgrading the focal dependencies via this commit naturally fixes that
bug.
Fixes#17854
We support Debian as an OS for setting up the Zulip server. But the CI
does not run on pull request to test the setting up of the server on
Debian. Hence, add the check to CI.
GitHub Actions gives us 2 cpus (probably shared) to run the
jobs. Specifying 6 processes here doesn't make a difference
since both jobs run in around 5 minutes right now.
We basically move all the tests from backend and frontend test
files to zulip-ci workflow. This results in GitHub Actions
nicely displaying all the tests separately.
Timestamps are logged automatically by GitHub Actions and can be
made visible using log settings easily. Hence we remove the
unnecessary timestamps here to make the logs look much cleaner.
This prevents Zulip CI from eventually consuming large amounts of
storage on one's GitHub account.
I picked a longer retention period for the Puppeteer artifacts because
humans look at those; the production tarballs are unlikely to be used
10 minutes after the run completes as they are just for the next stage
fo the build; certainly 14 days seems ample for any debugging.
It only cancel previous runs on forks or for pull request builds
and does not run for zulip/zulip repo pushes. It finishes pretty
quickly (under 1 minute) so it fine to have it as a workflow
rather than to add a new step under every single job to cancel it's
previous runs.
The only downside to this is that GitHub creates a notification for
the cancelled job just like it does for failed jobs!!!
The hash keys were missing hash for package.json and yarn.lock
because they were not present since we don't do a full checkout
in this job. We fix this by sending over those files and generating
hashes from them.
I usally verify these cache keys by clicking the Restore <cache>
step dropdown menu and then clicking the Run ... dropdown menu again
to see the generated hash.
The "event log" in question was never useful in our test systems (and
hasn't been used for anything real since 2014). I'm not sure how we
ended up with in the CI configuration.
All the steps are same from circleci except two steps:
1. The 'Add permissions ...' step is Actions specific as explained
in comments.
2. The step that used upload-artifacts is Actions verison of
presist_to_workspace.
Finally, I should note the duplication in this and zulip-ci
workflow. There are three reason this is not a problem:
1. It will be messy to mush this into zulip-ci workflow only for
benefit of un-duplicating the env and cache restore steps.
2. We needs this on its own workflow if we want to only run it
when production related dependencies are updated.
3. I don't see us updating the duplicated steps between both
workflow. Circle CI config is prefect example for this; nothing
is changed except for adding or updating steps which are not
duplicated.
This change makes it so if focal backend job fails the bionic
backend and frontend jobs keeps running. Previously, it failed both
of the jobs if one failed. This is expected since typically matrix
is used to run sames tests on multiple versions and such but our use
case is bit more than that.
Since we already run this on every push we don't need to run it as a
cron job every week for no reason. While we are touching this code
block, we convert it to on: [push, pull_request] since the previous
format felt weird. It was only written that way because we had the
cron job declared there.