Commit Graph

60 Commits

Author SHA1 Message Date
Zixuan James Li a081428ad2 user_groups: Make locks required for updating user group memberships.
**Background**

User groups are expected to comply with the DAG constraint for the
many-to-many inter-group membership. The check for this constraint has
to be performed recursively so that we can find all direct and indirect
subgroups of the user group to be added.

This kind of check is vulnerable to phantom reads which is possible at
the default read committed isolation level because we cannot guarantee
that the check is still valid when we are adding the subgroups to the
user group.

**Solution**

To avoid having another transaction concurrently update one of the
to-be-subgroup after the recursive check is done, and before the subgroup
is added, we use SELECT FOR UPDATE to lock the user group rows.

The lock needs to be acquired before a group membership change is about
to occur before any check has been conducted.

Suppose that we are adding subgroup B to supergroup A, the locking protocol
is specified as follows:

1. Acquire a lock for B and all its direct and indirect subgroups.
2. Acquire a lock for A.

For the removal of user groups, we acquire a lock for the user group to
be removed with all its direct and indirect subgroups. This is the special
case A=B, which is still complaint with the protocol.

**Error handling**

We currently rely on Postgres' deadlock detection to abort transactions
and show an error for the users. In the future, we might need some
recovery mechanism or at least better error handling.

**Notes**

An important note is that we need to reuse the recursive CTE query that
finds the direct and indirect subgroups when applying the lock on the
rows. And the lock needs to be acquired the same way for the addition and
removal of direct subgroups.

User membership change (as opposed to user group membership) is not
affected. Read-only queries aren't either. The locks only protect
critical regions where the user group dependency graph might violate
the DAG constraint, where users are not participating.

**Testing**

We implement a transaction test case targeting some typical scenarios
when an internal server error is expected to happen (this means that the
user group view makes the correct decision to abort the transaction when
something goes wrong with locks).

To achieve this, we add a development view intended only for unit tests.
It has a global BARRIER that can be shared across threads, so that we
can synchronize them to consistently reproduce certain potential race
conditions prevented by the database locks.

The transaction test case lanuches pairs of threads initiating possibly
conflicting requests at the same time. The tests are set up such that exactly N
of them are expected to succeed with a certain error message (while we don't
know each one).

**Security notes**

get_recursive_subgroups_for_groups will no longer fetch user groups from
other realms. As a result, trying to add/remove a subgroup from another
realm results in a UserGroup not found error response.

We also implement subgroup-specific checks in has_user_group_access to
keep permission managing in a single place. Do note that the API
currently don't have a way to violate that check because we are only
checking the realm ID now.
2023-08-24 17:21:08 -07:00
Anders Kaseorg 124c5d02e5 ci: Restore commented clean_unused_caches.py invocation.
The comment logic doesn’t make sense.  Every build gets to write to
the caches; some builds do in fact add new items, and without
clean_unused_caches.py there’s no way for them to remove items.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-23 16:20:01 -07:00
Tim Abbott 396cedd0e8 ci: Reorder tests to run unique tests first.
As discussed in the comment, it doesn't really make sense for our 4
jobs that we run in parallel for different platforms to all start with
running the backend tests. While it's true that puppeteer will likely
fail if the backend doesn't run, and thus there's a mild prerequisite
relationship there, what is far more common is the node tests fail and
the user doesn't get that input for 10 minutes unnecessarily while all
the backend jobs run, and this change lets us avoid that.
2023-08-09 17:15:51 -07:00
Anders Kaseorg d926144e13 ci: Fix pnpm store path for GitHub Actions.
This would ordinarily be determined by running ‘pnpm store path’, but
pnpm is not installed yet at that point.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-31 13:23:06 -07:00
Anders Kaseorg 12310189ed install: Support Debian 12.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-18 11:52:22 -07:00
Anders Kaseorg 16dedb08fd ci: Fix matrix definition for tests job.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-18 11:52:22 -07:00
Anders Kaseorg 033f561d94 ci: Run pnpm dedupe --check.
New in pnpm 8.3.0, this replaces the yarn-deduplicate check that was
removed in commit 3a27b12a7d (#24731).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-04-25 22:26:04 -07:00
Anders Kaseorg 341f6173aa ci: Enable XML coverage report to fix Codecov uploads.
This was broken by commit 534754442a
(#22039).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-31 15:51:43 -07:00
Anders Kaseorg 3a27b12a7d dependencies: Switch to pnpm.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-20 15:48:29 -07:00
Anders Kaseorg 5a79ca251b check-database-compatibility: Drop .py from script name.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-03 18:02:37 -08:00
Anders Kaseorg c1675913a2 web: Move web app to ‘web’ directory.
Ever since we started bundling the app with webpack, there’s been less
and less overlap between our ‘static’ directory (files belonging to
the frontend app) and Django’s interpretation of the ‘static’
directory (files served directly to the web).

Split the app out to its own ‘web’ directory outside of ‘static’, and
remove all the custom collectstatic --ignore rules.  This makes it
much clearer what’s actually being served to the web, and what’s being
bundled by webpack.  It also shrinks the release tarball by 3%.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-23 16:04:17 -08:00
Anders Kaseorg 872f4b41c1 ci: Check that non-scripts aren’t marked executable.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-07 09:54:01 -08:00
Josh Klar a67ecc0f36 ci: Only report failures to CZO on branch pushes.
Targeted fix for regression introduced in #23719 wherein failure reports
were attempted for all CI failures, including those from forked pull
requests, which don't have access to Actions Secrets. Since undefined
Secrets are empty strings at interpolation time [^1], the underlying
`send-message` Action was being called with no API Key, causing a
failure in the failure handler.

This fix is, per discussion in both a comment on #23719 and later on CZO
[^2], prefered to restoring the prior guard against ZULIP_BOT_KEY being
an empty string that had been in the shell script as it is more explicit
in its intent.

[^1]: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-using-secrets

[^2]: https://chat.zulip.org/#narrow/stream/43-automated-testing/topic/all.20branches.20failing/near/1475246
2022-12-06 17:43:54 -05:00
Josh Klar 3a0620a40c tools: Reimplement CI failure script without using CircleCI endpoint.
Using curl to POST to the CircleCI workflow endpoint on CZO:

- Doesn't work on zulip/zulip@main (CZO runs a revert)
- Sets a bad example for other orgs
- Robs us of an opportunity to dogfood our own zulip/github-actions-zulip

Refactor the Actions workflows in this repo to report failure states
using the Zulip Action, and reimplement the related helper scripts in
Python, since they'd previously mostly shelled out to Python anyway.
2022-12-05 14:33:15 -05:00
Varun Sharma 6cdf2853ff
ci: Limit GitHub token permissions for workflows.
This limits the ability for an Action to do mischief with this token.

Fixes #22786.

Signed-off-by: Varun Sharma <varunsh@stepsecurity.io>
2022-08-29 17:12:55 -07:00
Vipul 35d56ea528
CI: Remove multiple hashFiles instances in a single step.
hashFiles supports passing multiple filenames, and using this feature results in 
much cleaner keys.

Fixes: #22796
2022-08-29 10:37:10 -07:00
Anders Kaseorg acff0879e7 ci: Avoid duplicate GitHub Actions runs for push, pull_request.
We’ve always been running CI on both push events and pull_request
events, which means it runs twice for commits that are pushed to a
pull request.

Filter the push events by branch name.  Add the workflow_dispatch
event in case developers want to manually run CI on some other branch
that isn’t a pull request.

https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-06 17:31:07 -07:00
Anders Kaseorg 27fa91066c ci: Update GitHub Actions dependencies.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-05 15:54:46 -07:00
Anders Kaseorg 4a11642cee ci: Replace cancel-previous-runs job with concurrency configuration.
Using ‘github.head_ref || github.run_id’ makes this only cancel
in-progress jobs for pull_request events.

https://docs.github.com/en/actions/using-jobs/using-concurrency

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-05 13:08:06 -07:00
Anders Kaseorg e952641013 install: Resupport Ubuntu 22.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-05-03 09:41:08 -07:00
Anders Kaseorg e8e0b045fc Revert "ci: Remove actions/cache@v2 steps from run due to failures."
This reverts commit ae24fe69ed.

The problem was fixed by GitHub.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-29 14:03:12 -07:00
Lauryn Menard ae24fe69ed
ci: Remove actions/cache@v2 steps from run due to failures.
Comments out the steps in 'Create cache directories' that use
`actions/cache@2` so that the CI and production build can pass
while Github support issue is processed.

See https://github.com/actions/cache/issues/794 for an upstream report.
2022-04-29 10:14:51 -07:00
Tim Abbott 0c385fe01b ci: Only run documentation/link tests on a single job.
As noted in ReadTheDocs, it's very unlikely that these documentation
tests will pass or fail depending on the server's OS.
2022-04-26 17:26:41 -07:00
Anders Kaseorg a543dcc8e3 Remove Debian 10 support.
As a consequence:

• Bump minimum supported Python version to 3.8.
• Move Vagrant environment to Ubuntu 20.04, which has Python 3.8.
• Move CI frontend tests to Ubuntu 20.04.
• Move production build test to Ubuntu 20.04.
• Move 3.4 upgrade test to Ubuntu 20.04.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-26 16:32:02 -07:00
Anders Kaseorg 3848050456 ci: Temporarily disable Ubuntu 22.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-03-02 16:00:35 -08:00
Anders Kaseorg 894a50b5c9 install: Support Ubuntu 22.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-25 14:49:07 -08:00
Anders Kaseorg 170f4745dc ci: Ban check-database-compatibility.py from using static/generated.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-24 14:31:24 -08:00
Anders Kaseorg b3260bd610 docs: Use Debian and Ubuntu version numbers over development codenames.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-23 12:04:24 -08:00
Anders Kaseorg b0ce4f1bce docs: Fix many spelling mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-07 18:51:06 -08:00
Anders Kaseorg a58a71ef43 Remove Ubuntu 18.04 support.
As a consequence:

• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-21 17:26:14 -08:00
rht a3a71487b0 CI: Add Codespell linter.
This tool helps catch common typos in code and documentation, which is
particularly useful for our many contributors who are not native
English speakers.

The config is based on the codespell that I ran in
https://github.com/zulip/zulip/pull/18535.
2021-10-27 16:49:30 -07:00
Anders Kaseorg 9cf5a03f2a ci: Migrate to new Codecov uploader.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-10-14 17:56:52 -07:00
Anders Kaseorg a88ace988a ci: Remove /__w permission twiddling.
Commit 9f2ac49fb3 (#19963) should make
this unnecessary.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-10-14 13:24:30 -07:00
Ganesh Pawar dee20f0dbf node_tests: Run node tests in parallel.
Fixes #9471.
2021-09-10 14:18:31 -07:00
Alex Vandiver d78723b6e8 ci: Update outdated comments, documentation and gitignore.
Use of `Dockerfile.template` and generated `tools/ci/images/` was
removed in 16067bc4fc.
2021-07-22 14:09:01 -07:00
Riken Shah 4f54e15993 refactor: Convert `clean-unused-caches` to`clean_unused_caches.py`.
We convert the `clean-unused-caches` script to a
python file so we can run it in provision by importing it
instead of running the script, hence saving some time.
2021-06-12 07:28:16 -07:00
Anders Kaseorg 405bc8dabf requirements: Remove Thumbor.
Thumbor and tc-aws have been dragging their feet on Python 3 support
for years, and even the alphas and unofficial forks we’ve been running
don’t seem to be maintained anymore.  Depending on these projects is
no longer viable for us.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-06 20:07:32 -07:00
Adam Birds e27268837b tools: Have `optimize-svg` do the optimization automatiically.
I have made `tools/setup/optimize-svg` do the SVG optimization
automatically rather than just telling you the command to run if they
need optimizing. This included adding a `--check` parameter to use in
CI to only check as we previously did rather than actually running the
optimization.

I have also made `tools/setup/optimize-svg` execute
`tools/setup/generate_integration_bots_avatars.py` once it has run the
optimization to ensure it is always ran.

This makes it one less command to run when creating an integration,
but also means that we catch instances where a PNG has just been
copied into the `static/images/integrations/bot_avatars` folder as the
only instance where this won't be run is if `optimize-svg` has not
been run which would be caught in CI.

Fixes #18183. Fixes #18184.
2021-04-19 10:16:54 -07:00
Gaurav Pandey 1bdcb11543 ci: Run zulip backend test suite for Debian bullseye.
This also verifies the Zulip codebase's Python 3.9 support.
2021-04-15 21:38:31 -07:00
Alex Vandiver 0023d561dd ci: Switch to hosting the CI images under Zulip on Dockerhub. 2021-03-31 16:54:34 -07:00
Anders Kaseorg 056b715765 ci: Remove 2>&1 redirection.
We had used 2>&1 to redirect stderr to stdout so it could be piped
into ts, but commit dd3cdd6ec5 (#17611)
removed ts, so we no longer need the redirection.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-03-29 14:52:06 -07:00
Aman Agrawal 9c4d9dbaf1 ci: Regenerate bionic and focal containers.
This helps us reduce time to update dependencies on every CI
build since the previous containers used to take about 1 minute.

`sudo` had a bug due to which we were not able to create directories.
See https://github.com/sudo-project/sudo/issues/42.
We used these directories to restore caches.
Upgrading the focal dependencies via this commit naturally fixes that
bug.

Fixes #17854
2021-03-29 12:26:56 +05:30
Aman Agrawal e0ed9cc605 ci: Remove CircleCI workaround for buggy CPU count allocation.
GitHub Actions gives us 2 cpus (probably shared) to run the
jobs. Specifying 6 processes here doesn't make a difference
since both jobs run in around 5 minutes right now.
2021-03-16 15:11:26 -07:00
Aman Agrawal f2a137f863 github_actions: Remove Codecov workaround.
Codecov has released the new version which fixes the find error.
Followup from 6a357ea114
2021-03-16 15:11:21 -07:00
Aman Agrawal 76c69b943c github_actions: Explode backend and frontend tests.
We basically move all the tests from backend and frontend test
files to zulip-ci workflow. This results in GitHub Actions
nicely displaying all the tests separately.
2021-03-16 15:11:21 -07:00
Aman Agrawal dd3cdd6ec5 github_actions: Stop logging timestamp.
Timestamps are logged automatically by GitHub Actions and can be
made visible using log settings easily. Hence we remove the
unnecessary timestamps here to make the logs look much cleaner.
2021-03-16 15:11:21 -07:00
Alex Vandiver 2dc0662a50 ci: Upload puppeteer artifacts on failure.
Storing the puppeteer artifacts is useful for debugging failures in
CI.

Confusingly, `if: ${{ something }}` does not work out to be true like
`if: ${{ always() && something }}` does; the former has a silent
`success()` built into it[1]:

> If your if expression does not contain any of the status functions
> it will automatically result with success().

[1] https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#job-status-check-functions
2021-03-10 12:00:47 -08:00
Aman Agrawal 80268c52df ci: Notify in zulip when a build fails in GitHub Actions.
We use the circleci integration which already has a nice setup
for sending messages when triggered to send the build failure
notification.
2021-02-26 08:29:56 -08:00
Tim Abbott b6ec66e972 github: Enable retention periods for uploaded artifacts.
This prevents Zulip CI from eventually consuming large amounts of
storage on one's GitHub account.

I picked a longer retention period for the Puppeteer artifacts because
humans look at those; the production tarballs are unlikely to be used
10 minutes after the run completes as they are just for the next stage
fo the build; certainly 14 days seems ample for any debugging.
2020-11-03 16:36:26 -08:00
Alex Vandiver 2b0bbbb882 tools: Rename postgres to postgresql in tool names. 2020-10-28 11:57:02 -07:00