Commit Graph

120 Commits

Author SHA1 Message Date
Zixuan James Li a081428ad2 user_groups: Make locks required for updating user group memberships.
**Background**

User groups are expected to comply with the DAG constraint for the
many-to-many inter-group membership. The check for this constraint has
to be performed recursively so that we can find all direct and indirect
subgroups of the user group to be added.

This kind of check is vulnerable to phantom reads which is possible at
the default read committed isolation level because we cannot guarantee
that the check is still valid when we are adding the subgroups to the
user group.

**Solution**

To avoid having another transaction concurrently update one of the
to-be-subgroup after the recursive check is done, and before the subgroup
is added, we use SELECT FOR UPDATE to lock the user group rows.

The lock needs to be acquired before a group membership change is about
to occur before any check has been conducted.

Suppose that we are adding subgroup B to supergroup A, the locking protocol
is specified as follows:

1. Acquire a lock for B and all its direct and indirect subgroups.
2. Acquire a lock for A.

For the removal of user groups, we acquire a lock for the user group to
be removed with all its direct and indirect subgroups. This is the special
case A=B, which is still complaint with the protocol.

**Error handling**

We currently rely on Postgres' deadlock detection to abort transactions
and show an error for the users. In the future, we might need some
recovery mechanism or at least better error handling.

**Notes**

An important note is that we need to reuse the recursive CTE query that
finds the direct and indirect subgroups when applying the lock on the
rows. And the lock needs to be acquired the same way for the addition and
removal of direct subgroups.

User membership change (as opposed to user group membership) is not
affected. Read-only queries aren't either. The locks only protect
critical regions where the user group dependency graph might violate
the DAG constraint, where users are not participating.

**Testing**

We implement a transaction test case targeting some typical scenarios
when an internal server error is expected to happen (this means that the
user group view makes the correct decision to abort the transaction when
something goes wrong with locks).

To achieve this, we add a development view intended only for unit tests.
It has a global BARRIER that can be shared across threads, so that we
can synchronize them to consistently reproduce certain potential race
conditions prevented by the database locks.

The transaction test case lanuches pairs of threads initiating possibly
conflicting requests at the same time. The tests are set up such that exactly N
of them are expected to succeed with a certain error message (while we don't
know each one).

**Security notes**

get_recursive_subgroups_for_groups will no longer fetch user groups from
other realms. As a result, trying to add/remove a subgroup from another
realm results in a UserGroup not found error response.

We also implement subgroup-specific checks in has_user_group_access to
keep permission managing in a single place. Do note that the API
currently don't have a way to violate that check because we are only
checking the realm ID now.
2023-08-24 17:21:08 -07:00
Anders Kaseorg 124c5d02e5 ci: Restore commented clean_unused_caches.py invocation.
The comment logic doesn’t make sense.  Every build gets to write to
the caches; some builds do in fact add new items, and without
clean_unused_caches.py there’s no way for them to remove items.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-23 16:20:01 -07:00
Tim Abbott 396cedd0e8 ci: Reorder tests to run unique tests first.
As discussed in the comment, it doesn't really make sense for our 4
jobs that we run in parallel for different platforms to all start with
running the backend tests. While it's true that puppeteer will likely
fail if the backend doesn't run, and thus there's a mild prerequisite
relationship there, what is far more common is the node tests fail and
the user doesn't get that input for 10 minutes unnecessarily while all
the backend jobs run, and this change lets us avoid that.
2023-08-09 17:15:51 -07:00
Anders Kaseorg d926144e13 ci: Fix pnpm store path for GitHub Actions.
This would ordinarily be determined by running ‘pnpm store path’, but
pnpm is not installed yet at that point.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-31 13:23:06 -07:00
Anders Kaseorg a4d897c42b ci: Remove unused pnpm cache from production install test.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-31 13:23:06 -07:00
Anders Kaseorg 12310189ed install: Support Debian 12.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-18 11:52:22 -07:00
Anders Kaseorg 16dedb08fd ci: Fix matrix definition for tests job.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-18 11:52:22 -07:00
Alex Vandiver 65c552e55a ci: Rename focal job to describe all it does. 2023-05-05 13:35:32 -07:00
Alex Vandiver 4a9424b207 ci: Stop trying to pull out the default extra-args.
This was preventing the 20.04 install from actually happening, as
GitHub was folding the two into one configuration.
2023-05-05 13:35:32 -07:00
Anders Kaseorg 033f561d94 ci: Run pnpm dedupe --check.
New in pnpm 8.3.0, this replaces the yarn-deduplicate check that was
removed in commit 3a27b12a7d (#24731).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-04-25 22:26:04 -07:00
Anders Kaseorg 341f6173aa ci: Enable XML coverage report to fix Codecov uploads.
This was broken by commit 534754442a
(#22039).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-31 15:51:43 -07:00
Anders Kaseorg 3a27b12a7d dependencies: Switch to pnpm.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-20 15:48:29 -07:00
Anders Kaseorg 8102556578 ci: Run generate-failure-message from the right path.
This was broken by commit 3a0620a40c
(#23719).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-09 13:20:24 -08:00
Anders Kaseorg 5a79ca251b check-database-compatibility: Drop .py from script name.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-03 18:02:37 -08:00
Anders Kaseorg 0ef8e88b17 webpack: Move webpack configuration to web.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-24 06:35:58 -08:00
Anders Kaseorg c1675913a2 web: Move web app to ‘web’ directory.
Ever since we started bundling the app with webpack, there’s been less
and less overlap between our ‘static’ directory (files belonging to
the frontend app) and Django’s interpretation of the ‘static’
directory (files served directly to the web).

Split the app out to its own ‘web’ directory outside of ‘static’, and
remove all the custom collectstatic --ignore rules.  This makes it
much clearer what’s actually being served to the web, and what’s being
bundled by webpack.  It also shrinks the release tarball by 3%.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-23 16:04:17 -08:00
Anders Kaseorg 7cafbefdef ci: Reduce production suite tarball retention from 14 days to 1.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-16 10:15:11 -05:00
Mateusz Mandera 9aacf76f0d do: Install pynacl in the oneclick job.
This is now a required dependency.
2023-01-24 10:33:41 -08:00
Anders Kaseorg 872f4b41c1 ci: Check that non-scripts aren’t marked executable.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-07 09:54:01 -08:00
Josh Klar a67ecc0f36 ci: Only report failures to CZO on branch pushes.
Targeted fix for regression introduced in #23719 wherein failure reports
were attempted for all CI failures, including those from forked pull
requests, which don't have access to Actions Secrets. Since undefined
Secrets are empty strings at interpolation time [^1], the underlying
`send-message` Action was being called with no API Key, causing a
failure in the failure handler.

This fix is, per discussion in both a comment on #23719 and later on CZO
[^2], prefered to restoring the prior guard against ZULIP_BOT_KEY being
an empty string that had been in the shell script as it is more explicit
in its intent.

[^1]: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-using-secrets

[^2]: https://chat.zulip.org/#narrow/stream/43-automated-testing/topic/all.20branches.20failing/near/1475246
2022-12-06 17:43:54 -05:00
Josh Klar 3a0620a40c tools: Reimplement CI failure script without using CircleCI endpoint.
Using curl to POST to the CircleCI workflow endpoint on CZO:

- Doesn't work on zulip/zulip@main (CZO runs a revert)
- Sets a bad example for other orgs
- Robs us of an opportunity to dogfood our own zulip/github-actions-zulip

Refactor the Actions workflows in this repo to report failure states
using the Zulip Action, and reimplement the related helper scripts in
Python, since they'd previously mostly shelled out to Python anyway.
2022-12-05 14:33:15 -05:00
Alex Vandiver 1ad26a2a9a ci: Test upgrades from Zulip Server 6.0. 2022-11-28 20:21:28 -08:00
Alex Vandiver 63d2565467 ci: Do not pre-install rabbitmq-server in Docker images.
Before Zulip 4.9, the Zulip install process left any already-installed
rabbitmq with whatever nodename it had previously configured.  Wince
this encodes the name of the host when it was installed, this does not
function well with containers.

Leave rabbitmq-server uninstalled, which lets the Zulip installation
process set the nodename to `localhost`, which ensures that it is
usable across container restarts.
2022-10-25 14:53:32 -04:00
Anders Kaseorg a8d72115eb ci: Fix custom database name test.
Caught by actionlint.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-08-30 17:33:37 -07:00
Varun Sharma 6cdf2853ff
ci: Limit GitHub token permissions for workflows.
This limits the ability for an Action to do mischief with this token.

Fixes #22786.

Signed-off-by: Varun Sharma <varunsh@stepsecurity.io>
2022-08-29 17:12:55 -07:00
Vipul 35d56ea528
CI: Remove multiple hashFiles instances in a single step.
hashFiles supports passing multiple filenames, and using this feature results in 
much cleaner keys.

Fixes: #22796
2022-08-29 10:37:10 -07:00
Alex Vandiver f8e2d652e1 ci: Test upgrades from the minimum of each major version, not the max. 2022-07-16 10:43:40 -07:00
Anders Kaseorg e8283b37b4 ci: Limit CodeQL analysis with the same branches for push, pull_request.
Silences “Warning: 1 issue was detected with this workflow: Please
make sure that every branch in on.pull_request is also in on.push so
that Code Scanning can compare pull requests against the state of the
base branch.”

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-07 14:51:51 -07:00
Anders Kaseorg acff0879e7 ci: Avoid duplicate GitHub Actions runs for push, pull_request.
We’ve always been running CI on both push events and pull_request
events, which means it runs twice for commits that are pushed to a
pull request.

Filter the push events by branch name.  Add the workflow_dispatch
event in case developers want to manually run CI on some other branch
that isn’t a pull request.

https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-06 17:31:07 -07:00
Anders Kaseorg 27fa91066c ci: Update GitHub Actions dependencies.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-05 15:54:46 -07:00
Anders Kaseorg 4a11642cee ci: Replace cancel-previous-runs job with concurrency configuration.
Using ‘github.head_ref || github.run_id’ makes this only cancel
in-progress jobs for pull_request events.

https://docs.github.com/en/actions/using-jobs/using-concurrency

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-05 13:08:06 -07:00
Alex Vandiver 91379fd67e ci: Update upgrade test to 5.3, from 5.2. 2022-06-21 17:40:33 -07:00
Alex Vandiver bf562f8fff ci: Update upgrade test to 5.2, from 5.1. 2022-05-04 11:37:15 -07:00
Anders Kaseorg e952641013 install: Resupport Ubuntu 22.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-05-03 09:41:08 -07:00
Anders Kaseorg e8e0b045fc Revert "ci: Remove actions/cache@v2 steps from run due to failures."
This reverts commit ae24fe69ed.

The problem was fixed by GitHub.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-29 14:03:12 -07:00
Lauryn Menard ae24fe69ed
ci: Remove actions/cache@v2 steps from run due to failures.
Comments out the steps in 'Create cache directories' that use
`actions/cache@2` so that the CI and production build can pass
while Github support issue is processed.

See https://github.com/actions/cache/issues/794 for an upstream report.
2022-04-29 10:14:51 -07:00
Tim Abbott 0c385fe01b ci: Only run documentation/link tests on a single job.
As noted in ReadTheDocs, it's very unlikely that these documentation
tests will pass or fail depending on the server's OS.
2022-04-26 17:26:41 -07:00
Anders Kaseorg a543dcc8e3 Remove Debian 10 support.
As a consequence:

• Bump minimum supported Python version to 3.8.
• Move Vagrant environment to Ubuntu 20.04, which has Python 3.8.
• Move CI frontend tests to Ubuntu 20.04.
• Move production build test to Ubuntu 20.04.
• Move 3.4 upgrade test to Ubuntu 20.04.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-26 16:32:02 -07:00
Alex Vandiver e2a3fe0930 ci: Test upgrades from 3.x, 4.x and 5.x. 2022-04-08 17:10:03 -07:00
Alex Vandiver d150236217 ci: Test upgrades from 4.11. 2022-03-15 16:00:02 -07:00
Anders Kaseorg 3848050456 ci: Temporarily disable Ubuntu 22.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-03-02 16:00:35 -08:00
Alex Vandiver 62f4f3435f ci: Test upgrades from 4.10. 2022-02-25 16:28:33 -08:00
Anders Kaseorg 894a50b5c9 install: Support Ubuntu 22.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-25 14:49:07 -08:00
Anders Kaseorg 170f4745dc ci: Ban check-database-compatibility.py from using static/generated.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-24 14:31:24 -08:00
Anders Kaseorg b3260bd610 docs: Use Debian and Ubuntu version numbers over development codenames.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-23 12:04:24 -08:00
Anders Kaseorg b0ce4f1bce docs: Fix many spelling mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-07 18:51:06 -08:00
Alex Vandiver 2fc156e556 ci: Cache with the OS name, not the job name.
The job name is just the constant `production_build`.  Renaming it to
have the OS in the key ensures that it is not shared across OS'es (for
instance between `4.x` and `main`, which are now bionic and buster,
respectively), and also allows it to share caches with the install
step, which uses the OS name in that place.
2022-01-24 14:29:49 -08:00
Anders Kaseorg a58a71ef43 Remove Ubuntu 18.04 support.
As a consequence:

• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-21 17:26:14 -08:00
Anders Kaseorg d035efd467 ci: Test upgrade-postgresql on Ubuntu 20.04.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-21 17:26:14 -08:00
Alex Vandiver 71b56f7c1c puppet: process_fts_updates connects as nagios (or provided username).
It should not use the configured zulip username, but should instead
pull from the login user (likely `nagios`), or an explicit alternate
provided PostgreSQL username.  Failure to do so results in Nagios
failures because the `nagios` login does not have permissions to
authenticated the `zulip` PostgreSQL user.

This requires CI changes, as the install tests install as the `zulip`
login username, which allowed Nagios tests to pass previously; with
the custom database and username, however, they must be passed to
process_fts_updates explicitly when validating the install.
2021-12-14 14:48:53 -08:00