Commit Graph

83 Commits

Author SHA1 Message Date
Mateusz Mandera f8616fa013 analytics: Send ZULIP_MERGE_BASE to the bouncer. 2024-06-23 07:44:11 -07:00
Alex Vandiver 50c3dd88e6 models: Migrate ids of all non-Message-related tables to bigint.
Migrate all `ids` of anything which does not have a foreign key from
the Message or UserMessage table (and would thus require walking
those) to be `bigint`.  This is done by removing explicit
`BigAutoField`s, trading them for explicit `AutoField`s on the tables
to not be migrated, while updating `DEFAULT_AUTO_FIELD` to the new
default.

In general, the tables adjusted in this commit are small tables -- at
least compared to Messages and UserMessages.

Many-to-many tables without their own model class are adjusted by a
custom Operation, since they do not automatically pick up migrations
when `DEFAULT_AUTO_FIELD` changes[^1].

Note that this does multiple scans over tables to update foreign
keys[^2].  Large installs may wish to hand-optimize this using the
output of `./manage.py sqlmigrate` to join multiple `ALTER TABLE`
statements into one, to speed up the migration.  This is unfortunately
not possible to do generically, as constraint names may differ between
installations.

This leaves the following primary keys as non-`bigint`:
- `auth_group.id`
- `auth_group_permissions.id`
- `auth_permission.id`
- `django_content_type.id`
- `django_migrations.id`
- `otp_static_staticdevice.id`
- `otp_static_statictoken.id`
- `otp_totp_totpdevice.id`
- `two_factor_phonedevice.id`
- `zerver_archivedmessage.id`
- `zerver_client.id`
- `zerver_message.id`
- `zerver_realm.id`
- `zerver_recipient.id`
- `zerver_userprofile.id`

[^1]: https://code.djangoproject.com/ticket/32674
[^2]: https://code.djangoproject.com/ticket/24203
2024-06-05 11:48:27 -07:00
Alex Vandiver 4f4725f810 analytics: Migrate models' id columns to bigint.
This helps prevent wraparound on exceedingly large and old installs,
particularly Zulip Cloud.  These are relatively simple migrations
since they are not referenced by any other tables; however, they are
quite large, and are actively used from Django by running servers,
making this not a migration which is possible to run without stopping
the server.

Use the escape hatch in the previous commit to temporarily pause
analytics writes while the migration happens.  This should make the
migration transparent to users, at the small cost of an artificial dip
in statistics (specifically, to push notification counts, and unread
message counts) while the migration runs.
2024-06-05 11:48:27 -07:00
Alex Vandiver b557297dd2 zilencer: Drop data which is no longer sent by remote servers. 2024-06-03 12:35:35 -07:00
Alex Vandiver 7607969f27 zilencer: Add a unique index on RemoteRealm counts with RemoteRealm objects. 2024-05-06 16:34:01 -07:00
Anders Kaseorg 570f3dd447 python: Reformat with Ruff formatter.
https://docs.astral.sh/ruff/formatter/

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-29 17:07:16 -08:00
Lauryn Menard e5c92a1ee6 remote-billing: Add index to RemoteRealmAuditLog for billing events.
When profiling the database queries for the remote support view,
getting the user counts for remote realms was identifed as an
expensive query. Adds an index on RemoteRealmAuditLog to improve
this relatively common query for remote billing information.
2024-02-22 16:30:06 -08:00
Lauryn Menard 47a5459637 zilencer: Add index on RemoteInstallationCount for remote activity.
When profiling the database query in `remote_activity.py`,
push_forwarded_count was identified as an expensive part of
the overall work. Adds an index on RemoteInstallationCount
so this is more efficient.
2024-02-01 12:01:16 -08:00
Mateusz Mandera cbfbdd7337 zilencer: Add last_request_datetime to RemoteRealm + RemoteZulipServer.
For the RemoteRealm case, we can only set this in endpoints where the
remote server sends us the realm_uuid. So we're missing that for the
endpoints:

- remotes/push/unregister and remotes/push/unregister/all
- remotes/push/test_notification

This should be added in a follow-up commit.
2024-01-05 13:09:09 -08:00
Mateusz Mandera fb5137f8b5 zilencer: Handle deleted realms nicely at server/analytics. 2023-12-15 09:18:26 -08:00
Tim Abbott 1757b88760 billing: Offer release announcement subscriptions.
Also avoid prompting for full name time more than once.
Adds TOS version field to Remote server user.

Co-authored-by: Karl Stolley <karl@zulip.com>
Co-authored-by: Aman Agrawal <amanagr@zulip.com>
2023-12-14 10:51:16 -08:00
Mateusz Mandera 651590c49a remote_billing: Store acting users in remote user audit logs. 2023-12-14 08:11:04 -08:00
Tim Abbott 6308e07e53 billing: Standardize remote server plan type IDs.
This will likely save us at least one headache.
2023-12-13 16:40:44 -08:00
Aman Agrawal b4e4ca14d5 models: Store `is_system_bot_realm` information for `RemoteRealm`.
This will help us filter out system bot realm and control
feature access to it.
2023-12-11 13:23:49 -08:00
Mateusz Mandera c800951966 remote_billing: Add some useful fields to Remote...User models.
These are useful for auditing and follow what we have for UserProfile.
And is_active will be used in the future when we add user deactivation.
2023-12-11 09:39:24 -08:00
Mateusz Mandera 423aebf98e remote_billing: Implement confirmation flow for RemoteRealm auth.
The way the flow goes now is this:
1. The user initiaties login via "Billing" in the gear menu.
2. That takes them to `/self-hosted-billing/` (possibly with a
   `next_page` param if we use that for some gear menu options).
3. The server queries the bouncer to give the user a link with a signed
   access token.
4. The user is redirected to that link (on `selfhosting.zulipchat.com`).
Now we have two cases, either the user is logging in for the first time
and already did in the past.
If this is the first time, we have:
5. The user is asked to fill in their email in a form that's shown,
   pre-filled with the value provided inside the signed access token.
   They POST this to the next endpoint.
6. The next endpoint sends a confirmation email to that address and asks
   the user to go check their email.
7. The user clicks the link in their email is taken to the
   from_confirmation endpoint.
8. Their initial RemoteBillingUser is created, a new signed link like in
   (3) is generated and they're transparently taken back to (4),
   where now that they have a RemoteBillingUser, they're handled
   just like a user who already logged in before:
If the user already logged in before, they go straight here:
9. "Confirm login" page - they're shown their information (email and
   full_name), can update
   their full name in the form if they want. They also accept ToS here
   if necessary. They POST this form back to
   the endpoint and finally have a logged in session.
10. They're redirected to billing (or `next_page`) now that they have
    access.
2023-12-10 16:15:28 -08:00
Anders Kaseorg f86becfc94 remote_server: Send API feature level along with Zulip version.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-09 12:01:22 -08:00
Mateusz Mandera abdfdeffe4 remote_billing: Implement confirmation flow for legacy servers.
For the last form (with Full Name and ToS consent field), this pretty
shamelessly re-uses and directly renders the
corporate/remote_realm_billing_finalize_login_confirmation.html
template. That's probably good in terms of re-use, but calls for a
clean-up commit that will generalize the name of this template and the
classes/ids in the HTML.
2023-12-08 23:49:10 -08:00
Prakhar Pratyush ed9b0d330d stripe: Raise 'MissingDataError' while fetching license count.
If the RemoteRealmAuditLog has stale data, it means the server
stopped or never uploaded data. We raise MissingDataError in such
cases when a user action led to calculating licenses count from
stale data.
2023-12-08 12:58:21 -08:00
Tim Abbott 5e721f4605 migrations: Add recently added indexes concurrently. 2023-12-05 18:22:23 -08:00
Mateusz Mandera d631c76747 zilencer: Add some indexes on Remote* models.
These are for making fix_remote_realm_foreign_keys more efficient.
2023-12-05 16:49:00 -08:00
Mateusz Mandera 250b52e3dc remote_billing: Add a "confirm login" page in RemoteRealm auth flow. 2023-12-05 11:34:57 -08:00
Mateusz Mandera 7f33d6f0ea zilencer: Tie RemotePushDeviceToken to RemoteRealm at registration.
This consists of the following pieces:
1. Makes servers using the bouncer send realm_uuid in requests for token
   registration. (Sidenote: realm_uuid is already sent in the "send
   notification" codepath as of
   48db4bf854)
2. This allows the bouncer to tie RemotePushDeviceToken to the
   RemoteRealm with matching realm_uuid at registration time.
3. Introduce handling of some potential weird edge cases around the
   realm_uuid and RemoteRealm objects in get_remote_realm_helper.
2023-12-03 09:51:45 -08:00
Aman Agrawal 4d60c3a96c models: Allow realm_id to be blank.
We cannot provide realm_id for some remote session logs.
2023-11-30 11:22:19 -08:00
Aman Agrawal 2795f11e3f models: Add org_type to RemoteZulipServer.
This is required to save sponsorship data for remote servers.
2023-11-29 19:04:32 -08:00
Mateusz Mandera 9b1a495e2c zilencer: Sync name and authentication_methods on RemoteRealm. 2023-11-29 15:54:38 -08:00
Alex Vandiver 98b68d7034 zilencer: Remove duplicates before adding unique indexes.
The recent #27818 naïvely added unique indexes, despite there being a
large number of existing violations.  This makes the migration
impossible to deploy.

Update the migration to de-duplicate rows, dropping all but the
first-by-id of each unique set.  This is equivalent to what
dd954749be does with `ignore_conflicts`.  We update the migration,
rather than making a new one, as any server which has somehow
successfully applied the migration apparently did not need to
de-duplicate anything.
2023-11-28 15:01:10 -08:00
Mateusz Mandera 02d5740f0f remote_realm: Add syncing of org_type. 2023-11-28 14:41:16 -08:00
Alex Vandiver 150c64ddd0 zilencer: Enforce uniqueness of server_id + remote_id.
This was previously just an index (not a unique one).  Enforce this
data constraint.
2023-11-28 09:46:48 -08:00
Alex Vandiver 49263ba69f migrations: Keep the existing constraints until the new ones are made.
This removes a window where more violations could enter, and also a
period where indexes which may be useful are lacking.
2023-11-21 21:02:37 -05:00
Alex Vandiver ae836ae007 zilencer: Apply partial unique constraints for null subgroups.
This applies f299f31340 but for the push bouncer receiving side.
This is particularly important as we start relying on the unique
constraints, via `ON CONFLICT ... IGNORE`, in subsequent commits.

Fixes: #12362.
2023-11-21 11:44:55 -08:00
Alex Vandiver 9bc41ca040 zilencer: Store the last-reported server version when storing analytics.
Servers since 216d2ec1bf (version 2.0.0)
have submitted this, but we have never stored it.
2023-11-20 14:36:27 -08:00
Mateusz Mandera 48db4bf854 counts: Add new mobile_pushes RemoteRealmCount stats.
This requires a bit of complexity to avoid a name collision in
COUNT_STATS with the RemoteInstallationCount stats with the same name.
2023-11-10 16:09:11 -08:00
Mateusz Mandera 1312c7ccd7 zilencer: Add mechanism to update RemoteRealm when Realm is changed.
This requires a migration to allow RemoteRealmAuditLog.remote_id to be
NULL, and to add a RemoteRealmAuditLog.remote_realm.
2023-11-08 15:54:22 -08:00
Mateusz Mandera 76e0511481 zilencer: Add new model RemoteRealm and send the data to the bouncer.
Add the new model for recording basic information about Realms on remote
server, to go with the other analytics data. Also adds necessary changes
to the bouncer endpoint and the send_analytics_to_push_bouncer()
function to submit such Realm information.
2023-11-08 15:54:22 -08:00
Greg Price f109e3b598 push_notifs: Backfill ios_app_id on bouncer. 2023-11-07 16:19:42 -08:00
Mateusz Mandera 2ecd7abc0d zilencer: Make BaseRemoteCount.remote_id field nullable. 2023-11-01 17:26:10 -07:00
Mateusz Mandera 3cbb651942 zilencer: Remove index on RemoteInstallationCount.remote_id.
As in 902498ec4f, we shouldn't need an
index on remote_id alone - only (server_id, remote_id) together.
2023-10-20 10:07:06 -07:00
Alex Vandiver 902498ec4f zilencer: Update remoterealm indexes.
There is no reason to have an index on just `realm_id` or `remote_id`,
as those values mean nothing outside of the scope of a specific
`server_id`.  Remove those never-used single-column indexes from the
two tables that have them.

By contrast, the pair of `server_id` and `remote_id` is quite useful
and specific -- it is a unique pair, and every POST of statistics from
a remote host requires looking up the highest `remote_id` for a given
`server_id`, which (without this index) is otherwise a quite large
scan.

Add a unique constraint, which (in PostgreSQL) is implemented as a
unique index.
2023-09-14 09:30:16 -07:00
Anders Kaseorg 0ce6dcb905 mypy: Upgrade mypy from 1.4.1 to 1.5.1.
_default_manager is the same as objects on most of our models. But
when a model class is stored in a variable, the type system doesn’t
know which model the variable is referring to, so it can’t know that
objects even exists (Django doesn’t add it if the user added a custom
manager of a different name). django-stubs used to incorrectly assume
it exists unconditionally, but it no longer does.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-09-07 17:51:42 -07:00
Zixuan James Li 30495cec58 migration: Rename extra_data_json to extra_data in audit log models.
This migration applies under the assumption that extra_data_json has
been populated for all existing and coming audit log entries.

- This removes the manual conversions back and forth for extra_data
throughout the codebase including the orjson.loads(), orjson.dumps(),
and str() calls.

- The custom handler used for converting Decimal is removed since
DjangoJSONEncoder handles that for extra_data.

- We remove None-checks for extra_data because it is now no longer
nullable.

- Meanwhile, we want the bouncer to support processing RealmAuditLog entries for
remote servers before and after the JSONField migration on extra_data.

- Since now extra_data should always be a dict for the newer remote
server, which is now migrated, the test cases are updated to create
RealmAuditLog objects by passing a dict for extra_data before
sending over the analytics data. Note that while JSONField allows for
non-dict values, a proper remote server always passes a dict for
extra_data.

- We still test out the legacy extra_data format because not all
remote servers have migrated to use JSONField extra_data.
This verifies that support for extra_data being a string or None has not
been dropped.

Co-authored-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2023-08-16 17:18:14 -07:00
Anders Kaseorg 2ae285af7c ruff: Fix PLR1714 Consider merging multiple comparisons.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-23 15:21:33 -07:00
Zixuan Li a0cf624eaa
migrations: Backfill extra_data_json for audit log entries.
This migration is reasonably complex because of various anomalies in existing
data.

Note that there are cases when extra_data does not contain data that is
proper json with possibly single quotes. Thus we need to use
"ast.literal_eval" to cover that.

There is also a special case for "event_type == USER_FULL_NAME_CHANGED",
where extra_data is a plain str. This event_type is only used for
RealmAuditLog, so the zilencer migration script does not need to handle
it.

The migration does not handle "event_type == REALM_DISCOUNT_CHANGED"
because ast.literal_eval only allow Python literals. We expect the admin
to populate the jsonified extra_data for extra_data_json manually
beforehand.

This chunks the backfilling migration to reduce potential block time.

The migration for zilencer is mostly similar to the one for zerver; except that
the backfill helper is added in a wrapper and unrelated events are
removed.

**Logging and error recovery**

We print out a warning when the extra_data_json field of an entry
would have been overwritten by a value inconsistent with what we derived
from extra_data. Usually this only happens when the extra_data was
corrupted before this migration. This prevents data loss by backing up
possibly corrupted data in extra_data_json with the keys
"inconsistent_old_extra_data" and "inconsistent_old_extra_data_json".
More roundtrips to the database are needed for inconsistent data, which are
expected to be infrequent.

This also outputs messages when there are audit log entries with decimals,
indicating that such entries are not backfilled. Do note that audit log
entries with decimals are not populated with "inconsistent_old_extra_data_*"
in the JSONField, because they are not overwritten.

For such audit log entries with "extra_data_json" marked as inconsistent,
we skip them in the migration.  Because when we have discovered anomalies in a
previous run, there is no need to overwrite them again nesting the extra keys
we added to it.

**Testing**

We create a migration test case utilizing the property of bulk_create
that it doesn't call our modified save method.

We extend ZulipTestCase to support verifying console output at the test
case level. The implementation is crude but the use case should be rare
enough that we don't need it to be too elaborate.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2023-07-15 09:43:23 -07:00
Anders Kaseorg 7e707270f0 models: Convert deprecated index_together option to indexes.
index_together is slated for removal in Django 5.1:
https://docs.djangoproject.com/en/4.2/internals/deprecation/#deprecation-removed-in-5-1

We set the optional index names to match the previously generated
index names to avoid adding new migrations.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-12 07:12:43 -07:00
Zixuan Li e39e04c3ce
migration: Add `extra_data_json` for audit log models.
Note that we use the DjangoJSONEncoder so that we have builtin support
for parsing Decimal and datetime.

During this intermediate state, the migration that creates
extra_data_json field has been run. We prepare for running the backfilling
migration that populates extra_data_json from extra_data.

This change implements double-write, which is important to keep the
state of extra data consistent. For most extra_data usage, this is
handled by the overriden `save` method on `AbstractRealmAuditLog`, where
we either generates extra_data_json using orjson.loads or
ast.literal_eval.

While backfilling ensures that old realm audit log entries have
extra_data_json populated, double-write ensures that any new entries
generated will also have extra_data_json set. So that we can then safely
rename extra_data_json to extra_data while ensuring the non-nullable
invariant.

For completeness, we additionally set RealmAuditLog.NEW_VALUE for
the USER_FULL_NAME_CHANGED event. This cannot be handled with the
overridden `save`.

This addresses: https://github.com/zulip/zulip/pull/23116#discussion_r1040277795

Note that extra_data_json at this point is not used yet. So the test
cases do not need to switch to testing extra_data_json. This is later
done after we rename extra_data_json to extra_data.

Double-write for the remote server audit logs is special, because we only
get the dumped bytes from an external source. Luckily, none of the
payload carries extra_data that is not generated using orjson.dumps for
audit logs of event types in SYNC_BILLING_EVENTS. This can be verified
by looking at:

`git grep -A 6 -E "event_type=.*(USER_CREATED|USER_ACTIVATED|USER_DEACTIVATED|USER_REACTIVATED|USER_ROLE_CHANGED|REALM_DEACTIVATED|REALM_REACTIVATED)"`

Therefore, we just need to populate extra_data_json doing an
orjson.loads call after a None-check.

Co-authored-by: Zixuan James Li <p359101898@gmail.com>
2023-06-07 12:14:43 -07:00
Anders Kaseorg 0628c3cac8 migrations: Import BaseDatabaseSchemaEditor from its canonical module.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-05 14:46:28 -08:00
Anders Kaseorg df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Zixuan James Li d5517932cd typing: Use BaseDatabaseSchemaEditor in place of DatabaseSchemaEditor.
This is a part of #18777.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-05-30 14:18:53 -07:00
Mateusz Mandera f90beae616 zilencer: Drop the index from RemotePushDeviceToken.user_id.
The index isn't used, because our unique_index entries provide better
indexes for the queries.
2022-03-14 17:47:30 -07:00
Mateusz Mandera 0677c90170 zilencer: Change push bouncer API to accept uuids as user identifier.
This is the first step to making the full switch to self-hosted servers
use user uuids, per issue #18017. The old id format is still supported
of course, for backward compatibility.

This commit is separate in order to allow deploying *just* the bouncer
API change to production first.
2022-03-14 17:47:30 -07:00