When profiling the database query in `remote_activity.py`,
push_forwarded_count was identified as an expensive part of
the overall work. Adds an index on RemoteInstallationCount
so this is more efficient.
For the RemoteRealm case, we can only set this in endpoints where the
remote server sends us the realm_uuid. So we're missing that for the
endpoints:
- remotes/push/unregister and remotes/push/unregister/all
- remotes/push/test_notification
This should be added in a follow-up commit.
Also avoid prompting for full name time more than once.
Adds TOS version field to Remote server user.
Co-authored-by: Karl Stolley <karl@zulip.com>
Co-authored-by: Aman Agrawal <amanagr@zulip.com>
The way the flow goes now is this:
1. The user initiaties login via "Billing" in the gear menu.
2. That takes them to `/self-hosted-billing/` (possibly with a
`next_page` param if we use that for some gear menu options).
3. The server queries the bouncer to give the user a link with a signed
access token.
4. The user is redirected to that link (on `selfhosting.zulipchat.com`).
Now we have two cases, either the user is logging in for the first time
and already did in the past.
If this is the first time, we have:
5. The user is asked to fill in their email in a form that's shown,
pre-filled with the value provided inside the signed access token.
They POST this to the next endpoint.
6. The next endpoint sends a confirmation email to that address and asks
the user to go check their email.
7. The user clicks the link in their email is taken to the
from_confirmation endpoint.
8. Their initial RemoteBillingUser is created, a new signed link like in
(3) is generated and they're transparently taken back to (4),
where now that they have a RemoteBillingUser, they're handled
just like a user who already logged in before:
If the user already logged in before, they go straight here:
9. "Confirm login" page - they're shown their information (email and
full_name), can update
their full name in the form if they want. They also accept ToS here
if necessary. They POST this form back to
the endpoint and finally have a logged in session.
10. They're redirected to billing (or `next_page`) now that they have
access.
For the last form (with Full Name and ToS consent field), this pretty
shamelessly re-uses and directly renders the
corporate/remote_realm_billing_finalize_login_confirmation.html
template. That's probably good in terms of re-use, but calls for a
clean-up commit that will generalize the name of this template and the
classes/ids in the HTML.
If the RemoteRealmAuditLog has stale data, it means the server
stopped or never uploaded data. We raise MissingDataError in such
cases when a user action led to calculating licenses count from
stale data.
This consists of the following pieces:
1. Makes servers using the bouncer send realm_uuid in requests for token
registration. (Sidenote: realm_uuid is already sent in the "send
notification" codepath as of
48db4bf854)
2. This allows the bouncer to tie RemotePushDeviceToken to the
RemoteRealm with matching realm_uuid at registration time.
3. Introduce handling of some potential weird edge cases around the
realm_uuid and RemoteRealm objects in get_remote_realm_helper.
The recent #27818 naïvely added unique indexes, despite there being a
large number of existing violations. This makes the migration
impossible to deploy.
Update the migration to de-duplicate rows, dropping all but the
first-by-id of each unique set. This is equivalent to what
dd954749be does with `ignore_conflicts`. We update the migration,
rather than making a new one, as any server which has somehow
successfully applied the migration apparently did not need to
de-duplicate anything.
This applies f299f31340 but for the push bouncer receiving side.
This is particularly important as we start relying on the unique
constraints, via `ON CONFLICT ... IGNORE`, in subsequent commits.
Fixes: #12362.
Add the new model for recording basic information about Realms on remote
server, to go with the other analytics data. Also adds necessary changes
to the bouncer endpoint and the send_analytics_to_push_bouncer()
function to submit such Realm information.
There is no reason to have an index on just `realm_id` or `remote_id`,
as those values mean nothing outside of the scope of a specific
`server_id`. Remove those never-used single-column indexes from the
two tables that have them.
By contrast, the pair of `server_id` and `remote_id` is quite useful
and specific -- it is a unique pair, and every POST of statistics from
a remote host requires looking up the highest `remote_id` for a given
`server_id`, which (without this index) is otherwise a quite large
scan.
Add a unique constraint, which (in PostgreSQL) is implemented as a
unique index.
_default_manager is the same as objects on most of our models. But
when a model class is stored in a variable, the type system doesn’t
know which model the variable is referring to, so it can’t know that
objects even exists (Django doesn’t add it if the user added a custom
manager of a different name). django-stubs used to incorrectly assume
it exists unconditionally, but it no longer does.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This migration applies under the assumption that extra_data_json has
been populated for all existing and coming audit log entries.
- This removes the manual conversions back and forth for extra_data
throughout the codebase including the orjson.loads(), orjson.dumps(),
and str() calls.
- The custom handler used for converting Decimal is removed since
DjangoJSONEncoder handles that for extra_data.
- We remove None-checks for extra_data because it is now no longer
nullable.
- Meanwhile, we want the bouncer to support processing RealmAuditLog entries for
remote servers before and after the JSONField migration on extra_data.
- Since now extra_data should always be a dict for the newer remote
server, which is now migrated, the test cases are updated to create
RealmAuditLog objects by passing a dict for extra_data before
sending over the analytics data. Note that while JSONField allows for
non-dict values, a proper remote server always passes a dict for
extra_data.
- We still test out the legacy extra_data format because not all
remote servers have migrated to use JSONField extra_data.
This verifies that support for extra_data being a string or None has not
been dropped.
Co-authored-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
This migration is reasonably complex because of various anomalies in existing
data.
Note that there are cases when extra_data does not contain data that is
proper json with possibly single quotes. Thus we need to use
"ast.literal_eval" to cover that.
There is also a special case for "event_type == USER_FULL_NAME_CHANGED",
where extra_data is a plain str. This event_type is only used for
RealmAuditLog, so the zilencer migration script does not need to handle
it.
The migration does not handle "event_type == REALM_DISCOUNT_CHANGED"
because ast.literal_eval only allow Python literals. We expect the admin
to populate the jsonified extra_data for extra_data_json manually
beforehand.
This chunks the backfilling migration to reduce potential block time.
The migration for zilencer is mostly similar to the one for zerver; except that
the backfill helper is added in a wrapper and unrelated events are
removed.
**Logging and error recovery**
We print out a warning when the extra_data_json field of an entry
would have been overwritten by a value inconsistent with what we derived
from extra_data. Usually this only happens when the extra_data was
corrupted before this migration. This prevents data loss by backing up
possibly corrupted data in extra_data_json with the keys
"inconsistent_old_extra_data" and "inconsistent_old_extra_data_json".
More roundtrips to the database are needed for inconsistent data, which are
expected to be infrequent.
This also outputs messages when there are audit log entries with decimals,
indicating that such entries are not backfilled. Do note that audit log
entries with decimals are not populated with "inconsistent_old_extra_data_*"
in the JSONField, because they are not overwritten.
For such audit log entries with "extra_data_json" marked as inconsistent,
we skip them in the migration. Because when we have discovered anomalies in a
previous run, there is no need to overwrite them again nesting the extra keys
we added to it.
**Testing**
We create a migration test case utilizing the property of bulk_create
that it doesn't call our modified save method.
We extend ZulipTestCase to support verifying console output at the test
case level. The implementation is crude but the use case should be rare
enough that we don't need it to be too elaborate.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
Note that we use the DjangoJSONEncoder so that we have builtin support
for parsing Decimal and datetime.
During this intermediate state, the migration that creates
extra_data_json field has been run. We prepare for running the backfilling
migration that populates extra_data_json from extra_data.
This change implements double-write, which is important to keep the
state of extra data consistent. For most extra_data usage, this is
handled by the overriden `save` method on `AbstractRealmAuditLog`, where
we either generates extra_data_json using orjson.loads or
ast.literal_eval.
While backfilling ensures that old realm audit log entries have
extra_data_json populated, double-write ensures that any new entries
generated will also have extra_data_json set. So that we can then safely
rename extra_data_json to extra_data while ensuring the non-nullable
invariant.
For completeness, we additionally set RealmAuditLog.NEW_VALUE for
the USER_FULL_NAME_CHANGED event. This cannot be handled with the
overridden `save`.
This addresses: https://github.com/zulip/zulip/pull/23116#discussion_r1040277795
Note that extra_data_json at this point is not used yet. So the test
cases do not need to switch to testing extra_data_json. This is later
done after we rename extra_data_json to extra_data.
Double-write for the remote server audit logs is special, because we only
get the dumped bytes from an external source. Luckily, none of the
payload carries extra_data that is not generated using orjson.dumps for
audit logs of event types in SYNC_BILLING_EVENTS. This can be verified
by looking at:
`git grep -A 6 -E "event_type=.*(USER_CREATED|USER_ACTIVATED|USER_DEACTIVATED|USER_REACTIVATED|USER_ROLE_CHANGED|REALM_DEACTIVATED|REALM_REACTIVATED)"`
Therefore, we just need to populate extra_data_json doing an
orjson.loads call after a None-check.
Co-authored-by: Zixuan James Li <p359101898@gmail.com>
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This is the first step to making the full switch to self-hosted servers
use user uuids, per issue #18017. The old id format is still supported
of course, for backward compatibility.
This commit is separate in order to allow deploying *just* the bouncer
API change to production first.
Given that these values are uuids, it's better to use UUIDField which is
meant for exactly that, rather than an arbitrary CharField.
This requires modifying some tests to use valid uuids.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>