Commit Graph

341 Commits

Author SHA1 Message Date
Mateusz Mandera 0abf60fd01 scheduled_message: Make export/import work.
Closes #25130 by addressing the import/export part of it.
2023-05-08 15:55:06 -07:00
Mateusz Mandera 780ef71891 export: Fix typo in variable name. 2023-05-08 15:55:06 -07:00
Mateusz Mandera 414658fc8e scheduled_message: Handle attachments properly.
Fixes #25414.

We add Attachment.scheduled_messages relation to track ScheduledMessages
which reference the attachment.

The import bits can be done after merging this, by updating #25345.
2023-05-08 09:56:02 -07:00
Tim Abbott 027b67be80 presence: Rewrite the backend data model.
This implements the core of the rewrite described in:

For the backend data model for UserPresence to one that supports much
more efficient queries and is more correct around handling of multiple
clients.  The main loss of functionality is that we no longer track
which Client sent presence data (so we will no longer be able to say
using UserPresence "the user was last online on their desktop 15
minutes ago, but was online with their phone 3 minutes ago").  If we
consider that information important for the occasional investigation
query, we have can construct that answer data via UserActivity
already.  It's not worth making Presence much more expensive/complex
to support it.

For slim_presence clients, this sends the same data format we sent
before, albeit with less complexity involved in constructing it.  Note
that we at present will always send both last_active_time and
last_connected_time; we may revisit that in the future.

This commit doesn't include the finalizing migration, which drops the
UserPresenceOld table.
The way to deploy is to start the backfill migration with the server
down and then start the server *without* the user_presence queue worker,
to let the migration finish without having new data interfering with it.
Once the migration is done, the queue worker can be started, leading to
the presence data catching up to the current state as the queue worker
goes over the queued up events and updating the UserPresence table.

Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
2023-04-26 14:26:47 -07:00
Alex Vandiver e0eb074b23 export: Skip PreregistrationRealm data.
Much like PreregistrationUser rows, these do not make sense to export.
2023-04-24 09:48:25 -07:00
Mateusz Mandera ffa3aa8487 auth: Rewrite data model for tracking enabled auth backends.
So far, we've used the BitField .authentication_methods on Realm
for tracking which backends are enabled for an organization. This
however made it a pain to add new backends (requiring altering the
column and a migration - particularly troublesome if someone wanted to
create their own custom auth backend for their server).

Instead this will be tracked through the existence of the appropriate
rows in the RealmAuthenticationMethods table.
2023-04-18 09:22:56 -07:00
Alex Vandiver 113a8c4782 export: Make --deactivate-realm exports be imported as active. 2023-04-03 16:08:43 -07:00
Sahil Batra 7f1bf9d6ab models: Add PreregistrationRealm class.
This commit adds PreregistrationRealm class which will be
similar to PreregistrationUser and will store initial
information of the realm before its creation as we are
changing the organization creation flow as per #24307.

Fixes part of #24307.
2023-03-27 15:44:42 -07:00
Anders Kaseorg 6992d3297a ruff: Fix PIE810 Call `startswith` once with a `tuple`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-08 16:40:35 -08:00
Anders Kaseorg 5b7c4206d7 ruff: Fix SIM300 Yoda conditions are discouraged.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-03 16:36:54 -08:00
Anders Kaseorg df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Anders Kaseorg ff1971f5ad ruff: Fix SIM105 Use `contextlib.suppress` instead of try-except-pass.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Anders Kaseorg b0e569f07c ruff: Fix SIM102 nested `if` statements.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Alex Vandiver 7ad06473b6 uploads: Add LOCAL_AVATARS_DIR / LOCAL_FILES_DIR computed settings.
This avoids strewing "avatars" and "files" constants throughout.
2023-01-09 18:23:58 -05:00
Alex Vandiver 8e68d68f32 uploads: Be consistent about first arguments to write_local_file.
Enforcing a consistent `type` helps us double-check that we're not
playing fast-and-loose with any file paths for local files.  As noted
in the comment, this is purely for defense-in-depth.

Passing `write_local_file` a consistent `type` requires removing the
"avatars" out of `realm_avatar_and_logo_path` -- which makes it
consistent across upload backends.

This, in turn, requires a compensatory change to zerver.lib.export, to
be explicit that the realm icons are exported from the avatars
directory.  This clarity is likely an improvement.
2023-01-09 18:23:58 -05:00
Alex Vandiver 7c0d414aff uploads: Split out S3 and local file backends into separate files.
The uploads file is large, and conceptually the S3 and local-file
backends are separable.
2023-01-09 18:23:58 -05:00
Anders Kaseorg 7216ba4813 ruff: Fix DTZ001 `datetime.datetime()` without `tzinfo` argument.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Anders Kaseorg bd884c88ed Fix typos caught by typos.
https://github.com/crate-ci/typos

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-03 11:09:50 -08:00
Anders Kaseorg 6e32684d09 export: Replace broken naive datetime warning with assertion.
‘logging.warning("Naive datetime:", item)’ is an invalid call that
crashes with “TypeError: not all arguments converted during string
formatting”.  I take that to mean this check has not been tripped in
the six years it’s been there, and can safely be replaced with an
error.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-27 10:33:47 -08:00
Anders Kaseorg 77c15547e6 ruff: Fix C414 Unnecessary `list` call within `sorted()`.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-11-03 12:10:15 -07:00
Mateusz Mandera e360b5e29b export: Remove unnecessary if in export with consent code.
This might be a bit cleaner.
2022-09-27 11:56:27 -07:00
Mateusz Mandera 318d7fd4cd export: Only export messages that a consenting user can access.
As mentioned in the TODO this commit deletes, the export with member
consent system was failing to account for the fact that if consenting
users only have access to a subset of messages of a stream with
protected history, only that subset should be exported - rather than all
the stream's messages.
2022-09-27 11:56:27 -07:00
Anders Kaseorg 9198fe4fac scim: Downgrade SCIMClient from a model to an ephemeral dataclass.
SCIMClient is a type-unsafe workaround for django-scim2’s conflation
of SCIM users with Django users.  Given that a SCIMClient is not a
UserProfile, it might as well not be a model at all, since it’s only
used to satisfy django-scim2’s request.user.is_authenticated queries.

This doesn’t solve the type safety issue with assigning a SCIMClient
to request.user, nor the performance issue with running the SCIM
middleware on non-SCIM requests.  But it reduces the risk of potential
consequences worse than crashing, since there’s no longer a
request.user.id for Django to confuse with the ID of an actual
UserProfile.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-09-26 11:36:48 -07:00
Sahil Batra 1e55e7641e export: Do not export direct_members and direct_subgroups field.
We do not need direct_members and direct_subgroups field of
UserGroup objects in the export data since we already have
UserGroupMembership and GroupGroupMembership object data.

While importing we keep these fields empty when creating
UserGroup objects and direct_members and direct_subgroups
fields will get set when UserGroupMembership and
GroupGroupMembership objects are created.

This change will also help us in further changes when we
will change the order of importing to import UserGroup
objects just after Realm objects.
2022-09-13 11:07:09 -07:00
Zixuan James Li 2382f1925d export: Add an isinstance check for orig_dt.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-08-12 17:08:04 -07:00
Mateusz Mandera cf74d7d140 realm_reactivation: Prevent realm reactivation link reuse.
This uses the approach analogical to EmailChangeStatus for email change
confirmation links.
2022-07-26 17:14:26 -07:00
Anders Kaseorg b35268e6bb CVE-2022-31134: Exclude private attachments from realm exports.
Zulip Server 2.1.0 and above have a UI tool, accessible only to server
owners and server administrators, which provides a way to download a
“public data” export. While this export tool is only accessible to
administrators, in many configurations server administrators are not
expected to have access to private messages and private
streams. However, the “public data” export which administrators could
generate contained the attachment contents for all attachments, even
those from private messages and streams.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-12 06:08:05 +00:00
Zixuan James Li 6c7b2d621e typing: Avoid redefinition of incompatible QuerySets.
The pattern of using the same variable to apply filters
or alter the `QuerySet` in other ways might produce `QuerySet`s
with incompatible types. This behavior is not allowed by mypy.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:27:43 -07:00
Zixuan James Li ab1bbdda65 typing: Broaden type annotations for QuerySet compatibility.
To explain the rationale of this change, for example, there is
`get_user_activity_summary` which accepts either a `Collection[UserActivity]`,
where `QuerySet[T]` is not strictly `Sequence[T]` because its slicing behavior
is different from the `Protocol`, making `Collection` necessary.

Similarily, we should have `Iterable[T]` instead of `List[T]` so that
`QuerySet[T]` will also be an acceptable subtype, or `Sequence[T]` when we
also expect it to be indexed.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:27:42 -07:00
Anders Kaseorg 2439914a50 settings: Add two_factor.plugins.phonenumber to INSTALLED_APPS.
I missed this in commit feff1d0411
(#22383) for upgrading to django-two-factor-auth 1.14.0.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-06 17:23:53 -07:00
Zixuan James Li fd9a0f4274 typing: Apply trivial none-checks with assertions as necessary.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-06-23 19:25:48 -07:00
Anders Kaseorg a2825e5984 python: Use Python 3.8 typing.{Protocol,TypedDict}.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-27 12:57:49 -07:00
Zixuan James Li 9d448e73d2 decorator: Remove cachify in favor of lru_cache.
`cachify` is essentially caching the return value of a function using only
the non-keyword-only arguments as the key.

The use case of the function in the backend can be sufficiently covered by
`functools.lru_cache` as an unbound cache. There is no signficant difference
apart from `cachify` overlooking keyword-only arguments, and
`functools.lru_cache` being conveniently typed.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-04-14 12:44:35 -07:00
Mateusz Mandera d800ac33a0 push_notifications: Send user_uuid to the push bouncer.
Fixes #18017.

In previous commits, the change to the bouncer API was introduced to
support this and then a series of migrations added .uuid to
UserProfiles.

Now the code for self-hosted servers that makes requests
to the bouncer is changed to make use of it.
2022-03-14 17:47:30 -07:00
Anders Kaseorg b0ce4f1bce docs: Fix many spelling mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-07 18:51:06 -08:00
Anders Kaseorg 27977eddeb export: Use tar -C to switch directories.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-17 08:01:53 -08:00
Tim Abbott 31842e1377 export: Fix empty realm_icons directory in single-user exports. 2021-12-10 12:05:34 -08:00
Steve Howell 9a39ca217f user export: Show less info for recipients.
For PM and huddles, show full names but no
emails or other crufty fields.
2021-12-09 17:20:01 -08:00
Steve Howell 6a5c407b05 user export: Be more selective about exported messages. 2021-12-09 17:20:01 -08:00
Steve Howell fa654fd7a0 user export: Ignore realm icon and logo.
These are not considered to be "personal"
info, even if you upload them, so we
don't export them.

Generally the only folks who upload
these are admins, who can easily get
them in other ways. In fact, anybody
can get these via the app.
2021-12-09 17:20:01 -08:00
Steve Howell 8f991f8eb1 export: Make sure messages are sorted **across** files.
We now ensure that all message ids are sorted BEFORE
we split them into batches.

We now do a few extra "slim" queries to get message
ids up front.

But, now, when we divide them into batches, we no
longer run 2 or 3 different complicated queries in
a loop. We just basically hydrate our message ids,
so `write_message_partials` should be easy to reason
about.

This change also means that for tiny realms with
< 1000 messages you will always have just one
json file, since we aggregate the ids from the
queries before batching.
2021-12-09 12:22:34 -08:00
Steve Howell cef0e11816 export: Add get_id_list_gently_from_database.
This is slightly overkill for the single-user use
case, but for small queries it's barely any overhead,
and it's a nice abstraction.
2021-12-09 12:22:34 -08:00
Steve Howell 8ea320812f user exports: Chunkify messages in sorted order.
This accomplishes a few things:

    * It extracts `chunkify` rather than having us
      clumsily track chunking-related stuff in a
      big loop that is doing other stuff.

    * It makes it so that all message ids
      in message-000001.json < message-000002.json.

    * It makes it easier for us to customize
      the messages we send to a single user
      (coming soon).

BTW we probably have a slicker version of chunkify
somewhere in our codebase, but I couldn't remember
where.
2021-12-09 12:22:34 -08:00
Steve Howell 2a73964e16 user export: Add reactions.
We may eventually try to attach these to the messages
in the message-NNNNNN.json files, but for now they're
fine in user.json.
2021-12-09 12:22:34 -08:00
Steve Howell f810833df5 export: Improve export_usermessages_batch.
We no longer jankily read our input file into
an "output" variable.  Instead, we do things
in a type-safe way.
2021-12-09 08:36:40 -08:00
Steve Howell 5c1e8cb8dc mypy: Add MessagePartial TypedDict. 2021-12-09 08:36:40 -08:00
Steve Howell 09c57a3f9f export: Log more consistently and sort ids.
Now all file writes go through our three
helper functions, and we consistently
write a single log message after the file
gets written.

I killed off write_message_exports, since
all but one of its callers can call
write_table_data, which automatically
sorts data.  In particular, our Message
and UserMessage data will now be sorted
by ids.
2021-12-09 08:36:40 -08:00
Steve Howell 6ec49951c6 minor: Avoid creating intermediate list for message_ids.
This probably just postpones the list creation until
Django builds the "IN" query, but semantically it's
good to work in sets where we don't have any
meaningful ordering of the list that gets used.
2021-12-08 16:12:54 -08:00
Steve Howell f8ed099d3c export: Sort table data for most tables.
This affects most of our tables, but it excludes
table(s) like Message that go through kind of
unique codepaths.
2021-12-08 16:12:54 -08:00
Steve Howell a1d3f12e53 refactor: Extract write_table_data().
The immediate benefit of this is stronger mypy
checks (avoiding the ugly union caused by message
files).

The subsequent commit will add sorting.

We have test coverage on all these lines insofar
as if you comment out the lines, tests will
explode (i.e. more than superficial line
coverage).
2021-12-08 16:12:54 -08:00