Commit Graph

315 Commits

Author SHA1 Message Date
Anders Kaseorg b35268e6bb CVE-2022-31134: Exclude private attachments from realm exports.
Zulip Server 2.1.0 and above have a UI tool, accessible only to server
owners and server administrators, which provides a way to download a
“public data” export. While this export tool is only accessible to
administrators, in many configurations server administrators are not
expected to have access to private messages and private
streams. However, the “public data” export which administrators could
generate contained the attachment contents for all attachments, even
those from private messages and streams.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-12 06:08:05 +00:00
Zixuan James Li 6c7b2d621e typing: Avoid redefinition of incompatible QuerySets.
The pattern of using the same variable to apply filters
or alter the `QuerySet` in other ways might produce `QuerySet`s
with incompatible types. This behavior is not allowed by mypy.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:27:43 -07:00
Zixuan James Li ab1bbdda65 typing: Broaden type annotations for QuerySet compatibility.
To explain the rationale of this change, for example, there is
`get_user_activity_summary` which accepts either a `Collection[UserActivity]`,
where `QuerySet[T]` is not strictly `Sequence[T]` because its slicing behavior
is different from the `Protocol`, making `Collection` necessary.

Similarily, we should have `Iterable[T]` instead of `List[T]` so that
`QuerySet[T]` will also be an acceptable subtype, or `Sequence[T]` when we
also expect it to be indexed.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:27:42 -07:00
Anders Kaseorg 2439914a50 settings: Add two_factor.plugins.phonenumber to INSTALLED_APPS.
I missed this in commit feff1d0411
(#22383) for upgrading to django-two-factor-auth 1.14.0.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-06 17:23:53 -07:00
Zixuan James Li fd9a0f4274 typing: Apply trivial none-checks with assertions as necessary.
Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-06-23 19:25:48 -07:00
Anders Kaseorg a2825e5984 python: Use Python 3.8 typing.{Protocol,TypedDict}.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-27 12:57:49 -07:00
Zixuan James Li 9d448e73d2 decorator: Remove cachify in favor of lru_cache.
`cachify` is essentially caching the return value of a function using only
the non-keyword-only arguments as the key.

The use case of the function in the backend can be sufficiently covered by
`functools.lru_cache` as an unbound cache. There is no signficant difference
apart from `cachify` overlooking keyword-only arguments, and
`functools.lru_cache` being conveniently typed.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-04-14 12:44:35 -07:00
Mateusz Mandera d800ac33a0 push_notifications: Send user_uuid to the push bouncer.
Fixes #18017.

In previous commits, the change to the bouncer API was introduced to
support this and then a series of migrations added .uuid to
UserProfiles.

Now the code for self-hosted servers that makes requests
to the bouncer is changed to make use of it.
2022-03-14 17:47:30 -07:00
Anders Kaseorg b0ce4f1bce docs: Fix many spelling mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-07 18:51:06 -08:00
Anders Kaseorg 27977eddeb export: Use tar -C to switch directories.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-17 08:01:53 -08:00
Tim Abbott 31842e1377 export: Fix empty realm_icons directory in single-user exports. 2021-12-10 12:05:34 -08:00
Steve Howell 9a39ca217f user export: Show less info for recipients.
For PM and huddles, show full names but no
emails or other crufty fields.
2021-12-09 17:20:01 -08:00
Steve Howell 6a5c407b05 user export: Be more selective about exported messages. 2021-12-09 17:20:01 -08:00
Steve Howell fa654fd7a0 user export: Ignore realm icon and logo.
These are not considered to be "personal"
info, even if you upload them, so we
don't export them.

Generally the only folks who upload
these are admins, who can easily get
them in other ways. In fact, anybody
can get these via the app.
2021-12-09 17:20:01 -08:00
Steve Howell 8f991f8eb1 export: Make sure messages are sorted **across** files.
We now ensure that all message ids are sorted BEFORE
we split them into batches.

We now do a few extra "slim" queries to get message
ids up front.

But, now, when we divide them into batches, we no
longer run 2 or 3 different complicated queries in
a loop. We just basically hydrate our message ids,
so `write_message_partials` should be easy to reason
about.

This change also means that for tiny realms with
< 1000 messages you will always have just one
json file, since we aggregate the ids from the
queries before batching.
2021-12-09 12:22:34 -08:00
Steve Howell cef0e11816 export: Add get_id_list_gently_from_database.
This is slightly overkill for the single-user use
case, but for small queries it's barely any overhead,
and it's a nice abstraction.
2021-12-09 12:22:34 -08:00
Steve Howell 8ea320812f user exports: Chunkify messages in sorted order.
This accomplishes a few things:

    * It extracts `chunkify` rather than having us
      clumsily track chunking-related stuff in a
      big loop that is doing other stuff.

    * It makes it so that all message ids
      in message-000001.json < message-000002.json.

    * It makes it easier for us to customize
      the messages we send to a single user
      (coming soon).

BTW we probably have a slicker version of chunkify
somewhere in our codebase, but I couldn't remember
where.
2021-12-09 12:22:34 -08:00
Steve Howell 2a73964e16 user export: Add reactions.
We may eventually try to attach these to the messages
in the message-NNNNNN.json files, but for now they're
fine in user.json.
2021-12-09 12:22:34 -08:00
Steve Howell f810833df5 export: Improve export_usermessages_batch.
We no longer jankily read our input file into
an "output" variable.  Instead, we do things
in a type-safe way.
2021-12-09 08:36:40 -08:00
Steve Howell 5c1e8cb8dc mypy: Add MessagePartial TypedDict. 2021-12-09 08:36:40 -08:00
Steve Howell 09c57a3f9f export: Log more consistently and sort ids.
Now all file writes go through our three
helper functions, and we consistently
write a single log message after the file
gets written.

I killed off write_message_exports, since
all but one of its callers can call
write_table_data, which automatically
sorts data.  In particular, our Message
and UserMessage data will now be sorted
by ids.
2021-12-09 08:36:40 -08:00
Steve Howell 6ec49951c6 minor: Avoid creating intermediate list for message_ids.
This probably just postpones the list creation until
Django builds the "IN" query, but semantically it's
good to work in sets where we don't have any
meaningful ordering of the list that gets used.
2021-12-08 16:12:54 -08:00
Steve Howell f8ed099d3c export: Sort table data for most tables.
This affects most of our tables, but it excludes
table(s) like Message that go through kind of
unique codepaths.
2021-12-08 16:12:54 -08:00
Steve Howell a1d3f12e53 refactor: Extract write_table_data().
The immediate benefit of this is stronger mypy
checks (avoiding the ugly union caused by message
files).

The subsequent commit will add sorting.

We have test coverage on all these lines insofar
as if you comment out the lines, tests will
explode (i.e. more than superficial line
coverage).
2021-12-08 16:12:54 -08:00
Steve Howell c76ca2d0df export: Sort records.json files by path. 2021-12-08 16:12:54 -08:00
Steve Howell 2ef38e3d48 refactor: Extract write_records_json_file. 2021-12-08 16:12:54 -08:00
Steve Howell b79cfc19ab user export: Broaden query for RealmAuditLog.
We now check acting_user as well as modified_user
to see if a row pertains to our exported user.
2021-12-08 16:01:38 -08:00
Steve Howell 927b04368e minor: Use virtual_parent for custom fetchers.
The distinction here wasn't super meaningful
due to the way we order our "elif" statements,
but we want to reserver "normal_parent" for the
majority of use cases, where you simply tell
the Config what the "foreign_key" is.
2021-12-08 15:58:07 -08:00
Steve Howell 50120a9387 export: Remove config parameter for custom fetchers. 2021-12-08 15:58:07 -08:00
Steve Howell 54a3a423e5 mypy: Fix CustomFetch=Any hack. 2021-12-08 15:58:07 -08:00
Steve Howell 4128b52ac5 export: Rename custom fetchers. 2021-12-08 15:58:07 -08:00
Steve Howell a2c4931316 exports: Use realm for RealmAuditLog in realm exports.
For realm-wide exports, there is no reason to query
inefficiently against a list of modified users.

We move the Config out of the common child configs.
2021-12-08 15:58:07 -08:00
Steve Howell 8dd3c1038f exports: Rename parent_key to include_rows.
Even though Django usually treats foo__in
and foo_id__in identically for filters where
foo is a ForeignKey type, we want to insist
on somewhat more consistent syntax, because
we have the odd combo of type and type_id
in Recipient, where type_id is kinda like a
foreign key, but not a ForeignKey.

So we assert for now that all our include_rows
values end in "_id__in".
2021-12-08 15:58:07 -08:00
Steve Howell 02207f47d5 minor: Move code blocks to be alphabetical. 2021-12-08 15:58:07 -08:00
Steve Howell aae9f1b6f5 export: Make Config errors more clear. 2021-12-08 15:58:07 -08:00
Steve Howell 6d09eab285 export: Export file images for single users.
We don't have automated test coverage on this yet,
but below are the results from manual testing.

Note that we include the realm icon and logo even
though they were not created by Cordelia.

    ./manage.py export_single_user cordelia@zulip.com

    $ (cd /tmp/zulip-export-4v3mo802/ && find .)
    .
    ./emoji
    ./emoji/2
    ./emoji/2/emoji
    ./emoji/2/emoji/images
    ./emoji/2/emoji/images/3.jpg
    ./emoji/records.json
    ./messages-000001.json
    ./realm_icons
    ./realm_icons/2
    ./realm_icons/2/night_logo.original
    ./realm_icons/2/night_logo.png
    ./realm_icons/2/icon.png
    ./realm_icons/2/icon.original
    ./realm_icons/records.json
    ./avatars
    ./avatars/2
    ./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.original
    ./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.png
    ./avatars/records.json
    ./uploads
    ./uploads/2
    ./uploads/2/68
    ./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt
    ./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt/denver.jpg
    ./uploads/2/96
    ./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim
    ./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim/denver.jpg
    ./uploads/records.json
    ./user.json
2021-12-07 11:16:52 -08:00
Steve Howell b8d9143318 export: Validate emoji paths.
(We lift the RealmEmoji query to be used by
both local and S3 storage helpers.)
2021-12-07 11:16:52 -08:00
Steve Howell ef6d9b10d2 refactor: Extract get_emoji_path. 2021-12-07 11:16:52 -08:00
Steve Howell 5a41904201 export: Add handle_system_bots flag.
We will set this to False for single-user exports.
2021-12-07 11:16:52 -08:00
Steve Howell 0e19deb558 exports: Limit s3 upload exports with path_id checks. 2021-12-07 11:16:52 -08:00
Steve Howell f6cbf931ae refactor: Pass attachments to export_uploads_from_local.
The next commit will use attachments in the s3 path.
2021-12-07 11:16:52 -08:00
Steve Howell 03f40a64d4 refactor: Pass valid_hashes to export_files_from_s3. 2021-12-07 11:16:52 -08:00
Steve Howell 15bc677f35 export: Pass users to export_avatars_from_local. 2021-12-07 11:16:52 -08:00
Steve Howell 42ecabe967 export: Add check_metadata flag. 2021-12-06 15:09:37 -08:00
Steve Howell 0166f13d83 s3 exports: Validate user metadata for all assets.
This preps us to download assets for just a single
user.
2021-12-06 15:09:37 -08:00
Steve Howell 946ab22bba refactor: Lift users query to caller.
This preps us to reuse this code for single users
(after a few more subsequent changes).
2021-12-06 15:09:37 -08:00
Steve Howell b0e5c1d3b9 export: Remove paranoid assertion.
There are tactical reasons to remove this assertion.

Basically, the reason it's safe to remove is that it's
been around a long time and we would have seen this
operationally. Also, the check to make sure that the
S3 filename thingy matches the avatar hash is a much
stronger check.

We will soon restore a stronger version of this check
that applies to all of our asset types (emojis/avatars/etc.).
2021-12-06 15:09:37 -08:00
Steve Howell 59951ae52b refactor: Move metadata checks for s3 export.
This technically broadens the check for user_profile_id,
but we write that metadata on every record.
2021-12-06 15:09:37 -08:00
Steve Howell a27a6a4548 refactor: Inline _check_key_metadata.
This was only called in one place.
2021-12-06 15:09:37 -08:00
Steve Howell db39948be5 refactor: Pass in flavor to export_files_from_s3. 2021-12-06 15:09:37 -08:00