zulip

Commit Graph

Author	SHA1	Message	Date
Mateusz Mandera	e360b5e29b	export: Remove unnecessary if in export with consent code. This might be a bit cleaner.	2022-09-27 11:56:27 -07:00
Mateusz Mandera	318d7fd4cd	export: Only export messages that a consenting user can access. As mentioned in the TODO this commit deletes, the export with member consent system was failing to account for the fact that if consenting users only have access to a subset of messages of a stream with protected history, only that subset should be exported - rather than all the stream's messages.	2022-09-27 11:56:27 -07:00
Anders Kaseorg	9198fe4fac	scim: Downgrade SCIMClient from a model to an ephemeral dataclass. SCIMClient is a type-unsafe workaround for django-scim2’s conflation of SCIM users with Django users. Given that a SCIMClient is not a UserProfile, it might as well not be a model at all, since it’s only used to satisfy django-scim2’s request.user.is_authenticated queries. This doesn’t solve the type safety issue with assigning a SCIMClient to request.user, nor the performance issue with running the SCIM middleware on non-SCIM requests. But it reduces the risk of potential consequences worse than crashing, since there’s no longer a request.user.id for Django to confuse with the ID of an actual UserProfile. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-09-26 11:36:48 -07:00
Sahil Batra	1e55e7641e	export: Do not export direct_members and direct_subgroups field. We do not need direct_members and direct_subgroups field of UserGroup objects in the export data since we already have UserGroupMembership and GroupGroupMembership object data. While importing we keep these fields empty when creating UserGroup objects and direct_members and direct_subgroups fields will get set when UserGroupMembership and GroupGroupMembership objects are created. This change will also help us in further changes when we will change the order of importing to import UserGroup objects just after Realm objects.	2022-09-13 11:07:09 -07:00
Zixuan James Li	2382f1925d	export: Add an isinstance check for orig_dt. Signed-off-by: Zixuan James Li <p359101898@gmail.com>	2022-08-12 17:08:04 -07:00
Mateusz Mandera	cf74d7d140	realm_reactivation: Prevent realm reactivation link reuse. This uses the approach analogical to EmailChangeStatus for email change confirmation links.	2022-07-26 17:14:26 -07:00
Anders Kaseorg	b35268e6bb	CVE-2022-31134: Exclude private attachments from realm exports. Zulip Server 2.1.0 and above have a UI tool, accessible only to server owners and server administrators, which provides a way to download a “public data” export. While this export tool is only accessible to administrators, in many configurations server administrators are not expected to have access to private messages and private streams. However, the “public data” export which administrators could generate contained the attachment contents for all attachments, even those from private messages and streams. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-07-12 06:08:05 +00:00
Zixuan James Li	6c7b2d621e	typing: Avoid redefinition of incompatible QuerySets. The pattern of using the same variable to apply filters or alter the `QuerySet` in other ways might produce `QuerySet`s with incompatible types. This behavior is not allowed by mypy. Signed-off-by: Zixuan James Li <p359101898@gmail.com>	2022-07-07 11:27:43 -07:00
Zixuan James Li	ab1bbdda65	typing: Broaden type annotations for QuerySet compatibility. To explain the rationale of this change, for example, there is `get_user_activity_summary` which accepts either a `Collection[UserActivity]`, where `QuerySet[T]` is not strictly `Sequence[T]` because its slicing behavior is different from the `Protocol`, making `Collection` necessary. Similarily, we should have `Iterable[T]` instead of `List[T]` so that `QuerySet[T]` will also be an acceptable subtype, or `Sequence[T]` when we also expect it to be indexed. Signed-off-by: Zixuan James Li <p359101898@gmail.com>	2022-07-07 11:27:42 -07:00
Anders Kaseorg	2439914a50	settings: Add two_factor.plugins.phonenumber to INSTALLED_APPS. I missed this in commit `feff1d0411` (#22383) for upgrading to django-two-factor-auth 1.14.0. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-07-06 17:23:53 -07:00
Zixuan James Li	fd9a0f4274	typing: Apply trivial none-checks with assertions as necessary. Signed-off-by: Zixuan James Li <p359101898@gmail.com>	2022-06-23 19:25:48 -07:00
Anders Kaseorg	a2825e5984	python: Use Python 3.8 typing.{Protocol,TypedDict}. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-04-27 12:57:49 -07:00
Zixuan James Li	9d448e73d2	decorator: Remove cachify in favor of lru_cache. `cachify` is essentially caching the return value of a function using only the non-keyword-only arguments as the key. The use case of the function in the backend can be sufficiently covered by `functools.lru_cache` as an unbound cache. There is no signficant difference apart from `cachify` overlooking keyword-only arguments, and `functools.lru_cache` being conveniently typed. Signed-off-by: Zixuan James Li <359101898@qq.com>	2022-04-14 12:44:35 -07:00
Mateusz Mandera	d800ac33a0	push_notifications: Send user_uuid to the push bouncer. Fixes #18017. In previous commits, the change to the bouncer API was introduced to support this and then a series of migrations added .uuid to UserProfiles. Now the code for self-hosted servers that makes requests to the bouncer is changed to make use of it.	2022-03-14 17:47:30 -07:00
Anders Kaseorg	b0ce4f1bce	docs: Fix many spelling mistakes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-02-07 18:51:06 -08:00
Anders Kaseorg	27977eddeb	export: Use tar -C to switch directories. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-12-17 08:01:53 -08:00
Tim Abbott	31842e1377	export: Fix empty realm_icons directory in single-user exports.	2021-12-10 12:05:34 -08:00
Steve Howell	9a39ca217f	user export: Show less info for recipients. For PM and huddles, show full names but no emails or other crufty fields.	2021-12-09 17:20:01 -08:00
Steve Howell	6a5c407b05	user export: Be more selective about exported messages.	2021-12-09 17:20:01 -08:00
Steve Howell	fa654fd7a0	user export: Ignore realm icon and logo. These are not considered to be "personal" info, even if you upload them, so we don't export them. Generally the only folks who upload these are admins, who can easily get them in other ways. In fact, anybody can get these via the app.	2021-12-09 17:20:01 -08:00
Steve Howell	8f991f8eb1	export: Make sure messages are sorted across files. We now ensure that all message ids are sorted BEFORE we split them into batches. We now do a few extra "slim" queries to get message ids up front. But, now, when we divide them into batches, we no longer run 2 or 3 different complicated queries in a loop. We just basically hydrate our message ids, so `write_message_partials` should be easy to reason about. This change also means that for tiny realms with < 1000 messages you will always have just one json file, since we aggregate the ids from the queries before batching.	2021-12-09 12:22:34 -08:00
Steve Howell	cef0e11816	export: Add get_id_list_gently_from_database. This is slightly overkill for the single-user use case, but for small queries it's barely any overhead, and it's a nice abstraction.	2021-12-09 12:22:34 -08:00
Steve Howell	8ea320812f	user exports: Chunkify messages in sorted order. This accomplishes a few things: * It extracts `chunkify` rather than having us clumsily track chunking-related stuff in a big loop that is doing other stuff. * It makes it so that all message ids in message-000001.json < message-000002.json. * It makes it easier for us to customize the messages we send to a single user (coming soon). BTW we probably have a slicker version of chunkify somewhere in our codebase, but I couldn't remember where.	2021-12-09 12:22:34 -08:00
Steve Howell	2a73964e16	user export: Add reactions. We may eventually try to attach these to the messages in the message-NNNNNN.json files, but for now they're fine in user.json.	2021-12-09 12:22:34 -08:00
Steve Howell	f810833df5	export: Improve export_usermessages_batch. We no longer jankily read our input file into an "output" variable. Instead, we do things in a type-safe way.	2021-12-09 08:36:40 -08:00
Steve Howell	5c1e8cb8dc	mypy: Add MessagePartial TypedDict.	2021-12-09 08:36:40 -08:00
Steve Howell	09c57a3f9f	export: Log more consistently and sort ids. Now all file writes go through our three helper functions, and we consistently write a single log message after the file gets written. I killed off write_message_exports, since all but one of its callers can call write_table_data, which automatically sorts data. In particular, our Message and UserMessage data will now be sorted by ids.	2021-12-09 08:36:40 -08:00
Steve Howell	6ec49951c6	minor: Avoid creating intermediate list for message_ids. This probably just postpones the list creation until Django builds the "IN" query, but semantically it's good to work in sets where we don't have any meaningful ordering of the list that gets used.	2021-12-08 16:12:54 -08:00
Steve Howell	f8ed099d3c	export: Sort table data for most tables. This affects most of our tables, but it excludes table(s) like Message that go through kind of unique codepaths.	2021-12-08 16:12:54 -08:00
Steve Howell	a1d3f12e53	refactor: Extract write_table_data(). The immediate benefit of this is stronger mypy checks (avoiding the ugly union caused by message files). The subsequent commit will add sorting. We have test coverage on all these lines insofar as if you comment out the lines, tests will explode (i.e. more than superficial line coverage).	2021-12-08 16:12:54 -08:00
Steve Howell	c76ca2d0df	export: Sort records.json files by path.	2021-12-08 16:12:54 -08:00
Steve Howell	2ef38e3d48	refactor: Extract write_records_json_file.	2021-12-08 16:12:54 -08:00
Steve Howell	b79cfc19ab	user export: Broaden query for RealmAuditLog. We now check acting_user as well as modified_user to see if a row pertains to our exported user.	2021-12-08 16:01:38 -08:00
Steve Howell	927b04368e	minor: Use virtual_parent for custom fetchers. The distinction here wasn't super meaningful due to the way we order our "elif" statements, but we want to reserver "normal_parent" for the majority of use cases, where you simply tell the Config what the "foreign_key" is.	2021-12-08 15:58:07 -08:00
Steve Howell	50120a9387	export: Remove config parameter for custom fetchers.	2021-12-08 15:58:07 -08:00
Steve Howell	54a3a423e5	mypy: Fix CustomFetch=Any hack.	2021-12-08 15:58:07 -08:00
Steve Howell	4128b52ac5	export: Rename custom fetchers.	2021-12-08 15:58:07 -08:00
Steve Howell	a2c4931316	exports: Use realm for RealmAuditLog in realm exports. For realm-wide exports, there is no reason to query inefficiently against a list of modified users. We move the Config out of the common child configs.	2021-12-08 15:58:07 -08:00
Steve Howell	8dd3c1038f	exports: Rename parent_key to include_rows. Even though Django usually treats foo__in and foo_id__in identically for filters where foo is a ForeignKey type, we want to insist on somewhat more consistent syntax, because we have the odd combo of type and type_id in Recipient, where type_id is kinda like a foreign key, but not a ForeignKey. So we assert for now that all our include_rows values end in "_id__in".	2021-12-08 15:58:07 -08:00
Steve Howell	02207f47d5	minor: Move code blocks to be alphabetical.	2021-12-08 15:58:07 -08:00
Steve Howell	aae9f1b6f5	export: Make Config errors more clear.	2021-12-08 15:58:07 -08:00
Steve Howell	6d09eab285	export: Export file images for single users. We don't have automated test coverage on this yet, but below are the results from manual testing. Note that we include the realm icon and logo even though they were not created by Cordelia. ./manage.py export_single_user cordelia@zulip.com $ (cd /tmp/zulip-export-4v3mo802/ && find .) . ./emoji ./emoji/2 ./emoji/2/emoji ./emoji/2/emoji/images ./emoji/2/emoji/images/3.jpg ./emoji/records.json ./messages-000001.json ./realm_icons ./realm_icons/2 ./realm_icons/2/night_logo.original ./realm_icons/2/night_logo.png ./realm_icons/2/icon.png ./realm_icons/2/icon.original ./realm_icons/records.json ./avatars ./avatars/2 ./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.original ./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.png ./avatars/records.json ./uploads ./uploads/2 ./uploads/2/68 ./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt ./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt/denver.jpg ./uploads/2/96 ./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim ./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim/denver.jpg ./uploads/records.json ./user.json	2021-12-07 11:16:52 -08:00
Steve Howell	b8d9143318	export: Validate emoji paths. (We lift the RealmEmoji query to be used by both local and S3 storage helpers.)	2021-12-07 11:16:52 -08:00
Steve Howell	ef6d9b10d2	refactor: Extract get_emoji_path.	2021-12-07 11:16:52 -08:00
Steve Howell	5a41904201	export: Add handle_system_bots flag. We will set this to False for single-user exports.	2021-12-07 11:16:52 -08:00
Steve Howell	0e19deb558	exports: Limit s3 upload exports with path_id checks.	2021-12-07 11:16:52 -08:00
Steve Howell	f6cbf931ae	refactor: Pass attachments to export_uploads_from_local. The next commit will use attachments in the s3 path.	2021-12-07 11:16:52 -08:00
Steve Howell	03f40a64d4	refactor: Pass valid_hashes to export_files_from_s3.	2021-12-07 11:16:52 -08:00
Steve Howell	15bc677f35	export: Pass users to export_avatars_from_local.	2021-12-07 11:16:52 -08:00
Steve Howell	42ecabe967	export: Add check_metadata flag.	2021-12-06 15:09:37 -08:00

1 2 3 4 5 ...

321 Commits