For realm-wide exports, there is no reason to query
inefficiently against a list of modified users.
We move the Config out of the common child configs.
Even though Django usually treats foo__in
and foo_id__in identically for filters where
foo is a ForeignKey type, we want to insist
on somewhat more consistent syntax, because
we have the odd combo of type and type_id
in Recipient, where type_id is kinda like a
foreign key, but not a ForeignKey.
So we assert for now that all our include_rows
values end in "_id__in".
We don't have automated test coverage on this yet,
but below are the results from manual testing.
Note that we include the realm icon and logo even
though they were not created by Cordelia.
./manage.py export_single_user cordelia@zulip.com
$ (cd /tmp/zulip-export-4v3mo802/ && find .)
.
./emoji
./emoji/2
./emoji/2/emoji
./emoji/2/emoji/images
./emoji/2/emoji/images/3.jpg
./emoji/records.json
./messages-000001.json
./realm_icons
./realm_icons/2
./realm_icons/2/night_logo.original
./realm_icons/2/night_logo.png
./realm_icons/2/icon.png
./realm_icons/2/icon.original
./realm_icons/records.json
./avatars
./avatars/2
./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.original
./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.png
./avatars/records.json
./uploads
./uploads/2
./uploads/2/68
./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt
./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt/denver.jpg
./uploads/2/96
./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim
./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim/denver.jpg
./uploads/records.json
./user.json
There are tactical reasons to remove this assertion.
Basically, the reason it's safe to remove is that it's
been around a long time and we would have seen this
operationally. Also, the check to make sure that the
S3 filename thingy matches the avatar hash is a much
stronger check.
We will soon restore a stronger version of this check
that applies to all of our asset types (emojis/avatars/etc.).
This makes it easier to read the calling code and see
the big picture of how the four asset types are
organized.
I also handle uploads first, to be similar to the local
code.
This code is well tested--you can modify any of the callers
to pass in a wrong value of `object_key` and get a failing
test.
We add the following tables to the user export:
AlertWord
CustomProfileFieldValue
RealmAuditLog
Service
UserActivity
UserActivityInterval
UserCount
UserGroup
UserHotspot
UserPresence
UserTopic
Except for UserCount, we achieve this by sharing
code with the realm export via
add_user_profile_child_configs.
UserCount is handled slightly differently than realm
exports due to which key we trigger off.
It's possible that RealmAuditLog is incomplete for
single users, since we may also want rows where they
are the acting_user. This commit finds rows where
they are the modified_user. For non-admins I believe
it's rarely the case that they are the actor, and
they will tend to be the modified user if the two
fields are different at all. For admins it's
arguable we want to see both changes they enacted
as well as changes that affected them.
For export realm following changes have been made:
- `./manage.py export --upload` would delete `.tar.gz` and unpacked dir
- `./manage.py export` would only delete `unpacked dir`
Besides, we have removed `--delete-after-upload` as we have set it as
the default.
Fixes#20081
This is a follow-up to #19388.
We will in the future allow patch requests to change the visibility
of an existing topic, so `last_updated` is better name for this field.
This commit does not affect the API or events in any way, but only the
database.
Because we create all realms with do_create_user (including in the
test suite), we just need to change that function, add a migration for
existing realms, and ensure the data import code path correctly
creates these objects.
Note that the import code path will create a RealmUserDefault row with
default values if it is not present in the import data, which is
important for importing data from other tools like Slack.
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This will be used to store the missedmessage events received
during the waiting time for email notifications (which is currently
2 minutes, hardcoded).
The change in `test_retention` is because we've set `on_delete=CASCADE`
for the message field this table.
The new query is like so:
```
DELETE FROM "zerver_missedmessageemailentry"
WHERE "zerver_missedmessageemailentry"."message_id" IN (
1545, 1546, 1547, 1548, 1549, 1550, 1551, 1552, 1553
)
```
model__id syntax implies needing a JOIN on the model table to fetch the
id. That's usually redundant, because the first table in the query
simply has a 'model_id' column, so the id can be fetched directly.
Django is actually smart enough to not do those redundant joins, but we
should still avoid this misguided syntax.
The exceptions are ManytoMany fields and queries doing a backward
relationship lookup. If "streams" is a many-to-many relationship, then
streams_id is invalid - streams__id syntax is needed. If "y" is a
foreign fields from X to Y:
class X:
y = models.ForeignKey(Y)
then object x of class X has the field x.y_id, but y of class Y doesn't
have y.x_id. Thus Y queries need to be done like
Y.objects.filter(x__id__in=some_list)
Tweaked exports.py to add the config object there so that our export
tool can include the table when exporting. Also includes all the
changes required to import the new table from the exported data.
Helper function `get_realm_playgrounds` added to fetch all
playgrounds in a realm.
Tests amended.
Adds backend code for the mute users feature.
This is just infrastructure work (database
interactions, helpers, tests, events, API docs
etc) and does not involve any behavioral/semantic
aspects of muted users.
Adds POST and DELETE endpoints, to keep the
URL scheme mostly consistent in terms of `users/me`.
TODOs:
1. Add tests for exporting `zulip_muteduser` database table.
2. Add dedicated methods to python-zulip-api to be used
in place of the current `client.call_endpoint` implementation.
While exporting analytics data we were using wrong table name
'zerver_analytics' in analytics config. Renamed it with
correct table name 'zerver_realm'.
datetime objects are not ordinarily JSON serializable. While both
ujson and orjson have special cases to serialize datetime objects,
they do it in different ways. So we want to fix the post-processing
code to do its job.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The S3 data export tool's upload code path uses this nice boto
callback feature for showing a progress bar, which is nice for the
management command. It's spammy/broken in production and the backend
tests, so we change percent_callback to be a parameter passed in so
that it can only be used in the contexts where it makes sense.
Also add a Draft object-to-dictionary conversion method.
The following commits will provide an API around this
model using which our clients can sync drafts across each
other (if they so wish too). As of making this commit, we
haven't finalized exactly how our clients will use this.
See https://chat.zulip.org/#narrow/stream/2-general/topic/drafts
For some of the discussion around this model and in general,
around this feature.
Signed-off-by: Hemanth V. Alluri <hdrive1999@gmail.com>
We also remove the post_process_data option.
The sanity check is just overkill at this point,
since the mechanism to find streams is very
direct due to a recent commit.
Before this change we would only export streams
that had actual subscribers, which is usually
harmless, but it was mostly a relic of a one
time migration that we did when we were cleaning
up some dirty data in some of our very early
databases (circa 2016).
Now we work down the table hierarchy in a
more natural way:
- get Streams in Realm
- get Recipients matching above Streams
- get Subscriptions matching above Recipients
Note that for per-user exports, I kept the
same logic (users -> subscriptions -> recipients ->
streams) we had before.
One subtle detail here is that we make our
final Config blocks--which build the final
version of Recipient/Subscription--now hang
off of realm_config.
Fixes#15146.
Generated by pyupgrade --py36-plus --keep-percent-format.
Now including %d, %i, %u, and multi-line strings.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Automatically generated by the following script, based on the output
of lint with flake8-comma:
import re
import sys
last_filename = None
last_row = None
lines = []
for msg in sys.stdin:
m = re.match(
r"\x1b\[35mflake8 \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
)
if m:
filename, row_str, col_str, err = m.groups()
row, col = int(row_str), int(col_str)
if filename == last_filename:
assert last_row != row
else:
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
with open(filename) as f:
lines = f.readlines()
last_filename = filename
last_row = row
line = lines[row - 1]
if err in ["C812", "C815"]:
lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
elif err in ["C819"]:
assert line[col - 2] == ","
lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")
if last_filename is not None:
with open(last_filename, "w") as f:
f.writelines(lines)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This commit adds three `.pysa` model files: `false_positives.pysa`
for ruling out false positive flows with `Sanitize` annotations,
`req_lib.pysa` for educating pysa about Zulip's `REQ()` pattern for
extracting user input, and `redirects.pysa` for capturing the risk
of open redirects within Zulip code. Additionally, this commit
introduces `mark_sanitized`, an identity function which can be used
to selectively clear taint in cases where `Sanitize` models will not
work. This commit also puts `mark_sanitized` to work removing known
false postive flows.
Prior to this change, there were reports of 500s in
production due to `export.extra_data` being a
Nonetype. This was reproducible using the s3
backend in development when a row was created in
the `RealmAuditLog` table, but the export failed in
the `DeferredWorker`. This left an entry lying
about that was never updated with an `extra_data`
field.
To fix this, we catch any exceptions in the
`DeferredWorker`, and then update `extra_data` to
encode the failure. We also fix the fact that we
never updated the export UI table with pending exports.
These changes also negated the use for the somewhat
hacky `clear_success_banner` logic.
Previously, alert words were a JSON list of strings stored in a
TextField on user_profile. That hacky model reflected the fact that
they were an early prototype feature.
This commit migrates from that to a separate table, 'AlertWord'. The
new AlertWord has user_profile, word, id and realm(denormalization so
we can provide a nice index for fetching all the alert words in a
realm).
This transition requires moving the logic for flushing the Alert Words
caches to their own independent feature.
Note that this commit should not be cherry-picked without the
following commit, which fixes case-sensitivity issues with Alert Words.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This improves the approach of creating multiple parallel processes by
using subprocess.Popen() instead of run_parallel() and
subprocess.call() while exporting an organization's message
history. This prevents forking twice for individual subprocess.
While this has some performance benefit, the main reason to fix this
is that it fixes an issue with the data export web UI introduced in
run_parallel forks exited).
Fixes#12904.
Preparatory commit for making the email mirror use the database instead
of redis for missed message addresses.
This model will represent missed message email addresses, which
currently have their data stored in redis.
The redis data will be converted and migrated into these models and
the email mirror will start using them in the main commit.
Fixes#1727.
With the server down, apply migrations 0245 and 0246. 0246 will remove
the pub_date column, so it's essential that the previous migrations
ran correctly to copy data before running this.
Apparently, the Zulip notifications (and resulting emails) were
correct, but the download links inside the Zulip UI were incorrectly
not including S3 prefix on the URL, making them not work.
While we're at this, we rewrite the somewhat convoluted previous
system for formatting the data export output.
This feature is intended to cover all of our ways of exporting a
realm, not just the initial "public export" feature, so we should name
things appropriately for that goal.
Additionally, we don't want to include data exports in page_params;
the original implementation was actually buggy and would have.
The conditional block containing the tarball upload logic for both S3
and local uploads was deconstructed and moved to the more appropriate
location within `zerver/lib/upload.py`.
This change is preliminary refactoring in order to improve the test
mocking strategy related to `test_realm_export.py`.
What this allows is the ability to simply mock a return value from
`do_export_realm`. We can then use that value as a dummy url to
ensure a file has been served and can be retrieved.
We add a new model, ArchiveTransaction, to tie archived objects together
in a coherent way, according to the batches in which they are archived.
This enables making a better system for restoring from archive, and it
seems just more sensible to tie the archived objects in this way, rather
the somewhat vague setting of archive_timestamp to each object using
timezone_now().