We add general code that will archive models that are tied to a specific
Message (such as Reactions and SubMessages). Certain details of the
model are grabbed from a list models_with_message_key, and then used to
create queries that will archive these database tables.
We put Reaction in that list in this commit, and add appropriate tests.
To have archiving of other analogical models (for example SubMessage),
one only needs to make an appropriate entry in the
models_with_message_key list.
Sometimes it's useful to run two copies of test-backend at the same
time. The problem with doing so is that we need to make sure no two
threads are using the same test database ID.
Previously, this worked only if at most one of those copies was
running in the single-threaded mode, because we used a random database
ID for the single-threaded code path, but the same IDs counting from 0
for the parallel code path.
Fix this, mostly, by generating a random start for the range of IDs
used by the process, and then counting off database IDs starting from
there (both in the parallel and non-paralllel modes).
There's still a very low probability race, see the TODO.
Additionally, there appear to be some other races with running two
copies of test-backend at the same time not related to the database.
See https://github.com/zulip/zulip/issues/12426 for a follow-up issue
that's sorta created by this.
The test-backend parallel test runner system doesn't actually use the
zulip_test database; instead, it creates its own databases off the
zulip_test_template database.
We were accidentally running `tools/generate_fixtures` even when there
are no changes, because this function is shared with the
tools/lib/test_server.py codebase, which needs us to do the work of
creating a test database for it off the zulip_test_template database.
Fixing this saves about 1.5s / 4s of the runtime of a single test.
Previously, if you exported a Zulip organization and then re-imported
it, we'd end up renumbering the user IDs and all direct foreign key
references to them in the database, but not the data-user-id
references in mentions. Fix this by parsing the message content and
doing that renumbering.
(Because we import raw markdown, not HTML, from third-party tools,
these changes won't affect data import from slack etc.)
Fixes the high-priority part of #11293.
Modifies the dict with the user info to include the key `bot_owner_id`
so it can be displayed in the user info popover.
Tests concerned with changing bot owner have been modified to have
number of events=2 because while updating the bot info, two events
are fired -- updating the `realm_bot` and `realm_user` since the
key `bot_owner_id` is a part of realm user info.
A unique path was created using the `LOCAL_UPLOADS_DIR` backend, similar
to the code used in `LocalUploadBackend`. The exported tarball was
copied to the directory, and an nginx url was created to serve the file
publicly.
Tweaked by tabbott to output an actual URL.
This cleans up the pattern for how we check which user is logged in
during Zulip's backend unit tests to be much more readable (replacing
the arcane session code that does this check).
We split archive_messages code into two functions: moving to archive and
cleanup. This allows cleaning up the tests - they can call
these functions directly instead of copying several lines of
archive_messages here and there in multiple tests.
This is probably a good idea for the production use case, since then
there's some consistency of behavior, and if we extend logging, one
knows exactly which realms were or were not executed before a logged
failure.
This fixes the nondeterministic test failures we've been seeing in CI:
if you use `-id` in that order_by, it happens consistently.
The upload option will no longer be limited to strictly S3 uploads. This
commit serves as a preliminary step for supporting LOCAL_UPLOADS_DIR as
part of the public only export feature.
log_and_report and its helper functions were mostly old code no longer
well adapted to how email mirror works currently, as well as having no
test coverage. We rewrite this part of the email to report errors in a
similar manner, and add tests for it. We're able to get rid of the
clunky and now useless debug_info dictionary in process message, as
log_and_report only needs the recipient email in its third argument.
The only place in which process_stream_message used debug_info was to
set the 'stream' key, which would only be used if ZulipEmailForwardError
was raised after this line in the code - which is impossible, because after
that line only send_zulip (which doesnt raise this exception) and
logger.info get called, then process_stream_message successfully returns
and then process_message succesfully returns as well. So this debug_info
code wasn't doing anything. We remove it.
Clients won't have access to user email addresses, and thus won't be
able to compute gravatars.
The tests for this are a bit messy, in large part because our tests
for get_events call subsections of it, rather than the main function.
This is a very old commit for #106, which has been on hiatus for a few
years. It was significantly modified by tabbott to:
* Improve coding style and variable names
* Update mypy annotations style
* Clean up the testing logic
* Update for API changes elsewhere in our system
But the actual runtime code is essentially unmodified from the
original work by Kirill.
It contains basic support for archiving Messages, UserMessages, and
Attachments with a nice test suite. It's still not usable in
production (e.g. it will probably break Reactions, SubMessages, etc.),
but upcoming commits will address that.
This is handy for code that needs to do something with the sent
message. We need it for a retention policy code path, but it seems
likely we'll use it a lot down the line.
This lets us handle directly in our tooling the user experience that
we document for exporting a realm with member consent (before, it
required unpleasant manual work).
We may be successfully able to get the page once, to get the content type, but
the server or network may go down and cause problems when fetching the page for
parsing its meta tags.
Currently, we only show previews for URLs which are HTML pages, which could
contain other media. We don't show previews for links to non-HTML pages, like
pdf documents or audio/video files. To verify that the URL posted is an HTML
page, we verify the content-type of the page, either using server headers or by
sniffing the content.
Closes#8358
`youtube.com/playlist?list=<list-id>` incorrectly matches the regex since the
change in 8afda1c1bb. The regex was modified to
match URLs of the form `youtu.be/<id>` and this playlist URL incorrectly matches
with the `<id>` set to `playlist`.
This commit avoids this match by verifying that the ID is not playlist.
This renames Subscription.in_home_view field to is_muted, for greater
clarity as to what it does just from seeing the setting name, without
having to look it up.
Also disabled an obsolete test_migrations test.
Fixes#10042.
Digest emails were disabled for soft deactivated users, since UserMessage
objects are created for such users lazily when they return.
We now compute the message list for gathering hot conversations by looking at
all the messages sent to the streams where the user is subscribed, while they
were subscribed.
Fixes#6297
If the text part of an email message didn't specify the charset in the
Content-Type header, the text content wouldn't be found. We fix this, by
assuming us-ascii charset in those cases, as specified by RFC6657:
https://tools.ietf.org/html/rfc6657
This commit migrates the Subscription's notification fields from a
BooleanField to a NullBooleanField where a value of None means to
inherit the value from user's profile.
Also includes a migrations to set the corresponding settings to None
if they match the user profile's values. This migration helps us in
getting rid of the weird "Apply to all" widget that we offered on
subscription settings page.
The mobile apps can't handle None appearing as the stream-level
notification settings, so for backwards-compatibility we arrange to
only send True/False to the mobile apps by applying those defaults
server-side. We introduce a notification_settings_null value within a
client_capabilities structure that newer versions of the mobile apps
can use to request the new model.
This mobile compatibility code is pretty effectively tested by the
existing test_events tests for the subscriptions subsystem.
Currently there's no way to tell the difference between "a server admin
deactivated a realm due to it being spammy" vs "a realm admin deactivated
the realm".
Due to my misreading the code and a sloppy search, I thought in
8218bf101c that
all_stream_subscription_logs didn't filter for streams.
While changing this, we'll switch to using `.modified_stream_id` for
potentially better performance.
This makes the implementation of `get_realm` consistent with its
declared return type of `Realm` rather than `Optional[Realm]`.
Fixes#12263.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This commit replaces the `create_stream_by_admins_only` setting with a
new `create_stream_policy` setting, which mirroring the structure of
the existing `invite_to_stream_policy`.
This is important preparation for migrating the waiting period feature
to be its own independent setting.
Fixes#12236.
Running the backend tests with a high number of processes can cause
unexpected errors with language changes. When certain tests that change
the default language, (without explicitly overriding the teardown method
to reset the default language), interleave with other tests that are
expecting the language to be in English, discrepancies arise.
This fixes a common nondeterministic test failure with high levels of
parallelization.
If a soft deactivated user had a subscription double-toggled without
any new messages being sent in between, add_missing_messages might
incorrectly process those two subscription changes in the wrong order.
Fortunately, the failure mode was usually to throw this exception:
django.db.utils.IntegrityError: duplicate key value violates unique
constraint "zerver_usermessage_user_profile_id_message_id_4936d0df_uniq"
DETAIL: Key (user_profile_id, message_id)=(4, 57) already exists.
Our unit tests actually had this precise setup some fraction of the
time, because a bit of the test setup code subscribed+unsubscribed the
target user without sending any messages in between, resulting in a
test failure something like 50% of the time.
The original exception was hard to reproduce reliably originally
(resulting in an extremely annoying nondetermnistic test failure), but
is easily reproducible by changing the "id" to "-id" in this change to
always mis-order the processing of those RealmAuditLog events.
Previously, our soft-deactivation logic incorrectly did not filter the
set of stream subscription changes to look at to only include the
target stream.
This could result in unspecified buggy behavior.
Break will do the same thing as continue here, as each iteration will
have the same result, and it's also worth explaining why this isn't
one layer up in the loop setup.
We should definitely be starting each test case with an empty copy of
the per-request caches, since their intended duration is even shorter
than a request.
This was masked by the fact that these caches are automatically
flushed when one makes an actual request to the Zulip API; so the
problems were only manifesting in tests like test_events, where we
call lower-level functions that access a per-request cache without
using the Zulip API.
The make_import_output_dir helper function used a path determined
primarily by the filename of the fixture being used, and expected to
have complete control over that path for the duration of the test.
This resulted in nondeterministic errors if our two test classes that
ran Mattermost import code ran at the same time.
This module is used to render the HTML of pages like our user documentation
into text for use in open graph previews of those articles. It provided somewhat
confusing output in the case that there were paragraph breaks in the original message,
because text with multiple paragraphs and list items does't read very well. This commit
adds `|` as a delimiter between paragraphs, and prefixes list items with a `*`.
Closes#12228
When an emoji is nested inside another inline tag - like em or strong -
it was getting double processed because of the way the inlinePattern
TreeProcessor runs (it runs recursively). With this fix, we set the
inner text of the emoji span as an AtomicString, preventing us from
double processing the emoji's text.
Fixes#11621
Test Plan:
* Add test case for **😄**, verify it passes.
* Go into local dev server and send "**😄**" to self and verify the DOM
does not have double <span> tags for the emoji.
* Run zerver.tests.test_push_notifications and verify the markdown test case matches
the text_content field properly
We create rate_limit_entity as a general rate-limiting function for
RateLimitedObjects, from code that was possible to abstract away from
rate_limit_user and that will be used for other kinds of rate limiting.
We make rate_limit_user use this new general framework from now.
This commit creates a new organization setting that determines whether
a user can invite other users to streams. Previously this was linked
to the waiting period threshold, but this was both not documented and
overly limiting.
With significant tweaks by tabbott to change the database model to not
involve two threshhold fields, edit the tests, etc.
This requires follow-up work to make the create stream policy setting
work how this code implies it should.
Fixes#12042.
Allow realms to specify the day of the week when the digest should be sent out.
When enqueue-ing digests, pick only the realms that chose the current weekday as
the day to send out digests.
The github-services model for how GitHub would send requests to this
legacy integration is no longer available since earlier in 2019.
Removing this integration also allows us to finally remove
authenticated_api_view, the legacy authentication model from 2013 that
had been used for this integration (and other features long since
upgraded).
A few functions that were used by the Beanstalk webhook are moved into
that webhook's implementation directly.
An endpoint was created in zerver/views. Basic rate-limiting was
implemented using RealmAuditLog. The idea here is to simply log each
export event as a realm_exported event. The number of events
occurring in the time delta is checked to ensure that the weekly
limit is not exceeded.
The event is published to the 'deferred_work' queue processor to
prevent the export process from being killed after 60s.
Upon completion of the export the realm admin(s) are notified.
This reverts commit fd9dd51d16 (#1815).
The issue described does not exist in Python 3, where urllib.parse now
_only_ accepts (Unicode) str and does the right thing with it. The
workaround was not being triggered and would have failed if it were.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This contains email of the user to whom notification is being
send. This has not been used in any past mobile releases, so it is
safe to remove it.
As user_id will be stable for the user, but not email. So it's better to
start consuming `user_id` instead of email on mobile.
Calls to `render_markdown_path` weren't getting cached since the context
argument is unhashable, and the `ignore_unhashable_lru_cache` decorator ignores
such calls. This commit adds a couple of more decorators - one which converts
dict arguments to the function to a dict items tuple, and another which converts
dict items tuple arguments back to dicts. These two decorators used along with
the `ignore_unhashable_lru_cache` decorator ensure that the calls to
`render_markdown_path` with the context dict argument are also cached.
The time to run zerver.tests.test_urls.PublicURLTest.test_public_urls drops by
about 50% from 8.4s to 4.1s with this commit. The time to run
zerver.tests.test_docs.DocPageTest.test_doc_endpoints drops by about 20% from
3.2s to 2.5s.
This fixes an issue where the hanging unordered list was not
rendering in blockquote; the problem was that we were not
adding an empty line(to satisfy the markdown) for hanging
unordered list if it is in blockquote. Both blockquote
and code block is fenced but we want to avoid rendering
the list if it's in the code block but not in blockquote.
Fixes: #11916.
This commit serves as the first step in supporting "public export" as a
webapp feature. The refactoring was done as a means to allow calling
the export logic from elsewhere in the codebase.
This is important because upcoming features will include slightly more
complex logic in post_process_state that we'd ideally like to be
included in what this suite tests.
This requires a few related changes:
* A small change to post_process_state to sort the realm_users objects
by user_id to ensure those data structures are stable.
* Improvements to the logic for checking if the initial state has
changed to use match_states for better output.
Extend the list of users that have to be notified when a message is
changed, so that in addition to users who have a UserMessage row, any
users who subscribed later to a stream with history public to
subscribers will also get the update.
Fixes: #8750.
This adds experimental support in /register for sending key
statistical data on the last 1000 private messages that the user is a
participant in. Because it's experimental, we require developers to
request it explicitly in production (we don't use these data yet in
the webapp, and it likely carries some perf cost).
We expect this to be extremely helpful in initializing the mobile app
user experience for showing recent private message conversations.
See the code comments, but this has been heavily optimized to be very
efficient and do all the filtering work at the database layer so that
we minimize network transit with the database.
Fixes#11944.
This is in response to a support ticket where the user had a closed left
sidebar, had added an organization, and then couldn't figure out how to
switch organizations. They had googled and found "The desktop app makes it
easy to switch between different organizations" in our help docs, which was
not sufficiently helpful.
Previously, we could 500 if an organization administrator scanned
possible PreregistrationUser IDs looking for a valid invitation they
can interact with.
They couldn't do anything, so no security issue, but this fixes that
case to just be a 400 error as it should be.
With the previous commit, fixes#1836.
As specified in the issue above, we make
get_email_gateway_message_string_from_address raise an exception if
it doesn't recognise the email gateway address pattern. Then, we make
appropriate adjustments in the codepaths which call this function.
These functions don't really belong in actions.py, so we move them out,
into email_mirror_helpers.py. They can't go directly into
email_mirror.py or we'd get circular imports resulting in ImportError.
The hope is that by having a shorter list of initial streams, it'll
avoid some potential confusion confusion about the value of topics.
At the very least, having 5 streams each with 1 topic was not a good
way to introduce Zulip.
This commit minimizes changes to the message content in
`send_initial_realm_messages` to keep the diff readable. Future commits will
reshape the content.
There were several problems with the old format:
* The sender was not necessarily the sender; it was the person who did
the deletion (which could be an organization administrator)
* It didn't include the ID of the sender, just the email address.
* It didn't include the recipient ID, instead having a semi-malformed
recipient_type_id under the weird name recipient_user_ids.
Since nothing was relying on the old behavior, we can just fix the
event structure.
Closes#2420
We add rate limiting (max X emails withing Y seconds per realm) to the
email mirror. By creating RateLimitedRealmMirror class, inheriting from
RateLimitedObject, and rate_limit_mirror_by_realm function, following a
mechanism used by rate_limit_user, we're able to have this
implementation mostly rely on the already existing, and proven over
time, rate_limiter.py code. The rules are configurable in settings.py in
RATE_LIMITING_MIRROR_REALM_RULES, analogically to RATE_LIMITING_RULES.
Rate limit verification happens in the MirrorWorker in
queue_processors.py. We don't rate limit missed message emails, as due
to using one time addresses, they're not a spam threat.
test_mirror_worker is adapted to the altered MirrorWorker code and a new
test - test_mirror_worker_rate_limiting is added in test_queue_worker.py
to provide coverage for these changes.
Fixes#9840.
Old addresses caused bugs in some cases with non-latin characters in
stream names (see issue number above). We switch to using django's
slugify helper function to convert stream names to full ascii, while
also getting rid of problematic non-alphanumeric characters, in a
reasonable way. See Django's documentation for slugify to see more about
how this function works.
Tests extended by tabbott to cover cases where we do end up with ascii.
To prepare for changing how the stream name gets encoded into mirror
email addresses while making sure old addresses keep working, we ignore
the stream_name part when receiving emails into the mirror and we only
look at the email_token to identify into which stream to mirror the
email.
Previously, these cache keys looked like:
:1:9c26164d3a393e316e0f8210efe270e08710d45astream_by_realm_and_name:...
Now, they look like this:
:1:9c26164d3a393e316e0f8210efe270e08710d45a:stream_by_realm_and_name:...
See the comment, but this is a significant performance optimization
for all of our pages using common_context, because this code path is
called more than a dozen times (recursively) by common_context.
Follow up on 92dc363. This modifies the ScheduledEmail model
and send_future_email to properly support multiple recipients.
Tweaked by tabbott to add some useful explanatory comments and fix
issues with the migration.
When soft deactivation is run for in "auto" mode (no emails are
specified and all users inactive for specified number of days are
deactivated), catch-up is also run in the "auto" mode if
AUTO_CATCH_UP_SOFT_DEACTIVATED_USERS is True.
Automatically catching up soft-deactivated users periodically would
ensure a good user experience for returning users, but on some servers
we may want to turn off this option to save on some disk space.
Fixes#8858, at least for the default configuration, by eliminating
the situation where there are a very large number of messages to recover.
A user who has been soft deactivated for a long time might have 10Ks of message
history that was "soft deactivated". It might take a minute or more to add
UserMessage rows for all of these messages, causing timeouts. So, we paginate
the creation of these UserMessage rows.
This field is primarily intended to support avoiding displaying the
"more topics" feature in new organizations and streams, where we might
know that all messages in the stream are already available in the
browser.
Based on original work by Roman Godov, and significantly modified by
tabbott.
The second migration involved here could be expensive on Zulip Cloud,
but is unlikely to be an issue on other servers.
Apparently, our new validator for stream color having a valid format
incorrectly handled colors that had duplicate characters in them.
(This is caused in part by the spectrum.js logic automatically
converting #ffff00 to #ff0, which our validator rejected). Given that
we had old stream colors in the #ff0 format in our database anyway for
legacy, there's no benefit to banning these colors.
In the future, we could imagine standardizing the format, but doing so
will require also changing the frontend to submit colors only in the
6-character format.
Fixes an issue reported in
https://github.com/zulip/zulip/issues/11845#issuecomment-471417073
Addresses point 2 of #10612. We use a regex to detect if a form
of FWD indicator is present at the beginning of the subject, which
means the message has been forwarded.
remove_quotations argument is added to a couple of functions where
it's necessary.
In filter_footer, the criteria for a line to be a possible beginning
of a footer is changed to line.strip() == "--", instead of
line.strip().startswith("--"), because the former would remove
quotations from plaintext emails. This change makes sense, because
RFC 3676 specifies ""-- " as the separator line between the body
and the signature of a message":
https://tools.ietf.org/html/rfc3676
We remove the 'subject' argument of process_stream_message and make
subject processing happen inside the function, as it's a more
appropriate place than the general process_message function and is
needed to have a good way of disabling removing quotations in forwarded
emails sent into the mirror.
Some urls which end with image file extensions (eg .jpg) may link to
html pages. This adds handling for linx.li, wikipedia.org and
pasteboard.co. If it is possible, we redirect to the actual image url
otherwise we do not attempt to render it as an image.
Fixes#10438.
When a Zephyr user deactivates their account, they should be
automatically turned into a mirror dummy user (so that other users can
continue to interact with them as normal for a Zephyr user who isn't
using Zulip).
Fixes part 3 of #10612. When sending an email to the email mirror to a
stream address, if "+show-sender" is added in the address, the stream
message will now include "From: <sender>" at the top.
This reverts commit ff90c0101c but keeps
the test cases added for reference.
This was reverted because it was both not a clean solution and created
other realm filters bugs involving dashes (etc.).
In commit de65a04 we can see that if the need ever arises to modify
how stream descriptions are rendered, we would need to make changes
at 5 different call points which can be quite cumbersome. So this
functionality has been extracted to a new method called
'render_stream_descriptions'.
This fixes an issue where invalid emoji name prevents following
emojis from rendering.
This reverts the code change in
8842349629, while still passing the
tests added in that commit (it seems the original commit had
misdiagnosed an ordering bug and thus introduced this issue).
Fixes: #11770.
The night logo synchronization on the settings page was perfect, but
the actual display logic had a few problems:
* We were including the realm_logo in context_processors, even though
it is only used in home.py.
* We used different variable names for the templating in navbar.html
than anywhere else the codebase.
* The behavior that the night logo would default to the day logo if
only one was uploaded was not correctly implemented for the navbar
position, either in the synchronization for updates code or the
logic in the navbar.html templates.
This commit leverages the ahocorasick algorithm to build a set of user_ids
that have their alert_words present in the message. It runs in linear time
of the order of length of the input message as opposed to number of
alert_words. This is after building a ahocorasick Automaton which runs
in O(number of alert_words in entire realm) which is usually cached.
This fixes an issue where blank lines between blocks were causing
auto-numbering of list to stop before the blank line resulting
in two separate numbered list instead of one.
Edited significantly by tabbott to explain the tricky details in the
comments.
Fixes: #11651.
Add `max_int_size` parameter to `to_non_negative_int()` in
decorator.py so it will be able to validate that the integer doesn't
exceed the integer maximum limit.
Fixes#11451
This is important for situations such as with our Zapier app,
where the requesting user may be a bot that would like to access
its owner's subscriptions.
Tweaked by tabbott to eliminate the 2^N growth of cases in
do_get_streams.
We want to use the baseline features of bugdown, but not fancy things
like inline URL previews, since the whole structure of stream
descriptions is to have a single-line thing supporting some
formatting.
The migration part of this change fixes a bug encountered by some
organizations upgrading from older versions of Zulip.
This allows us to have some features using bugdown rendering where
inline image previews will not be rendered (which would be problematic
for e.g. stream descriptions).
Guest users will just get an empty list of default streams; we also
hide the "Default streams" organization view from the guest users UI.
This is for consistency with not providing guest users the full list
of streams in an organization.
Fixing this involves fixing the backend to handle unchanged field
submissions of the Zoom credentials without trying to re-validate the
credentials (for performance) as well as to fetch the already-sent
secret.