This was a pretty nasty error, where we were accidentally accessing
the parent list in this inner loop function.
This appears to have been introduced as a refactoring bug in
7822ef38c2.
The various vars here that had recipient_subject
in the name now have either bucket or bucket_tup
there.
The shorter names are a bit easier to read, and the
original names were misleading for the PM case.
This was basically two search/replaces, and we have
good test coverage here, so it's pretty low risk
despite the messy diff.
We now attach zulip_db_data to the markdown engines
for classes that need it. This was the last remaining
global we had, so we remove `arguments.py` here.
The Markdown processor makes it fairly simple for
the helper classes to access the `md` engine. We
now write `_md_engine.zulip_message` to avoid having
the current message in the global namespace.
Note that we do reuse engines for multiple messages,
but each engine is specific to a realm. And we therefore
avoid even the theoretical possibility of leaking message
data between realms.
This makes us consistent with how we import codehilite.
Using Python's normal import mechanism avoids some overhead
with Markdown having to parse dotted notation.
These modules are tiny, so they shouldn't impact startup
too much. Also, by explicitly importing them, we avoid
the pitfall of having a sucessful startup and a broken
renderer.
We were building the same link regex every time
we build a Markdown engine, which happens twice
per realm. It's an expensive operation due to
the complexity of the regex and us reading a file.
Nested classes are kind of expensive in Python,
particularly when you throw in mypy annotations.
Also, flatter is arguably better, although it is
kind of a pain here not to have closures.
This change avoids hitting the Django ORM when
we don't find any possible group mentions in
the message content.
Django doesn't necessarily actually hit the database,
but it's still slow and shows up in profiles.
This commit speeds up the import by avoiding
sender lookups and instead using the data
for users that we already have in memory.
This avoids a few DB hops, many hops to memcached,
plus some object construction.
We now call do_render_markdown() directly. This
also makes it more explicit that the import has
never rendered alert words.
For the import-data codepath, we will call
the extracted function directly in a
subsequent commit.
The do_render_markdown() function has more
required parameters, which allows for more
explicit code and also allows us to flatten
out some logic related to alert words. (We
just pass in empty sets/dicts as needed).
We can rely on `message_realm` being the same
as `message.sender.realm`, which allows us to
skip two queries to the database for the rare
Zephyr mirroring case.
This function requires a message object, whereas
we want to work with JSON data to avoid necessary
queries when we import data. Inlining the function
sets us up for a subsequent refactoring.
We change the way we deal with theoretical return
values of `None` to use an assertion; otherwise,
we would have to loosen up a bunch of mypy types
from `str` to `Optional[str]`. It's not clear `None`
is even possible--we've moved toward throwing exceptions
there instead of silently failing.
This is somewhat hairy logic, so it's nice
to extract it and not worry about variable leaks.
Also, this moves some legacy "subject" references out
of actions.py.
We start by including functions that do custom
queries for topic history.
The goal of this library is partly to quarantine
the legacy "subject" column on Message.
A recent change to check_send_webhook_message allows webhooks to
unescape stream names before sending a message. This commit adds
a test for the edge case where the webhook URL is escaped twice by
a third-party.
Recently, one of our users reported that a JIRA webhook was not
able to send messages to a stream with a space character in its
name. Turns out that JIRA does something weird with webhook URLs,
such that escaped space characters (%20) are escaped again, so
that when the request gets to Zulip, the double escaped %20 is
evaluated as the literal characters `%20`, and not as a space.
We fix this by unescaping the stream name on our end before
sending the message forward!
The previous logic was incorrect, in that if `content_type` was set to
None (which happens with Slack/HipChat export, among other things),
then we wouldn't run the `guess_type` logic to auto-detect the
Content-Type to send to S3.
The last_modified field is intended to support setting the
orig-last-modified field in the S3 backend when importing, basically
to keep track of this bit of pre-export data for debugging. In the
event that it isn't available, the correct thing to do is not write
out an invalid `last_modified` field; we should just not write it out
at all.
This fixes the fact that these emoji were sometimes not displaying
properly (because of changes in the emoji names used in the codebase),
while also making this integration more standard (since it was the
only one with such an aggressive use of emoji).
The UserMessage table can be huge, so creating a
bunch of entries in `ID_MAP` can overflow memory.
We don't have any tables that depend on `UserMessage`,
and we don't send the 'id' fields from `zerver_usermessage`
to the database, so re-mapping them was just busy-work.
This is a preparator refactor for supporting hosting different Tornado
processes on different servers; to look up which Tornado server we
should be sending the event to, we'll need the realm object.
This should make it possible for there to safely be multiple Tornado
processes running on different ports on the same system.
It may also fix a rare race bug in development, where previously, it
was possible for the Tornados processes for Casper and the main
development server to interfere; I haven't investigated whether this
was a real bug or not, but now those two services will use independent
Tornado files.
We still need to add something to direct traffic between the different
Tornado processes.
This reverts commit 3645bb9225.
This change was incorrect, because the `is_zephyr_mirror_realm`
property on Realm is a property and thus isn't available in the
migration codebase.
Since this migration is only run for very old servers, this should
have no impact.
Apparently, the QUERY_STRING property of the report object wasn't
actually a string; since we only care about its string representation,
we should just stringify it.
Apparently, we weren't resetting the query counters inside the
websockets codebase, resulting in broken log results like this:
SOCKET 403 2ms (db: 1ms/2q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 5ms (db: 2ms/3q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 3ms/4q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 3ms/5q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 4ms/6q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 5ms/7q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 2ms (db: 5ms/8q) /socket/auth [transport=websocket] (unknown via ?)
SOCKET 403 3ms (db: 6ms/9q) /socket/auth [transport=websocket] (unknown via ?)
The correct fix for this is to call reset_queries at the start of each
endpoint within the websockets system. As it turns out, we're already
calling record_request_start_data there, and in fact should be calling
`reset_queries` in all code paths that use that function (the other
code paths, in zerver/middleware.py, do it manually with
connection.connection.queries = []).
So we can clean up the code in a way that reduces risk for similar
future issues and fix this logging bug with this simple refactor.
Fixes#10745.
Use get_display_recipient to get stream names, and remove the
references to message.stream_name in push_notifications.py which were
added in 97571a203, as the actual stream names were being retrived
only for Message objects associated with public streams.
This is mostly an extraction, but it does change the
way we calculate `content`. We append the markdown
links from ALL files to any content that came in the
message itself.
Separating this out also allows us to add more
test coverage for the extracted code.
We now use subscriber_map for building UserMessage
rows in Slack/Gitter conversions.
This is mostly designed to simplify the code, rather
than having to scan the entire subscribers for each
message.
I am guessing this will improve performance for most
conversions. We sort small lists on every message,
in order to be deterministic, but the sorting cost
is probably more than offset by avoiding the O(N)
scans across all subscriptions. Also, it's probably
negligible in the grand scheme of things, compared
to JSON parsing, file I/O, etc.
This commits also fixes some typos with mentioned_users_id ->
mentioned_user_ids and cleans up a test a bit as well.
We now have all three third party
conversions (Gitter/Slack/Hipchat)
go through build_user_message().
Hipchat was already using this helper.
We also avoid callers having to pass in
an id to build_user_message().
When you send a message to a bot that wants
to talk via an outgoing webhook, and there's
an error (e.g. server is down), we send a
message to the bot's owner that links to the
message that triggered the error.
The code to produce those links was out of
date.
Now we move the important code to the
`url_encoding.py` library and fix the PM
links to use the more modern style (user_ids
instead of emails). We also replace "subject"
with "topic" in the stream urls.
This supports guest user in the user-info-form-modal as well as in the
role section of the admin-user-table.
With some fixes by Tim Abbott and Shubham Dhama.
The purpose of this commit is to pass information
to the frontend whether the message response recieved
has been limited due to plan restrictions or not.
To implement this, the backend for limiting the message
history had to be rewritten as we used to fetch
only the message rows whose id was greater than
first_visible_message_id. The filtered rows gives us
no information on whether the message history was
limited or not. So the backend was rewritten to not
do any restriction of limiting the message rows while
making the query. The limiting of rows is now done in
post_process_limited_query which will also return back
the value of history_limited flag.
Tweaked by tabbott to note a few cases where the results are
incorrect. I'm merging this despite those, because those cases don't
impact the correctness of the feature, and it may have tricky
performance implications to fix correctly.
Apparently, we weren't actually checking that found_oldest had the
correct value; fortunately, this didn't actually result in a problem,
because the values were always correct. But this will be important as
we start extending this test.
This is a preparatory commit which will help us with removing camo.
In the upcoming commits we introduce a new endpoint which is based
out on the setting CAMO_URI. Since camo could have been hosted on
a different server as well from the main Zulip server, this change
will help us realise in tests how that scenerio might be dealt with.
This will help us eliminate camo from our production installs.
Basically it helps us de duplicate some code from upcoming code
which will help us check validity of a camo url.
Also, rename get_alert_from_message to get_gcm_alert.
With the implementation of the and get_apns_alert_title and
get_apns_alert_subtitle, the logic within get_alert_from_message
is only relevant to the GCM payload, so we adjust the name
accordingly.
Progresses #9949.
Resolves https://github.com/zulip/zulip-mobile/issues/1316.
The string that is returned from get_alert_from_message is
dependent upon the same message that is passed into get_apns_payload
and get_gcm_payload. The contents of those payloads that are tested via
TestGetAPNsPayload and TestGetGCMPayload, which makes the tests for
get_alert_from_message redundant.
Also, simplify the logic by removing the last elif conditional.
If we use an outgoing webhook and the web server
responds with `widget_content` in the payload, we
include that in what we send through the send-message
codepath.
This makes outgoing webhook bots more consistent with
generic bots.
The test named `test_archiving_messages_with_attachment`
started flaking recently. We use sets for comparison
instead of lists to avoid arbitrary sorting differences.
Masking content can be useful for testing
out conversions where you're dealing
with data from customers and want to avoid
inadvertently reading their content (while
still having semi-realistic messages).
Having two smaller functions should make it
easier to customize the behavior for each specific
use case. The only reason they were ever coupled
was to keep ids in sequence, but the recent NEXT_ID
changes make that a non-issue now.
We now instantiate NEXT_ID in sequencer.py, which avoids
having multiple modules make multiple copies of a sequencer
and possibly causing id collisions.
Bots are not allowed to use the same name as
other users in the realm (either bot or human).
This is kind of a big commit, but I wanted to
combine the post/patch (aka add/edit) checks
into one commit, since it's a change in policy
that affects both codepaths.
A lot of the noise is in tests. We had good
coverage on the previous code, including some places
like event testing where we were expediently
not bothering to use different names for
different bots in some longer tests. And then
of course I test some new scenarios that are relevant
with the new policy.
There are two new functions:
check_bot_name_available:
very simple Django query
check_change_bot_full_name:
this diverges from the 3-line
check_change_full_name, where the latter
is still used for the "humans" use case
And then we just call those in appropriate places.
Note that there is still a loophole here
where you can get two bots with the same
name if you reactivate a bot named Fred
that was inactive when the second bot named
Fred was created. Also, we don't attempt
to fix historical data. So this commit
shouldn't be considered any kind of lockdown,
it's just meant to help people from
inadvertently creating two bots of the same
name where they don't intend to. For more
context, we are continuing to allow two
human users in the same realm to have the
same full name, and our code should generally
be tolerant of that possibility. (A good
example is our new mention syntax, which disambiguates
same-named people using ids.)
It's also worth noting that our web app client
doesn't try to scrub full_name from its payload in
situations where the user has actually only modified other
fields in the "Edit bot" UI. Starting here
we just handle this on the server, since it's
easy to fix there, and even if we fixed it in the web
app, there's no guarantee that other clients won't be
just as brute force. It wasn't exactly broken before,
but we'd needlessly write rows to audit tables.
Fixes#10509
This bug was introduced very recently and is an
aliasing bug. It caused extra UserMessage rows to
be created as we inadvertently updated the underlying
subscriber_map sets for multiple messages.
This probably mostly affected PMs.
It's doubtful the bug ever got out into the field.
Previously, MissedMessageWorker used a batching strategy of just
grabbing all the events from the last 2 minutes, and then sending them
off as emails. This suffered from the problem that you had a random
time, between 0s and 120s, to edit your message before it would be
sent out via an email.
Additionally, this made the queue had to monitor, because it was
expected to pile up large numbers of events, even if everything was
fine.
We fix this by batching together the events using a timer; the queue
processor itself just tracks the items, and then a timer-handler
process takes care of ensuring that the emails get sent at least 120s
(and at most 130s) after the first triggering message was sent in Zulip.
This introduces a new unpleasant bug, namely that when we restart a
Zulip server, we can now lose some missed_message email events;
further work is required on this point.
Fixes#6839.
These test are for the handling of HipChat
sender info. The data formats are somewhat
inconsistent and sometimes require us to
generate "mirror" users, so this is potentially
fragile code if we don't cover it well.
We extract this function and put it in the shared
library `import_util.py`.
Also, we make it one time higher up in the call
stack, rather than re-building it for every batch
of messages. I doubt this was super expensive, but
there's no reason to repeatedly execute this.
Before this fix, we were creating two copies of every
PM Message in zerver_message with only corresponding
UserMessage row.
Now we only create one PM Message per message, which
we accomplish by making sure we only use imported
messages from the sender's history.json file. And
then we write UserMessage rows for both participants
by making sure to include sender_id in the set of
user_ids that feeds into making UserMessage. For
the case where you PM yourself, there's just one
UserMessage row.
It does not appear that we need to support huddles
yet.
When we create new ids for message rows, we
now sort the new ids by their corresponding
pub_date values in the rows.
This takes a sizable chunk of memory.
This feature only gets turned on if you
set sort_by_date to True in realm.json.
We could migrate all the current PREMIUM_FREE organizations to have more
invites, but this setting mainly affects orgs right as they are starting, so
it's probably fine.
We recently received a bug report that implied that for certain
payloads, the `requested_reviewers` key was empty whereas a
singular `requested_reviewer` key containing one reviewer's
information was present in its stead. Naturally, this raised
some not so pretty IndexError exceptions.
After some investigation and generating a few similar payloads,
I discovered that in every case both the `requested_reviewers`
and the `requested_reviewer` keys were correctly populated, so I
had to manually edit the payload to reproduce the error on my end.
My guess is that this anomaly goes back to when GitHub's reviewer
request feature was new and didn't support requesting multiple
reviewers, and that the singular `requested_reviewer` key could
possibly just be there for backwards compatibility or might just
be mere oversight. Either way, the solution here is to look for the
plural `requested_reviewers` key, and if that is empty, fall back
to the singular `requested_reviewer` key.
We seemed to have been doing too much of sharpening on the thumbnails.
The purpose of sharpening here was to just counter the softening
effects of a resize on an image but overdoing it is bad.
Value sharpen(0.5,0.2,true) seems to look good for achieving the
best results here on different displays as revealed in the manual
hit and trial based testing.
Thanks to @borisyankov for pointing out the issue and suggesting
the values.
For some webhook endpoints where the third-party API requires us to do
this, the user's API key might appear in error emails through
appearing in the `QUERY_STRING` parameter. Fix that by filtering any
actual content from those; what we usually need for debugging is just
what set of parameters were provided.
Currently, if there is only one admin in realm and admin tries
to updates any non-adminuser's full name it throws error,
"Cannot remove only realm admin". Because in `/json/users/<user_id>`
api check_if_last_admin_is_changed is checked even if property
is_admin is not changed.
This commit fix this issue and add tests for it.
These lazy imports save a significant amount of time on Zulip's core
import process, because mock imports pbr, which in turn import
pkgresources, which is in turn incredibly slow to import.
Fixes part of #9953.
The APNS client libraries (especially the hyper.http20 one) were
determined via profiling to take significant time during the import
process, so we move them to be lazily imported in order to optimize
the overall Zulip import process. This save up to about 100ms in
import time.
These libraries are only used in certain Django processes inside
zulipchat.com, and so are unnecessary both in development as well as
for self-hosted Zulip servers.
This is a prepartory commit for the upcoming changes. It was meaningful
to extract this one out because this function is essentially a condition
check on whether a given url is one of the user_uploads or an external
one. Based on its value we decide whether a url must be thumbnailed or
not and thus this function will also be used in an upcoming commit
patching lib/thumbnail.py to do the same check before thumbnail url
generation.
We are basically adding a check for url's to be external (belonging
to some 3rd party web site hosting the image) or be one of the
user uploaded files. User uploaded files are served by a separate
endpoint which is /user_uploads/. Any other local url such as
/user_avatars/ or /static/ should never be sent to thumbor for
thumbnailing.
Not sending /user_avatars/ to thumbor for thumbnailing makes sense
because they are already properly thumbnailed and stored properly.
/static/ urls host very few images we use for demo and can be safely
be excluded from thumbnailing.