It's theoretically possible to have configured a Zulip server where
the system bots live in the same realm as normal users (and may have
in fact been the default in early Zulip releases? Unclear.). We
should handle these without the migration intended to clean up naming
for the system bot realm crashing.
Fixes#13660.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013. We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not. Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.
While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.
This WebSockets code path had a lot of downsides/complexity,
including:
* The messy hack involving constructing an emulated request object to
hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
needs and must be provisioned independently from the rest of the
server).
* A duplicate check_send_receive_time Nagios test specific to
WebSockets.
* The requirement for users to have their firewalls/NATs allow
WebSocket connections, and a setting to disable them for networks
where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
been poorly maintained, and periodically throws random JavaScript
exceptions in our production environments without a deep enough
traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
server restart, and especially for large installations like
zulipchat.com, resulting in extra delay before messages can be sent
again.
As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients. We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.
If we later want WebSockets, we’ll likely want to just use Django
Channels.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Fixes#13416
We used to search only one level in depth through the MIME structure,
and thus would miss attachments that were nested deeper (which can
happen with some email clients). We can take advantage of message.walk()
to iterate through each MIME part.
return in that loop was a bug, which would lead to the To: header not
being set even though data['recipient'] = str(message['To']) is being
run next, thus requiring the header. We can remove the return
statement and now the loop will overwrite all the potentially
troublesome headers.
This is a preparatory commit for using isort for sorting all of our
imports, merging changes to files where we can easily review the
changes as something we're happy with.
These are also files with relatively little active development, which
means we don't expect much merge conflict risk from these changes.
Previously, if you tried to invite a user whose account had been
deactivated, we didn't provide a clear path forward for reactivating
the users, which was confusing.
We fix this by plumbing through to the frontend the information that
there is an existing user account with that email address in this
organization, but that it's deactivated. For administrators, we
provide a link for how to reactivate the user.
Fixes#8144.
Our recent fixes to using the system's configured memcached settings
broke populate_db, because its hacky clear_database helper is called
with a hacked-up settings module.
We fix this by first moving this out-of-place code from models.py into
populate_db, and then saving the settings required to access memcached
so that we can use them in clear_database.
We also fix a mypy erorr in flush-memcached that matches the same
issue fixed in clear_database.
This experimental setting disables sending private messages in Zulip
in a crude way (i.e. users get an error when they try to send one).
It makes no effort to adjust the UI to avoid advertising the idea of
sending private messages.
Fixes#6617.
We should take adventage of the recipient field being denormalized into
the Stream model. We don't need to make queries to figure out a stream's
recipient id, so we take advantage of that to eliminate some of
those redundant queries and simplify StreamRecipientMap.
This removes zerver/webhooks/trello/view/exceptions.py, which
contained legacy Trello webhook exception related classes. We replace
them with UnexpectedWebhookEventType, which results in our standard
exception handling for unknown event types running (avoiding too-high
priority error logging).
Fixes#13467.
This improves the approach of creating multiple parallel processes by
using subprocess.Popen() instead of run_parallel() and
subprocess.call() while exporting an organization's message
history. This prevents forking twice for individual subprocess.
While this has some performance benefit, the main reason to fix this
is that it fixes an issue with the data export web UI introduced in
run_parallel forks exited).
Fixes#12904.
process_missed_message did nothing other than calling
send_to_missed_message_address with the same arguments, so there's no
reason to have these as separate functions.
Addresses point 1 of #13533.
MissedMessageEmailAddress objects get tied to the specific that was
missed by the user. A useful benefit of that is that email message sent
to that address will handle topic changes - if the message that was
missed gets its topic changed, the email response will get posted under
the new topic, while in the old model it would get posted under the
old topic, which could potentially be confusing.
Migrating redis data to this new model is a bit tricky, so the migration
code has comments explaining some of the compromises made there, and
test_migrations.py tests handling of the various possible cases that
could arise.
Preparatory commit for making the email mirror use the database instead
of redis for missed message addresses.
This model will represent missed message email addresses, which
currently have their data stored in redis.
The redis data will be converted and migrated into these models and
the email mirror will start using them in the main commit.
For cross realm bots, explicitly set bot_owner_id
to None. This makes it clear that the cross realm
bots have no owner, whereas before it could be
misdiagnosed as the server forgetting to set the
field.
Model classes fetched through apps.get_model don't get methods or class
attributes. It's not feasible to add them to all these objects in
use_db_models, but Recipient.PERSONAL etc. are worth setting, since
doing that increases the range of functions that can successfully be
imported and called in test_migrations.py.
These tests had a lot of very repetetive, identical mocking, in some
tests without even doing anything with the mocks. It's cleaner to put
the mock in the one relevant, common place for all the tests that need
it, and remove it from tests who had no use for the mocking.
Fixes#13504.
This commit is purely an improvement in error handling.
We used to not do any validation on keys before passing them to
memcached, which meant for invalid keys, memcached's own key
validation would throw an exception. Unfortunately, the resulting
error messages are super hard to read; the traceback structure doesn't
even show where the call into memcached happened.
In this commit we add validation to all the basic cache_* functions, and
appropriate handling in their callers.
We also add a lot of tests for the new behavior, which has the nice
effect of giving us decent coverage of all these core caching
functions which previously had been primarily tested manually.
If ldap sync is run while ldap is misconfigured, it can end up causing
troublesome deactivations due to not finding users in ldap -
deactivating all users, or deactivating all administrators of a realm,
which then will require manual intervention to reactivate at least one
admin in django shell.
This change prevents such potential troublesome situations which are
overwhelmingly likely to be unintentional. If intentional, --force
option can be used to remove the protection.
This change should prevent test flakes, plus
it's more deterministic behavior for clients,
who will generally comma-join the ids into
a key for their internal data structures.
I was able to verify test coverage on this
by making the sort reversed, which would
cause test_huddle_send_message_events to
fail.
This should be about 4 times faster, saving something like half a
millisecond on each stream of 10000 subscribers.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
QueueProcessingWorker and LoopQueueProcessingWorker are abstract classes
meant to be subclassed by a class that will define its own consume()
or consume_batch() method. ABCs are suited for that and we can tag
consume/consume_batch with the @abstractmethod wrapper which will
prevent subclasses that don't define these methods properly to be
impossible to even instantiate (as opposed to only crashing once
consume() is called). It's also nicely detected by mypy, which will
throw errors such as this on invalid use:
error: Only concrete class can be given where "Type[TestWorker]" is
expected
error: Cannot instantiate abstract class 'TestWorker' with abstract
attribute 'consume'
Due to it being detected by mypy, we can remove the test
test_worker_noconsume which just tested the old version of this -
raising an exception when the unimplemented consume() gets called. Now
it can be handled already on the linter level.
LoopQueueProcessingWorker can handle exceptions inside consume_batch in
a similar manner to how QueueProcessingWorker handles exceptions inside
consume.
Our ldap integration is quite sensitive to misconfigurations, so more
logging is better than less to help debug those issues.
Despite the following docstring on ZulipLDAPException:
"Since this inherits from _LDAPUser.AuthenticationFailed, these will
be caught and logged at debug level inside django-auth-ldap's
authenticate()"
We weren't actually logging anything, because debug level messages were
ignored due to our general logging settings. It is however desirable to
log these errors, as they can prove useful in debugging configuration
problems. The django_auth_ldap logger can get fairly spammy on debug
level, so we delegate ldap logging to a separate file
/var/log/zulip/ldap.log to avoid spamming server.log too much.
A block of LDAP integration code related to data synchronization did
not correctly handle EMAIL_ADDRESS_VISIBILITY_ADMINS, as it was
accessing .email, not .delivery_email, both for logging and doing the
mapping between email addresses and LDAP users.
Fixes#13539.
This simplifies the RDS installation process to avoid awkwardly
requiring running the installer twice, and also is significantly more
robust in handling issues around rerunning the installer.
Finally, the answer for whether dictionaries are missing is available
to Django for future use in warnings/etc. around full-text search not
being great with this configuration, should they be required.
Fixes#13528.
The email_auth_enabled check caused all enabled backends to get
initialized, and thus if LDAP was enabled the check_ldap_config()
check would cause an error if LDAP was misconfigured
(for example missing the new settings).
In 3892a8afd8, we restructured the
system for managing uploaded files to a much cleaner model where we
just do parsing inside bugdown.
That new model had potentially buggy handling of cases around both
relative URLs and URLS starting with `realm.host`.
We address this by further rewriting the handling of attachments to
avoid regular expressions entirely, instead relying on urllib for
parsing, and having bugdown output `path_id` values, so that there's
no need for any conversions between formats outside bugdowm.
The check_attachment_reference_change function for processing message
updates is significantly simplified in the process.
The new check on the hostname has the side effect of requiring us to
fix some previously weird/buggy test data.
Co-Author-By: Anders Kaseorg <anders@zulipchat.com>
Co-Author-By: Rohitt Vashishtha <aero31aero@gmail.com>
This closes an open redirect vulnerability, one case of which was
found by Graham Bleaney and Ibrahim Mohamed using Pysa.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This avoids risk of OOM issues on servers with relatively limited RAM
and millions of messages of history; apparently, fetching all messages
ordered by ID could be quite memory-intensive even with an iterator
usage model.
Fortunately, we have other migrations that already follow this pattern
of iterating over messages, so it's easy to borrow existing code to
make this migration run reasonably.
Our open graph parser logic sloppily mixed data obtained by parsing
open graph properties with trusted data set by our oembed parser.
We fix this by consistenly using our explicit whitelist of generic
properties (image, title, and description) in both places where we
interact with open graph properties. The fixes are redundant with
each other, but doing both helps in making the intent of the code
clearer.
This issue fixed here was originally reported as an XSS vulnerability
in the upcoming Inline URL Previews feature found by Graham Bleaney
and Ibrahim Mohamed using Pysa. The recent Oembed changes close that
vulnerability, but this change is still worth doing to make the
implementation do what it looks like it does.
This fixes a few minor issues with the migration:
* Skips messages with empty rendered_content, fixing an exception that
affected 4 messages on chat.zulip.org.
* Accesses messages in order.
* Provides some basic output on the progress made.
This should make life substantially better for any organizations that
run into trouble with this migration, either due to it taking a long
time to run or due to any new exceptions.
For new user onboarding, it's important for it to be easy to verify
that Zulip's mobile push notifications work without jumping through
hoops or potentially making mistakes. For that reason, it makes sense
to toggle the notification defaults for new users to the more
aggressive mode (ignoring whether the user is currently actively
online); they can set the more subtle mode if they find that the
notifications are annoying.
Previously, these accesses used e.g. .select_related("realm"), which
was the only foreign key on the Stream model. Since the intent in
these code paths is to attach the related models for efficient access,
we should just do that for all related models, including Recipient.