Commit Graph

4728 Commits

Author SHA1 Message Date
Rohitt Vashishtha 630c564fc7 bugdown: Rewrite List Preprocessor logic to properly parse fences.
Previously, we didn't track opening and closing fences separately,
with led to bugs like not parsing a list that was immediately after
a quoted fence; we treated each ``` as a new fence.

This commit rewrites the function to maintain a stack of currently
open fences. If any of the parent fences is a code fence, we do not
insert a new line before a list.

We also add some test cases specifically to test this behavior with
complexly nested lists.

Fixes #13745.
2020-01-27 17:14:27 -08:00
Mateusz Mandera 92c16996fc redis_utils: Require key_format argument in get_dict_from_redis. 2020-01-26 21:40:15 -08:00
Mateusz Mandera ad460e6ccb redis_utils: Validate requested key length in helper functions. 2020-01-26 21:40:15 -08:00
Mateusz Mandera 8d987ba5ae auth: Use tokens, with data stored in redis, for log_into_subdomain.
The desktop otp flow (to be added in next commits) will want to generate
one-time tokens for the app that will allow it to obtain an
authenticated session. log_into_subdomain will be the endpoint to pass
the one-time token to. Currently it uses signed data as its input
"tokens", which is not compatible with the otp flow, which requires
simpler (and fixed-length) token. Thus the correct scheme to use is to
store the authenticated data in redis and return a token tied to the
data, which should be passed to the log_into_subdomain endpoint.

In this commit, we replace the "pass signed data around" scheme with the
redis scheme, because there's no point having both.
2020-01-26 21:32:44 -08:00
Abhishek-Balaji 434e8d3104 home: Extract compute_show_invites_and_add_streams.
This extracts a function for computing show_invites and
show_add_streams, for better readability and testability.

This commit was substantially cleaned up by tabbott.
2020-01-25 23:41:08 -08:00
Tim Abbott d70e799466 bots: Remove FEEDBACK_BOT implementation.
This legacy cross-realm bot hasn't been used in several years, as far
as I know.  If we wanted to re-introduce it, I'd want to implement it
as an embedded bot using those common APIs, rather than the totally
custom hacky code used for it that involves unnecessary queue workers
and similar details.

Fixes #13533.
2020-01-25 22:41:39 -08:00
Mateusz Mandera af2c4a9735 redis: Extract put_dict_in_redis and get_dict_from_redis helpers. 2020-01-23 16:24:07 -08:00
Jonathan Cobb c7433c83ff integrations: Add errbit integration.
Fixes #13685.
2020-01-16 15:33:51 -08:00
Mateusz Mandera d37e6ef921 email_mirror: Use plaintext if html body empty with prefer-html option.
If an email is sent with the .prefer-html option, but it has no html
body, it's better to fall back to plaintext content instead of treating
it as a user error.
2020-01-16 15:25:27 -08:00
Mateusz Mandera 0c9c218e91 email_mirror: Add prefer-html and prefer-text address options.
Closes #13484.

These options tell zulip whether to prefer the plaintext or html version
of the email message. prefer-text is the default behavior, so including
the option doesn't change anything as of now, but we're adding it to
prepare to potentially change the default behavior in the future.
2020-01-16 15:25:19 -08:00
Mateusz Mandera 170e0ac2dd email_mirror: More abstract option system.
As we add more address options, which will have different behavior than
simply setting option_name=True, we need to migrate this subsystem to
something that better supports more complex logic and will allow
encapsulating it, instead of needing to be put all over the
decode_email_address function.
2020-01-16 15:16:04 -08:00
Tim Abbott eb8b3539ad test_classes: Remove DEFAULT_REALM variable.
This essentially unused legacy variable was causing Zulip to query the
database at import time, which is generally not something we aim to
do.

Combined with the issue fixed in the previous commit, this variable
resulted in test-backend providing an unhelpful crash when provision
hadn't updated the unit testing database.
2020-01-16 13:13:46 -08:00
Tim Abbott 8ff5d8ca89 test_classes: Clean up API_KEYS cache.
Since the intent of our testing code was clearly to clear this cache
for every test, there's no reason for it to be a module-level global.

This allows us to remove an unnecessary import from test_runner.py,
which in combination with DEFAULT_REALM's definition was causing us to
run models code before running migrations inside test-backend.

(That bug, in turn, caused test-backend's check for whether migrations
needs to be run to happen sadly after trying to access a Realm,
trigger a test-backend crash if the Realm model had changed since the
last provision).
2020-01-16 13:07:26 -08:00
Anders Kaseorg 319e2231b8 thumbnail: Tighten fix for CVE-2019-19775 open redirect.
Due to a known but unfixed bug in the Python standard library’s
urllib.parse module (CVE-2015-2104), a crafted URL could bypass the
validation in the previous patch and still achieve an open redirect.

https://bugs.python.org/issue23505

Switch to using django.utils.http.is_safe_url, which already contains
a workaround for this bug.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-16 12:36:24 -08:00
Anders Kaseorg ea6934c26d dependencies: Remove WebSockets system for sending messages.
Zulip has had a small use of WebSockets (specifically, for the code
path of sending messages, via the webapp only) since ~2013.  We
originally added this use of WebSockets in the hope that the latency
benefits of doing so would allow us to avoid implementing a markdown
local echo; they were not.  Further, HTTP/2 may have eliminated the
latency difference we hoped to exploit by using WebSockets in any
case.

While we’d originally imagined using WebSockets for other endpoints,
there was never a good justification for moving more components to the
WebSockets system.

This WebSockets code path had a lot of downsides/complexity,
including:

* The messy hack involving constructing an emulated request object to
  hook into doing Django requests.
* The `message_senders` queue processor system, which increases RAM
  needs and must be provisioned independently from the rest of the
  server).
* A duplicate check_send_receive_time Nagios test specific to
  WebSockets.
* The requirement for users to have their firewalls/NATs allow
  WebSocket connections, and a setting to disable them for networks
  where WebSockets don’t work.
* Dependencies on the SockJS family of libraries, which has at times
  been poorly maintained, and periodically throws random JavaScript
  exceptions in our production environments without a deep enough
  traceback to effectively investigate.
* A total of about 1600 lines of our code related to the feature.
* Increased load on the Tornado system, especially around a Zulip
  server restart, and especially for large installations like
  zulipchat.com, resulting in extra delay before messages can be sent
  again.

As detailed in
https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it
appears that removing WebSockets moderately increases the time it
takes for the `send_message` API query to return from the server, but
does not significantly change the time between when a message is sent
and when it is received by clients.  We don’t understand the reason
for that change (suggesting the possibility of a measurement error),
and even if it is a real change, we consider that potential small
latency regression to be acceptable.

If we later want WebSockets, we’ll likely want to just use Django
Channels.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-01-14 22:34:00 -08:00
Mateusz Mandera 0beae44081 email_mirror: Use .walk() to search all MIME parts for attachments.
Fixes #13416

We used to search only one level in depth through the MIME structure,
and thus would miss attachments that were nested deeper (which can
happen with some email clients). We can take advantage of message.walk()
to iterate through each MIME part.
2020-01-14 15:37:39 -08:00
Mateusz Mandera 1561d144e0 email_mirror: Insert a new line before attachment links. 2020-01-14 15:37:39 -08:00
Tlazypanda 30ee0c2a49 invitations: Improve experience around reactivating users.
Previously, if you tried to invite a user whose account had been
deactivated, we didn't provide a clear path forward for reactivating
the users, which was confusing.

We fix this by plumbing through to the frontend the information that
there is an existing user account with that email address in this
organization, but that it's deactivated.  For administrators, we
provide a link for how to reactivate the user.

Fixes #8144.
2020-01-13 18:30:51 -08:00
Tim Abbott 79f18138f5 realm: Add private_message_policy setting.
This experimental setting disables sending private messages in Zulip
in a crude way (i.e. users get an error when they try to send one).
It makes no effort to adjust the UI to avoid advertising the idea of
sending private messages.

Fixes #6617.
2020-01-13 12:20:42 -08:00
Mateusz Mandera d5ac1afce8 email_mirror: Check address usability in get_missed_message_address. 2020-01-12 20:43:51 -08:00
Mateusz Mandera 89046ea1a9 email_mirror: Give extract_and_validate a more descriptive name. 2020-01-12 11:30:18 -08:00
Mateusz Mandera 90a69ab24f email_mirror: Reuse exception messages in mirror_email_message. 2020-01-12 11:30:18 -08:00
Mateusz Mandera 9f2b0c769f stream_recipient: Eliminate unnecessary queries.
We should take adventage of the recipient field being denormalized into
the Stream model. We don't need to make queries to figure out a stream's
recipient id, so we take advantage of that to eliminate some of
those redundant queries and simplify StreamRecipientMap.
2020-01-08 14:34:43 -08:00
Mateusz Mandera 786c235023 stream_recipient: Optimize query in populate_for_recipient_ids.
There's no reason to join with the Stream table, as Recipient.type_id is
the stream id.
2020-01-08 14:34:43 -08:00
Hashir Sarwar 0cabacb8ab export: Fix data export parallelization.
This improves the approach of creating multiple parallel processes by
using subprocess.Popen() instead of run_parallel() and
subprocess.call() while exporting an organization's message
history.  This prevents forking twice for individual subprocess.

While this has some performance benefit, the main reason to fix this
is that it fixes an issue with the data export web UI introduced in
run_parallel forks exited).

Fixes #12904.
2020-01-07 13:23:18 -08:00
Mateusz Mandera b87cf22b33 email_mirror: Move send_to_mm_address code to process_missed_message.
process_missed_message did nothing other than calling
send_to_missed_message_address with the same arguments, so there's no
reason to have these as separate functions.
2020-01-07 13:03:32 -08:00
Mateusz Mandera c011d2c6d3 email_mirror: Migrate missed message addresses from redis to database.
Addresses point 1 of #13533.

MissedMessageEmailAddress objects get tied to the specific that was
missed by the user. A useful benefit of that is that email message sent
to that address will handle topic changes - if the message that was
missed gets its topic changed, the email response will get posted under
the new topic, while in the old model it would get posted under the
old topic, which could potentially be confusing.

Migrating redis data to this new model is a bit tricky, so the migration
code has comments explaining some of the compromises made there, and
test_migrations.py tests handling of the various possible cases that
could arise.
2020-01-07 13:03:22 -08:00
Mateusz Mandera 9077bbfefd models: Add MissedMessageEmailAddress class.
Preparatory commit for making the email mirror use the database instead
of redis for missed message addresses.

This model will represent missed message email addresses, which
currently have their data stored in redis.
The redis data will be converted and migrated into these models and
the email mirror will start using them in the main commit.
2020-01-07 12:46:55 -08:00
Steve Howell 630aadb7e0 bot_owner_id: Explicitly set bot_owner_id to None.
For cross realm bots, explicitly set bot_owner_id
to None.  This makes it clear that the cross realm
bots have no owner, whereas before it could be
misdiagnosed as the server forgetting to set the
field.
2020-01-07 12:33:14 -08:00
Mateusz Mandera 510bc60663 test_helpers: Set Recipient class attrs in use_db_models.
Model classes fetched through apps.get_model don't get methods or class
attributes. It's not feasible to add them to all these objects in
use_db_models, but Recipient.PERSONAL etc. are worth setting, since
doing that increases the range of functions that can successfully be
imported and called in test_migrations.py.
2020-01-03 16:56:58 -08:00
Mateusz Mandera d691c249db api: Return a JsonableError if API key of invalid format is given. 2020-01-03 16:56:42 -08:00
Mateusz Mandera 72401b229f utils: Add a function to check if string can be an API key. 2020-01-03 16:56:42 -08:00
Mateusz Mandera 4f2897fafc cache: Validate keys before passing them to memcached.
Fixes #13504.

This commit is purely an improvement in error handling.

We used to not do any validation on keys before passing them to
memcached, which meant for invalid keys, memcached's own key
validation would throw an exception.  Unfortunately, the resulting
error messages are super hard to read; the traceback structure doesn't
even show where the call into memcached happened.

In this commit we add validation to all the basic cache_* functions, and
appropriate handling in their callers.

We also add a lot of tests for the new behavior, which has the nice
effect of giving us decent coverage of all these core caching
functions which previously had been primarily tested manually.
2020-01-03 16:56:42 -08:00
Steve Howell 405a529340 server: Sort user_ids in recent PM conversations.
This change should prevent test flakes, plus
it's more deterministic behavior for clients,
who will generally comma-join the ids into
a key for their internal data structures.

I was able to verify test coverage on this
by making the sort reversed, which would
cause test_huddle_send_message_events to
fail.
2020-01-02 11:59:58 -08:00
Anders Kaseorg 8f281c4fc9 apply_event: Replace list comprehension with list.remove.
This should be about 4 times faster, saving something like half a
millisecond on each stream of 10000 subscribers.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-31 10:06:09 -08:00
Tim Abbott 851eb1a6ee generate_test_data: Remove some useless type annotations.
One of these caused a parser error trying to run pyre on Zulip; the
other is just useless as the type can be inferred.
2019-12-13 11:52:23 -08:00
Tim Abbott 7ccc8373e2 bugdown: Fix logic for extracting attachment path_id.
In 3892a8afd8, we restructured the
system for managing uploaded files to a much cleaner model where we
just do parsing inside bugdown.

That new model had potentially buggy handling of cases around both
relative URLs and URLS starting with `realm.host`.

We address this by further rewriting the handling of attachments to
avoid regular expressions entirely, instead relying on urllib for
parsing, and having bugdown output `path_id` values, so that there's
no need for any conversions between formats outside bugdowm.

The check_attachment_reference_change function for processing message
updates is significantly simplified in the process.

The new check on the hostname has the side effect of requiring us to
fix some previously weird/buggy test data.

Co-Author-By: Anders Kaseorg <anders@zulipchat.com>
Co-Author-By: Rohitt Vashishtha <aero31aero@gmail.com>
2019-12-12 20:30:26 -08:00
Anders Kaseorg 8e37862b69 CVE-2019-19775: Close open redirect in thumbnail view.
This closes an open redirect vulnerability, one case of which was
found by Graham Bleaney and Ibrahim Mohamed using Pysa.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-12 17:29:20 -08:00
Tim Abbott 4901dc3795 url_preview: Fix parsing of open graph tags.
Our open graph parser logic sloppily mixed data obtained by parsing
open graph properties with trusted data set by our oembed parser.

We fix this by consistenly using our explicit whitelist of generic
properties (image, title, and description) in both places where we
interact with open graph properties.  The fixes are redundant with
each other, but doing both helps in making the intent of the code
clearer.

This issue fixed here was originally reported as an XSS vulnerability
in the upcoming Inline URL Previews feature found by Graham Bleaney
and Ibrahim Mohamed using Pysa.  The recent Oembed changes close that
vulnerability, but this change is still worth doing to make the
implementation do what it looks like it does.
2019-12-12 15:24:38 -08:00
Anders Kaseorg faa3ea0b8e oembed: Remove unsound HTML filtering.
The frontend now takes care of confining the HTML.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-12-12 15:24:38 -08:00
Tim Abbott 9f223bb7c2 url_preview: Simplify path to oembed code. 2019-12-12 13:34:49 -08:00
Tim Abbott e7cf1112c8 notifications: Enable online push notifications by default.
For new user onboarding, it's important for it to be easy to verify
that Zulip's mobile push notifications work without jumping through
hoops or potentially making mistakes.  For that reason, it makes sense
to toggle the notification defaults for new users to the more
aggressive mode (ignoring whether the user is currently actively
online); they can set the more subtle mode if they find that the
notifications are annoying.
2019-12-12 13:04:10 -08:00
Tim Abbott f3c224058f models: Use unlimited .select_related() for Stream and DefaultStream.
Previously, these accesses used e.g. .select_related("realm"), which
was the only foreign key on the Stream model.  Since the intent in
these code paths is to attach the related models for efficient access,
we should just do that for all related models, including Recipient.
2019-12-12 12:13:07 -08:00
Mateusz Mandera 9a42a83e15 streams: Remove get_stream_recipients function and its uses.
With the recipient field being denormalized into the UserProfile and
Streams models, all current uses of get_stream_recipients can be done
more efficiently, by simply checking the .recipient_id attribute on the
appropriate objects.
2019-12-12 12:05:42 -08:00
Mateusz Mandera 01288ede9e recipients: Remove bulk_get_recipients function and its uses.
With the recipient field being denormalized into the UserProfile and
Streams models, all current uses of bulk_get_recipients can be done more
efficient, by simply checking the .recipient_id attribute on the
appropriate objects.
2019-12-12 12:00:13 -08:00
Tim Abbott 63fd7bdf57 actions: Simplify logic of get_recipient_from_user_profiles.
This just uses the early return pattern and a local variable to
produce somewhat more readable code.
2019-12-12 11:59:27 -08:00
Mateusz Mandera 9995dab095 messages: Save a database query in check_message code path.
The flow in recipient_for_user_profiles previously worked by doing
validation on UserProfile objects (returning a list of IDs), and then
using that data to look up the appropriate Recipient objects.

For the case of sending a private message to another user, the new
UserProfile.recipient column lets us avoid the query to the Recipient
table if we move the step of reducing down to user IDs to only occur
in the Huddle code path.
2019-12-12 11:49:01 -08:00
Mateusz Mandera 690dc7313d actions: Restore a misplaced comment to its correct position. 2019-12-11 18:46:33 -08:00
Tim Abbott 299896b6ce notifications: Ignore mobile presence when sending notifications.
Previously, if the user had interacted with the Zulip mobile app in
the last ~140 seconds, it's likely the mobile app had sent presence
data to the Zulip server, which in turns means that the Zulip server
might not send that user mobile push notifications (or email
notifications) about new messages for the next few minutes.

The email notifications behavior is potentially desirable, but the
push notifications behavior is definitely not -- a private message
reply to something you sent 2 minutes ago is definitely something you
want a push notification for.

This commit partially addresses that issue, by ignoring presence data
from the ZulipMobile client when determining whether the user is
currently engaging with a Zulip client (essentially, we're only
considering desktop activity as something that predicts the user is
likely to see a desktop notification or is otherwise "online").
2019-12-11 16:05:35 -08:00
Tim Abbott 958f39a551 message_edit: Call check_attachment_reference_change unconditionally.
This removes the last of the messy use of regular expressions outside
bugdown to make decisions on whether a message contains an attachment
or not.  Centralizing questions about links to be decided entirely
within bugdown (rather than doing ad-hoc secondary parsing elsewhere)
makes the system cleaner and more robust.
2019-12-11 11:10:46 -08:00