Previously, incorrectly passing an existing directory to the
`manage.py export --output` option would remove its contents without
warning. Abort instead.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
* Whitelist a small number of image/ types to be served as
non-attachments.
* Serve the file using the type that we validated rather than relying
on an independent guess to match.
This issue can lead to a stored XSS security vulnerability for older
browsers that don't support Content-Security-Policy.
It primarily affects servers using Zulip's local file uploads backend
for servers running Ubuntu 16.04 Xenial or newer; the legacy local
file upload backend for (now EOL) Ubuntu 14.04 Trusty was not affected
and it has limited impact for the S3 upload backend (which uses an
unprivileged S3 bucket domain to serve files).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This brings us in line, and also allows us to style these more like
unordered lists, which is visually more appealing.
On the backend, we now use the default list blockprocessor + sane list
extension of python-markdown to get proper list markup; on the
frontend, we mostly return to upstream's code as they have followed
CommonMark on this issue.
Using <ol> here necessarily removes the behaviour of not renumbering
on lists written like 3, 4, 7; hopefully users will be OK with the
change.
Fixes#12822.
Also cleans up the interface between the management command and the
LDAP backends code to not guess/recompute under what circumstances
what should be logged.
Co-authored-by: mateuszmandera <mateusz.mandera@protonmail.com>
The order of operations for our LDAP synchronization code wasn't
correct: We would run the code to sync avatars (etc.) even for
deactivated users.
Thanks to niels for the report.
Co-authored-by: mateuszmandera <mateusz.mandera@protonmail.com>
Fixes#13130.
django_auth_ldap doesn't give any other way of detecting that LDAPError
happened other than catching the signal it emits - so we have to
register a receiver. In the receiver we just raise our own Exception
which will properly propagate without being silenced by
django_auth_ldap. This will stop execution before the user gets
deactivated.
So the reason 38f8cf612c seems
to be flaking is because the value of harry id switches between
1 and 2 in Xenial while in Bionic it would be fixed at 2. The
reason behind this is that Bionic ships with Python3.6 which
preserves dict insert order while Python3.5 that ships with Xenial
dont preserve the order. In initialize_stream_membership_dicts
we iterate user_data_map dict and the order in which the iteration
happens affects the ID of the users.
Papertrail sends requests with the content type
`application/x-www-form-urlencoded`, with the payload parameter holding the
JSON body. This commit fixes the papertrail integration to use the payload
parameter in the request's POST data instead of trying to parse the
request's entire body as JSON.
Papertrail documentation here:
https://help.papertrailapp.com/kb/how-it-works/web-hooks#encoding
We have a very useful piece of code, _RateLimitFilter, which is
designed to avoid sending us a billion error emails in the event that
a Zulip production server is down in a way that throws the same
exception a lot. The code uses memcached to ensure we send each
traceback roughly once per Zulip server per 10 minutes (or if
memcached is unavailable, at most 1/process/10 minutes, since we use
memcached to coordinate between processes)
However, if memcached is down, there is a logging.error call internal
to the Django/memcached setup that happens inside the cache.set() call,
and those aren't caught by the `except Exception` block around it.
This ends up resulting in infinite recursion, eventually leading to
Fatal Python error: Cannot recover from stack overflow., since this
handler is configured to run for logging.error in addition to
logging.exception.
We fix this using a thread-local variable to detect whether we are
being called recursively.
This change should prevent some nasty failure modes we've had in the
past where memcached being down resulted in infinite recursion
(resulting in extra resources being consumed by our error
notifications code, and most importantly, the error notifications not
being sent).
Fixes#12595.
There's no reason for this to be a category of error that emails the
server administrator, since there's a good chance that fixing it will
need to be done in the Zulip codebase, not administrator action.
Fixes#9401.
This adds a FAKE_EMAIL_DOMAIN setting, which should be used if
EXTERNAL_HOST is not a valid domain, and something else is needed to
form bot and dummy user emails (if email visibility is turned off).
It defaults to EXTERNAL_HOST.
get_fake_email_domain() should be used to get this value. It validates
that it's correctly set - that it can be used to form valid emails.
If it's not set correctly, an exception is raised. This is the right
approach, because it's undesirable to have the server seemingly
peacefully operating with that setting misconfigured, as that could
mask some hidden sneaky bugs due to UserProfiles with invalid emails,
which would blow up the moment some code that does validate the emails
is called.
Apparently, due to poor naming of the outer capture group we use to
separate the actual match from the surrounding whitespace (etc.) we
use to determine if the syntax is a possible linkifier start/end, if
you created a linkifier using "name" as the capture group, we'd try to
compile a pattern with two capture groups called "name", which would
500, preventing anyone from accessing the organization.
We’re about to start using PostgreSQL-specific syntax that can’t be
stringified without a specified dialect.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This requires part 1 (which can take hours to run but generally
doesn't require downtime) to be completed first.
This portion of the migration will require the server to be completely
down for a brief period; for chat.zulip.org with 250M UserMessage
rows, it took about 60s to run; that time will vary depending on
hardware details like whether the server has an SSD, but fundamentally
shouldn't be long.
Our upgrade-zulip and upgrade-zulip-from-git tools can apply this
migration correctly; nothing special needs to be done.
Fixes#13040.
As part of adding support for more than 2B UserMessage rows in a Zulip
server, we need to change UserMessage.id (a field we don't access but
is needed by Django) from an int to a bigint. This commit is a series
of migrations which create a `bigint_id` column and populates it correctly.
This migration will take a long time to run; on chat.zulip.org (a
server with a lot of history), it took about 4 hours to complete.
How to migrate with minimal downtime:
1. Run `upgrade-zulip-from-git` through this commit. It will install
migration 0238 and then more or less hang while applying migration
0239. Once migration 0238 is completed, however, your server should
be able to be started back up safely while migration 0239 is running.
2. Run `/home/zulip/deployments/next/scripts/restart-server` in a
separate terminal to get Zulip running again.
3. When the `upgrade-zulip-from-git` command finishes, it will
automatically re-restart the Zulip server, leaving you in a consistent
state and ready to do part 2 of the migration.
A useful `manage.py shell` query for checking the state after this
commit is consistent is this:
assert UserMessage.objects.exclude(bigint_id=F("id")).count() == 0
Part of #13040.
Previously, several of our URL patterns accidentally did not end with
`$`, and thus ended up controlling just the stated URL, but actually a
much broader set of URLs starting with it.
I did an audit and fixed what I believe are all instances of this URL
pattern behavior. In the process, I fixed a few tests that were
unintentionally relying on the behavior.
Fixes#13082.
Historically, Zulip's implementation of wildcard mentions never
triggered either email or push notifications, instead being limited to
desktop notifications and the "mentions" counter.
We fix this just by plumbing the "wildcard_mentioned" flag through our
system.
Implements much of
https://github.com/zulip/zulip/issues/6040#issuecomment-510157264.
We're also now ready to seriously work on #3750.
We added default ToS for the development environment a few months
back; as a side effect, we now need to accept ToS when going through
the development environment registration flow, including for our
one-click account creation buttons.
After a new user joins an active organization, it isn't obvious what
to do next; this change causes there to be recent unread messages in
the stream sidebar for the user to click on to get a feel for what's
happening in the organization and experiment with Zulip.
Fixes#6512.
This commit wraps up the major work that we held back when upgrading
py-markdown 2.6.11 to 3.0.1. Since we were making our custom changes
to the link syntax, at the time we stuck to using the old method of
parsing links. This lays the groundwork for further changes to our
link and image link handling, and brings us on par with upstream.
Also, we now better document the ways in which our link handling is
different from upstream.
Previously, the unread_msgs data structure accounting (used for both
the web and mobile apps to determine the "Unread mentions" count
displayed in the UI) did not include wildcard mentions at all.
We fix this by adding the logic required to include properly that
data, with tests. As discussed in #6040, it makes sense to include
muted streams and topics for the purpose of this calculation.
Fixes part of #6040.
Apparently, get_active_presence_idle_user_ids, which is carefully
optimized to only fetch data for users who might actually need
notification processing, was only considering PMs and direct mentions,
not wildcard mentions or alert words.
This caused some pretty weird failure modes when working on adding
support for broader mention notifications, because users who had one
of these types of notifications would be treated as never
presence-idle, which was just confusing.
This is part of adding support for notifications for wildcard mentions
and alert words; it's worth merging this as an early commit because
the consequence of not doing this are very difficult to debug.
Rather than continually resetting the contents of an existing event
queue, we allocate a new one for each subtest.
We also fix a rather confusing bundle of comments.
We add an ensure_users function and use it in tests which have
hard-coded user ids, to make it clear to which users the ids refer to
and have it verified.
This makes it easier to see what's happening
in these tests and to keep track of any renumberings of user ids due to
changes in how we populate the database.
Hopefully this does a better job of spurring people to action, and also
suggests a self-service fix if they don't (i.e. contacting the person that
invited them).
Add ability to search entire message history of all public streams at
once. It includes all subscibed, non subscribed public streams messages
and even historical public stream messages sent before user had joined
an organization or stream.
Fixes#8859.
Instead of having a hard-coded url, it seems better to replace it with
get_gravatar_url - which returns the correct url, without breaking if
the email/id of the example user changes.
One occasionally finds that a 1580 character string of SQL queries
might not most readably be presented on a single line.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Send the config_options for each supported incoming webhook bot along
with the initial state (not present in apply_events since this is
mostly just static data).
Without disturbing the flow of the existing code for configuring
embedded bots too much, we now use the config_options feature to
allow incoming webhook type bot to be configured via. the "/bots"
endpoint of the API.
This is a prep commit to allow us to validate user provided bot
config data using the same function for incoming webhook type
bots alongside embedded bots (as opposed to creating a new
function just for incoming webhook bots).
In integrations.py we have a class called Integration which we then usually
subclass and then use to define the meta-data for all of our integrations.
Now, we want to allow all of our bots, specifically incoming webhook bots,
to be configured (i.e. we should let the user provide BotConfigData).
For this we create a new instance member of the Integration class called
config_options which will be a list of tuples containing the displayable
integration name, the configuration key form of the integration name and
the validator that it's value is supposed to adhere to.
This was used as a helper to construct the final display_recipient when
fetching messages. With the new mechanism of constructing
display_recipient by fetching appropriate users/streams from the
database and cache, this shouldn't be needed anymore.
There is no need to fetch the entire Stream or UserProfile objects, as
only several fields are needed. We use Django's .values() method to only
get what's needed.
For UserProfiles, it means that we get from the queries are dictionaries
already in the display_recipient form (UserDisplayRecipient type) - so
we can remove the user_profile_to_display_recipient_dict function, as
there's no need for this UserProfile -> UserDisplayRecipient conversion
anymore.
Instead of having the rather unclear type Union[str,
List[UserDisplayRecipient]] where display_recipient of message dicts was
involved, we use DisplayRecipientT (renamed from DisplayRecipientCacheT
- since there wasn't much reason to have the word Cache in there), which
makes it clearer what is the actual nature of the objects and gets rid
of this pretty big type declaration.
Since the display_recipients dictionaries corresponding to users are
always dictionaries with keys email, full_name, short_name, id,
is_mirror_dummy - instead of using the overly general Dict[str, Any]
type, we can define a UserDisplayRecipient type,
using an appropriate TypedDict.
The type definitions are moved from display_recipient.py to types.py, so
that they can be imported in models.py.
Appropriate type adjustments are made in various places in the code
where we operate on display_recipients.
The user information in display_recipient in cached message_dicts
becomes outdated if the information is changed in any way.
In particular, since we don't have a way to find all the message
objects that might contain PMs after an organization toggles the
setting to hide user email addresses from other users, we had a
situation where client might see inaccurate cached data from before
the transition for a period of up to hours.
We address this by using our generic_bulk_cached_fetch toolchain to
ensure we always are fetching display_recipient data from the database
(and/or a special recipient_id -> display_recipient cache, which we
can flush easily).
Fixes#12818.
It's not clear to me how this is intended to work in Mattermost's
system in that they don't document this behavior, but some users have
`null` as their list of teams, and presumably are not meant to be
included in any team at all.
This restructures the API endpoints that we currently have implemented
more or less for exclusive use by the mobile and desktop apps (things
like checking what authentication methods are supported) to use a
system that can be effectively parsed by our test_openapi
documentation.
This brings us close to being able to eliminate
`buggy_documentation_endpoints` as a persistently nonempty list.
This add some regular expression manipulation hacks to make it
possible for us to validate the documentation for the presence
endpoint with a slightly more complex regular expression capture
group.
Previously, our OpenAPI documentation validation was failing for some
endpoints because it didn't account for the `in: path` type of
parameter, resulting in a mismatch between what was declared via REQ
and what was declared in the OpenAPI docs.
We fix this by excluding the path type parameters in both places from
what's considered by documentation using the `path_only` flag.
I doubt this is the correct long-term fix; in particular, I don't
think we're actually running the validators for these path-only
parameters. The examples that exist today are all IDs with validators
for being non-negative numbers, but longer-term I think we'll want to
do something different (possibly at the REQ layer, see the TODO).
Atlassian announced that it will no longer provide information about
comments along with their "issue" type event payloads for Jira. So
we must now update the Jira integration to appropriately respond to
"comment" type events (the reason why we didn't do this before was
that initially, the "comment" type event payloads didn't contain
sufficient information about their issues, but this payload has
since been improved).
Note: This commit does *not* remove support for the older "issue"
type event payloads where information about comments was included.
This way we can maintain compatibility with old self-hosted versions
of self hosted Jira (2016 and before).
Source:
https://developer.atlassian.com/cloud/jira/platform/change-notice-
removal-of-comments-from-issue-webhooks/
Fixes#13012
Instead of just mocking some fake events, we use the code
path that generates slow query events and publishes them
to SlowQueryWorker.
This test improvement would have got a recent potential regression
caught in code review.
This new test runs each generated curl example against the Zulip API,
checking whether it returns successfully without errors.
Significantly modified by tabbott for simplicity.
Our new curl example generation logic was broken, in that it hardcoded
localhost:9991 (without an HTTP method or anything) as the API URL.
It requires a bit of plumbing to make this possible.
This refactor extracts the code logic of checking if user can access
stream history into it's own function: can_access_stream_history
that takes in user_profile and stream. Then we make seperate function
can_access_stream_by_name that takes in stream_name and retrives stream
and pass it to can_access_stream_history. This will make it easily to later
add a function that does the same thing with stream ID.
This function was used only once in exclude_muting_conditions where
it returned stream name so the function to fetch stream id.
Since we exepect the narrow to also include stream id we refactor it to
return a stream object and since we use get_stream_by_narrow_operand_access_unchecked
we don't need to worry about handling cases where stream id is passed since
the function handles it.
This let's us clean up the linter that excludes the use of get_stream
and by adding the access_unchecked in the name we make it clear that
it should be used with caution.
Refactoring idea by Tim Abbott.
Makes it obvious and readable and as an added bonus take up less space.
Note, some types used Iterable instead of List that were change
to used List since the narrow_paramter converter return a List.
zerver/openapi/python_examples.py:105: error: Argument 1 to "get_user_presence" of "Client" has incompatible type "str"; expected "Dict[str, Any]"
zerver/openapi/python_examples.py:563: error: Argument 1 to "add_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]"
zerver/openapi/python_examples.py:576: error: Argument 1 to "remove_reaction" of "Client" has incompatible type "Dict[str, object]"; expected "Dict[str, str]"
zerver/worker/queue_processors.py:587: error: Argument "client" to "extract_query_without_mention" has incompatible type "EmbeddedBotHandler"; expected "ExternalBotHandler"
These were only missed because mypy daemon mode requires us to set
`follow_imports = skip` for the `zulip` package.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The typing for generic_bulk_cached_fetch is complicated, and was
recorded incorrectly previously for the case where a cache_transformer
function is required. We fix this by adding the new CacheItemT, and
additionally add comments explaining what's going on with these types
for future reference.
Thanks to Mateusz Mandera for raising this issue.
Apparently, the filters written for the send_password_reset_email (and
some other management commands) didn't correctly consider the case of
deactivated users.
While some commands, like syncing LDAP data (which can include whether
a user should be deactivated) want to process all users, other
commands generally only want to interact with active users. We fix
this and add some tests.
It seems possible that attempting to export large organizations could
result in high resource consumption that justifies having a technician
manage the exports manually.
The original seems to be unmaintained
(johnsensible/django-sendfile#65). Notably, this fixes a bug in the
filename parameter, which perviously showed the Python 3 repr of a
byte string (johnsensible/django-sendfile#49).
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
There’s an apparent contradiction between RFC 7230 §3.3.2
Content-Length:
“A server MUST NOT send a Content-Length header field in any response
with a status code of 1xx (Informational) or 204 (No Content).”
and RFC 7231 §4.3.7 OPTIONS:
“A server MUST generate a Content-Length field with a value of "0" if
no payload body is to be sent in the response.”
The only resolution within the existing language would be to disallow
all 204 responses to OPTIONS requests. However, I don’t think that
was the intention, so I submitted this erratum report:
https://www.rfc-editor.org/errata/eid5806
and updated the code accordingly.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Apparently, our edit-message events did not guarantee that the outer
wrapper dictionary, which is intended to be unique for each client,
was unique for every client (instead only ensuring it was unique for
each user).
This led to clients unexpectedly getting last_event_id validation
errors in this code path when a user had multiple connected clients,
because the linear ordering of event IDs within a given queue was
corrupted.
In fd2a63b049, we accidentally fixed
this issue with a different set of userdata events, without fixing the
edit-message event bug. This commit fixes the remaining issue.
The `users/me/subscriptions` endpoint accidentally started returning
subscriber information for each stream. This is convenient, but
unnecessarily costly for those clients which either don't need it
(most API apps) or already acquire this information via /register
(including Zulip's apps).
This change removes that data set from the default response. Clients
which had come to rely on it, or would like to rely on it in future,
may still access it via an additional documented API parameter.
Fixes#12917.
Use a decorator called openapi_test_function instead of the hard-coded
TEST_FUNCTIONS list for increased readability and maintainablity (at
the cost of performance).
I changed the class of the title in order to use the same styling as the
other similar pages (like `/accounts/go` or `/login`).
Changed the related test.
For the emails that are associated to an existing account in an
organisation, the avatars will be displayed in the email selection
page. This includes avatar data in what is passed to the page.
Added `avatar_urls` to the context in `test_templates.py`.
Apparently GitHub changed the email address for these; we need to
update our code accordingly.
One cannot receive emails on the username@users.noreply.github.com, so
if someone tries creating an account with this email address, that
person would not be able to verify the account.
It was allowing us to get away with wrong types on a few functions:
`check_send_typing_notification` and `send_notification_backend` can be
(and are) called with a list of `int` as `notification_to`, not just a
list of `str`.
The problem it was working around already had a better solution using
the dummy `type` argument. Use that.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
The previous iteration still had the failure mode of not actually
testing anything, because it didn't trigger the data export code path
(and in fact was getting an HTTP 401 authentication denied error).
This test was broken due to using an empty `RealmAuditLog`
table. We fix this by mocking the creation of an export,
thus creating an entry, similar to what we do in our other
tests.
Delete trailing newlines from all files, except
tools/ci/success-http-headers.txt and tools/setup/dev-motd, where they
are significant, and static/third, where we want to stay close to
upstream.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Previous cleanups (mostly the removals of Python __future__ imports)
were done in a way that introduced leading newlines. Delete leading
newlines from all files, except static/assets/zulip-emoji/NOTICE,
which is a verbatim copy of the Apache 2.0 license.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Apparently, our edit-message events did not guarantee that the outer
wrapper dictionary, which is intended to be unique for each client,
was unique for every client (instead only ensuring it was unique for
each user).
This led to clients unexpectedly getting last_event_id validation
errors in this code path when a user had multiple connected clients,
because the linear ordering of event IDs within a given queue was
corrupted.
Now that we can create cURL examples based on the OpenAPI
documentation. We can begin using simple one line tags in
the documentation instead of manually creating cURL examples.
Fixes part of #12878.
Now we can also include extra keyword arguments to specify
modifications in how the example code should be generated
in the generate_code_example template tag.
E.g. generate_code_example(curl, exclude=["param1", "param2"])
This commit extends api_code_examples.py to support automatically
generating cURL examples from the OpenAPI documentation. This way
work won't have to be repeated and we can also drastically reduce
the chance of introducing faulty cURL examples (via. an automated
test which can now be easily created).
This commit progress our efforts to reduce pending_endpoints
as well as to migrate away from templates/zerver/api/fixtures
and towards our OpenAPI documentation.
Similar to commit d62b75fc.
The current code looks like it's trying to redirect /integrations/doc/email
to /integrations when EMAIL_GATEWAY_PATTERN is not set.
I think it doesn't currently do this. The test for that pathway has a bug:
self.get_doc('integrations/doc-html/email', subdomain='zulip') needs a
leading slash, and putting the slash back in results in the test failing.
This redirection is not really desired behavior -- better is to
unconditionally show that the email integration exists, and just point the
user to https://zulip.readthedocs.io/en/latest/production/email-gateway.html
(this is done in a child commit).
This gives us access to typing_extensions.Deque, which was not added
to typing until 3.5.4.
(PROVISION_VERSION is not bumped because the transitive dependency set
in dev.txt hasn’t changed.)
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
This verifies that the client passed a last_event_id that actually
came from the queue instead of making up an ID from the future. It
turns out one of our tests was making up such an ID, but legitimate
clients are expected not to do so.
The previous version of this commit (commit
e00d4be6d5, #12888) had to be reverted
(commit b86c5cc490) because it was
missing the `to_dict`/`from_dict` migration code.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
Our implementation requires at least 1 space after the
'#' not not break existing linkifiers like '#123', etc.
that generally follow the convention we show in linkifier
examples.
- [valid] : # Hello
- [valid] : # Hello
- [invalid]: #Hello
For the frontend, we have taken the code from v0.7.0 of
upstream marked and made minor changes to avoid having
to refactor a significant part of our marked code.
For the backend, we merely have to change the regex to
force require spaces after #, and add hashheader to our
list of blockparsers.
Fixes#11418.
This fixes two issues:
* The syntax check logic we had for zerver.tornado.autoreload would
end up clearing _reload_hooks if one of the files that had changed
was zerver.tornado.autoreload itself (because we'd had re-imported
the current module), which could be incredibly confusing when trying
to test the autoreload logic. It seems better to just not run the
syntax check for syntax errors in this file.
Similarly, because reloading event_queue.py would destroy the state
in the queues, we avoid that as well.
* We make sure to flush stdout after running and reload hooks, to make
sure their output reaches the user.
We were apparently not running our own forked Tornado autoreload
library when adding reload hooks, which meant that our autoreload
hooks didn't run at all.
This fixes an issue that made dump_event_queues never run and thus the
local development environment difficult to use for testing event queues.
We simply state that certain options are `Optional`.
The following files are affected:
add_users_to_mailing_list
send_to_email_mirror
fill_memcached_caches
client_activity
When typing `**options` as an `Optional[str]` we will see errors
in the from of `None type has no attribute 'split'`. This change
allows mypy to effectively handle the `None` case.