RabbitMQ clients have a setting called prefetch[1], which controls how
many un-acknowledged events the server forwards to the local queue in
the client. The default is 0; this means that when clients first
connect, the server must send them every message in the queue.
This itself may cause unbounded memory usage in the client, but also
has other detrimental effects. While the client is attempting to
process the head of the queue, it may be unable to read from the TCP
socket at the rate that the server is sending to it -- filling the TCP
buffers, and causing the server's writes to block. If the server
blocks for more than 30 seconds, it times out the send, and closes the
connection with:
```
closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672):
{writer,send_failed,{error,timeout}}
```
This is https://github.com/pika/pika/issues/753#issuecomment-318119222.
Set a prefetch limit of 100 messages, or the batch size, to better
handle queues which start with large numbers of outstanding events.
Setting prefetch=1 causes significant performance degradation in the
no-op queue worker, to 30% of the prefetch=0 performance. Setting
prefetch=100 achieves 90% of the prefetch=0 performance, and higher
values offer only minor gains above that. For batch workers, their
performance is not notably degraded by prefetch equal to their batch
size, and they cannot function on smaller prefetches than their batch
size.
We also set a 100-count prefetch on Tornado workers, as they are
potentially susceptible to the same effect.
[1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
The `current_queue_size` key in the queue monitoring stats file was
the local queue size, not the global queue size -- d5a6b0f99a
renamed the function, but did not adjust the queue monitoring JSON,
despite the last use of it having been removed in cd9b194d88.
The function is still used to mark "we emptied our queue," and it
remains a reasonable metric for that.
TOR users are legitimate users of the system; however, that system can
also be used for abuse -- specifically, by evading IP-based
rate-limiting.
For the purposes of IP-based rate-limiting, add a
RATE_LIMIT_TOR_TOGETHER flag, defaulting to false, which lumps all
requests from TOR exit nodes into the same bucket. This may allow a
TOR user to deny other TOR users access to the find-my-account and
new-realm endpoints, but this is a low cost for cutting off a
significant potential abuse vector.
If enabled, the list of TOR exit nodes is fetched from their public
endpoint once per hour, via a cron job, and cached on disk. Django
processes load this data from disk, and cache it in memcached.
Requests are spared from the burden of checking disk on failure via a
circuitbreaker, which trips of there are two failures in a row, and
only begins trying again after 10 minutes.
Unhandled exceptions propagating to process_queue were not caught there,
causing improper logging - errors didn't land in errors.log as expected.
Exceptions should be caught and explicitly logged by the process_queue
logger. Exceptions occurring during consuming events are caught and
handled inside the worker's logic - however those that happen while
setting up the worker were not addressed at all, and that's the core bug
we mean to address here.
Furthermore, in multi-threaded mode we want the autoreload mechanism to
be working - which it doesn't without catching the exceptions. The
correct approach is to - again - catch the exception, log it and then
send SIGUSR1 signal to trigger exit and autoreload.
I believe that this migration with the default of atomic=True will
fail when trying to convert the field to PositiveIntegerField if there
were any 0 values present in the database when the migration began.
The fix is to have each of the steps be their own transaction.
A SIGTERM can show up at any point in the ioloop, even in places which
are not prepared to handle it. This results in the process ignoring
the `sys.exit` which the SIGTERM handler calls, with an uncaught
SystemExit exception:
```
2021-11-09 15:37:49.368 ERR [tornado.application:9803] Uncaught exception
Traceback (most recent call last):
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/http1connection.py", line 238, in _read_message
delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/httpserver.py", line 314, in finish
self.delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/routing.py", line 251, in finish
self.delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2097, in finish
self.execute()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2130, in execute
**self.path_kwargs)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper
yielded = next(result)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 1510, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 150, in get
request = self.convert_tornado_request_to_django_request()
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 113, in convert_tornado_request_to_django_request
request = WSGIRequest(environ)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/django/core/handlers/wsgi.py", line 66, in __init__
script_name = get_script_name(environ)
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/event_queue.py", line 611, in <lambda>
signal.signal(signal.SIGTERM, lambda signum, stack: sys.exit(1))
SystemExit: 1
```
Supervisor then terminates the process with a SIGKILL, which results
in dropping data held in the tornado process, as it does not dump its
queue.
The only command which is safe to run in the signal handler is
`ioloop.add_callback_from_signal`, which schedules the callback to run
during the course of the normal ioloop. This callbacks does an
orderly shutdown of the server and the ioloop before exiting.
This commit ensures that all markdown fixtures have unique
test names by rewriting the names of some of them and adding
a test in `test_markdown.py`.
Earlier this was over-writing the value for same keys in
`load_markdown_tests` in `test_markdown.py`.
Race conditions in stream unsubscription may lead to multiple
back-to-back SUBSCRIPTION_DEACTIVATED RealmAuditLog entries for the
same stream. The current logic constructs duplicate UserMessage
entries for such, which then later fail to insert.
Keep a set of message-ids that have been prep'd to be inserted, so
that we don't duplicate them if there is a duplicated
SUBSCRIPTION_DEACTIVATED row. This also renames the `message` local
variable, which otherwise overrode the `message` argument of a
different type.
Previously, our codebase contained links to various versions of the
Django docs, eg https://docs.djangoproject.com/en/1.8/ref/
request-response/#django.http.HttpRequest and https://
docs.djangoproject.com/en/2.2/ref/settings/#std:setting-SERVER_EMAIL
opening a link to a doc with an outdated Django version would show a
warning "This document is for an insecure version of Django that is no
longer supported. Please upgrade to a newer release!".
Most of these links are inside comments.
Following the replacement of these links in our docs, this commit uses
a search with the regex "docs.djangoproject.com/en/([0-9].[0-9]*)/"
and replaces all matches with "docs.djangoproject.com/en/3.2/".
All the new links in this commit have been generated by the above
replace and each link has then been manually checked to ensure that
(1) the page still exists and has not been moved to a new location
(and it has been found that no page has been moved like this), (2)
that the anchor that we're linking to has not been changed (and it has
been found that no anchor has been changed like this).
One comment where we mentioned a Django version in text before linking
to a page for that version has also been changed, the comment
mentioned the specific version when a change happened, and the history
is no longer relevant to us.
This resolves the issues reported in #20108, major chunk of which were
due to the incomplete support for importing the livechat streams/messages
in the tool. So, it's best not to import any livechat streams/messages for
now until a complete support for importing the same is developed.
The decorator form is clearer by being more explicit; additionally,
the api_by_user rate-limit only currently used in one place, and makes
it difficult to test per-user rate-limits that are more specific.
Both `create_realm_by_ip` and `find_account_by_ip` send emails to
arbitrary email addresses, and as such can be used to spam users.
Lump their IP rate limits into the same bucket; most legitimate users
will likely not be using both of these endpoints at similar times.
The rate is set at 5 in 30 minutes, the more quickly-restrictive of
the two previous rates.
The existing test did no verify that the rate limit only applied to
127.0.0.1, and that other IPs were unaffected. For safety, add an
explicit test of this.
The only use case of rate_limit_rule which does not clear the
RateLimitedIPAddr history is test_hit_ratelimits_as_remote_server,
which is not made any worse by clearing out the IP history for a
non-existent `api_by_remote_server` domain.
Since `prefers_web_public_view` key in session is only
relevant to users without an account, this key should no longer
be present in the user's session object.
Fixes#19907
For export realm following changes have been made:
- `./manage.py export --upload` would delete `.tar.gz` and unpacked dir
- `./manage.py export` would only delete `unpacked dir`
Besides, we have removed `--delete-after-upload` as we have set it as
the default.
Fixes#20081
If realm is web_public, spectators can now view avatar of other
users.
There is a special exception we had to introduce in rest model to
allow `/avatar` type of urls for `anonymous` access, because they
don't have the /api/v1 prefix.
Fixes#19838.
This commit adds functionality to import messages from the
Discussions having direct channels as their parent. As we don't
have topics in the PMs, the messages are imported in interleaved
form in the imported direct channels/PMs.
This was completely unsupported earlier and would have resulted in
an error.
This commit updates the error message returned when the maximum
invite limit for the day. We update the error returned by API to
only mention that the limit is reached and add the suggestion
to use multi-use link or contact support in the message shown
in webapp.
Add `escape_navigates_to_default_view` as a bool setting in
UserBaseSettings model and implement it as a checkbox that toggles
the hotkey implementation of escape to the default view in the
advanced user display settings.
With /help/ documentation edits from Alya Abbott.
Fixes#20043.
We always use delivery_email to generate gravatar_url, but in
test_admin_api_hide_emails we were passing email to get_gravatar_url
and matched with the avatar_url field of the fetched user object.
The tests were passing because the email_address_is_realm_public
was using old realm object and thus email field was incorrectly
set to delivery_email even when email_address_visibility was set
to EMAIL_ADDRESS_VISIBILITY_ADMINS.
This commit fixes the test to pass delivery_email to get_gravatar_url.
We create RealmUserDefault object for internal realm just
for consistency. The code in migration does so but it
was missed to add the code when creating new internal realm.
Not proxying these requests through camo is a security concern.
Furthermore, on the desktop client, any embed image which is hosted on
a server with an expired or otherwise invalid certificate will trigger
a blocking modal window with no clear source and a confusing error
message; see zulip/zulip-desktop#1119.
Rewrite all `message_embed_image` URLs through camo, if it is enabled.
We add discussion id and url in the comments and highlighted title to
the body of disscussion message to make it more meaningful and accessible.
Fixes#19938.
We aim to use Zulip topics thoughtfully in displaying messages from
discussions, as well as linking to the discussion in every message so
that it's easy to view them.
Fixes#19938.
Supporting URL percent-encoded bytes is possible using `%%20`, but this
is not necessarily very understandable to end-users, even those that
understand percent encoding.
Allow `%20` in linkifier URL format strings, and transform them into
`%%20` in the pattern just before they are applied in markdown
translation. Care must be taken here, such that already-escaped `%`s
are not escaped an extra time.
We do this before rendering, and not before storage, as
a simplification; the JS-side linkifier at present only understands
`%(foo)s` and thus needs no changes, and to avoid an un-escaping pass
before showing in the admin UI.
User-supplied custom realm filter has had some sort of regex-based
validation of the format URL since their introduction in
d7e1e4a2c0 -- and this has always been
in addition to the URLValidator. The URLValidator is the one which
does the security-relevant work of validating that the schema is
reasonable, and that the overall shape of the URL is well-formed. The
regex has served primarily to arbitrary limit the characters that can
appear in the URL, in the mistaken name of safety.
Adjust the regex, such that its only purpose is to verify that the
usages of `%` characters in the URL are reasonable, and leave the URL
validation to the URLValidator, which can do a far better job. This
includes broadening the support to include `%%` as an escape
character; this is likely such a niche case as to be unnecessary, but
costs little.
Fixes#16013.
og:image is supposed to be an absolute URL, but some sites incorrectly
provide a relative URL. In this case, it makes more sense to
interpret it relative to the full page URL after redirects, rather
than relative to just the domain part of the page URL before
redirects.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We don't yet have a do_reactivate_stream function, but we reserve a
number since:
1. It'll likely be added in the future.
2. For now, we can restore archived stream with some manual intervention
in the Django shell, and for that we'll want to create an appropriate
RealmAuditLog entry.
Removes the `/day` and `/night` options from the typeahead menu while
still allowing the commands to be used. Typing `/day` and `/night`
will now suggest `/light` and `/dark`, respectively. Also changes the
`Dark mode` and `Light mode` popups that appear after using the
corresponding command.
Fixes#18318.
Created a schema for the ignored_parameters_unsupported that is
returned by the /settings and /realm/user_settings_defaults endpoints
and removed the duplicated text in the api documentation.
Also cleaned up some small errors in the /realm/user_settings_default
definition and sidebar link /api/update-realm-user-settings-defaults.
Fixes#19674
Since 3853285241, PushNotificationsWorker uses the aioapns library
to send Apple push notifications. This introduces an asyncio event
loop into this worker process, which, if unlucky, can respond poorly
when a SIGALRM is introduced to it:
```
[asyncio] Task exception was never retrieved
future: <Task finished coro=<send_apple_push_notification.<locals>.attempt_send() done, defined at /path/to/zerver/lib/push_notifications.py:166> exception=WorkerTimeoutException(30, 1)>
Traceback (most recent call last):
File "/path/to/zerver/lib/push_notifications.py", line 169, in attempt_send
result = await apns_context.apns.send_notification(request)
File "/path/to/zulip-py3-venv/lib/python3.6/site-packages/aioapns/client.py", line 57, in send_notification
response = await self.pool.send_notification(request)
File "/path/to/zulip-py3-venv/lib/python3.6/site-packages/aioapns/connection.py", line 407, in send_notification
response = await connection.send_notification(request)
File "/path/to/zulip-py3-venv/lib/python3.6/site-packages/aioapns/connection.py", line 189, in send_notification
data = json.dumps(request.message, ensure_ascii=False).encode()
File "/usr/lib/python3.6/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/path/to/zerver/worker/queue_processors.py", line 353, in timer_expired
raise WorkerTimeoutException(limit, len(events))
zerver.worker.queue_processors.WorkerTimeoutException: Timed out after 30 seconds processing 1 events
```
...which subsequently leads to the worker failing to make any progress
on the queue.
Remove the timeout on the worker. This may result in failing to make
forward progress if Apple/Google take overly long handling requests,
but is likely preferable to failing to make forward progress if _one_
request takes too long and gets unlucky with when the signal comes
through.
This makes logging more consistent between FCM and APNs codepaths, and
makes clear which user-ids are for local users, and which are opaque
integers namespaced from some remote zulip server.
Being able to determine how many distinct users are getting push
notifications per remote host is useful, as is the distribution of
device counts. This parallels the log line in
handle_push_notification for push notifications from local realms,
handled via the event queue.
We should use more selective query for UserGroupMembership
objects in tests for checking adding and removing members.
This is done by checking the membership counts for the
particular user group only.
This will help in keeping the tests more understandable
after we add members to the role-based system groups,
since that would create a lot of membership objects.
We make the UserGroup queries in user group creation and
deletion tests more selective by fitering the user groups
which belong to the realm and not the one included in
lear realm, etc.
This will help us to keep the tests more understandable
when the counts of UserGroup increases due to addition of
system groups. There is no need to consider system groups
of other realms in these tests.
It is confusing to have the plan type constants not be namespaced
by the thing they represent. We already have a namespacing
convention in place for constants, so we should use it for
Realm.plan_type as well.
* Remove unnecessary json_validator for full_name parameter.
* Update frontend to pass the right parameter.
* Update documentation and note the change.
Fixes#18409.
`rendered_content` in historical messages may be empty; examining the
history of them may thus require diff'ing two empty strings, which
itself produces an empty string.
Use `lxml.html.fragment_fromstring` to be able to successfully parse
these, rather than 500.
Part of #19559.
As detailed in the comments, the default behavior is undesirable for us
because we can't really predict all possibilities of exceptions that may
be raised - and thus putting str(e) in the http response is potentially
insecure as it may leak some unexpected sensitive information that was
in the exception.
As a hypothetical example - KeyError resulting from some buggy
some_dict[secret_string] call would leak information. Though of course
we aim to never write code like that.
We pass allow_realm_admin as True to access_stream_by_id for
`GET users/{user_id}/subscriptions{stream_id}` endpoint
because we want to allow non-subscribed admins to get
subscription status in private streams.
Fixes#19077.
This commit adds related_name parameter to UserGroup.direct_members
such that we can use direct_groups instead of the default
usergroupmembership_set for getting all the groups of which the
user is direct member.
This commit also sets related_name of UserGroupMembership.user_group
and UserGroupMembership.user_profile to "+" which means that we will
not be having backward relations for these. This change is correct
since we would need to use the recursive queries to get all the
groups of a user and all the members of a group after we add the
subgroups concept in next commit. This leads to us using direct_members
field of UserGroup instead of usergroupmembership_set in mention code,
but this will soon be replaced with the recursive query function to
include subgroup's members as well.
Extracted this commit from #19866.
Authored-by : Anders Kaseorg <anders@zulip.com>
This commit makes the query in get_user_group_direct_members
efficient by directly fetching user-profile ids instead of
first fetching user profile object and then id.
The previous commit apparently didn't have its migration attached properly.
Since this is just a reference to a ManyToManyField, this migration
doesn't actually modify the database, but it is needed for CI to pass.
This commit renames members field of UserGroup to direct_members
for better readability because in the new permissions model, a
user group can be a sub-group of another group and thus technically
members of sub-group will also be members of that group.
This is a prep commit for new permissions model.
Extracted this commit from #19866.
Co-authored-by: Anders Kaseorg <anders@zulip.com>
This is a prep commit for new permissions model in
which a user group would be able to have a subgroup.
This commit renames get_memberships_of_users to
get_direct_memberships_of_users to specify that
the function is used only to fetch the direct
memberships and not memberships of subgroups of
the direct group.
Extracted this commit from #19866.
Co-authored-by: Anders Kaseorg <anders@zulip.com>