RabbitMQ clients have a setting called prefetch[1], which controls how
many un-acknowledged events the server forwards to the local queue in
the client. The default is 0; this means that when clients first
connect, the server must send them every message in the queue.
This itself may cause unbounded memory usage in the client, but also
has other detrimental effects. While the client is attempting to
process the head of the queue, it may be unable to read from the TCP
socket at the rate that the server is sending to it -- filling the TCP
buffers, and causing the server's writes to block. If the server
blocks for more than 30 seconds, it times out the send, and closes the
connection with:
```
closing AMQP connection <0.30902.126> (127.0.0.1:53870 -> 127.0.0.1:5672):
{writer,send_failed,{error,timeout}}
```
This is https://github.com/pika/pika/issues/753#issuecomment-318119222.
Set a prefetch limit of 100 messages, or the batch size, to better
handle queues which start with large numbers of outstanding events.
Setting prefetch=1 causes significant performance degradation in the
no-op queue worker, to 30% of the prefetch=0 performance. Setting
prefetch=100 achieves 90% of the prefetch=0 performance, and higher
values offer only minor gains above that. For batch workers, their
performance is not notably degraded by prefetch equal to their batch
size, and they cannot function on smaller prefetches than their batch
size.
We also set a 100-count prefetch on Tornado workers, as they are
potentially susceptible to the same effect.
[1] https://www.rabbitmq.com/confirms.html#channel-qos-prefetch
The `current_queue_size` key in the queue monitoring stats file was
the local queue size, not the global queue size -- d5a6b0f99a
renamed the function, but did not adjust the queue monitoring JSON,
despite the last use of it having been removed in cd9b194d88.
The function is still used to mark "we emptied our queue," and it
remains a reasonable metric for that.
TOR users are legitimate users of the system; however, that system can
also be used for abuse -- specifically, by evading IP-based
rate-limiting.
For the purposes of IP-based rate-limiting, add a
RATE_LIMIT_TOR_TOGETHER flag, defaulting to false, which lumps all
requests from TOR exit nodes into the same bucket. This may allow a
TOR user to deny other TOR users access to the find-my-account and
new-realm endpoints, but this is a low cost for cutting off a
significant potential abuse vector.
If enabled, the list of TOR exit nodes is fetched from their public
endpoint once per hour, via a cron job, and cached on disk. Django
processes load this data from disk, and cache it in memcached.
Requests are spared from the burden of checking disk on failure via a
circuitbreaker, which trips of there are two failures in a row, and
only begins trying again after 10 minutes.
It's better for this to catch all exit(...) calls with non-zero exit
code, given the purpose is to catch all exits with failure, as opposed
to only exit(1).
Unhandled exceptions propagating to process_queue were not caught there,
causing improper logging - errors didn't land in errors.log as expected.
Exceptions should be caught and explicitly logged by the process_queue
logger. Exceptions occurring during consuming events are caught and
handled inside the worker's logic - however those that happen while
setting up the worker were not addressed at all, and that's the core bug
we mean to address here.
Furthermore, in multi-threaded mode we want the autoreload mechanism to
be working - which it doesn't without catching the exceptions. The
correct approach is to - again - catch the exception, log it and then
send SIGUSR1 signal to trigger exit and autoreload.
I believe that this migration with the default of atomic=True will
fail when trying to convert the field to PositiveIntegerField if there
were any 0 values present in the database when the migration began.
The fix is to have each of the steps be their own transaction.
Note from tabbott: It's not clear we want fundamentally dynamic
corporate website elements like this jobs page to be in zulip/zulip,
but there's even less reason for there to be an out-of-date copy there.
A SIGTERM can show up at any point in the ioloop, even in places which
are not prepared to handle it. This results in the process ignoring
the `sys.exit` which the SIGTERM handler calls, with an uncaught
SystemExit exception:
```
2021-11-09 15:37:49.368 ERR [tornado.application:9803] Uncaught exception
Traceback (most recent call last):
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/http1connection.py", line 238, in _read_message
delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/httpserver.py", line 314, in finish
self.delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/routing.py", line 251, in finish
self.delegate.finish()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2097, in finish
self.execute()
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 2130, in execute
**self.path_kwargs)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper
yielded = next(result)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/tornado/web.py", line 1510, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 150, in get
request = self.convert_tornado_request_to_django_request()
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/handlers.py", line 113, in convert_tornado_request_to_django_request
request = WSGIRequest(environ)
File "/home/zulip/deployments/2021-11-08-05-10-23/zulip-py3-venv/lib/python3.6/site-packages/django/core/handlers/wsgi.py", line 66, in __init__
script_name = get_script_name(environ)
File "/home/zulip/deployments/2021-11-08-05-10-23/zerver/tornado/event_queue.py", line 611, in <lambda>
signal.signal(signal.SIGTERM, lambda signum, stack: sys.exit(1))
SystemExit: 1
```
Supervisor then terminates the process with a SIGKILL, which results
in dropping data held in the tornado process, as it does not dump its
queue.
The only command which is safe to run in the signal handler is
`ioloop.add_callback_from_signal`, which schedules the callback to run
during the course of the normal ioloop. This callbacks does an
orderly shutdown of the server and the ioloop before exiting.
Changes `update_page` to only update the modified user setting instead
of updating all of the user settings on the page. This is modeled on
the behavior for updates to the realm user default settings.
This is a follow up on feedback in #20070.
Now that it's further away from the composebox, we probably want it to
be visible for longer. Doubling it from 1.5 seconds to 3 seconds seems
reasonable to start with, although we should tune it based on feedback.
Now that this is in the left sidebar, we can remove the now-redundant
compose area button for it. This also changes where the "Saved as
draft" tooltip appears.
This currently shows the drafts as a popup. Eventually, we'll want to
migrate it to be a view in the center pane, as we did with Recent
Topics.
This uses the same style as starred messages in order to show the number
of drafts.
See CZO for more context:
https://chat.zulip.org/#narrow/stream/101-design/topic/drafts.20in.20sidebar
This commit ensures that all markdown fixtures have unique
test names by rewriting the names of some of them and adding
a test in `test_markdown.py`.
Earlier this was over-writing the value for same keys in
`load_markdown_tests` in `test_markdown.py`.
Since the position of topic in recent topics can change, we
focus the last selected topic using the `topic_key` instead
of relying `row_focus` value which is incorrect.
When user is scrolling, we simply keep the center element in
focus.
When user is using hotkeys, we keep the focused element in
center.
When user is using keyboard, we need to always keep the
"focused" topic in visible scrolling area.
We determine if the topic row is above or below the visible
area and scroll half_height_of_visible_area so that the selected
topic is visible.
This gives a nice navigation experience for both the views.
Reduced height of recent topics table to account for
compose box so that focused element is not below compose box.
Race conditions in stream unsubscription may lead to multiple
back-to-back SUBSCRIPTION_DEACTIVATED RealmAuditLog entries for the
same stream. The current logic constructs duplicate UserMessage
entries for such, which then later fail to insert.
Keep a set of message-ids that have been prep'd to be inserted, so
that we don't duplicate them if there is a duplicated
SUBSCRIPTION_DEACTIVATED row. This also renames the `message` local
variable, which otherwise overrode the `message` argument of a
different type.
Using these tuples is clearly uglier than using classes for storing
these encoded stream. This can be built on further to implement the
various fiddly logic around handling these objects inside appropriate
class method.
Previously, opening multiple message_edits and then drag-dropping a
file into any one of them would cause all of them to upload ie you'd
get one uploaded file in each message_edit.
This bug was caused by returning multiple elements from
upload.get_item("drag_drop_container", config) when config.mode =
"edit".
This commit changes the selector to use the row provided (config.row),
and so ensures that the above bug doesn't happen.
This is a prep commit for adding extended descriptions to
message_view_header, it ensures hover effects work even if we add
additional elements to the message_view_header.