It's rare that there's value in having the log files get this big, and
these changes mean these log files should never consume more than a
few gigabytes.
And in particular, the server.log is far more important than the other
log files, and grows much faster, so we might as well spend most of
the space we are spending on that.
I estimate that the total size of log files from this is going to be
under 1-2GB, since 75MB (compressed size) * 10 (compressed logs) +
500MB (uncompressed size) = 1.25GB from server.log, and the rest is
negligible.
Fixes part of #5724.
Most of these log files are useless except a few minutes after an
event happens, and the aggregate effect of the originals size limits
meant that Zulip's logs could consume many gigabytes of disk.
The new logging strategy should limit our usage from supervisor logs
to at most 3 Gigabytes:
* 20 * 3 = 60MB per queue worker => <1GB.
* 100 * 10 = 1GB for Django and Tornado logs.
Fixes part of #5724.
While running queue processors multithreaded will limit the
performance available to very small systems, it's easy to fix that by
adding more RAM, and previously, Zulip didn't work on such systems at
all, so this is unambiguously an improvement there.
Fixes#32.
Fixes#34.
(Commit message expanded significantly by tabbott.)
Also puts them into a processing queue, though the queue processor
does nothing.
Rewritten by tabbott to avoid unnecessary database queries in
do_send_messages.
This fixes a performance problem where we were previously starting up
a full Django process (~0.7s even on a fast machine) every time a new
email came in, potentially allowing users to accidentally DoS a Zulip
server. Now, we just post over HTTPS, allowing the existing thread
pool support to do its job.
- Add script wrapper to communicate postfix pipe with django web server
over HTTP(S). It uses shared_secret authentication mode.
- Add django view to process messages from email mirror server.
- Clean management command `email-mirror`. Left just functional
for cron email processing.
- Add routes for new tornado view.
- Change pipe script in master process postfix config template
based on updated script.
- Add tests.
Tweaked by tabbott to adjust the directory and set better defaults.
Fixes#2421.
- Enable `master` parameter for `uswgi` configuration.
It allows cleaning leaked processes if the parent
process is closed unexpectedly or with SIGKILL command.
Child processes follow to the master and kill themselves
after the main process.
Fixes#3855
- Add new 'missedmessage_email_senders' queue for sending missed messages emails.
- Add the new worker to process 'missedmessage_email_senders' queue.
- Split aggregation missed messages and sending missed messages email
to separate queue workers.
- Adapt tests for sending missed emails to the new logic.
Fixes#2607
Using `supervisorctl restart all` carried longer downtime (since it
just restarts everything at the same moment) and was less under our
control; I'm not sure it had any advantages.
Since browser clients send messages via websockets and not the API,
this is an important element in making sure mission-critical Zulip
functionality is working.
I'm not altogether happy with this (a better solution would be
database-level locking), but I think it solves the immediate problem
of folks with 2 servers being very likely to run analytics on both of
them.
This results in a brief service interruption (not a graceful restart),
but fixes a bug where on a `supervisorctl restart zulip-django`, we'd
end up leaking a bunch of uwsgi processes.
The mechanism was that sending SIGHUP to uwsgi was a command for it to
gracefully restart, so it'd start doing that (whereas supervisor
expected it to be dying)... and then supervisor would start up the new
uwsgi process group, resulting in 2 uwsgi process groups running.
This, in turn, led to a memory leak that could eventually result in
OOM kills.
The old zulip_ops Nagios configuration depended on Nagios having the
ability to login as the zulip user (with essentially full write
access); this configuration is helpful for limiting nagios to special
"nagios" user with more limited credentials.
Previously, the CRITICAL state would never fire (because x > 6 =>
x > 3). Additionally, 6s is not so unusually high as to deserve being
immediately pageable.
- Add websocket client to create connection with SockJS websocket server.
It contains callback method to launch after connection setup.
- Add '--websocket' parameter to 'check_send_receive_time' script to
check websocket connection.
- Add testing websocket connection to production installation checking.
- Add cronjob to launch websocket connection nagios test.
This makes it possible for Zulip Nagios monitoring to check for
problems impacting the websockets sending code path, which is what all
web users use.
This change adds support for displaying inline open graph previews for
links posted into Zulip.
It is designed to interact correctly with message editing.
This adds the new settings.INLINE_URL_EMBED_PREVIEW setting to control
whether this feature is enabled.
By default, this setting is currently disabled, so that we can burn it
in for a bit before it impacts users more broadly.
Eventually, we may want to make this manageable via a (set of?)
per-realm settings. E.g. I can imagine a realm wanting to be able to
enable/disable it for certain URLs.
(Why is -u needed at all? I’m not sure, but test-run-dev spins forever
“Polling run-dev...” without it.)
Signed-off-by: Anders Kaseorg <andersk@mit.edu>