This should make it possible to use the zulip_ops base rules
successfully on chat.zulip.org. Many of the changes in this commit
are hacks and probably can be cleaned up later, but given that we plan
to drop trusty support soon, it's likely that most of them will simply
be deleted then.
This doesn't yet pass all Nagios checks correctly, and still has a few
flaws:
* The ideal setup code for the `nagios` user in the database isn't included.
* Some of the other details are a bit off; we need to split some host roles.
But it's better than nothing, and we can iterate from here.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak. While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.
Fixes#7095.
Sparkle was the auto-update system used by the legacy desktop app. We
haven't been capable of using it for auto-update in years, so there's
no reason to keep around the configuration.
The new Electron app uses a different system anyway.
Whatever dist/ functionality this had in 2014 is now served by
zulip.org, and since this serves as a sample, it should be as simple
as possible.
Previously, this was more cluttered than it needed to be.
This allows the Nagios user to access redis without having full access
to the redis system. Ideally, this would eventually use a password
that only has statistics read access, but I'm not sure redis supports
that.
This old puppet configuration was never really used, and regardless
hardcoded an ancient zulip.net hostname. We fix this to use the
zulipconf system to get the host domain (though not, at present, the
hostname).
If a machine is configured with no swap intentationally, that
shouldn't be a Nagios problem. This alert is intended to flag
machines which are swapping.
Arguably, we should make this a symlink, but it's probably a good idea
to have every change in the production Nagios configuration go through
the zulip-puppet-apply diff experience.
This code empirically doesn't work. It's not entirely clear why, even
having done quite a bit of debugging; partly because the code is quite
convoluted, and because it shows the symptoms of people making changes
over time without really understanding how it was supposed to work.
Moreover, this code targets an old version of the APNs provider API.
Apple deprecated that in 2015, in favor of a shiny new one which uses
HTTP/2 to meet the same needs for concurrency and scale that the old
one had to do a bunch of ad-hoc protocol design for.
So, rip this code out. We'll build a pathway to the new API from
scratch; it's not that complicated.
This causes `upgrade-zulip-from-git`, as well as a no-option run of
`tools/build-release-tarball`, to produce a Zulip install running
Python 3, rather than Python 2. In particular this means that the
virtualenv we create, in which all application code runs, is Python 3.
One shebang line, on `zulip-ec2-configure-interfaces`, explicitly
keeps Python 2, and at least one external ops script, `wal-e`, also
still runs on Python 2. See discussion on the respective previous
commits that made those explicit. There may also be some other
third-party scripts we use, outside of this source tree and running
outside our virtualenv, that still run on Python 2.
On `trusty` there is no package for `boto` or `gevent` on Python 3, both
of which are dependencies of `wal-e` (at the version we've pinned.) This
is something used only on database servers and only in a replication
scenario, and it doesn't involve any of our code outside the wal-e repo,
so the Python version it uses is quite independent of the Zulip
application server itself and the rest of our code. For now, keep it
explicitly on Python 2 while we move forward for most everything else.
This script in `zulip_ops` is handy for managing EC2 instances. It uses
`boto`, which isn't available in `trusty` for Python 3. The use of
`boto` here isn't particularly deep, so we could replace it with some
more manual HTTP calls if it comes to that. For now, just mark it to
stay on Python 2 while we move the app and all the rest of the ops code
(except this and another straggler or two) to Python 3.
Also make a comment on this package in the Puppet manifest clearer
about what it specifically refers to.
This consists of the `zulip_ops::stats` Puppet class, which has apparently
not been used since 2014, and a number of files that I believe were
only used for that. Also a couple of tiny loose ends in other files.
This is only actually used in our `wal-e` setup, which is in
zulip_ops::postgres_common. (In fact the only mentions of `gevent` in
our whole Git history are for `wal-e`.) So remove where we mention it
on the broader zulip::postgres_common module, and move it where it's
needed.
This follows up on 98cef0ab4 by eliminating the only dependency
outside of the `zulip_ops` Puppet tree on a system Python-library
package which isn't available in `trusty` for Python 3.
In some of these contexts, we may still be *using* the Python 2
version, but at least this should eliminate running into
`ImportError`s one by one in scripts that run outside a virtualenv,
as we update their shebangs to refer to Python 3.
Several Python libraries we use don't come in Python 3 versions on
trusty: gevent, boto, twisted, django, django-tagging, whisper.
The latter two don't come in Python 3 versions even on xenial.
So some work required before we can actually switch the code that
relies on those libraries to run as Python 3 -- probably the best
solution will be to backport them all in our apt repo. (All but
`whisper` are packaged in zesty; `whisper` upstream just grew Python 3
support this year.)
These are no longer useful, with our spiffy new analytics framework,
and we haven't in fact been using them for some time, while the
`active-user-stats` cron job does cause regular mail from cron.
Just delete them.
Also puts them into a processing queue, though the queue processor
does nothing.
Rewritten by tabbott to avoid unnecessary database queries in
do_send_messages.
- Add new 'missedmessage_email_senders' queue for sending missed messages emails.
- Add the new worker to process 'missedmessage_email_senders' queue.
- Split aggregation missed messages and sending missed messages email
to separate queue workers.
- Adapt tests for sending missed emails to the new logic.
Fixes#2607
Since browser clients send messages via websockets and not the API,
this is an important element in making sure mission-critical Zulip
functionality is working.
I'm not altogether happy with this (a better solution would be
database-level locking), but I think it solves the immediate problem
of folks with 2 servers being very likely to run analytics on both of
them.
The old zulip_ops Nagios configuration depended on Nagios having the
ability to login as the zulip user (with essentially full write
access); this configuration is helpful for limiting nagios to special
"nagios" user with more limited credentials.
- Add websocket client to create connection with SockJS websocket server.
It contains callback method to launch after connection setup.
- Add '--websocket' parameter to 'check_send_receive_time' script to
check websocket connection.
- Add testing websocket connection to production installation checking.
- Add cronjob to launch websocket connection nagios test.
This makes it possible for Zulip Nagios monitoring to check for
problems impacting the websockets sending code path, which is what all
web users use.
This allows the actual nagios work involved with
check_send_receive_time nagios checks to be done by an unprivileged
"nagios" user rather than the "zulip" user.
There's no longer a reason to have copies of forked postgres
configuration files in our repository, since some time ago we merged
the features of these configuration files into the main
postgres_appdb_tuned.pp.
The old "zulip_internal" name was from back when Zulip, Inc. had two
distributions of Zulip, the enterprise distribution in puppet/zulip/
and the "internal" SAAS distribution in puppet/zulip_internal. I
think the name is a bit confusing in the new fully open-source Zulip
work, so we're replacing it with "zulip_ops". I don't think the new
name is perfect, but it's better.
In the following commits, we'll delete a bunch of pieces of Zulip,
Inc.'s infrastructure that don't exist anymore and thus are no longer
useful (e.g. the old Trac configuration), with the goal of cleaning
the repository of as much unnecessary content as possible.