While this is a different system than I'd written up in #8004, I think
this is a better solution to the general problem of cron jobs to run
on just one server.
Fixes#8004.
This should make it possible to use the zulip_ops base rules
successfully on chat.zulip.org. Many of the changes in this commit
are hacks and probably can be cleaned up later, but given that we plan
to drop trusty support soon, it's likely that most of them will simply
be deleted then.
This doesn't yet pass all Nagios checks correctly, and still has a few
flaws:
* The ideal setup code for the `nagios` user in the database isn't included.
* Some of the other details are a bit off; we need to split some host roles.
But it's better than nothing, and we can iterate from here.
Sparkle was the auto-update system used by the legacy desktop app. We
haven't been capable of using it for auto-update in years, so there's
no reason to keep around the configuration.
The new Electron app uses a different system anyway.
This allows the Nagios user to access redis without having full access
to the redis system. Ideally, this would eventually use a password
that only has statistics read access, but I'm not sure redis supports
that.
This old puppet configuration was never really used, and regardless
hardcoded an ancient zulip.net hostname. We fix this to use the
zulipconf system to get the host domain (though not, at present, the
hostname).
Arguably, we should make this a symlink, but it's probably a good idea
to have every change in the production Nagios configuration go through
the zulip-puppet-apply diff experience.
This code empirically doesn't work. It's not entirely clear why, even
having done quite a bit of debugging; partly because the code is quite
convoluted, and because it shows the symptoms of people making changes
over time without really understanding how it was supposed to work.
Moreover, this code targets an old version of the APNs provider API.
Apple deprecated that in 2015, in favor of a shiny new one which uses
HTTP/2 to meet the same needs for concurrency and scale that the old
one had to do a bunch of ad-hoc protocol design for.
So, rip this code out. We'll build a pathway to the new API from
scratch; it's not that complicated.
On `trusty` there is no package for `boto` or `gevent` on Python 3, both
of which are dependencies of `wal-e` (at the version we've pinned.) This
is something used only on database servers and only in a replication
scenario, and it doesn't involve any of our code outside the wal-e repo,
so the Python version it uses is quite independent of the Zulip
application server itself and the rest of our code. For now, keep it
explicitly on Python 2 while we move forward for most everything else.
This script in `zulip_ops` is handy for managing EC2 instances. It uses
`boto`, which isn't available in `trusty` for Python 3. The use of
`boto` here isn't particularly deep, so we could replace it with some
more manual HTTP calls if it comes to that. For now, just mark it to
stay on Python 2 while we move the app and all the rest of the ops code
(except this and another straggler or two) to Python 3.
Also make a comment on this package in the Puppet manifest clearer
about what it specifically refers to.
This consists of the `zulip_ops::stats` Puppet class, which has apparently
not been used since 2014, and a number of files that I believe were
only used for that. Also a couple of tiny loose ends in other files.
This is only actually used in our `wal-e` setup, which is in
zulip_ops::postgres_common. (In fact the only mentions of `gevent` in
our whole Git history are for `wal-e`.) So remove where we mention it
on the broader zulip::postgres_common module, and move it where it's
needed.
This follows up on 98cef0ab4 by eliminating the only dependency
outside of the `zulip_ops` Puppet tree on a system Python-library
package which isn't available in `trusty` for Python 3.
In some of these contexts, we may still be *using* the Python 2
version, but at least this should eliminate running into
`ImportError`s one by one in scripts that run outside a virtualenv,
as we update their shebangs to refer to Python 3.
Several Python libraries we use don't come in Python 3 versions on
trusty: gevent, boto, twisted, django, django-tagging, whisper.
The latter two don't come in Python 3 versions even on xenial.
So some work required before we can actually switch the code that
relies on those libraries to run as Python 3 -- probably the best
solution will be to backport them all in our apt repo. (All but
`whisper` are packaged in zesty; `whisper` upstream just grew Python 3
support this year.)
These are no longer useful, with our spiffy new analytics framework,
and we haven't in fact been using them for some time, while the
`active-user-stats` cron job does cause regular mail from cron.
Just delete them.
I'm not altogether happy with this (a better solution would be
database-level locking), but I think it solves the immediate problem
of folks with 2 servers being very likely to run analytics on both of
them.
The old zulip_ops Nagios configuration depended on Nagios having the
ability to login as the zulip user (with essentially full write
access); this configuration is helpful for limiting nagios to special
"nagios" user with more limited credentials.