zulip/puppet/zulip_ops/files
Alex Vandiver 9b1bdfefcd nagios: Use a better index on UserActivity for zephyr alerting.
Limiting only by client_name and query leads to a very poorly-indexed
lookup on `query` which throws out nearly all of its rows:

```
Nested Loop  (cost=50885.64..60522.96 rows=821 width=8)
  ->  Index Scan using zerver_client_name_key on zerver_client  (cost=0.28..2.49 rows=1 width=4)
        Index Cond: ((name)::text = 'zephyr_mirror'::text)
  ->  Bitmap Heap Scan on zerver_useractivity  (cost=50885.37..60429.95 rows=9052 width=12)
        Recheck Cond: ((client_id = zerver_client.id) AND ((query)::text = ANY ('{get_events,/api/v1/events}'::text[])))
        ->  BitmapAnd  (cost=50885.37..50885.37 rows=9052 width=0)
              ->  Bitmap Index Scan on zerver_useractivity_2bfe9d72  (cost=0.00..16631.82 rows=..large.. width=0)
                    Index Cond: (client_id = zerver_client.id)
              ->  Bitmap Index Scan on zerver_useractivity_1b1cc7f0  (cost=0.00..34103.95 rows=..large.. width=0)
                    Index Cond: ((query)::text = ANY ('{get_events,/api/v1/events}'::text[]))
```

A partial index on the client and query list is extremely effective
here in reducing PostgreSQL's workload; however, we cannot easily
write it as a migration, since it depends on the value of the ID of
the `zephyr_mirror` client.

Since this is only relevant for Zulip Cloud, we manually create the
index:

```sql
CREATE INDEX CONCURRENTLY zerver_useractivity_zehpyr_liveness
    ON zerver_useractivity(last_visit)
 WHERE client_id = 1005
   AND query IN ('get_events', '/api/v1/events');
```

We rewrite the query to do the time limit, distinct, and count in SQL,
instead of Python, and make use of this index.  This turns a 20-second
query into two 10ms queries.
2023-11-30 16:01:55 -08:00
..
apache puppet: Move nagios to behind teleport. 2021-06-02 18:38:38 -07:00
apt/apt.conf.d puppet: Prevent unattended upgrades of erlang-base. 2023-05-16 14:02:06 -07:00
certs
cron.d cron: Remove unused STATE_FILE environment variable. 2022-06-22 12:07:38 -07:00
grafana grafana: Enable auto-sign-up. 2022-07-19 17:52:17 -07:00
iptables iptables: Stop logging on dropped packets. 2023-08-30 15:29:01 -07:00
munin puppet: Configure munin and nagios under apache with puppet. 2020-07-13 13:23:11 -07:00
munin-plugins munin: Update to use NAGIOS_BOT_HOST. 2021-01-27 12:07:09 -08:00
nagios4 nagios: Remove load monitoring. 2023-09-14 09:29:29 -07:00
nagios_plugins/zulip_zephyr_mirror nagios: Use a better index on UserActivity for zephyr alerting. 2023-11-30 16:01:55 -08:00
needrestart puppet: Tell needrestart to not default to restarting core services. 2022-07-19 17:51:18 -07:00
nginx puppet: Serve /etc/zulip/well-known/ in nginx as /.well-known/. 2023-10-04 15:56:42 -07:00
postgresql puppet: Add a database teleport server. 2021-06-08 22:21:21 -07:00
prometheus puppet: Only fetch from running hosts in Grafana ec2 discovery. 2021-12-09 08:12:03 -08:00
supervisor/conf.d puppet: Switch teleport to running under systemd, not supervisord. 2023-03-15 17:23:42 -04:00
chrony.conf puppet: Configure chrony to use AWS-local NTP sources. 2022-03-25 17:07:53 -07:00
common-session
dot_emacs.el
krb5.conf puppet: Replace debathena krb5 package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
nagios_ssh_config puppet: Use existing autossh tunnels as OpenSSH "master" sockets. 2022-11-01 22:24:40 -07:00
process_exporter.yaml puppet: Rename and generalize Tornado process exporter. 2023-08-06 13:41:10 -07:00
sshd_config
statuspage-pusher zulip_ops: Configure stats to be pushed to status.zulip.com. 2023-11-16 16:21:12 -05:00
teleport_app.yaml puppet: Only include "app_service" section if there are apps. 2022-04-26 16:36:13 -07:00
teleport_node.yaml puppet: Only include "app_service" section if there are apps. 2022-04-26 16:36:13 -07:00
teleport_server.yaml teleport: Add explicit WebAuthn config, not just U2F. 2022-07-18 11:41:00 -07:00
zephyr-clients puppet: Replace debathena zephyr package with equivalent puppet file. 2022-01-18 14:13:28 -08:00