Commit Graph

41 Commits

Author SHA1 Message Date
Alex Vandiver 50c3dd88e6 models: Migrate ids of all non-Message-related tables to bigint.
Migrate all `ids` of anything which does not have a foreign key from
the Message or UserMessage table (and would thus require walking
those) to be `bigint`.  This is done by removing explicit
`BigAutoField`s, trading them for explicit `AutoField`s on the tables
to not be migrated, while updating `DEFAULT_AUTO_FIELD` to the new
default.

In general, the tables adjusted in this commit are small tables -- at
least compared to Messages and UserMessages.

Many-to-many tables without their own model class are adjusted by a
custom Operation, since they do not automatically pick up migrations
when `DEFAULT_AUTO_FIELD` changes[^1].

Note that this does multiple scans over tables to update foreign
keys[^2].  Large installs may wish to hand-optimize this using the
output of `./manage.py sqlmigrate` to join multiple `ALTER TABLE`
statements into one, to speed up the migration.  This is unfortunately
not possible to do generically, as constraint names may differ between
installations.

This leaves the following primary keys as non-`bigint`:
- `auth_group.id`
- `auth_group_permissions.id`
- `auth_permission.id`
- `django_content_type.id`
- `django_migrations.id`
- `otp_static_staticdevice.id`
- `otp_static_statictoken.id`
- `otp_totp_totpdevice.id`
- `two_factor_phonedevice.id`
- `zerver_archivedmessage.id`
- `zerver_client.id`
- `zerver_message.id`
- `zerver_realm.id`
- `zerver_recipient.id`
- `zerver_userprofile.id`

[^1]: https://code.djangoproject.com/ticket/32674
[^2]: https://code.djangoproject.com/ticket/24203
2024-06-05 11:48:27 -07:00
Alex Vandiver 4f4725f810 analytics: Migrate models' id columns to bigint.
This helps prevent wraparound on exceedingly large and old installs,
particularly Zulip Cloud.  These are relatively simple migrations
since they are not referenced by any other tables; however, they are
quite large, and are actively used from Django by running servers,
making this not a migration which is possible to run without stopping
the server.

Use the escape hatch in the previous commit to temporarily pause
analytics writes while the migration happens.  This should make the
migration transparent to users, at the small cost of an artificial dip
in statistics (specifically, to push notification counts, and unread
message counts) while the migration runs.
2024-06-05 11:48:27 -07:00
Alex Vandiver 09e9c75ec6 analytics: Remove `active_users` and `active_users_log` metrics.
Both of these are inaccurate, not currently used anywhere, and have
been superseded by the `active_users_audit` metric.
2024-06-03 12:35:35 -07:00
Alex Vandiver 0100440a86 analytics: Make active_users_audit into a RealmCount.
With `realm_active_humans` no longer dependent on the per-user rows,
there is no reason to preserve them -- any measure of "was a user
active" should look directly at the much richer RealmAuditLog.  This
removes the bulk of the UserCount table, since the remaining rows all
require user interaction of some sort to produce rows.
2024-06-03 12:35:35 -07:00
Alex Vandiver a782aae78e analytics: Regenerate partial indexes due to Django bug.
Due to a bug[^1] in Django 4.2, fixed in 4.2.6, queries using
`__isnull` added an unnecessary cast.  This cast was _also_ used in
`WHERE` clauses for partial indexes.  This means that partial indexes
created before Zulip was using Django 4.2 (i.e. before Zulip Server
7.0 or 2c20028aa4) will not be used when the server is using Django
4.2.0 through 4.2.5 -- and, conversely, that indexes created while
Zulip had those versions of Django (i.e. Zulip Server 7.0 through 7.4
or 7807bff526) will not be used later.

We re-create the indexes, to ensure that users that installed Zulip
after Zulip Server 7.0 / 2c20028aa4 and before Zulip Server 7.5 /
7807bff526 have indexes which can be used by current Django.  This
is useless work for some installations, but most analytics tables are
not large enough for this to take significant time.

[^1]: https://code.djangoproject.com/ticket/34840
2023-11-16 13:53:04 -08:00
Anders Kaseorg 7e707270f0 models: Convert deprecated index_together option to indexes.
index_together is slated for removal in Django 5.1:
https://docs.djangoproject.com/en/4.2/internals/deprecation/#deprecation-removed-in-5-1

We set the optional index names to match the previously generated
index names to avoid adding new migrations.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-12 07:12:43 -07:00
Anders Kaseorg 0628c3cac8 migrations: Import BaseDatabaseSchemaEditor from its canonical module.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-05 14:46:28 -08:00
Anders Kaseorg df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Zixuan James Li d5517932cd typing: Use BaseDatabaseSchemaEditor in place of DatabaseSchemaEditor.
This is a part of #18777.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-05-30 14:18:53 -07:00
Anders Kaseorg 6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg 11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
arpit551 a68d38cc52 migrations: Upgrade migrations to remove duplicates in all Count tables.
This commit upgrades 0015_clear_duplicate_counts migration to remove
duplicate count in StreamCount, UserCount, InstallationCount as well.

Fixes https://github.com/zulip/docker-zulip/issues/266
2020-07-30 15:18:00 -07:00
Anders Kaseorg 365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Tim Abbott 69ae4931c3 migrations: Use django.db.backends.postgresql.schema.
This replaces django.db.backends.postgresql_psycopg2, which has been
an alias to django.db.backends.postgresql since Django 1.9.
2020-04-26 22:20:24 -07:00
Anders Kaseorg c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00
Mateusz Mandera 64b85415f5 migrations: Fix unused import error. 2020-03-06 12:17:19 -08:00
arpit551 f299f31340 analytics: Fix missing unique constraint when subgroup is null.
Replaced unique_together with UniqueConstraint in models that
covered nullable fields as in unique_together database indexes
don't work where subgroup=None. So added conditional unique
index handling invalid duplicate Count data.

Added 0015_clear_duplicate_counts migration to handle existing
data that violates the constraints.

Also corrected a test case in test_counts.py which didn't clear its
state properly and thus was accidentally taking advantage of this
database schema bug.
2020-03-06 11:10:04 -08:00
Tim Abbott 9ac3e1099c analytics: Remove last_modified field from FillState.
This field wasn't used for anything, and I think it has very limited
use for debugging, since fundamentally, it'll almost always have a
value within the hour of the actual timestamp in FillState, and any
more fine-grained logging we might want would be available in the
analytics job's own logs.

The proximal reason to remove it is that apparently Django's
model_to_dict doesn't support auto_now fields, and that caused some
trouble when working on adding more complete import/export support for
analytics data.
2020-01-26 20:38:26 -08:00
Tim Abbott 8e7ce7cc79 python: Sort migrations/management command imports with isort.
This is a preparatory commit for using isort for sorting all of our
imports, merging changes to files where we can easily review the
changes as something we're happy with.

These are also files with relatively little active development, which
means we don't expect much merge conflict risk from these changes.
2020-01-14 13:07:47 -08:00
Anders Kaseorg 4bd28f7ae6 migrations: Remove unused imports.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2019-02-02 17:01:04 -08:00
Rishi Gupta 85f7ac8172 analytics: Remove Anomaly model. 2019-02-01 18:48:18 -08:00
Tim Abbott c679920c01 python: Fix unnecessary uses of str_utils library. 2018-11-27 11:44:09 -08:00
Tim Abbott f0ef335412 models: Remove unused ModelReprMixin class.
It appeared to be used as a base class in various Django migrations,
but because it didn't define any model fields, it wasn't actually.
2018-05-15 19:11:22 -07:00
rht 8106a25e61 django-2.0: Add on_delete on ForeignKeys.
In Django 2.0, one must specify the on_delete behavior for all
ForeignKeys explicitly.
2018-01-30 10:53:54 -08:00
rht d1689b5884 analytics: Use python 3 syntax for typing. 2017-11-17 13:16:49 -08:00
Tim Abbott 2b43a0302a python: Sort imports in smaller apps. 2017-11-15 15:55:49 -08:00
rht b2ad8fd747 py3: Remove all `from __future__ import unicode_literals`.
This was mostly used in migrations, so it's a pretty safe change.
2017-10-17 23:07:42 -07:00
Umair Khan c74f125b7c analytics: Add on_delete in foreign keys.
on_delete will be a required arg for ForeignKey in Django 2.0. Set it
to models.CASCADE on models and in existing migrations if you want to
maintain the current default behavior.
See https://docs.djangoproject.com/en/1.11/ref/models/fields/#django.db.models.ForeignKey.on_delete
2017-06-13 15:13:49 -07:00
Rishi Gupta dfbeab73b5 analytics: Change update_analytics_counts to only use hour boundaries.
Fixes a recent regression where analytics were not being run on hour
boundaries.

Includes a migration that dumps all the analytics data.
2017-04-28 16:15:07 -07:00
hollywoodno dd067c761a analytics: Separate private messages from group private messages.
This makes it possible for our graphs to show the group private
message counts as separate from 1:1 private messages.

Fixes #4102.
2017-03-20 11:46:29 -07:00
Rishi Gupta 87981a2bf1 analytics: Fix direct import of models in migrations. 2017-03-14 16:59:54 -07:00
Rishi Gupta 20255e48a4 analytics: Change messages_sent_to_stream to a daily stat.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.

Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
  same as the other messages_sent stats, but they are different (these don't
  include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
  allowable length for a BaseCount.property.

Includes a database migration to remove the old stat from the analytics
tables.
2017-03-03 16:11:28 -08:00
Tim Abbott b7df84d5a8 analytics: Add indexes to optimize performance of aggregation.
These indexes fix some slow queries used in updating the analytics
tables, resulting in the analytics system consuming far less total
resources.
2017-02-01 15:47:49 -08:00
Rishi Gupta 68fcb4152f analytics: Remove interval field from *Count tables.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
2017-01-17 15:54:57 -08:00
umkay a94599fca7 analytics/models.py: Add subgroup column to unique_together constraints. 2016-11-01 16:53:56 -07:00
umkay e92604ab78 analytics: Alter field length for property and interval in BaseCount. 2016-10-27 16:33:58 -07:00
umkay 610e92b94e analytics: Add subgroup column to analytics tables.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows

* Grouping sets of similar CountStats in the *Count tables. For instance,
  active{_humans,_bots} will now have the same property, but have different
  subgroup values.

* Combining queries that differ only in their value on 1 filter clause, so
  that we make fewer passes through the zerver tables. For instance, instead
  of running a query for each of messages_sent_to_public_streams and
  messages_sent_to_private_streams, we can now run a single query with a
  group by on Stream.invite_only, and store the group by value in the
  subgroup column.
2016-10-27 16:33:58 -07:00
Rishi Gupta 655ee51e35 analytics: Add table to keep track of fill state.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
  in chronological order.

Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
2016-10-14 10:18:37 -07:00
umkay 721529b782 analytics: Remove HuddleCount for now.
Planned changes to the underlying analytics model will require potentially
complicated changes to huddle queries.
2016-10-14 10:18:37 -07:00
umkay 78477ea071 Reorder the columns in analytics tables inherited from BaseCount.
This is primarily implemented through altering the migration file in
order to move the columns, but also we try to make the defaults a
little better for future tables inherited from BaseCount.
2016-10-06 17:51:01 -07:00
umkay d260a22637 Add a new statistics/analytics framework.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
  aggregating user/stream level stats into realm level stats
* A management command for pulling the data

Note that counts.py was added to the linter exclude list due to errors
around %%s.
2016-10-04 17:18:54 -07:00