zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	50c3dd88e6	models: Migrate ids of all non-Message-related tables to bigint. Migrate all `ids` of anything which does not have a foreign key from the Message or UserMessage table (and would thus require walking those) to be `bigint`. This is done by removing explicit `BigAutoField`s, trading them for explicit `AutoField`s on the tables to not be migrated, while updating `DEFAULT_AUTO_FIELD` to the new default. In general, the tables adjusted in this commit are small tables -- at least compared to Messages and UserMessages. Many-to-many tables without their own model class are adjusted by a custom Operation, since they do not automatically pick up migrations when `DEFAULT_AUTO_FIELD` changes[^1]. Note that this does multiple scans over tables to update foreign keys[^2]. Large installs may wish to hand-optimize this using the output of `./manage.py sqlmigrate` to join multiple `ALTER TABLE` statements into one, to speed up the migration. This is unfortunately not possible to do generically, as constraint names may differ between installations. This leaves the following primary keys as non-`bigint`: - `auth_group.id` - `auth_group_permissions.id` - `auth_permission.id` - `django_content_type.id` - `django_migrations.id` - `otp_static_staticdevice.id` - `otp_static_statictoken.id` - `otp_totp_totpdevice.id` - `two_factor_phonedevice.id` - `zerver_archivedmessage.id` - `zerver_client.id` - `zerver_message.id` - `zerver_realm.id` - `zerver_recipient.id` - `zerver_userprofile.id` [^1]: https://code.djangoproject.com/ticket/32674 [^2]: https://code.djangoproject.com/ticket/24203	2024-06-05 11:48:27 -07:00
Alex Vandiver	4f4725f810	analytics: Migrate models' id columns to bigint. This helps prevent wraparound on exceedingly large and old installs, particularly Zulip Cloud. These are relatively simple migrations since they are not referenced by any other tables; however, they are quite large, and are actively used from Django by running servers, making this not a migration which is possible to run without stopping the server. Use the escape hatch in the previous commit to temporarily pause analytics writes while the migration happens. This should make the migration transparent to users, at the small cost of an artificial dip in statistics (specifically, to push notification counts, and unread message counts) while the migration runs.	2024-06-05 11:48:27 -07:00
Alex Vandiver	09e9c75ec6	analytics: Remove `active_users` and `active_users_log` metrics. Both of these are inaccurate, not currently used anywhere, and have been superseded by the `active_users_audit` metric.	2024-06-03 12:35:35 -07:00
Alex Vandiver	0100440a86	analytics: Make active_users_audit into a RealmCount. With `realm_active_humans` no longer dependent on the per-user rows, there is no reason to preserve them -- any measure of "was a user active" should look directly at the much richer RealmAuditLog. This removes the bulk of the UserCount table, since the remaining rows all require user interaction of some sort to produce rows.	2024-06-03 12:35:35 -07:00
Alex Vandiver	a782aae78e	analytics: Regenerate partial indexes due to Django bug. Due to a bug[^1] in Django 4.2, fixed in 4.2.6, queries using `__isnull` added an unnecessary cast. This cast was _also_ used in `WHERE` clauses for partial indexes. This means that partial indexes created before Zulip was using Django 4.2 (i.e. before Zulip Server 7.0 or `2c20028aa4`) will not be used when the server is using Django 4.2.0 through 4.2.5 -- and, conversely, that indexes created while Zulip had those versions of Django (i.e. Zulip Server 7.0 through 7.4 or `7807bff526`) will not be used later. We re-create the indexes, to ensure that users that installed Zulip after Zulip Server 7.0 / `2c20028aa4` and before Zulip Server 7.5 / `7807bff526` have indexes which can be used by current Django. This is useless work for some installations, but most analytics tables are not large enough for this to take significant time. [^1]: https://code.djangoproject.com/ticket/34840	2023-11-16 13:53:04 -08:00
Anders Kaseorg	7e707270f0	models: Convert deprecated index_together option to indexes. index_together is slated for removal in Django 5.1: https://docs.djangoproject.com/en/4.2/internals/deprecation/#deprecation-removed-in-5-1 We set the optional index names to match the previously generated index names to avoid adding new migrations. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-07-12 07:12:43 -07:00
Anders Kaseorg	0628c3cac8	migrations: Import BaseDatabaseSchemaEditor from its canonical module. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-03-05 14:46:28 -08:00
Anders Kaseorg	df001db1a9	black: Reformat with Black 23. Black 23 enforces some slightly more specific rules about empty line counts and redundant parenthesis removal, but the result is still compatible with Black 22. (This does not actually upgrade our Python environment to Black 23 yet.) Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-02-02 10:40:13 -08:00
Zixuan James Li	d5517932cd	typing: Use BaseDatabaseSchemaEditor in place of DatabaseSchemaEditor. This is a part of #18777. Signed-off-by: Zixuan James Li <359101898@qq.com>	2022-05-30 14:18:53 -07:00
Anders Kaseorg	6e4c3e41dc	python: Normalize quotes with Black. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
Anders Kaseorg	11741543da	python: Reformat with Black, except quotes. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-02-12 13:11:19 -08:00
arpit551	a68d38cc52	migrations: Upgrade migrations to remove duplicates in all Count tables. This commit upgrades 0015_clear_duplicate_counts migration to remove duplicate count in StreamCount, UserCount, InstallationCount as well. Fixes https://github.com/zulip/docker-zulip/issues/266	2020-07-30 15:18:00 -07:00
Anders Kaseorg	365fe0b3d5	python: Sort imports with isort. Fixes #2665. Regenerated by tabbott with `lint --fix` after a rebase and change in parameters. Note from tabbott: In a few cases, this converts technical debt in the form of unsorted imports into different technical debt in the form of our largest files having very long, ugly import sequences at the start. I expect this change will increase pressure for us to split those files, which isn't a bad thing. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-06-11 16:45:32 -07:00
Tim Abbott	69ae4931c3	migrations: Use django.db.backends.postgresql.schema. This replaces django.db.backends.postgresql_psycopg2, which has been an alias to django.db.backends.postgresql since Django 1.9.	2020-04-26 22:20:24 -07:00
Anders Kaseorg	c734bbd95d	python: Modernize legacy Python 2 syntax with pyupgrade. Generated by `pyupgrade --py3-plus --keep-percent-format` on all our Python code except `zthumbor` and `zulip-ec2-configure-interfaces`, followed by manual indentation fixes. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-09 16:43:22 -07:00
Mateusz Mandera	64b85415f5	migrations: Fix unused import error.	2020-03-06 12:17:19 -08:00
arpit551	f299f31340	analytics: Fix missing unique constraint when subgroup is null. Replaced unique_together with UniqueConstraint in models that covered nullable fields as in unique_together database indexes don't work where subgroup=None. So added conditional unique index handling invalid duplicate Count data. Added 0015_clear_duplicate_counts migration to handle existing data that violates the constraints. Also corrected a test case in test_counts.py which didn't clear its state properly and thus was accidentally taking advantage of this database schema bug.	2020-03-06 11:10:04 -08:00
Tim Abbott	9ac3e1099c	analytics: Remove last_modified field from FillState. This field wasn't used for anything, and I think it has very limited use for debugging, since fundamentally, it'll almost always have a value within the hour of the actual timestamp in FillState, and any more fine-grained logging we might want would be available in the analytics job's own logs. The proximal reason to remove it is that apparently Django's model_to_dict doesn't support auto_now fields, and that caused some trouble when working on adding more complete import/export support for analytics data.	2020-01-26 20:38:26 -08:00
Tim Abbott	8e7ce7cc79	python: Sort migrations/management command imports with isort. This is a preparatory commit for using isort for sorting all of our imports, merging changes to files where we can easily review the changes as something we're happy with. These are also files with relatively little active development, which means we don't expect much merge conflict risk from these changes.	2020-01-14 13:07:47 -08:00
Anders Kaseorg	4bd28f7ae6	migrations: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-02 17:01:04 -08:00
Rishi Gupta	85f7ac8172	analytics: Remove Anomaly model.	2019-02-01 18:48:18 -08:00
Tim Abbott	c679920c01	python: Fix unnecessary uses of str_utils library.	2018-11-27 11:44:09 -08:00
Tim Abbott	f0ef335412	models: Remove unused ModelReprMixin class. It appeared to be used as a base class in various Django migrations, but because it didn't define any model fields, it wasn't actually.	2018-05-15 19:11:22 -07:00
rht	8106a25e61	django-2.0: Add on_delete on ForeignKeys. In Django 2.0, one must specify the on_delete behavior for all ForeignKeys explicitly.	2018-01-30 10:53:54 -08:00
rht	d1689b5884	analytics: Use python 3 syntax for typing.	2017-11-17 13:16:49 -08:00
Tim Abbott	2b43a0302a	python: Sort imports in smaller apps.	2017-11-15 15:55:49 -08:00
rht	b2ad8fd747	py3: Remove all `from __future__ import unicode_literals`. This was mostly used in migrations, so it's a pretty safe change.	2017-10-17 23:07:42 -07:00
Umair Khan	c74f125b7c	analytics: Add on_delete in foreign keys. on_delete will be a required arg for ForeignKey in Django 2.0. Set it to models.CASCADE on models and in existing migrations if you want to maintain the current default behavior. See https://docs.djangoproject.com/en/1.11/ref/models/fields/#django.db.models.ForeignKey.on_delete	2017-06-13 15:13:49 -07:00
Rishi Gupta	dfbeab73b5	analytics: Change update_analytics_counts to only use hour boundaries. Fixes a recent regression where analytics were not being run on hour boundaries. Includes a migration that dumps all the analytics data.	2017-04-28 16:15:07 -07:00
hollywoodno	dd067c761a	analytics: Separate private messages from group private messages. This makes it possible for our graphs to show the group private message counts as separate from 1:1 private messages. Fixes #4102.	2017-03-20 11:46:29 -07:00
Rishi Gupta	87981a2bf1	analytics: Fix direct import of models in migrations.	2017-03-14 16:59:54 -07:00
Rishi Gupta	20255e48a4	analytics: Change messages_sent_to_stream to a daily stat. Analytics database tables are getting big, and so we're likely moving to a model where ~all stats are day stats, and we keep hourly stats only for the last N days. Also changed the name because: * messages_sent_* suggests the counts (summed over subgroup) should be the same as the other messages_sent stats, but they are different (these don't include PMs). * messages_sent_by_stream:is_bot:day is longer than 32 characters, the max allowable length for a BaseCount.property. Includes a database migration to remove the old stat from the analytics tables.	2017-03-03 16:11:28 -08:00
Tim Abbott	b7df84d5a8	analytics: Add indexes to optimize performance of aggregation. These indexes fix some slow queries used in updating the analytics tables, resulting in the analytics system consuming far less total resources.	2017-02-01 15:47:49 -08:00
Rishi Gupta	68fcb4152f	analytics: Remove interval field from *Count tables. Includes a database migration. The interval field was originally there to facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such aggregations in views code or in the frontend.	2017-01-17 15:54:57 -08:00
umkay	a94599fca7	analytics/models.py: Add subgroup column to unique_together constraints.	2016-11-01 16:53:56 -07:00
umkay	e92604ab78	analytics: Alter field length for property and interval in BaseCount.	2016-10-27 16:33:58 -07:00
umkay	610e92b94e	analytics: Add subgroup column to analytics tables. This is a major change to the analytics schema, and is the first step in a number of refactorings and performance improvements. For instance, it allows * Grouping sets of similar CountStats in the Count tables. For instance, active{_humans,_bots} will now have the same property, but have different subgroup values. Combining queries that differ only in their value on 1 filter clause, so that we make fewer passes through the zerver tables. For instance, instead of running a query for each of messages_sent_to_public_streams and messages_sent_to_private_streams, we can now run a single query with a group by on Stream.invite_only, and store the group by value in the subgroup column.	2016-10-27 16:33:58 -07:00
Rishi Gupta	655ee51e35	analytics: Add table to keep track of fill state. Adds two simplifying assumptions to how we process analytics stats: * Sets the atomic unit of work to: a stat processed at an hour boundary. * For any given stat, only allows these atomic units of work to be processed in chronological order. Adds a table FillState that, for each stat, keeps track of the last unit of work that was processed.	2016-10-14 10:18:37 -07:00
umkay	721529b782	analytics: Remove HuddleCount for now. Planned changes to the underlying analytics model will require potentially complicated changes to huddle queries.	2016-10-14 10:18:37 -07:00
umkay	78477ea071	Reorder the columns in analytics tables inherited from BaseCount. This is primarily implemented through altering the migration file in order to move the columns, but also we try to make the defaults a little better for future tables inherited from BaseCount.	2016-10-06 17:51:01 -07:00
umkay	d260a22637	Add a new statistics/analytics framework. This is a first pass at building a framework for collecting various stats about realms, users, streams, etc. Includes: * New analytics tables for storing counts data * Raw SQL queries for pulling data from zerver/models.py tables * Aggregation functions for aggregating hourly stats into daily stats, and aggregating user/stream level stats into realm level stats * A management command for pulling the data Note that counts.py was added to the linter exclude list due to errors around %%s.	2016-10-04 17:18:54 -07:00

41 Commits