zulip

Commit Graph

Author	SHA1	Message	Date
arpit551	b23a5431cd	analytics: Add realm argument to analytics. This changeset is prepartory work for doing something reasonable with analytics data during the zulip -> zulip data import process (and potentially e.g. slack -> Zulip as well). To support that, we need to make it possible to do our analytics calculations for a single realm. We do this while maintaining backwards compatibility and avoiding massive duplicated code by adding an optional `realm` argument to the entrypoints to the analytics system, especially process_count_stat. More work involving restructuring FillState will be required for this to be actually usable for its intented purpose, but this commit is a nice checkpoint along the way. Tweaked by tabbott to adjust comments and disable InstallationCount updates when a realm argument is specified.	2020-01-23 17:36:13 -08:00
Rishi Gupta	4256ee61cf	billing: Change RealmAuditLog.event_type from str to int. This is a more robust long-term model for storing these data.	2019-10-06 15:55:56 -07:00
Mateusz Mandera	dbe508bb91	models: Migration of Message.pub_date to date_sent, part 2. Fixes #1727. With the server down, apply migrations 0245 and 0246. 0246 will remove the pub_date column, so it's essential that the previous migrations ran correctly to copy data before running this.	2019-10-05 19:01:34 -07:00
Anders Kaseorg	f5197518a9	analytics/zilencer/zproject: Remove unused imports. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2019-02-02 17:31:45 -08:00
Rishi Gupta	85f7ac8172	analytics: Remove Anomaly model.	2019-02-01 18:48:18 -08:00
Vishnu Ks	4d1a68430a	analytics: Remove unused RealmAuditLog import.	2018-07-10 15:42:26 +05:30
Nikhil Kumar Mishra	26decb4c48	stats: Add 1day_actives::day CountStat to analytics tables.	2018-05-20 10:56:16 -07:00
Aditya Bansal	5adf983c3c	analytics: Change use of typing.Text to str.	2018-05-10 14:19:49 -07:00
Greg Price	b830b446f1	logging: Reduce `create_logger` to new `log_to_file`. The name `create_logger` suggests something much bigger than what this function actually does -- the logger doesn't any more or less exist after the function is called than before. Its one real function is to send logs to a specific file. So, pull out that logic to an appropriately-named function just for it. We already use `logging.getLogger` in a number of places to simply get a logger by name, and the old `create_logger` callsites can do the same.	2017-12-12 17:17:08 -08:00
Greg Price	ebcf0b4876	logging: Stop having `create_logger` force loglevels to INFO. This is already the loglevel we set on the root logger, so this has no effect -- except in tests, where `test_settings.py` attempts to set some of these same loggers to higher loglevels. Because the `create_logger` call generally runs after we've configured settings, it clobbers that effect. The code in `test_settings.py` that tries to suppress logs only works because it also sets `propagate=False`, which has nothing to do with loglevels but does cause logs at this logger (and descendants) to be dropped completely unless we've configured handlers for this logger (or one of its relevant descendants.)	2017-12-12 17:17:07 -08:00
Rishi Gupta	fbd8dde1f8	invitations: Add LoggingCountStat to keep track of sent invitations.	2017-12-06 20:35:50 -08:00
rht	01885cdedc	analytics: Use Python 3 syntax for typing (final).	2017-11-22 12:16:59 -08:00
rht	6c286b5eb6	analytics: Use Python 3 syntax for typing (part 2).	2017-11-22 12:16:58 -08:00
Tim Abbott	a0cfe45150	analytics: Wrap some longer lines.	2017-11-17 13:19:48 -08:00
rht	d1689b5884	analytics: Use python 3 syntax for typing.	2017-11-17 13:16:49 -08:00
Tim Abbott	2b43a0302a	python: Sort imports in smaller apps.	2017-11-15 15:55:49 -08:00
rht	51c1a6dfc9	analytics: Text-wrap long lines exceeding 110. License: Apache-2.0 Signed-off-by: rht <rhtbot@protonmail.com>	2017-11-10 16:22:00 -08:00
rht	b557b02f2f	analytics/lib: Remove unused imports (F401).	2017-11-07 16:37:07 -08:00
rht	ec5120e807	refactor: Remove six.moves.zip import.	2017-11-07 10:46:42 -08:00
rht	5cfffb0e51	analytics: Remove inheritance from object.	2017-11-06 08:53:48 -08:00
rht	dcc831f767	refactor: Replace all __unicode__ method with __str__. Close #6627.	2017-11-02 11:01:47 -07:00
rht	691598a88b	py3: Remove "from six.moves import range". This is no longer required, since in Python 3, this is what the range built-in does.	2017-10-17 23:28:14 -07:00
rht	2f3ae84e5a	py3: Remove all `__future__ import division`.	2017-10-17 23:09:12 -07:00
rht	a603a4f9f5	Remove `from __future__ import absolute_import`. Except in: - docs/writing-bots-guide.md, because bots are supposed to be Python 2 compatible - puppet/zulip_ops/files/zulip-ec2-configure-interfaces, because this script is still on python2.7 - tools/lint - tools/linter_lib - tools/lister.py For the latter two, because they might be yanked away to a separate repo for general use with other FLOSS projects.	2017-10-17 22:59:42 -07:00
Rishi Gupta	c7bdabbda8	analytics: Disallow non-UTC fill times in process_count_stat. No change in behavior, but we aren't supporting non-UTC times in analytics as a whole any more, so might as well change this check as well.	2017-10-05 11:22:06 -07:00
Rishi Gupta	0596c4a810	analytics: Enforce various datetime arguments are in UTC. Sort of a hacky hammer, but * The original design of the analytics system mistakenly attempted to play nicely with non-UTC datetimes. * Timezone errors are really hard to find and debug, and don't jump out that easily when reading code. I don't know of any outstanding errors, but putting a few "assert this timezone is in UTC" around will hopefully reduce the chance that there are any current or future timezone errors. Note that none of these functions are called outside of the analytics code (and tests). This commit also doesn't change any current behavior, assuming a database where all datetimes have been being stored in UTC.	2017-10-05 11:22:06 -07:00
Rishi Gupta	0f31cddf49	analytics: Add management command to clear single stat.	2017-10-05 11:22:06 -07:00
Aditya Bansal	d9c9bfe7f6	logger: Add new create_logger abstraction to simplify logging. This deduplicates a ton of Python logger-creation code to use a single standard implementation, so we can avoid copy-paste problems.	2017-08-27 18:31:53 -07:00
umkay	d9b23b39d3	mypy: Fix strict-optional in analytics.	2017-05-26 15:39:39 -07:00
Aditya Bansal	27b87943af	pep8: Add compliance with rule E261 to counts.py.	2017-05-07 23:21:50 -07:00
Rishi Gupta	61bf445da4	analytics: Restrict fill_to_time to hour boundaries in process_count_stat.	2017-04-28 16:15:07 -07:00
Rishi Gupta	5e49da9285	analytics: Only update daily stats on day boundaries. Previously we would update FillState for daily stats on hourly boundaries as well. This would create two extra queries on the FillState table every hour (for each CountStat), which adds roughly 50ms of extra processing for each CountStat each day, as well as two extra lines each hour in the analytics log. This can be a minor annoyance when backfilling stats.	2017-04-18 11:02:51 -07:00
Rishi Gupta	c5f1398052	analytics: Add section comments in counts.count_stats_. Also reorders the stats a bit.	2017-04-18 11:02:51 -07:00
Rishi Gupta	b335ad2794	models: Add MIN_INTERVAL_LENGTH to UserActivityInterval. Was previously a floating magic number appearing in both zerver/lib/actions.py and analytics/lib/counts.py.	2017-04-18 11:02:51 -07:00
hackerkid	5c8f011d66	Remove unused timezone import.	2017-04-16 12:28:56 -07:00
Rishi Gupta	49bd330304	analytics: Add class DependentCountStat and stat realm_active_humans::day.	2017-04-14 11:41:07 -07:00
Rishi Gupta	1e8d2b984d	counts.py: Rename DataCollector-level operations to be more generic. We're about to use these for DependentCountStats that will run SQL queries on the analytics tables instead of the zerver tables.	2017-04-14 11:41:07 -07:00
Rishi Gupta	47cf1d15ba	counts.py: Move performance logging call out of pull_functions. Makes it less likely someone will write a pull function in the future and forget.	2017-04-14 11:41:07 -07:00
Rishi Gupta	6dff22cbaf	counts.py: Change check for LoggingCountStat to use isinstance. I think this is more pythonic? We could also get rid of LoggingCountStats altogether, since it's now just a special case of CountStat (is_logging == data_collector.pull_function is None). But I think it's nice to keep the distinction since they behave so differently.	2017-04-14 11:41:07 -07:00
Rishi Gupta	b45185562a	counts.py: Fix out of date comments.	2017-04-14 11:41:07 -07:00
Rishi Gupta	ac2cc9e2da	counts.py: Reorganize file into logical sections. No changes to code or behavior.	2017-04-14 11:41:07 -07:00
Rishi Gupta	50868b98a9	counts.py: Change pull_function to take a property instead of a full stat. Removes the circular dependency of CountStat containing a DataCollector, and DataCollector containing a function that takes a CountStat as an argument.	2017-04-14 11:41:07 -07:00
Rishi Gupta	eadfc743c8	counts.py: Remove CustomPullCountStat.	2017-04-14 11:41:07 -07:00
Rishi Gupta	118b44d4f0	counts.py: Change DataCollector to take a pull_function argument. This will allow us to appropriately generalize CountStat to include LoggingCountStat and CustomPullCountStat. It'll also make life easier when we introduce DependentCountStat.	2017-04-14 11:41:07 -07:00
Rishi Gupta	f9e56ad25d	counts.py: Move DataCollector declarations into CountStat declarations. The previous zerver_* names were unwieldy and not very readable. This also puts more of the useful information in one place; in particular, makes it easier to skim a CountStat declaration and see if we're collecting it at a user/stream granularity or a realm granularity.	2017-04-14 11:41:07 -07:00
Rishi Gupta	c20e79ab1f	counts.py: Rename DataCollector.analytics_table to output_table.	2017-04-14 11:41:07 -07:00
Rishi Gupta	6369d23633	counts.py: Rename ZerverCountQuery to DataCollector. Not the final form of DataCollector, but the name change causes a big diff so separating it out.	2017-04-14 11:41:07 -07:00
Rishi Gupta	b3991e2557	counts.py: Move CountStat.group_by into ZerverCountQuery. Part of a larger refactoring to reduce cyclic dependencies between CountStat and DataCollector (coming soon).	2017-04-14 11:41:07 -07:00
Rishi Gupta	341e1b54fc	counts.py: Remove zerver_table from ZerverCountQuery. Was only needed for filter_args, which are now gone.	2017-04-14 11:41:07 -07:00
Rishi Gupta	661de6bf25	counts.py: Remove filter_args argument from CountStat definition. It turned out to not be that useful once we added subgroup. The previous design of the CountStat object also assumed more reuseability of the _query strings than what ended up happening. The filter_args also had some carrying costs: It's hard to be confident that filter_args other than the ones explicitly in our tests would have had expected behavior. * The filter_args/join_args system is the most complex part of the CountStat object, and makes understanding the *_query strings unnecessarily difficult for a new contributor.	2017-04-14 11:41:07 -07:00
Rishi Gupta	4dfadba244	counts.py: Hardcode is_active=true in count_user_by_realm_query. A step towards removing filter_args from the CountStat object.	2017-04-14 11:41:07 -07:00
Rishi Gupta	6bb97db136	analytics: Add active_users_audit:is_bot:day.	2017-04-14 11:41:07 -07:00
Rishi Gupta	cc75d83b74	counts.py: Reorder count_stats_ to put similar stats together.	2017-04-14 11:41:07 -07:00
Rishi Gupta	2f74ccabf9	analytics: Add 15day_actives CountStat.	2017-04-14 11:41:07 -07:00
Rishi Gupta	9b661ca91f	analytics: Replace CountStat.is_gauge with interval. Groundwork for allowing stats like "Monthly Active Users". CountStat.interval is no longer as clean a value as before, so removed it from views.get_chart_data. It wasn't being used by the frontend anyway. Removing interval from logger calls in counts.py is not a big loss since we now include the frequency (which is typically also the interval) in CountStat.property.	2017-04-14 11:41:07 -07:00
Rishi Gupta	d6c5c672d3	analytics: Add minutes_active CountStat.	2017-04-14 11:41:07 -07:00
hollywoodno	dd067c761a	analytics: Separate private messages from group private messages. This makes it possible for our graphs to show the group private message counts as separate from 1:1 private messages. Fixes #4102.	2017-03-20 11:46:29 -07:00
Rishi Gupta	7c6f0033ed	analytics: Add test for do_drop_all_analytics_tables.	2017-03-14 16:59:54 -07:00
Rishi Gupta	87981a2bf1	analytics: Fix direct import of models in migrations.	2017-03-14 16:59:54 -07:00
Rishi Gupta	ebebd04587	analytics: Fix ValueErrors affecting test coverage. Pathways that only catch internal code errors should use AssertionError so that they are not included when computing test coverage.	2017-03-14 16:59:54 -07:00
Rishi Gupta	b18bfe6771	analytics: Standardize format of zerver count queries. count_message_type_by_user_query is in a different format (no WHERE clause) from the rest since I'm having a hard time reasoning about how that would interact with the LEFT JOIN, especially given that there are %(join_args)s.	2017-03-14 16:59:54 -07:00
Rishi Gupta	8feea6c598	analytics: Add LoggingCountStat for number of users.	2017-03-04 16:46:09 -08:00
Raghav Jajodia	a3a03bd6a5	mypy: Added Dict, List and Set imports. Fixed mypy errors associated with the upgrade.	2017-03-04 14:33:44 -08:00
Rishi Gupta	8bea47d6b5	analytics: Do a stylistic cleanup of TestProcessCountStat.	2017-03-03 16:12:12 -08:00
Rishi Gupta	6c784d6321	analytics: Refactor COUNT_STATS declaration to not repeat itself.	2017-03-03 16:11:28 -08:00
Rishi Gupta	20255e48a4	analytics: Change messages_sent_to_stream to a daily stat. Analytics database tables are getting big, and so we're likely moving to a model where ~all stats are day stats, and we keep hourly stats only for the last N days. Also changed the name because: * messages_sent_* suggests the counts (summed over subgroup) should be the same as the other messages_sent stats, but they are different (these don't include PMs). * messages_sent_by_stream:is_bot:day is longer than 32 characters, the max allowable length for a BaseCount.property. Includes a database migration to remove the old stat from the analytics tables.	2017-03-03 16:11:28 -08:00
Rishi Gupta	5eb5fa3f31	analytics: Change time_range to not include current day/hour. Current day/hour will always be 0, since we haven't computed it yet for the CountStat tables.	2017-02-02 10:59:52 -08:00
Tim Abbott	d6e38e2a5c	lint: Clean up E123 PEP-8 rule.	2017-01-23 21:34:26 -08:00
Rishi Gupta	734ca4644c	analytics: Add random_seed argument to generate_time_series_data.	2017-01-17 15:54:57 -08:00
Rishi Gupta	37bdc7c010	analytics: Remove COUNT_STATS['messages_sent:hour']. Having both messages_sent:hour and messages_sent:is_bot:day is confusing, since a single messages_sent:is_bot:hour would have a superset of the information and take less total space. This commit and its parent together replace the two stats with a single messages_sent:is_bot:hour.	2017-01-17 15:54:57 -08:00
Rishi Gupta	b593ac9d7c	analytics: Change messages_sent:is_bot to hourly frequency. In preparation for replacing messages_sent.	2017-01-17 15:54:57 -08:00
Rishi Gupta	68fcb4152f	analytics: Remove interval field from *Count tables. Includes a database migration. The interval field was originally there to facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such aggregations in views code or in the frontend.	2017-01-17 15:54:57 -08:00
Rishi Gupta	a8f2ebb443	analytics: Include interval in COUNT_STATS property names.	2017-01-17 15:54:57 -08:00
Rishi Gupta	12d277d4f4	analytics: Change messages_sent:client stat to daily frequency. A few reasons: * Our two other subgroup'd message stats in UserCount are at CountStat.DAY frequency (messages_sent:is_bot and messages_sent:message_type). * Keeping this stat at hourly frequency would likely double the size of our analytics table, given the current stats. (Counterpoint: if there are roughly as many active streams as active users, and we keep messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat is only a 30% or 50% increase). * We're currently only showing this on the frontend as a pie chart anyway.	2017-01-17 15:54:57 -08:00
Rishi Gupta	2710a944e8	analytics: Refactor fixture creation to make it more general. Also less verbose, in preparation for adding a bunch more fixtures.	2017-01-17 15:54:57 -08:00
Rishi Gupta	680e7f75e1	analytics: Change generate_time_series_data argument from length to days. Previously, this function seemed ambivalent about whether it was generating a series of abstract data points or a series of data points that would correspond to times. Switch firmly to the latter, so e.g. if the frequency changes, so will the length of the output sequence.	2017-01-17 15:54:57 -08:00
Rishi Gupta	3712fda30d	analytics: Ensure fixture data points are non-negative.	2017-01-17 15:54:57 -08:00
Rishi Gupta	3f2a002c6e	analytics/lib/counts.py: Fix one of the COUNT_STATS definitions. Fixes an error in the definition of COUNT_STATS['messages_sent_to_stream:is_bot']. The CountStat needs a group_by argument since it is supposed to group by UserProfile.is_bot.	2017-01-10 20:41:07 -08:00
Rishi Gupta	977f5b9178	analytics/lib/counts.py: Fix error in count_message_type_by_user_query. This query counts the number of messages each user has sent, subgroup'd by whether the message was a private_message (PM or sent to a huddle), sent to a 'private_stream', or sent to a 'public_stream'. We need to join on zerver_stream to find out whether stream messages were sent to public streams or private streams, but it needs to be a LEFT JOIN rather than a JOIN so that we preserve the messages sent to non-streams.	2017-01-10 20:41:07 -08:00
Rishi Gupta	6374596a77	analytics: Add initial fixture for testing views.	2017-01-10 17:48:07 -08:00
Rishi Gupta	552d626ef2	analytics: Fix FillState.last_modified not being updated. We were updating FillState with FillState.objects.filter(..).update(..), which does not update the last_modified field (which has auto_now=True). The correct incantation is the save() method of the actual FillState object.	2017-01-08 23:36:34 -08:00
Rishi Gupta	190d320afa	analytics: Change CountStat.property from Text to str.	2017-01-08 17:24:51 -08:00
Rishi Gupta	f8962d521d	analytics: Fix uses of 'interval' in arguments and variable names. interval refers to a time interval, and frequency refers to something that semantically means something closer to 'hourly' or 'daily'. Currently, interval can have values 'hour', 'day', or 'gauge', and frequency can only have values 'hour' and 'day'.	2017-01-08 17:24:51 -08:00
Rishi Gupta	f5899dd14b	analytics: Add lib/ function to drop all analytics tables.	2017-01-08 17:24:51 -08:00
Rishi Gupta	73dc904e9c	analytics: Move time_range from views.py to lib/time_utils.py	2017-01-08 17:24:51 -08:00
Rishi Gupta	2211b8b102	analytics: Change count_message_by_stream to join on UserProfile. It seems unlikely we will need count_message_by_stream without the UserProfile table in the future, so write count_message_by_stream_and_is_bot in the usual query form and replace count_message_by_stream with it. This also has the benefit of shortening our list of "special case" queries from two to one. The pathways of the removed test will be covered more thoroughly in the new TestCountStats tests.	2016-12-20 12:03:23 -08:00
Rishi Gupta	6992f9784c	analytics: Update TestCountStat prototype.	2016-12-20 12:03:23 -08:00
Rishi Gupta	93a10a475a	counts.py: Fix count_message_type_by_user_query.	2016-12-15 16:02:12 -08:00
Rishi Gupta	4f3e1b2ece	analytics/lib/counts.py: Fix messages_sent_to_stream:is_bot. Adds a new query.	2016-12-15 16:02:12 -08:00
Rishi Gupta	87b47ec283	analytics: Add __unicode__ method to the CountStat object.	2016-12-15 16:02:12 -08:00
anirudhjain75	beaa62cafa	mypy: Convert several directories to use typing.Text. Specifically, these directories are converted: [analytics/, scripts/, tools/, zerver/management/, zilencer/, zproject/]	2016-12-07 20:51:05 -08:00
nikolay	abc2ff4a06	pep8: Fix many rule E128 violations. [Tweaked by tabbott to adjust some approaches used in wrapping]	2016-12-03 13:33:31 -08:00
bulat22101	adebc75740	pep8: Fix E502 violations	2016-12-03 10:56:36 -08:00
AZtheAsian	1ba150fa85	pep8: Fix E203 violations	2016-12-01 20:37:57 -08:00
Rafid Aslam	c5316b4002	lint: Fix E127 pep8 violations. Fix pep8: E127 continuation line over-indented for visual indent style issue.	2016-12-01 10:23:55 -08:00
umkay	dc8463e09c	analytics: Remove incorrect filter args for stat. The filter args dictionary applies to the X table in a count X by Y query, which in this case is the zerver_message table. This stat had an incorrect set of arguments meant for the zerver_userprofile table.	2016-11-10 12:25:21 -08:00
umkay	e6ac8c3543	analytics: Add extra count stats. Fill in remaining countstats in counts.py for our intended use cases.	2016-11-03 16:50:39 -07:00
umkay	298890d125	analytics: Rename count stats and associated properties. Our current naming convention is getting unwieldy. The subgroup now goes on the right side of the colon.	2016-11-03 16:50:39 -07:00
umkay	5490442580	analytics: Replace all joins in raw SQL with natural joins. We alter the behavior of our queries to no longer write rows with 0 counts to the db, and pad with 0s in the related views code. As a result we are also able to combine the where and join clause conditions in the sql queries. This new behavior is also updated in our tests.	2016-11-03 16:50:39 -07:00
umkay	5e5a0d4db9	analytics: Add user-level count query for messages sent to {PMs, streams}. Adds a count_X_by_Y_query to counts.py, similar in spirit to a count_recipient_by_user query, where we would join on the Message, Recipient, and UserProfile table. Here, we also join on the Stream table in order to distinguish private and public streams, and we merge the counts for PM and Huddle type messages into a single subgroup.	2016-11-01 17:00:43 -07:00

1 2 3 4

162 Commits