zulip

Commit Graph

Author	SHA1	Message	Date
Rishi Gupta	3712fda30d	analytics: Ensure fixture data points are non-negative.	2017-01-17 15:54:57 -08:00
Rishi Gupta	3f2a002c6e	analytics/lib/counts.py: Fix one of the COUNT_STATS definitions. Fixes an error in the definition of COUNT_STATS['messages_sent_to_stream:is_bot']. The CountStat needs a group_by argument since it is supposed to group by UserProfile.is_bot.	2017-01-10 20:41:07 -08:00
Rishi Gupta	977f5b9178	analytics/lib/counts.py: Fix error in count_message_type_by_user_query. This query counts the number of messages each user has sent, subgroup'd by whether the message was a private_message (PM or sent to a huddle), sent to a 'private_stream', or sent to a 'public_stream'. We need to join on zerver_stream to find out whether stream messages were sent to public streams or private streams, but it needs to be a LEFT JOIN rather than a JOIN so that we preserve the messages sent to non-streams.	2017-01-10 20:41:07 -08:00
Rishi Gupta	6374596a77	analytics: Add initial fixture for testing views.	2017-01-10 17:48:07 -08:00
Rishi Gupta	552d626ef2	analytics: Fix FillState.last_modified not being updated. We were updating FillState with FillState.objects.filter(..).update(..), which does not update the last_modified field (which has auto_now=True). The correct incantation is the save() method of the actual FillState object.	2017-01-08 23:36:34 -08:00
Rishi Gupta	190d320afa	analytics: Change CountStat.property from Text to str.	2017-01-08 17:24:51 -08:00
Rishi Gupta	f8962d521d	analytics: Fix uses of 'interval' in arguments and variable names. interval refers to a time interval, and frequency refers to something that semantically means something closer to 'hourly' or 'daily'. Currently, interval can have values 'hour', 'day', or 'gauge', and frequency can only have values 'hour' and 'day'.	2017-01-08 17:24:51 -08:00
Rishi Gupta	f5899dd14b	analytics: Add lib/ function to drop all analytics tables.	2017-01-08 17:24:51 -08:00
Rishi Gupta	73dc904e9c	analytics: Move time_range from views.py to lib/time_utils.py	2017-01-08 17:24:51 -08:00
Rishi Gupta	2211b8b102	analytics: Change count_message_by_stream to join on UserProfile. It seems unlikely we will need count_message_by_stream without the UserProfile table in the future, so write count_message_by_stream_and_is_bot in the usual query form and replace count_message_by_stream with it. This also has the benefit of shortening our list of "special case" queries from two to one. The pathways of the removed test will be covered more thoroughly in the new TestCountStats tests.	2016-12-20 12:03:23 -08:00
Rishi Gupta	6992f9784c	analytics: Update TestCountStat prototype.	2016-12-20 12:03:23 -08:00
Rishi Gupta	93a10a475a	counts.py: Fix count_message_type_by_user_query.	2016-12-15 16:02:12 -08:00
Rishi Gupta	4f3e1b2ece	analytics/lib/counts.py: Fix messages_sent_to_stream:is_bot. Adds a new query.	2016-12-15 16:02:12 -08:00
Rishi Gupta	87b47ec283	analytics: Add __unicode__ method to the CountStat object.	2016-12-15 16:02:12 -08:00
anirudhjain75	beaa62cafa	mypy: Convert several directories to use typing.Text. Specifically, these directories are converted: [analytics/, scripts/, tools/, zerver/management/, zilencer/, zproject/]	2016-12-07 20:51:05 -08:00
nikolay	abc2ff4a06	pep8: Fix many rule E128 violations. [Tweaked by tabbott to adjust some approaches used in wrapping]	2016-12-03 13:33:31 -08:00
bulat22101	adebc75740	pep8: Fix E502 violations	2016-12-03 10:56:36 -08:00
AZtheAsian	1ba150fa85	pep8: Fix E203 violations	2016-12-01 20:37:57 -08:00
Rafid Aslam	c5316b4002	lint: Fix E127 pep8 violations. Fix pep8: E127 continuation line over-indented for visual indent style issue.	2016-12-01 10:23:55 -08:00
umkay	dc8463e09c	analytics: Remove incorrect filter args for stat. The filter args dictionary applies to the X table in a count X by Y query, which in this case is the zerver_message table. This stat had an incorrect set of arguments meant for the zerver_userprofile table.	2016-11-10 12:25:21 -08:00
umkay	e6ac8c3543	analytics: Add extra count stats. Fill in remaining countstats in counts.py for our intended use cases.	2016-11-03 16:50:39 -07:00
umkay	298890d125	analytics: Rename count stats and associated properties. Our current naming convention is getting unwieldy. The subgroup now goes on the right side of the colon.	2016-11-03 16:50:39 -07:00
umkay	5490442580	analytics: Replace all joins in raw SQL with natural joins. We alter the behavior of our queries to no longer write rows with 0 counts to the db, and pad with 0s in the related views code. As a result we are also able to combine the where and join clause conditions in the sql queries. This new behavior is also updated in our tests.	2016-11-03 16:50:39 -07:00
umkay	5e5a0d4db9	analytics: Add user-level count query for messages sent to {PMs, streams}. Adds a count_X_by_Y_query to counts.py, similar in spirit to a count_recipient_by_user query, where we would join on the Message, Recipient, and UserProfile table. Here, we also join on the Stream table in order to distinguish private and public streams, and we merge the counts for PM and Huddle type messages into a single subgroup.	2016-11-01 17:00:43 -07:00
umkay	610e92b94e	analytics: Add subgroup column to analytics tables. This is a major change to the analytics schema, and is the first step in a number of refactorings and performance improvements. For instance, it allows * Grouping sets of similar CountStats in the Count tables. For instance, active{_humans,_bots} will now have the same property, but have different subgroup values. Combining queries that differ only in their value on 1 filter clause, so that we make fewer passes through the zerver tables. For instance, instead of running a query for each of messages_sent_to_public_streams and messages_sent_to_private_streams, we can now run a single query with a group by on Stream.invite_only, and store the group by value in the subgroup column.	2016-10-27 16:33:58 -07:00
Rishi Gupta	54016e1096	analytics: Remove outdated comment in counts.py.	2016-10-25 13:42:55 -07:00
umkay	87d22c9e4d	analytics: Fix count_stream_by_realm. Add a join clause on zerver_message in count_stream_by_realm, otherwise we only output the final total streamcount for a realm for every time entry.	2016-10-22 19:10:36 -07:00
umkay	906a4e3b26	analytics: Add performance and transaction logging to counts.py. For each database query made by an analytics function, log time spent and the number of rows changed to var/logs/analytics.log. In the spirit of write ahead logging, for each (stat, end_time) update, log the start and end of the "transaction", as well as time spent.	2016-10-17 16:10:03 -07:00
Rishi Gupta	82b814a1cd	analytics: Simplify frequency and measurement interval options. Change the CountStat object to take an is_gauge variable instead of a smallest_interval variable. Previously, (smallest_interval, frequency) could be any of (hour, hour), (hour, day), (hour, gauge), (day, hour), (day, day), or (day, gauge). The current change is equivalent to excluding (hour, day) and (day, hour) from the list above. This change, along with other recent changes, allows us to simplify how we handle time intervals. This commit also removes the TimeInterval object.	2016-10-14 10:18:37 -07:00
Rishi Gupta	807520411b	analytics: Simplify logic in do_fill_count_stat_at_hour. Adding FillState, removing do_aggregate_hour_to_day, and disallowing unused (interval, frequency) pairs removes the need for the nested for loops in do_fill_count_stat_at_hour. This commit replaces that control flow with a simpler equivalent.	2016-10-14 10:18:37 -07:00
Rishi Gupta	27d1360e1d	analytics: Remove do_aggregate_hour_to_day. The functionality provided is more naturally done in the views code. It also allows us to aggregate using day boundaries from the local timezone, rather than UTC.	2016-10-14 10:18:37 -07:00
Rishi Gupta	655ee51e35	analytics: Add table to keep track of fill state. Adds two simplifying assumptions to how we process analytics stats: * Sets the atomic unit of work to: a stat processed at an hour boundary. * For any given stat, only allows these atomic units of work to be processed in chronological order. Adds a table FillState that, for each stat, keeps track of the last unit of work that was processed.	2016-10-14 10:18:37 -07:00
umkay	721529b782	analytics: Remove HuddleCount for now. Planned changes to the underlying analytics model will require potentially complicated changes to huddle queries.	2016-10-14 10:18:37 -07:00
umkay	7e2340155d	analytics: Fix aggregation to RealmCount for realms with no users. Previously, if a Realm had no users (or no streams), do_aggregate_to_summary_table would fail to add a row with value 0. This commit fixes the issue and also simplifies the do_aggregate_to_summary_table logic.	2016-10-11 18:20:58 -07:00
umkay	01324f2afe	Fix aggregation to analytics summary tables. There are a number of different stats that need to be propagated from UserCount and StreamCount to RealmCount, and from RealmCount to InstallationCount. Stats with hour intervals also need to have their day values propagated. This commit fixes a bug in the summary table aggregation logic so that for a given interval on a CountStat object we pull the correct counts for the interval as well as do the day aggregation if required. We Also ensure that any aggregation then done from the realmcount table to the installationcount table follows the same aggregation logic for intervals.	2016-10-06 08:46:33 -07:00
umkay	d260a22637	Add a new statistics/analytics framework. This is a first pass at building a framework for collecting various stats about realms, users, streams, etc. Includes: * New analytics tables for storing counts data * Raw SQL queries for pulling data from zerver/models.py tables * Aggregation functions for aggregating hourly stats into daily stats, and aggregating user/stream level stats into realm level stats * A management command for pulling the data Note that counts.py was added to the linter exclude list due to errors around %%s.	2016-10-04 17:18:54 -07:00

1 2 3

136 Commits