zulip

Commit Graph

Author	SHA1	Message	Date
Rishi Gupta	b3991e2557	counts.py: Move CountStat.group_by into ZerverCountQuery. Part of a larger refactoring to reduce cyclic dependencies between CountStat and DataCollector (coming soon).	2017-04-14 11:41:07 -07:00
Rishi Gupta	341e1b54fc	counts.py: Remove zerver_table from ZerverCountQuery. Was only needed for filter_args, which are now gone.	2017-04-14 11:41:07 -07:00
Rishi Gupta	661de6bf25	counts.py: Remove filter_args argument from CountStat definition. It turned out to not be that useful once we added subgroup. The previous design of the CountStat object also assumed more reuseability of the _query strings than what ended up happening. The filter_args also had some carrying costs: It's hard to be confident that filter_args other than the ones explicitly in our tests would have had expected behavior. * The filter_args/join_args system is the most complex part of the CountStat object, and makes understanding the *_query strings unnecessarily difficult for a new contributor.	2017-04-14 11:41:07 -07:00
Rishi Gupta	6bb97db136	analytics: Add active_users_audit:is_bot:day.	2017-04-14 11:41:07 -07:00
Rishi Gupta	3d514c3e8d	analytics: Add a default for the value column in assertTableState. A default value of 1 is reasonable in this framework, especially for testing things like LoggingCountStats.	2017-04-14 11:41:07 -07:00
Rishi Gupta	2f74ccabf9	analytics: Add 15day_actives CountStat.	2017-04-14 11:41:07 -07:00
Rishi Gupta	9b661ca91f	analytics: Replace CountStat.is_gauge with interval. Groundwork for allowing stats like "Monthly Active Users". CountStat.interval is no longer as clean a value as before, so removed it from views.get_chart_data. It wasn't being used by the frontend anyway. Removing interval from logger calls in counts.py is not a big loss since we now include the frequency (which is typically also the interval) in CountStat.property.	2017-04-14 11:41:07 -07:00
Rishi Gupta	d6c5c672d3	analytics: Add minutes_active CountStat.	2017-04-14 11:41:07 -07:00
Rishi Gupta	30024d0a8f	models: Remove Realm.domain.	2017-03-25 19:55:48 -07:00
hollywoodno	dd067c761a	analytics: Separate private messages from group private messages. This makes it possible for our graphs to show the group private message counts as separate from 1:1 private messages. Fixes #4102.	2017-03-20 11:46:29 -07:00
Rishi Gupta	ceac6d9c59	analytics: Remove stray comment from test_counts.py. The "actual test that would be nice to do" was indeed done!	2017-03-17 21:58:51 -07:00
Rishi Gupta	7c6f0033ed	analytics: Add test for do_drop_all_analytics_tables.	2017-03-14 16:59:54 -07:00
Rishi Gupta	35f854a2fd	analytics: Add test for do_aggregate_to_summary_table.	2017-03-04 16:46:09 -08:00
Rishi Gupta	8feea6c598	analytics: Add LoggingCountStat for number of users.	2017-03-04 16:46:09 -08:00
Raghav Jajodia	a3a03bd6a5	mypy: Added Dict, List and Set imports. Fixed mypy errors associated with the upgrade.	2017-03-04 14:33:44 -08:00
Rishi Gupta	8bea47d6b5	analytics: Do a stylistic cleanup of TestProcessCountStat.	2017-03-03 16:12:12 -08:00
Rishi Gupta	20255e48a4	analytics: Change messages_sent_to_stream to a daily stat. Analytics database tables are getting big, and so we're likely moving to a model where ~all stats are day stats, and we keep hourly stats only for the last N days. Also changed the name because: * messages_sent_* suggests the counts (summed over subgroup) should be the same as the other messages_sent stats, but they are different (these don't include PMs). * messages_sent_by_stream:is_bot:day is longer than 32 characters, the max allowable length for a BaseCount.property. Includes a database migration to remove the old stat from the analytics tables.	2017-03-03 16:11:28 -08:00
Rishi Gupta	37bdc7c010	analytics: Remove COUNT_STATS['messages_sent:hour']. Having both messages_sent:hour and messages_sent:is_bot:day is confusing, since a single messages_sent:is_bot:hour would have a superset of the information and take less total space. This commit and its parent together replace the two stats with a single messages_sent:is_bot:hour.	2017-01-17 15:54:57 -08:00
Rishi Gupta	b593ac9d7c	analytics: Change messages_sent:is_bot to hourly frequency. In preparation for replacing messages_sent.	2017-01-17 15:54:57 -08:00
Rishi Gupta	68fcb4152f	analytics: Remove interval field from *Count tables. Includes a database migration. The interval field was originally there to facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such aggregations in views code or in the frontend.	2017-01-17 15:54:57 -08:00
Rishi Gupta	a8f2ebb443	analytics: Include interval in COUNT_STATS property names.	2017-01-17 15:54:57 -08:00
Rishi Gupta	c466036c80	analytics: Remove unneeded references to interval from test_counts.py.	2017-01-17 15:54:57 -08:00
Rishi Gupta	12d277d4f4	analytics: Change messages_sent:client stat to daily frequency. A few reasons: * Our two other subgroup'd message stats in UserCount are at CountStat.DAY frequency (messages_sent:is_bot and messages_sent:message_type). * Keeping this stat at hourly frequency would likely double the size of our analytics table, given the current stats. (Counterpoint: if there are roughly as many active streams as active users, and we keep messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat is only a 30% or 50% increase). * We're currently only showing this on the frontend as a pie chart anyway.	2017-01-17 15:54:57 -08:00
Rishi Gupta	cdb1c96169	analytics tests: Refactor assertCountEquals calls to be more readable.	2017-01-17 15:54:57 -08:00
Rishi Gupta	59d50c3a47	analytics tests: Make it easy to refer to users in test realm.	2017-01-17 15:54:57 -08:00
Rishi Gupta	54e66e6079	analytics: Add remaining backend tests in TestCountStats.	2017-01-17 15:54:57 -08:00
aakash-cr7	b373f2ef0f	analytics: Add backend test for messages_sent_to_stream:is_bot.	2017-01-17 15:54:57 -08:00
Amy Liu	10c0c2b16d	analytics: Add backend tests for messages_sent:message_type.	2017-01-17 15:54:57 -08:00
Rishi Gupta	f30b174199	analytics: Set property and interval defaults in assertCountEquals.	2017-01-17 15:54:57 -08:00
Rishi Gupta	a563a15f88	analytics: Make TestCountStats tests more robust. Adds two things to TestCountStats.setUp(): * A realm with no messages, that generally should not show up in Count tables, Users/streams/messages created at 0, 1, 61, and 1441 (just over a day) minutes ago (previously was 0, 60), to better test the start_time/end_time in the queries, and the frequency/interval setting in the CountStats.	2017-01-17 15:54:57 -08:00
Rishi Gupta	e94bc8f142	analytics tests: Autogenerate names for create* functions.	2017-01-17 15:54:57 -08:00
Amy Liu	f7ce76fb63	analytics: Add create_stream_with_recipient and create_huddle_with_recipient. This commit replaces AnalyticsTestCase.create_stream with create_stream_with_recipient and adds the method create_huddle_with_recipient.	2017-01-17 15:54:57 -08:00
Rishi Gupta	552d626ef2	analytics: Fix FillState.last_modified not being updated. We were updating FillState with FillState.objects.filter(..).update(..), which does not update the last_modified field (which has auto_now=True). The correct incantation is the save() method of the actual FillState object.	2017-01-08 23:36:34 -08:00
Rishi Gupta	c7c0e36508	analytics: Add InstallationCount checks to prototype TestCountStat. Was enabled by commit `41e8ee3` where we moved TIME_ZERO to before the realms created by populate_db.py. Also removes the stub for TestAggregates, since the remaining thing to be tested was the aggregation from RealmCount to InstallationCount, and the end to end checks provided by the TestCountStat tests should be sufficient.	2016-12-20 12:03:23 -08:00
Rishi Gupta	dbc94d0fc0	analytics: Remove test for no longer supported behavior. In a previous design, there was no FillState table, and one could run any CountStat at any time. This is no longer supported. This test was making sure that if one ran a CountStat at a certain hour, and then ran it at a previous hour, the old rows would still be there.	2016-12-20 12:03:23 -08:00
Rishi Gupta	e09aaf1020	analytics: Remove tests that will be subsumed by TestCountStats.	2016-12-20 12:03:23 -08:00
Rishi Gupta	6748b72ccc	analytics: Remove tests now covered by test_active_users_by_is_bot.	2016-12-20 12:03:23 -08:00
Rishi Gupta	2211b8b102	analytics: Change count_message_by_stream to join on UserProfile. It seems unlikely we will need count_message_by_stream without the UserProfile table in the future, so write count_message_by_stream_and_is_bot in the usual query form and replace count_message_by_stream with it. This also has the benefit of shortening our list of "special case" queries from two to one. The pathways of the removed test will be covered more thoroughly in the new TestCountStats tests.	2016-12-20 12:03:23 -08:00
Rishi Gupta	6992f9784c	analytics: Update TestCountStat prototype.	2016-12-20 12:03:23 -08:00
Rishi Gupta	c6a6c871ee	analytics: Change TIME_ZERO in tests to be in the past.	2016-12-20 12:03:23 -08:00
Rishi Gupta	f34af0896d	analytics: Add subgroup argument to assertCountEquals.	2016-12-20 12:03:23 -08:00
Rishi Gupta	31cf8db28c	analytics: Allow assertCountEquals to work on InstallationCount.	2016-12-20 12:03:23 -08:00
anirudhjain75	beaa62cafa	mypy: Convert several directories to use typing.Text. Specifically, these directories are converted: [analytics/, scripts/, tools/, zerver/management/, zilencer/, zproject/]	2016-12-07 20:51:05 -08:00
bulat22101	adebc75740	pep8: Fix E502 violations	2016-12-03 10:56:36 -08:00
Umair Khan	7d51efe9a1	Django 1.10: Fix dummy data for count stat. Django 1.10 checks the foreign key constraints as part of the testing suite so we need to create test data which passes validation tests.	2016-11-14 16:09:12 -08:00
umkay	5490442580	analytics: Replace all joins in raw SQL with natural joins. We alter the behavior of our queries to no longer write rows with 0 counts to the db, and pad with 0s in the related views code. As a result we are also able to combine the where and join clause conditions in the sql queries. This new behavior is also updated in our tests.	2016-11-03 16:50:39 -07:00
Rishi Gupta	db0e509422	do_create_realm: Replace domain argument with string_id. Turns string_id into a required argument, and domain into an optional argument.	2016-11-02 22:46:34 -07:00
Rishi Gupta	9ef8536cc6	models.Realm: Require Realm.string_id to be non-NULL. Adds a database migration, adds a new string_id argument to the management realm creation command, and adds a short name field to the web realm creation form when REALMS_HAVE_SUBDOMAINS is False.	2016-11-02 22:46:34 -07:00
umkay	610e92b94e	analytics: Add subgroup column to analytics tables. This is a major change to the analytics schema, and is the first step in a number of refactorings and performance improvements. For instance, it allows * Grouping sets of similar CountStats in the Count tables. For instance, active{_humans,_bots} will now have the same property, but have different subgroup values. Combining queries that differ only in their value on 1 filter clause, so that we make fewer passes through the zerver tables. For instance, instead of running a query for each of messages_sent_to_public_streams and messages_sent_to_private_streams, we can now run a single query with a group by on Stream.invite_only, and store the group by value in the subgroup column.	2016-10-27 16:33:58 -07:00
Rishi Gupta	82b814a1cd	analytics: Simplify frequency and measurement interval options. Change the CountStat object to take an is_gauge variable instead of a smallest_interval variable. Previously, (smallest_interval, frequency) could be any of (hour, hour), (hour, day), (hour, gauge), (day, hour), (day, day), or (day, gauge). The current change is equivalent to excluding (hour, day) and (day, hour) from the list above. This change, along with other recent changes, allows us to simplify how we handle time intervals. This commit also removes the TimeInterval object.	2016-10-14 10:18:37 -07:00
Rishi Gupta	655ee51e35	analytics: Add table to keep track of fill state. Adds two simplifying assumptions to how we process analytics stats: * Sets the atomic unit of work to: a stat processed at an hour boundary. * For any given stat, only allows these atomic units of work to be processed in chronological order. Adds a table FillState that, for each stat, keeps track of the last unit of work that was processed.	2016-10-14 10:18:37 -07:00
umkay	721529b782	analytics: Remove HuddleCount for now. Planned changes to the underlying analytics model will require potentially complicated changes to huddle queries.	2016-10-14 10:18:37 -07:00
umkay	7e2340155d	analytics: Fix aggregation to RealmCount for realms with no users. Previously, if a Realm had no users (or no streams), do_aggregate_to_summary_table would fail to add a row with value 0. This commit fixes the issue and also simplifies the do_aggregate_to_summary_table logic.	2016-10-11 18:20:58 -07:00
Rishi Gupta	52b56cca65	analytics: Reorder arguments to assertCountEquals. Require a table argument and change argument order around for clarity.	2016-10-11 18:20:58 -07:00
Rishi Gupta	c6b611c8b9	analytics: Re-organize tests into higher level TestClasses. Refactor the current analytics tests into the following classes: * TestUpdateAnalyticsCounts, which will eventually test the management command, backfilling, what happens when new tests are added, etc. * TestProcessCountStat, which tests the ins and outs of propagating the value of a single stat up through the various Count tables. TestAggregates, which tests the do_aggregate_* methods. * TestXByYQueries, which tests the count_X_by_Y_query SQL snippets. * TestCountStats, which has tests for individual CountStats. This commit does not change the name or contents of any individual test.	2016-10-09 16:09:04 -07:00
Rishi Gupta	795d10b9ad	analytics: Refactor tests to simplify asserts of count values. Many tests are structured to run some process, and then check a count in a BaseCount record using default values for realm, property, interval, and end_time. This commit adds a new assertCountEquals method to AnalyticsTestCase, and simplifies other assert calls as appropriate.	2016-10-09 16:09:04 -07:00
Rishi Gupta	35cf9092d5	analytics: Canonicalize creation of model objects in tests. Add a default_realm object to AnalyticsTestCase, created 2 days before AnalyticsTestCase.TIME_ZERO. Add lightweight create_user, create_stream, and create_message methods to AnalyticsTestCase, with sensible defaults. In particular, all objects are by default created at AnalyticsTestCase.TIME_LAST_HOUR, so that they are included when running AnalyticsTestCase.process_last_hour.	2016-10-09 16:09:04 -07:00
Rishi Gupta	6996b21480	analytics: Change tests to use a fixed TIME_ZERO. Previously, analytics tests used timezone.now or custom datetime objects when creating new realms, users, and streams. This commit adds a fixed TIME_ZERO and a process_last_hour helper function in a new AnalyticsTestCase class, and modifies the existing tests to use them.	2016-10-09 16:09:04 -07:00
umkay	d260a22637	Add a new statistics/analytics framework. This is a first pass at building a framework for collecting various stats about realms, users, streams, etc. Includes: * New analytics tables for storing counts data * Raw SQL queries for pulling data from zerver/models.py tables * Aggregation functions for aggregating hourly stats into daily stats, and aggregating user/stream level stats into realm level stats * A management command for pulling the data Note that counts.py was added to the linter exclude list due to errors around %%s.	2016-10-04 17:18:54 -07:00

1 2 3 4

159 Commits