Commit Graph

341 Commits

Author SHA1 Message Date
rht dc5136ed96 analytics: Remove unused optparse import. 2017-09-30 09:22:08 -07:00
rht 4494112862 analytics: Remove absolute_import. 2017-09-27 20:20:07 -07:00
rht 74f8a527e4 analytics: Remove print_function. 2017-09-27 18:05:45 -07:00
Aditya Bansal d9c9bfe7f6 logger: Add new create_logger abstraction to simplify logging.
This deduplicates a ton of Python logger-creation code to use a single
standard implementation, so we can avoid copy-paste problems.
2017-08-27 18:31:53 -07:00
Umair Khan 6d4ba50ceb result.json: Upgrade test_views. 2017-08-17 09:03:35 -07:00
Greg Price 61666a9262 zulip_ops: Delete the long-disused `stats1.zulip.net` config and its dependencies.
This consists of the `zulip_ops::stats` Puppet class, which has apparently
not been used since 2014, and a number of files that I believe were
only used for that.  Also a couple of tiny loose ends in other files.
2017-08-15 17:30:31 -07:00
Tim Abbott e3e2d9093c analytics: Remove references to UserProfile.objects.get. 2017-08-15 12:48:33 -07:00
neiljp (Neil Pilgrim) b782db48e1 mypy: Remove superfluous older 'type: ignore' annotations. 2017-08-08 11:27:51 -07:00
Greg Price c127630dcf Delete some obsolete usage-stats tools.
These are no longer useful, with our spiffy new analytics framework,
and we haven't in fact been using them for some time, while the
`active-user-stats` cron job does cause regular mail from cron.
Just delete them.
2017-07-31 17:06:15 -07:00
Vishnu Ks a99c60ce07 analytics: Add translation tags to stats.html. 2017-07-16 16:16:43 -07:00
Vishnu Ks b0e4cfd480 analytics: Replace get_user_profile_by email in client_activity. 2017-07-14 13:35:43 -07:00
Vishnu Ks e5c5960faa analytics: Remove unused get_user_profile_by_email import. 2017-07-14 13:35:43 -07:00
Vishnu Ks 0a67e00702 analytics: Use example_user in test_views.py. 2017-07-14 13:35:43 -07:00
Tim Abbott 0b60f11cf8 analytics: Display Zulip Electron app as the main desktop app.
This should make the /stats data significantly clearer.
2017-07-07 18:31:47 -07:00
Tim Abbott 31caab4229 analytics: Rename 'new iOS app' to 'Mobile app'. 2017-07-07 18:31:13 -07:00
Umair Khan c74f125b7c analytics: Add on_delete in foreign keys.
on_delete will be a required arg for ForeignKey in Django 2.0. Set it
to models.CASCADE on models and in existing migrations if you want to
maintain the current default behavior.
See https://docs.djangoproject.com/en/1.11/ref/models/fields/#django.db.models.ForeignKey.on_delete
2017-06-13 15:13:49 -07:00
Aditya Bansal 42b0680ab2 pep8: Add compliance with rule E261 populate_analytics_db.py. 2017-05-31 17:07:15 -07:00
Aditya Bansal 30dfb54c65 pep8: Add compliance with rule E261 to analytics/views.py. 2017-05-31 17:07:15 -07:00
Aditya Bansal 061cb9ae44 pep8: Add compliance with rule E261 to analytics/tests/test_views.py. 2017-05-31 17:07:15 -07:00
umkay d9b23b39d3 mypy: Fix strict-optional in analytics. 2017-05-26 15:39:39 -07:00
Christian Hudon c80e6edb4e mypy: Declare models with null=True Optional. 2017-05-23 14:36:40 -07:00
Rishi Gupta ee16abaf3b analytics: Fix sort function in views.sort_client_labels.
sort_client_labels sorts first by total, and then to ensure deterministic
outcomes, sorts (reverse) alphabetically by label.

Fixes regression introduced in 0c0e539.
2017-05-09 11:32:35 -07:00
Aditya Bansal 3f0b22ce31 pep8: Add compliance with rule E261 to test_counts.py. 2017-05-07 23:21:50 -07:00
Aditya Bansal 2ca1f60ac5 pep8: Add compliance with rule E261 to analytics/models.py. 2017-05-07 23:21:50 -07:00
Aditya Bansal 13d9b98c39 pep8: Add compliance with rule E261 to analyze_mit.py. 2017-05-07 23:21:50 -07:00
Aditya Bansal 9e11185fe2 pep8: Add compliance with rule E261 to active_user_stats.py. 2017-05-07 23:21:50 -07:00
Aditya Bansal 27b87943af pep8: Add compliance with rule E261 to counts.py. 2017-05-07 23:21:50 -07:00
Rishi Gupta 92978d6fb2 analytics: Fix --utc argument in update_analytics_counts.py. 2017-04-28 16:15:07 -07:00
Rishi Gupta 73ae2abd4e analytics: Add --verbose option to update_analytics_counts.py. 2017-04-28 16:15:07 -07:00
Rishi Gupta 61bf445da4 analytics: Restrict fill_to_time to hour boundaries in process_count_stat. 2017-04-28 16:15:07 -07:00
Rishi Gupta dfbeab73b5 analytics: Change update_analytics_counts to only use hour boundaries.
Fixes a recent regression where analytics were not being run on hour
boundaries.

Includes a migration that dumps all the analytics data.
2017-04-28 16:15:07 -07:00
Rishi Gupta f595f4f7f2 analytics: Change Number of Users chart to use realm_active_humans::day.
Previously we showed the total number of users with an active account. This
changes it to show only the number of users that have logged in in the past
two weeks.
2017-04-25 18:35:13 -07:00
Maxim Averin 73a1dd63d5 analytics: Refactor legacy 'zulip_internal' decorator.
Rename 'zulip_internal' decorator to 'require_server_admin', add
documentation for 'server_admin', explaining how to give permission
for ./activity page.

Fixes: #1463.
2017-04-22 11:42:02 -07:00
Rishi Gupta 5e49da9285 analytics: Only update daily stats on day boundaries.
Previously we would update FillState for daily stats on hourly boundaries as
well. This would create two extra queries on the FillState table every hour
(for each CountStat), which adds roughly 50ms of extra processing for each
CountStat each day, as well as two extra lines each hour in the analytics
log. This can be a minor annoyance when backfilling stats.
2017-04-18 11:02:51 -07:00
Rishi Gupta c5f1398052 analytics: Add section comments in counts.count_stats_.
Also reorders the stats a bit.
2017-04-18 11:02:51 -07:00
Rishi Gupta b335ad2794 models: Add MIN_INTERVAL_LENGTH to UserActivityInterval.
Was previously a floating magic number appearing in both
zerver/lib/actions.py and analytics/lib/counts.py.
2017-04-18 11:02:51 -07:00
hackerkid 5c8f011d66 Remove unused timezone import. 2017-04-16 12:28:56 -07:00
hackerkid b2504084ab Replace timezone.now with timezone_now. 2017-04-16 12:28:56 -07:00
hackerkid 55c3d12078 Replace timezone.utc with timezone_utc. 2017-04-16 12:28:56 -07:00
Rishi Gupta 49bd330304 analytics: Add class DependentCountStat and stat realm_active_humans::day. 2017-04-14 11:41:07 -07:00
Rishi Gupta 62de1cf898 test_counts: Modernize TestProcessCountStat tests. 2017-04-14 11:41:07 -07:00
Rishi Gupta 1e8d2b984d counts.py: Rename DataCollector-level operations to be more generic.
We're about to use these for DependentCountStats that will run SQL queries
on the analytics tables instead of the zerver tables.
2017-04-14 11:41:07 -07:00
Rishi Gupta 47cf1d15ba counts.py: Move performance logging call out of pull_functions.
Makes it less likely someone will write a pull function in the future and
forget.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6dff22cbaf counts.py: Change check for LoggingCountStat to use isinstance.
I think this is more pythonic?

We could also get rid of LoggingCountStats altogether, since it's now just a
special case of CountStat (is_logging == data_collector.pull_function is None).
But I think it's nice to keep the distinction since they behave so differently.
2017-04-14 11:41:07 -07:00
Rishi Gupta b45185562a counts.py: Fix out of date comments. 2017-04-14 11:41:07 -07:00
Rishi Gupta ac2cc9e2da counts.py: Reorganize file into logical sections.
No changes to code or behavior.
2017-04-14 11:41:07 -07:00
Rishi Gupta 50868b98a9 counts.py: Change pull_function to take a property instead of a full stat.
Removes the circular dependency of CountStat containing a DataCollector, and
DataCollector containing a function that takes a CountStat as an argument.
2017-04-14 11:41:07 -07:00
Rishi Gupta eadfc743c8 counts.py: Remove CustomPullCountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 118b44d4f0 counts.py: Change DataCollector to take a pull_function argument.
This will allow us to appropriately generalize CountStat to include
LoggingCountStat and CustomPullCountStat. It'll also make life easier when
we introduce DependentCountStat.
2017-04-14 11:41:07 -07:00
Rishi Gupta f9e56ad25d counts.py: Move DataCollector declarations into CountStat declarations.
The previous zerver_* names were unwieldy and not very readable. This also
puts more of the useful information in one place; in particular, makes it
easier to skim a CountStat declaration and see if we're collecting it at a
user/stream granularity or a realm granularity.
2017-04-14 11:41:07 -07:00
Rishi Gupta c20e79ab1f counts.py: Rename DataCollector.analytics_table to output_table. 2017-04-14 11:41:07 -07:00
Rishi Gupta 6369d23633 counts.py: Rename ZerverCountQuery to DataCollector.
Not the final form of DataCollector, but the name change causes a big diff
so separating it out.
2017-04-14 11:41:07 -07:00
Rishi Gupta b3991e2557 counts.py: Move CountStat.group_by into ZerverCountQuery.
Part of a larger refactoring to reduce cyclic dependencies between CountStat
and DataCollector (coming soon).
2017-04-14 11:41:07 -07:00
Rishi Gupta 341e1b54fc counts.py: Remove zerver_table from ZerverCountQuery.
Was only needed for filter_args, which are now gone.
2017-04-14 11:41:07 -07:00
Rishi Gupta 661de6bf25 counts.py: Remove filter_args argument from CountStat definition.
It turned out to not be that useful once we added subgroup. The previous
design of the CountStat object also assumed more reuseability of the *_query
strings than what ended up happening.

The filter_args also had some carrying costs:

* It's hard to be confident that filter_args other than the ones explicitly
  in our tests would have had expected behavior.
* The filter_args/join_args system is the most complex part of the CountStat
  object, and makes understanding the *_query strings unnecessarily
  difficult for a new contributor.
2017-04-14 11:41:07 -07:00
Rishi Gupta 4dfadba244 counts.py: Hardcode is_active=true in count_user_by_realm_query.
A step towards removing filter_args from the CountStat object.
2017-04-14 11:41:07 -07:00
Rishi Gupta 6bb97db136 analytics: Add active_users_audit:is_bot:day. 2017-04-14 11:41:07 -07:00
Rishi Gupta cc75d83b74 counts.py: Reorder count_stats_ to put similar stats together. 2017-04-14 11:41:07 -07:00
Rishi Gupta 3d514c3e8d analytics: Add a default for the value column in assertTableState.
A default value of 1 is reasonable in this framework, especially for testing
things like LoggingCountStats.
2017-04-14 11:41:07 -07:00
Rishi Gupta 2f74ccabf9 analytics: Add 15day_actives CountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 9b661ca91f analytics: Replace CountStat.is_gauge with interval.
Groundwork for allowing stats like "Monthly Active Users".

CountStat.interval is no longer as clean a value as before, so removed it
from views.get_chart_data. It wasn't being used by the frontend anyway.

Removing interval from logger calls in counts.py is not a big loss since we
now include the frequency (which is typically also the interval) in
CountStat.property.
2017-04-14 11:41:07 -07:00
Rishi Gupta d6c5c672d3 analytics: Add minutes_active CountStat. 2017-04-14 11:41:07 -07:00
Rishi Gupta 6e425814bf analytics: Add a few tests for fixtures.py.
The code in fixtures.py is only called from populate_analytics_db, and is
only used for generating pretty fixture data for manual testing. This commit
adds tests for a few things that were easy to add tests for, and provides
some minimal coverage of the file, but is not meant to be comprehensive.
2017-04-13 12:37:47 -07:00
Rishi Gupta b555191ee7 analytics: Change subgroup and labels lists to dictionary in views.py.
A dict is more semantically appropriate than two lists with an assertion
that they be the same length.
2017-04-13 12:37:47 -07:00
Rishi Gupta f3fc9721f4 analytics: Match client names in populate_analytics_db to populate_db.
Originally, all the client names in populate_analytics_db started with
underscores to make it easy to selectively delete and regenerate them when
re-running populate_analytics_db.

We eventually want to merge populate_analytics_db into populate_db though,
in which case it makes more sense for them to share client names, and not
worry about the case where we run (or re-run) populate_analytics_db
independently of populate_db.
2017-04-12 11:45:15 -07:00
Rishi Gupta 30024d0a8f models: Remove Realm.domain. 2017-03-25 19:55:48 -07:00
Rishi Gupta 9f60dd8387 analytics: Send zeros for data.user.bot in Messages Sent Over Time.
It will simplify the logic needed to process the "Sent by Me" view in
Messages Sent Over Time in stats.js.

Also, we gzip the data sent from our server, so there is little additional
network usage by doing this.
2017-03-25 14:18:23 -07:00
Tim Abbott a474f4359d tests: Set maxDiff to None unconditionally. 2017-03-21 07:34:16 -07:00
Tim Abbott 8041ebf579 mypy: Annotate maxDiff variable. 2017-03-21 07:31:37 -07:00
Tim Abbott 20a7609018 analytics: Rename message count types to use standard Zulip casing. 2017-03-21 00:09:54 -07:00
hollywoodno dd067c761a analytics: Separate private messages from group private messages.
This makes it possible for our graphs to show the group private
message counts as separate from 1:1 private messages.

Fixes #4102.
2017-03-20 11:46:29 -07:00
Tim Abbott 0c0e5397c4 analytics: Fix nondeterministic ordering of labels. 2017-03-20 11:39:08 -07:00
Rishi Gupta ceac6d9c59 analytics: Remove stray comment from test_counts.py.
The "actual test that would be nice to do" was indeed done!
2017-03-17 21:58:51 -07:00
Umair Khan 4442703011 jinja2: No need for custom render_to_response.
Django 1.10 has changed the implementation of this function to
match our custom implementation; in addition to this, we prefer
render().

Fixes #1914 via #4093.
2017-03-17 13:57:34 -07:00
Umair Khan 6511f929cc analytics: Change render_to_response to render.
Related to #4093
2017-03-17 13:52:59 -07:00
Rishi Gupta 7c6f0033ed analytics: Add test for do_drop_all_analytics_tables. 2017-03-14 16:59:54 -07:00
Rishi Gupta 87981a2bf1 analytics: Fix direct import of models in migrations. 2017-03-14 16:59:54 -07:00
Rishi Gupta ebebd04587 analytics: Fix ValueErrors affecting test coverage.
Pathways that only catch internal code errors should use AssertionError so
that they are not included when computing test coverage.
2017-03-14 16:59:54 -07:00
Rishi Gupta b18bfe6771 analytics: Standardize format of zerver count queries.
count_message_type_by_user_query is in a different format (no WHERE clause)
from the rest since I'm having a hard time reasoning about how that would
interact with the LEFT JOIN, especially given that there are %(join_args)s.
2017-03-14 16:59:54 -07:00
Rishi Gupta e33ef1c788 analytics/models: Remove extended_id and key_model.
They are unused / were part of a previous design.
2017-03-14 16:59:54 -07:00
Rishi Gupta 35f854a2fd analytics: Add test for do_aggregate_to_summary_table. 2017-03-04 16:46:09 -08:00
Rishi Gupta 8feea6c598 analytics: Add LoggingCountStat for number of users. 2017-03-04 16:46:09 -08:00
Rishi Gupta 51b7677db7 Add RealmAuditLog table and record user activation/deactivation events.
The RealmAuditLog will make it easier for server admins to replay history.
2017-03-04 16:45:44 -08:00
Raghav Jajodia a3a03bd6a5 mypy: Added Dict, List and Set imports.
Fixed mypy errors associated with the upgrade.
2017-03-04 14:33:44 -08:00
Rishi Gupta 1453a5bfda Change string_id of test zephyr realm from mit to zephyr.
Also changes Realm.is_zephyr_mirror_realm to use string_id=zephyr instead of
domain=mit.edu.
Part of a larger migration away from Realm.domain.
2017-03-04 12:18:01 -08:00
Rishi Gupta 8bea47d6b5 analytics: Do a stylistic cleanup of TestProcessCountStat. 2017-03-03 16:12:12 -08:00
Rishi Gupta 6c784d6321 analytics: Refactor COUNT_STATS declaration to not repeat itself. 2017-03-03 16:11:28 -08:00
Rishi Gupta 20255e48a4 analytics: Change messages_sent_to_stream to a daily stat.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.

Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
  same as the other messages_sent stats, but they are different (these don't
  include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
  allowable length for a BaseCount.property.

Includes a database migration to remove the old stat from the analytics
tables.
2017-03-03 16:11:28 -08:00
Rishi Gupta 4dc791f393 Clean up timestamps.py and add a test. 2017-03-01 23:03:56 -08:00
Rishi Gupta 562bc6429c Replace datetime.now() with timezone.now() in Django ORM queries.
When you pass a naive datetime to the Django ORM, it uses settings.TIME_ZONE
for the time zone. In the development environment, both settings.TIME_ZONE
and datetime.now() use 'America/New_York', so there is no change in behavior
there. (fromtimestamp with no tz argument uses the same timezone as
datetime.now)

We are soon going to change settings.TIME_ZONE to UTC, so need to remove
naive datetimes from queries to the ORM.
2017-03-01 22:54:28 -08:00
Rishi Gupta 01a4615f6e Change datetime.now to timezone.now in active_user_stats_by_day.
This actually fixes previously broken behavior, since 'date' here gets
turned into the 'day' argument of seconds_active_during_day(day), where
tzinfo is set to UTC.
2017-03-01 22:54:28 -08:00
Rishi Gupta 2b2be8120f Change datetime.now(tz=X) to timezone.now().
datetime.now with a timezone set is equivalent to timezone.now() if it's
never being printed out, but the latter is cleaner and more idiomatic.
2017-03-01 22:54:28 -08:00
Rishi Gupta eee5cb5197 analytics: Add tests for views code. 2017-02-11 14:51:01 -08:00
Rishi Gupta d6ce017a58 analytics: Minor cleanup of views.py. 2017-02-11 14:51:01 -08:00
Rishi Gupta 480fc0874b analytics: Break ties deterministically in sort_client_labels. 2017-02-11 14:51:01 -08:00
Tim Abbott f944ac8902 mypy: Fix incorrect annotation for by_used_time. 2017-02-10 23:53:44 -08:00
Rishi Gupta 19d1fc6223 stats: Pass user data to the frontend for messages sent over time. 2017-02-10 14:41:18 -08:00
Rishi Gupta 68a7f91022 stats: Add a fixed display order to summary charts.
API: Adds a "display_order" to the response, which is a suggested order of
importance for the clients or recipient types respectively.

frontend: Changes messages_sent_by_{client,recipient_type} to use a fixed
order for any given user.
2017-02-10 14:41:18 -08:00
Rishi Gupta cf3ae2eafe stats: Turn messages_sent_by_client into a bar chart.
Also includes a number of changes to messages_sent_by_recipient_type that
were convenient to do at the same time, since the two charts share a lot of
code.
2017-02-10 14:41:18 -08:00
Rishi Gupta ce89c64f43 stats.js: Move name_map computation to the backend. 2017-02-10 14:41:18 -08:00