Commit Graph

352 Commits

Author SHA1 Message Date
Rishi Gupta 562bc6429c Replace datetime.now() with timezone.now() in Django ORM queries.
When you pass a naive datetime to the Django ORM, it uses settings.TIME_ZONE
for the time zone. In the development environment, both settings.TIME_ZONE
and datetime.now() use 'America/New_York', so there is no change in behavior
there. (fromtimestamp with no tz argument uses the same timezone as
datetime.now)

We are soon going to change settings.TIME_ZONE to UTC, so need to remove
naive datetimes from queries to the ORM.
2017-03-01 22:54:28 -08:00
Rishi Gupta 01a4615f6e Change datetime.now to timezone.now in active_user_stats_by_day.
This actually fixes previously broken behavior, since 'date' here gets
turned into the 'day' argument of seconds_active_during_day(day), where
tzinfo is set to UTC.
2017-03-01 22:54:28 -08:00
Rishi Gupta 2b2be8120f Change datetime.now(tz=X) to timezone.now().
datetime.now with a timezone set is equivalent to timezone.now() if it's
never being printed out, but the latter is cleaner and more idiomatic.
2017-03-01 22:54:28 -08:00
Rishi Gupta eee5cb5197 analytics: Add tests for views code. 2017-02-11 14:51:01 -08:00
Rishi Gupta d6ce017a58 analytics: Minor cleanup of views.py. 2017-02-11 14:51:01 -08:00
Rishi Gupta 480fc0874b analytics: Break ties deterministically in sort_client_labels. 2017-02-11 14:51:01 -08:00
Tim Abbott f944ac8902 mypy: Fix incorrect annotation for by_used_time. 2017-02-10 23:53:44 -08:00
Rishi Gupta 19d1fc6223 stats: Pass user data to the frontend for messages sent over time. 2017-02-10 14:41:18 -08:00
Rishi Gupta 68a7f91022 stats: Add a fixed display order to summary charts.
API: Adds a "display_order" to the response, which is a suggested order of
importance for the clients or recipient types respectively.

frontend: Changes messages_sent_by_{client,recipient_type} to use a fixed
order for any given user.
2017-02-10 14:41:18 -08:00
Rishi Gupta cf3ae2eafe stats: Turn messages_sent_by_client into a bar chart.
Also includes a number of changes to messages_sent_by_recipient_type that
were convenient to do at the same time, since the two charts share a lot of
code.
2017-02-10 14:41:18 -08:00
Rishi Gupta ce89c64f43 stats.js: Move name_map computation to the backend. 2017-02-10 14:41:18 -08:00
Rishi Gupta a1b1ffe1e4 analytics: Base default views end_time on FillState, not current time. 2017-02-10 14:41:07 -08:00
Rishi Gupta 6ab31d1bac analytics: Move time computation to later in get_chart_data. 2017-02-10 14:40:14 -08:00
Tim Abbott ec52322ae1 stats: Include Zulip and realm name in heading. 2017-02-07 11:22:57 -08:00
Tim Abbott 6c4eaf3d14 analytics: Map client names to user-facing versions.
This makes the pie charts on /stats more readable.
2017-02-05 22:19:10 -08:00
Tim Abbott 161522e04c analytics: Add comment explaining server admin routes. 2017-02-02 16:23:10 -08:00
Rishi Gupta 5eb5fa3f31 analytics: Change time_range to not include current day/hour.
Current day/hour will always be 0, since we haven't computed it yet for the
CountStat tables.
2017-02-02 10:59:52 -08:00
Tim Abbott e8b0880320 analytics: Log updates to analytics counts. 2017-02-01 17:02:46 -08:00
umkay 76f3d02590 analytics: Add cron job to run analytics jobs.
This adds a cron job to update the Zulip analytics counts, complete
with locking etc.

Substantially tweaked by tabbott.
2017-02-01 17:02:46 -08:00
Tim Abbott b7df84d5a8 analytics: Add indexes to optimize performance of aggregation.
These indexes fix some slow queries used in updating the analytics
tables, resulting in the analytics system consuming far less total
resources.
2017-02-01 15:47:49 -08:00
Amy Liu 0a39e354dc analytics: Add graphs of usage statistics on /stats.
This adds a frontend for the analytics system we've had for a few
months, showing several graphs of the data in Zulip.

There's a ton more that we can do with this tooling, but this initial
version is enough to provide users with a pretty good experience.

Fixes #2052.
2017-01-31 22:18:54 -08:00
Tim Abbott 4e171ce787 lint: Clean up E126 PEP-8 rule. 2017-01-23 22:06:13 -08:00
Tim Abbott d6e38e2a5c lint: Clean up E123 PEP-8 rule. 2017-01-23 21:34:26 -08:00
Tim Abbott 9cc83f87fc lint: Clean up E241 PEP-8 rule. 2017-01-23 21:21:14 -08:00
Tim Abbott 9640a9e864 lint: Clean up E712 PEP-8 rule. 2017-01-23 21:11:18 -08:00
Rishi Gupta 29799d93c6 analytics/views.py: Always return time series data for stats.
Makes a number of simplications to the analytics views code. The main one is
that we now return the entire data series, even if the data is eventually
going to go into a pie chart. This was prompted by us wanting several
different pie charts for each stat (one for last 30 days, one for all time,
etc), but I think it is also a more natural API. The total amount of data
being sent for the pie charts now is maybe half of what is being sent for
our single 'hourly' stat, or maybe up to 10,000 ints per year the
organization has been around.

The other big change is that the data being sent back is now always explicit
about whether it is data about the realm (stored in data['realm'], or data
about the user (stored in data['user']).
2017-01-19 17:44:17 -08:00
Rishi Gupta 734ca4644c analytics: Add random_seed argument to generate_time_series_data. 2017-01-17 15:54:57 -08:00
Rishi Gupta 37bdc7c010 analytics: Remove COUNT_STATS['messages_sent:hour'].
Having both messages_sent:hour and messages_sent:is_bot:day is confusing,
since a single messages_sent:is_bot:hour would have a superset of the
information and take less total space. This commit and its parent together
replace the two stats with a single messages_sent:is_bot:hour.
2017-01-17 15:54:57 -08:00
Rishi Gupta b593ac9d7c analytics: Change messages_sent:is_bot to hourly frequency.
In preparation for replacing messages_sent.
2017-01-17 15:54:57 -08:00
Rishi Gupta 68fcb4152f analytics: Remove interval field from *Count tables.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
2017-01-17 15:54:57 -08:00
Rishi Gupta a8f2ebb443 analytics: Include interval in COUNT_STATS property names. 2017-01-17 15:54:57 -08:00
Rishi Gupta c466036c80 analytics: Remove unneeded references to interval from test_counts.py. 2017-01-17 15:54:57 -08:00
Rishi Gupta 12d277d4f4 analytics: Change messages_sent:client stat to daily frequency.
A few reasons:
* Our two other subgroup'd message stats in UserCount are at CountStat.DAY
  frequency (messages_sent:is_bot and messages_sent:message_type).
* Keeping this stat at hourly frequency would likely double the size of our
  analytics table, given the current stats. (Counterpoint: if there are
  roughly as many active streams as active users, and we keep
  messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat
  is only a 30% or 50% increase).
* We're currently only showing this on the frontend as a pie chart anyway.
2017-01-17 15:54:57 -08:00
Rishi Gupta 690002aef8 analytics: Add fixtures for several CountStats. 2017-01-17 15:54:57 -08:00
Rishi Gupta 2710a944e8 analytics: Refactor fixture creation to make it more general.
Also less verbose, in preparation for adding a bunch more fixtures.
2017-01-17 15:54:57 -08:00
Rishi Gupta 1f4a4e5e26 analytics: Force --clear-existing-data option in populate_analytics_db.
Makes more sense for a fixture generating script to just clear the existing
data every time.
2017-01-17 15:54:57 -08:00
Rishi Gupta 680e7f75e1 analytics: Change generate_time_series_data argument from length to days.
Previously, this function seemed ambivalent about whether it was generating
a series of abstract data points or a series of data points that would
correspond to times. Switch firmly to the latter, so e.g. if the frequency
changes, so will the length of the output sequence.
2017-01-17 15:54:57 -08:00
Rishi Gupta 3712fda30d analytics: Ensure fixture data points are non-negative. 2017-01-17 15:54:57 -08:00
Rishi Gupta ecfc336a15 analytics: Add views for remaining /stats graphs. 2017-01-17 15:54:57 -08:00
Rishi Gupta 73c0c4c52e analytics/views.py: Increase efficiency of get_time_series_by_subgroup.
Not sure if this would actually be a performance problem in practice, but
this was originally making a database query for each subgroup (instead of
just a single query getting data for all the subgroups).

Also removed the filter against the interval column, which will soon not be
needed (interval will be uniquely determined by the property).
2017-01-17 15:54:57 -08:00
Rishi Gupta d873902755 analytics/views.py: Refactor get_messages_sent_by_humans_and_bots.
Refactor out the reusable parts, since we're about to add several more
views.
2017-01-17 15:54:57 -08:00
Rishi Gupta 3a72b5cda9 analytics: Rename messages_sent_to_realm.
Several additional stats in the pipeline that also relate to messages sent
to the realm.
2017-01-17 15:54:57 -08:00
Rishi Gupta cdb1c96169 analytics tests: Refactor assertCountEquals calls to be more readable. 2017-01-17 15:54:57 -08:00
Rishi Gupta 59d50c3a47 analytics tests: Make it easy to refer to users in test realm. 2017-01-17 15:54:57 -08:00
Rishi Gupta 54e66e6079 analytics: Add remaining backend tests in TestCountStats. 2017-01-17 15:54:57 -08:00
aakash-cr7 b373f2ef0f analytics: Add backend test for messages_sent_to_stream:is_bot. 2017-01-17 15:54:57 -08:00
Amy Liu 10c0c2b16d analytics: Add backend tests for messages_sent:message_type. 2017-01-17 15:54:57 -08:00
Rishi Gupta f30b174199 analytics: Set property and interval defaults in assertCountEquals. 2017-01-17 15:54:57 -08:00
Rishi Gupta a563a15f88 analytics: Make TestCountStats tests more robust.
Adds two things to TestCountStats.setUp():
* A realm with no messages, that generally should not show up in *Count
  tables,
* Users/streams/messages created at 0, 1, 61, and 1441 (just over a day)
  minutes ago (previously was 0, 60), to better test the start_time/end_time
  in the queries, and the frequency/interval setting in the CountStats.
2017-01-17 15:54:57 -08:00
Rishi Gupta e94bc8f142 analytics tests: Autogenerate names for create* functions. 2017-01-17 15:54:57 -08:00
Amy Liu f7ce76fb63 analytics: Add create_stream_with_recipient and create_huddle_with_recipient.
This commit replaces AnalyticsTestCase.create_stream with create_stream_with_recipient and adds the method create_huddle_with_recipient.
2017-01-17 15:54:57 -08:00
Rishi Gupta f375caed46 /activity: Fix URL route for analytics.views.get_realm_activity.
analytics.views.get_realm_activity was taking a 'realm_str', but the URL
route was expecting a 'realm'. Changed the URL route to take a 'realm_str'.
2017-01-12 15:21:06 -08:00
Rishi Gupta 3f2a002c6e analytics/lib/counts.py: Fix one of the COUNT_STATS definitions.
Fixes an error in the definition of
COUNT_STATS['messages_sent_to_stream:is_bot']. The CountStat needs a
group_by argument since it is supposed to group by UserProfile.is_bot.
2017-01-10 20:41:07 -08:00
Rishi Gupta 977f5b9178 analytics/lib/counts.py: Fix error in count_message_type_by_user_query.
This query counts the number of messages each user has sent, subgroup'd by
whether the message was a private_message (PM or sent to a huddle), sent to
a 'private_stream', or sent to a 'public_stream'.

We need to join on zerver_stream to find out whether stream messages were
sent to public streams or private streams, but it needs to be a LEFT JOIN
rather than a JOIN so that we preserve the messages sent to non-streams.
2017-01-10 20:41:07 -08:00
Rishi Gupta 6374596a77 analytics: Add initial fixture for testing views. 2017-01-10 17:48:07 -08:00
Tim Abbott 3f8d4193da lint: Fix % comprehensions being used without a tuple. 2017-01-09 11:45:11 -08:00
Rishi Gupta ac29928d91 Remove domain from analytics management commands. 2017-01-09 11:26:08 -08:00
Rishi Gupta e14f575979 Remove domain from analytics/views.py. 2017-01-09 11:26:08 -08:00
Rishi Gupta 552d626ef2 analytics: Fix FillState.last_modified not being updated.
We were updating FillState with FillState.objects.filter(..).update(..),
which does not update the last_modified field (which has auto_now=True).
The correct incantation is the save() method of the actual FillState
object.
2017-01-08 23:36:34 -08:00
Rishi Gupta 190d320afa analytics: Change CountStat.property from Text to str. 2017-01-08 17:24:51 -08:00
Rishi Gupta a07757c127 analytics/views: Fix query in get_messages_sent_to_realm. 2017-01-08 17:24:51 -08:00
Rishi Gupta f8962d521d analytics: Fix uses of 'interval' in arguments and variable names.
interval refers to a time interval, and frequency refers to something that
semantically means something closer to 'hourly' or 'daily'.

Currently, interval can have values 'hour', 'day', or 'gauge', and frequency
can only have values 'hour' and 'day'.
2017-01-08 17:24:51 -08:00
Rishi Gupta f5899dd14b analytics: Add lib/ function to drop all analytics tables. 2017-01-08 17:24:51 -08:00
Rishi Gupta 73dc904e9c analytics: Move time_range from views.py to lib/time_utils.py 2017-01-08 17:24:51 -08:00
Tommy Ip 28abfca565 analytics: Fix bare except clause. 2017-01-08 16:25:22 -08:00
Rishi Gupta 2b0a7fd0ba Rename models.get_realm_by_string_id to get_realm.
Finishes the refactoring started in c1bbd8d. The goal of the refactoring is
to change the argument to get_realm from a Realm.domain to a
Realm.string_id. The steps were

* Add a new function, get_realm_by_string_id.

* Change all calls to get_realm to use get_realm_by_string_id instead.

* Remove get_realm.

* (This commit) Rename get_realm_by_string_id to get_realm.

Part of a larger migration to remove the Realm.domain field entirely.
2017-01-04 17:12:23 -08:00
Rishi Gupta 605361ec86 makemessages: Fix string with unnamed arguments in analytics/views.py. 2016-12-30 16:52:24 -08:00
Rishi Gupta 9e5325a164 Add /stats page with basic stats graph.
Adds a new url route and a new json endpoint.
2016-12-29 14:20:13 -08:00
Rishi Gupta 31efe858ef Clean up imports in analytics/views.py. 2016-12-29 14:20:13 -08:00
Rishi Gupta 717afcb408 Remove calls to get_realm in preparation for its deprecation.
Also removes two calls to email_to_domain.
2016-12-26 17:53:32 -08:00
Rishi Gupta c7c0e36508 analytics: Add InstallationCount checks to prototype TestCountStat.
Was enabled by commit 41e8ee3 where we moved TIME_ZERO to before the realms
created by populate_db.py.

Also removes the stub for TestAggregates, since the remaining thing to be
tested was the aggregation from RealmCount to InstallationCount, and the end
to end checks provided by the TestCountStat tests should be sufficient.
2016-12-20 12:03:23 -08:00
Rishi Gupta dbc94d0fc0 analytics: Remove test for no longer supported behavior.
In a previous design, there was no FillState table, and one could run any
CountStat at any time. This is no longer supported.

This test was making sure that if one ran a CountStat at a certain hour, and
then ran it at a previous hour, the old rows would still be there.
2016-12-20 12:03:23 -08:00
Rishi Gupta e09aaf1020 analytics: Remove tests that will be subsumed by TestCountStats. 2016-12-20 12:03:23 -08:00
Rishi Gupta 6748b72ccc analytics: Remove tests now covered by test_active_users_by_is_bot. 2016-12-20 12:03:23 -08:00
Rishi Gupta 2211b8b102 analytics: Change count_message_by_stream to join on UserProfile.
It seems unlikely we will need count_message_by_stream without the
UserProfile table in the future, so write count_message_by_stream_and_is_bot
in the usual query form and replace count_message_by_stream with it.
This also has the benefit of shortening our list of "special case" queries
from two to one.

The pathways of the removed test will be covered more thoroughly in the new
TestCountStats tests.
2016-12-20 12:03:23 -08:00
Rishi Gupta 6992f9784c analytics: Update TestCountStat prototype. 2016-12-20 12:03:23 -08:00
Rishi Gupta c6a6c871ee analytics: Change TIME_ZERO in tests to be in the past. 2016-12-20 12:03:23 -08:00
Rishi Gupta d95fb33d8d analytics: Add subgroups to unicode representations in models.py. 2016-12-20 12:03:23 -08:00
Rishi Gupta f34af0896d analytics: Add subgroup argument to assertCountEquals. 2016-12-20 12:03:23 -08:00
Rishi Gupta 31cf8db28c analytics: Allow assertCountEquals to work on InstallationCount. 2016-12-20 12:03:23 -08:00
Rishi Gupta 93a10a475a counts.py: Fix count_message_type_by_user_query. 2016-12-15 16:02:12 -08:00
Rishi Gupta 4f3e1b2ece analytics/lib/counts.py: Fix messages_sent_to_stream:is_bot.
Adds a new query.
2016-12-15 16:02:12 -08:00
Rishi Gupta 87b47ec283 analytics: Add __unicode__ method to the CountStat object. 2016-12-15 16:02:12 -08:00
reyha 82e32ad255 Access realm by `string_id` in management commands.
`Realm.string_id` replaces 'Realm.domain'
in the management commands.

Fixes #2325.
2016-12-14 10:38:03 -08:00
anirudhjain75 beaa62cafa mypy: Convert several directories to use typing.Text.
Specifically, these directories are converted: [analytics/, scripts/,
tools/, zerver/management/, zilencer/, zproject/]
2016-12-07 20:51:05 -08:00
nikolay abc2ff4a06 pep8: Fix many rule E128 violations.
[Tweaked by tabbott to adjust some approaches used in wrapping]
2016-12-03 13:33:31 -08:00
bulat22101 adebc75740 pep8: Fix E502 violations 2016-12-03 10:56:36 -08:00
Sidhant Bhavnani 8c0c12c1d9 pep8: Fix E303 violations. 2016-12-02 15:34:11 -08:00
AZtheAsian 1ba150fa85 pep8: Fix E203 violations 2016-12-01 20:37:57 -08:00
Rafid Aslam c5316b4002 lint: Fix E127 pep8 violations.
Fix pep8: E127 continuation line over-indented for visual indent
style issue.
2016-12-01 10:23:55 -08:00
Rafid Aslam 41bd88d5ed pep8: Fix E301 pep8 violations.
Fix "E301: expected (1 or 2) blank line" pep8 violations.
2016-11-29 08:51:44 -08:00
Rishi Gupta 4b183cd526 domain migration: Remove several instances of get_realm.
Remove the easy to remove instances of get_realm.
2016-11-26 15:19:56 -08:00
Anders Kaseorg 207cf6302b Always start python via shebang lines.
This is preparation for supporting using Python 3 in production.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2016-11-26 14:46:37 -08:00
Umair Khan 7d51efe9a1 Django 1.10: Fix dummy data for count stat.
Django 1.10 checks the foreign key constraints as part of the testing
suite so we need to create test data which passes validation tests.
2016-11-14 16:09:12 -08:00
umkay dc8463e09c analytics: Remove incorrect filter args for stat.
The filter args dictionary applies to the X table in a count X by Y query,
which in this case is the zerver_message table. This stat had an incorrect set
of arguments meant for the zerver_userprofile table.
2016-11-10 12:25:21 -08:00
Umair Khan d837753d4b Django 1.10: Update analytics urls. 2016-11-10 16:20:03 +05:00
Umair Khan 682aa1f298 Django 1.10: Use add_argument for options in BaseCommand. 2016-11-04 10:20:23 -07:00
Umair Khan b140236fcf Django 1.10: Do not use patterns function. 2016-11-04 10:06:00 -07:00
umkay e6ac8c3543 analytics: Add extra count stats.
Fill in remaining countstats in counts.py for our intended use cases.
2016-11-03 16:50:39 -07:00
umkay 298890d125 analytics: Rename count stats and associated properties.
Our current naming convention is getting unwieldy. The subgroup now goes
on the right side of the colon.
2016-11-03 16:50:39 -07:00
umkay 5490442580 analytics: Replace all joins in raw SQL with natural joins.
We alter the behavior of our queries to no longer write rows with 0 counts
to the db, and pad with 0s in the related views code. As a result we are
also able to combine the where and join clause conditions in the sql
queries. This new behavior is also updated in our tests.
2016-11-03 16:50:39 -07:00
Rishi Gupta db0e509422 do_create_realm: Replace domain argument with string_id.
Turns string_id into a required argument, and domain into an optional
argument.
2016-11-02 22:46:34 -07:00
Rishi Gupta 9ef8536cc6 models.Realm: Require Realm.string_id to be non-NULL.
Adds a database migration, adds a new string_id argument to the management
realm creation command, and adds a short name field to the web realm
creation form when REALMS_HAVE_SUBDOMAINS is False.
2016-11-02 22:46:34 -07:00
umkay 5e5a0d4db9 analytics: Add user-level count query for messages sent to {PMs, streams}.
Adds a count_X_by_Y_query to counts.py, similar in spirit to a
count_recipient_by_user query, where we would join on the Message,
Recipient, and UserProfile table. Here, we also join on the Stream table in
order to distinguish private and public streams, and we merge the counts for
PM and Huddle type messages into a single subgroup.
2016-11-01 17:00:43 -07:00
umkay a94599fca7 analytics/models.py: Add subgroup column to unique_together constraints. 2016-11-01 16:53:56 -07:00
umkay e92604ab78 analytics: Alter field length for property and interval in BaseCount. 2016-10-27 16:33:58 -07:00
umkay 610e92b94e analytics: Add subgroup column to analytics tables.
This is a major change to the analytics schema, and is the first step in a
number of refactorings and performance improvements. For instance, it allows

* Grouping sets of similar CountStats in the *Count tables. For instance,
  active{_humans,_bots} will now have the same property, but have different
  subgroup values.

* Combining queries that differ only in their value on 1 filter clause, so
  that we make fewer passes through the zerver tables. For instance, instead
  of running a query for each of messages_sent_to_public_streams and
  messages_sent_to_private_streams, we can now run a single query with a
  group by on Stream.invite_only, and store the group by value in the
  subgroup column.
2016-10-27 16:33:58 -07:00
Rishi Gupta 54016e1096 analytics: Remove outdated comment in counts.py. 2016-10-25 13:42:55 -07:00
umkay 87d22c9e4d analytics: Fix count_stream_by_realm.
Add a join clause on zerver_message in count_stream_by_realm,
otherwise we only output the final total streamcount for a realm
for every time entry.
2016-10-22 19:10:36 -07:00
umkay 906a4e3b26 analytics: Add performance and transaction logging to counts.py.
For each database query made by an analytics function, log time spent and
the number of rows changed to var/logs/analytics.log.
In the spirit of write ahead logging, for each (stat, end_time)
update, log the start and end of the "transaction", as well as time
spent.
2016-10-17 16:10:03 -07:00
Tim Abbott 4a4664d268 mypy: Remove a bunch of now-unnecessary type: ignore annotations.
Since mypy and typeshed have advanced a lot over the last several
months, we no longer need these `type: ignore` annotations.
2016-10-17 11:48:34 -07:00
Rishi Gupta 82b814a1cd analytics: Simplify frequency and measurement interval options.
Change the CountStat object to take an is_gauge variable instead of a
smallest_interval variable. Previously, (smallest_interval, frequency)
could be any of (hour, hour), (hour, day), (hour, gauge), (day, hour),
(day, day), or (day, gauge).
The current change is equivalent to excluding (hour, day) and (day, hour)
from the list above.

This change, along with other recent changes, allows us to simplify how we
handle time intervals. This commit also removes the TimeInterval object.
2016-10-14 10:18:37 -07:00
Rishi Gupta 807520411b analytics: Simplify logic in do_fill_count_stat_at_hour.
Adding FillState, removing do_aggregate_hour_to_day, and disallowing unused
(interval, frequency) pairs removes the need for the nested for loops in
do_fill_count_stat_at_hour. This commit replaces that control flow with a
simpler equivalent.
2016-10-14 10:18:37 -07:00
Rishi Gupta 27d1360e1d analytics: Remove do_aggregate_hour_to_day.
The functionality provided is more naturally done in the views code. It also
allows us to aggregate using day boundaries from the local timezone, rather
than UTC.
2016-10-14 10:18:37 -07:00
Rishi Gupta 655ee51e35 analytics: Add table to keep track of fill state.
Adds two simplifying assumptions to how we process analytics stats:
* Sets the atomic unit of work to: a stat processed at an hour boundary.
* For any given stat, only allows these atomic units of work to be processed
  in chronological order.

Adds a table FillState that, for each stat, keeps track of the last unit of
work that was processed.
2016-10-14 10:18:37 -07:00
umkay 721529b782 analytics: Remove HuddleCount for now.
Planned changes to the underlying analytics model will require potentially
complicated changes to huddle queries.
2016-10-14 10:18:37 -07:00
umkay 7e2340155d analytics: Fix aggregation to RealmCount for realms with no users.
Previously, if a Realm had no users (or no streams),
do_aggregate_to_summary_table would fail to add a row with value 0. This
commit fixes the issue and also simplifies the do_aggregate_to_summary_table
logic.
2016-10-11 18:20:58 -07:00
Rishi Gupta 52b56cca65 analytics: Reorder arguments to assertCountEquals.
Require a table argument and change argument order around for clarity.
2016-10-11 18:20:58 -07:00
Rishi Gupta 929b69397b analytics: Change string representation of BaseCount models.
Previously we showed both the value and the id of the BaseCount record,
which is confusing in a typical case where you only care about the value,
and both the value and id are smallish ints.
2016-10-09 16:09:04 -07:00
Rishi Gupta c6b611c8b9 analytics: Re-organize tests into higher level TestClasses.
Refactor the current analytics tests into the following classes:
* TestUpdateAnalyticsCounts, which will eventually test the management
  command, backfilling, what happens when new tests are added, etc.
* TestProcessCountStat, which tests the ins and outs of propagating the
  value of a single stat up through the various *Count tables.
* TestAggregates, which tests the do_aggregate_* methods.
* TestXByYQueries, which tests the count_X_by_Y_query SQL snippets.
* TestCountStats, which has tests for individual CountStats.

This commit does not change the name or contents of any individual test.
2016-10-09 16:09:04 -07:00
Rishi Gupta 795d10b9ad analytics: Refactor tests to simplify asserts of count values.
Many tests are structured to run some process, and then check a count in a
BaseCount record using default values for realm, property, interval, and
end_time. This commit adds a new assertCountEquals method to
AnalyticsTestCase, and simplifies other assert calls as appropriate.
2016-10-09 16:09:04 -07:00
Rishi Gupta 35cf9092d5 analytics: Canonicalize creation of model objects in tests.
Add a default_realm object to AnalyticsTestCase, created 2 days before
AnalyticsTestCase.TIME_ZERO.

Add lightweight create_user, create_stream, and create_message methods to
AnalyticsTestCase, with sensible defaults. In particular, all objects are by
default created at AnalyticsTestCase.TIME_LAST_HOUR, so that they are
included when running AnalyticsTestCase.process_last_hour.
2016-10-09 16:09:04 -07:00
Rishi Gupta 6996b21480 analytics: Change tests to use a fixed TIME_ZERO.
Previously, analytics tests used timezone.now or custom datetime objects
when creating new realms, users, and streams.

This commit adds a fixed TIME_ZERO and a process_last_hour helper function
in a new AnalyticsTestCase class, and modifies the existing tests to use
them.
2016-10-09 16:09:04 -07:00
umkay 78477ea071 Reorder the columns in analytics tables inherited from BaseCount.
This is primarily implemented through altering the migration file in
order to move the columns, but also we try to make the defaults a
little better for future tables inherited from BaseCount.
2016-10-06 17:51:01 -07:00
umkay 01324f2afe Fix aggregation to analytics summary tables.
There are a number of different stats that need to be propagated from
UserCount and StreamCount to RealmCount, and from RealmCount to
InstallationCount. Stats with hour intervals also need to have their day
values propagated. This commit fixes a bug in the summary table aggregation
logic so that for a given interval on a CountStat object we pull the correct
counts for the interval as well as do the day aggregation if required. We Also
ensure that any aggregation then done from the realmcount
table to the installationcount table follows the same aggregation logic
for intervals.
2016-10-06 08:46:33 -07:00
Tim Abbott 273c17a072 update_analytics_counts: Add missing future imports. 2016-10-05 17:13:46 -07:00
umkay 5d0bed8673 Add script to clear analytics tables. 2016-10-05 17:11:13 -07:00
Tim Abbott 3973ae5dbb update_analytics_counts: Fix buggy argument parsing. 2016-10-04 20:43:19 -07:00
umkay d260a22637 Add a new statistics/analytics framework.
This is a first pass at building a framework for collecting various
stats about realms, users, streams, etc. Includes:
* New analytics tables for storing counts data
* Raw SQL queries for pulling data from zerver/models.py tables
* Aggregation functions for aggregating hourly stats into daily stats, and
  aggregating user/stream level stats into realm level stats
* A management command for pulling the data

Note that counts.py was added to the linter exclude list due to errors
around %%s.
2016-10-04 17:18:54 -07:00
Taranjeet a137bf15ed Wrap some lines with length greater than 120.
With some tweaks by tabbott.
2016-07-06 14:35:16 -07:00
Eklavya Sharma 71e613424b Fix annotations clashing with UserProfile's model fields. 2016-06-13 20:01:01 +05:30
Hyunchel Kim f226456675 Add type annotations for analytics/views.py.
Type of parameter for function `is_recent`(line no.812) is `datetime`.
MyPy errors out, however, when the parameter is defined as `datetime`.
To get around, type `Any` is used.
2016-06-05 15:04:24 -07:00
Tim Abbott a1a27b1789 Annotate most Zulip management commands. 2016-06-04 10:12:06 -07:00
Eklavya Sharma 94e4b39112 Replace python2.7 by python everywhere. 2016-05-29 05:03:08 -07:00
Umair Khan f9bbc5d6ff Enable i18n support in URL configuration.
This supports i18n using all of the following:
- I18N urls
- Session
- Cookie
- HTTP header
2016-05-19 08:33:30 -07:00
Tim Abbott efd24b374e analytics: Fix cnts variable reuse with different type.
Found using mypy.
2016-05-12 14:07:32 -07:00
Tim Abbott b869be9301 style: Use 'not in' consistently rather than `not foo in`. 2016-05-09 17:00:10 -07:00
Umair Khan 5359e6b0d4 Convert Zulip to use Jinja2 templates.
This results in a substantial performance improvement for all of
Zulip's backend templates.

Changes in templates:
- Change `block.super` to `super()`.
- Remove `load` tag because Jinja2 doesn't support it.
- Use `minified_js()|safe` instead of `{% minified_js %}`.
- Use `compressed_css()|safe` instead of `{% compressed_css %}`.
- `forloop.first` -> `loop.first`.
- Use `{{ csrf_input }}` instead of `{% csrf_token %}`.
- Use `{# ... #}` instead of `{% comment %}`.
- Use `url()` instead of `{% url %}`.
- Use `_()` instead of `{% trans %}` because in Jinja `trans` is a block tag.
- Use `{% trans %}` instead of `{% blocktrans %}`.
- Use `{% raw %}` instead of `{% verbatim %}`.

Changes in tools:
- Check for `trans` block in `check-templates` instead of `blocktrans`

Changes in backend:
- Create custom `render_to_response` function which takes `request` objects
  instead of `RequestContext` object. There are two reasons to do this:
    1. `RequestContext` is not compatible with Jinja2
    2. `RequestContext` in `render_to_response` is deprecated.
- Add Jinja2 related support files in zproject/jinja2 directory. It
  includes a custom backend and a template renderer, compressors for js
  and css and Jinja2 environment handler.
- Enable `slugify` and `pluralize` filters in Jinja2 environment.

Fixes #620.
2016-05-09 09:55:18 -07:00
Umair Khan 6a0c7fec72 analytics: Add `at_risk_count` to Totals row in realm summary.
This fixes reading from an unset value in realm_summary_table, which
is fine with the Django template engine but will be problematic with
jinja2.
2016-05-07 17:30:06 -07:00
Tim Abbott 191201bd10 Fix unnecessary whitespace between % and (. 2016-05-04 14:22:52 -07:00
Tim Abbott 54022ac204 Fix unnecessary whitespace between , and ). 2016-05-04 14:16:53 -07:00
Ashish 6356584f84 Replace /json/update_pointer with REST style route. 2016-04-11 21:38:23 -07:00
Ashish 41993ef2f5 Replace /json/update_message_flags with REST style route. 2016-04-11 21:38:22 -07:00
Tim Abbott a1b306f9ce Finish purging 'fromt typing import *' from Zulip codebase. 2016-04-07 14:11:21 -07:00
Tim Abbott b8c82d5b43 Add PEP-484 type annotations to analytics/. 2016-04-03 15:40:23 -07:00
Tim Abbott 2436ad19ba analytics: Cleanup confusingly type-variable all_records. 2016-02-03 19:29:07 -08:00
Tim Abbott df1670ef59 Fix various float initialization to use 0.0 instead of 0.
This is needed to type-check these values.
2016-02-03 19:29:07 -08:00
Tim Abbott 1f44417fc1 Switch to using Python 3 style division everywhere.
Also add testing for this to our Python 3 compatibility test suite.
2016-01-26 21:09:43 -08:00
Tim Abbott a79e89b28f Cleanup remaining usage of % comprehensions without explicit tuples. 2015-12-05 15:29:42 -08:00
Tim Abbott 607eedfc25 Apply Python 3 futurize transform libmodernize.fixes.fix_zip. 2015-11-01 09:35:06 -08:00
Tim Abbott f7878a61e1 Apply Python 3 futurize transform libmodernize.fixes.fix_xrange_six. 2015-11-01 09:35:06 -08:00
Tim Abbott cd6f8e9191 Apply Python 3 futurize transform libmodernize.fixes.fix_map. 2015-11-01 09:35:05 -08:00
Tim Abbott b3ac668779 Apply Python 3 futurize transform libmodernize.fixes.fix_filter. 2015-11-01 09:26:16 -08:00
Tim Abbott f3783fb4a1 Apply Python 3 futurize transform libfuturize.fixes.fix_print_with_import. 2015-11-01 09:26:16 -08:00
Tim Abbott 8c34c40924 Apply Python 3 futurize transform lib2to3.fixes.fix_except. 2015-11-01 08:08:33 -08:00
Steven Oud d5435fad1d Consistently use /usr/bin/env python2.7 in shebangs and commands. 2015-10-21 22:58:21 +00:00
Tim Abbott 71a06d58de Convert uses of Realm.objects.get() to get_realm().
get_realm is better in two key ways:
* It uses memcached to fetch the data from the cache and thus is faster.
* It does a case-insensitive query and thus is more safe.
2015-10-15 09:16:58 -04:00
Tim Abbott e29c473077 Simplify analytics code to not filter certain low-interest users/realms.
(imported from commit 2dcf2e50b65c8b96d893cbe7dcdbbe652e6a90ff)
2015-09-19 23:42:28 -07:00
Reid Barton ae0ae3dde8 Django 1.8: declare positional arguments in management commands
(imported from commit d9efca1376de92c8187d25f546c79fece8d2d8c6)
2015-08-20 23:35:40 -07:00
Steve Howell 9224d4b521 Fix comment in meets_goal() function in activity reports.
(imported from commit b07a6059b4fc9ecb609881e7b619ac10cafb9e22)
2014-02-03 13:30:49 -05:00
Tim Abbott 85381a62b9 analytics: Show zulip.com and mit.edu counts on /activity.
To make this give accurate numbers, we need to filter out the
automated traffic from Zephyr mirroring and our internal monitoring.

(imported from commit 83642bc9a9d8d01dd9dc5dc7b3e3dee6c9705162)
2014-02-03 12:38:28 -05:00
Steve Howell 13d015f81e Show /api/v1/send_message traffic in /activity reports.
(imported from commit 5230f92c1376a914099e4c4c883f1bee5d776cad)
2014-01-21 11:23:45 -05:00
Steve Howell b880b9426f Handle Realm.DoesNotExist error in /realm_activity.
(imported from commit 8455b840563fe48ff33b3bcc33f442a7cdd6a2ff)
2014-01-10 21:39:04 -05:00
Steve Howell 8ccf3ad36a Count all queries for mobile apps in /activity.
(imported from commit 1ea200fc8d2432af0c66d501c0fc8a226eb11194)
2014-01-10 21:38:57 -05:00
Steve Howell 7f668463ac Add an ZulipiOS tab to /activity.
(imported from commit 09f609e340a8fb7838e74d558eb3e397ea5f1f7b)
2014-01-10 21:38:57 -05:00
Steve Howell 0b9eb74f93 Remove Pure API tab from /activity.
(imported from commit 32ec22de7746637a18c98f93903661851e367bc8)
2014-01-10 21:38:57 -05:00
Steve Howell 26548da779 Use calendar days for /activity message history.
(imported from commit f4d331e8384ff2663fb6aab7170482f057d7403f)
2014-01-06 11:50:15 -05:00
Steve Howell 0c5f543433 Exclude totals row from # of active sites.
(imported from commit 80f38d8f24337244d7a8e1eb409aaed08dd772be)
2013-12-23 09:57:51 -05:00
Steve Howell fcbc7e1516 Do not count wdaher.com in report.
(imported from commit 61a17c3addc0893282c55135261a0c2ffe797273)
2013-12-23 09:57:50 -05:00
Steve Howell ea1fefa5b5 Show message history in reverse chronological order.
(imported from commit 6237369793044cec7ae8ffec42aaac5d443e2ce1)
2013-12-19 14:03:55 -05:00
Steve Howell 986c46d417 Color code "Messages sent"
(imported from commit d648bfbcdd054a57fb6b39d14d0579819c963c40)
2013-12-18 17:30:47 -05:00
Steve Howell ef089692ed Right-justify numbers on /activity.
(imported from commit 11f167626fc3f1338143fc1cf68a1bb1663a93b9)
2013-12-18 16:38:50 -05:00
Steve Howell 5d85bfbf6b Improve "Messages sent" formatting on /activity.
(imported from commit 891760c116d5a3290b49fb2d75eae0f0b227540d)
2013-12-18 16:34:03 -05:00
Steve Howell ef959a748e Exclude mit.edu from history counts.
(imported from commit eb27f4bb0f6eb07c2d72d4f32fa21995f215254e)
2013-12-18 16:34:03 -05:00
Steve Howell 7cded2a956 Show realm history on the main activity page.
This shows the number of messages sent by humans for the last
eight 24-hour periods, for each realm.  "Messages sent" isn't a
perfect metric of activity, but it's easier to query with our
current data model than certain other statistics.

(imported from commit 9de3c479640a0b9dbc017b245dda21d951f4efa4)
2013-12-18 16:01:44 -05:00
Steve Howell 532fa9fae8 Remove dead code in /activity reports.
(imported from commit ea9ddae01fd5527c6a4ceefaaa5fac091618f544)
2013-12-18 13:42:30 -05:00
Steve Howell 0c1f57815c Add grahite to link to /realm_activity.
(imported from commit 6d2a23ffcca23a65e5aadb2ed8499645e42131e0)
2013-12-18 13:40:07 -05:00
Tim Abbott 3b7bf691e7 Add tool to query our usage stats as of a given date.
This contains the various fixes that needed to be made in order to get
accurate statistics.

Most notably, the active_users_between function in the previous
version of zerver/lib/statistics.py was broken for end dates in the
past, because it used the UserActivity table to get its data -- so in
fact it really was querying "users last active between".

This commit isn't super clean, but I figure we're probably better off
having our latest code for historical usage data in git so it doesn't
bitrot and anyone can improve on it.

(imported from commit 24ff2f24a22e5bdc004ea8043d8da12deb97ff2f)
2013-12-17 15:34:44 -05:00
Leo Franchi 1e7a22f14e Replace other non-zerver uses of iPhone client
(imported from commit 0988e2c9bd0499a0711daed97f89aa672776f085)
2013-12-03 14:35:24 -05:00
Steve Howell 14ae79b311 Count /api/*/external/* messages as "sent" in /realm_activity.
This makes it easier to see how many messages are being sent
by webhook bots.  This assumes a 1:1 relationship between
hitting webhook endpoints and sending messages, which is probably
valid enough in the near future.

(imported from commit eb272cd38b9cabd54d317ce2dfdf12099d302fce)
2013-11-25 15:37:19 -05:00
Steve Howell b5707b9c7b Add is_active check to /realm_activity report.
(imported from commit e531446bc73caf7173e4aa1d22b35f3942c9c1f9)
2013-11-24 16:35:19 -05:00
Steve Howell 07d4dd66c4 Show admins on /realm_activity page.
(imported from commit 7d6beb4145b86d95b872b175ea8a567b7ce56d23)
2013-11-18 14:41:37 -05:00
Steve Howell 91a463def0 Show webhook clients in activity pages.
(imported from commit 423df7def53583ab8122a26271068423f03a687c)
2013-11-18 12:58:39 -05:00
Steve Howell 76a1754357 Use from/import style for datetime imports in activity views.
(imported from commit 72772f63f00d1b5e9d5d0ccefb56b6714d1a2370)
2013-11-18 12:36:25 -05:00
Steve Howell 38a7aa2255 Highlight recently active users in /realm_activity.
(imported from commit 307ed9c17ea1055f40cd983d5c2b2f308203f3f7)
2013-11-18 12:36:24 -05:00
Steve Howell ae77b98155 DRY up East Coast time formatting.
(imported from commit b5b5787816c2dc641bbe4cfb351f4ff765e44969)
2013-11-18 12:36:24 -05:00
Steve Howell dd53b92efc Add row_class support to /activity tables.
(imported from commit 69dcbeccc3b048053b2dcca82c628b0cfccddfbb)
2013-11-18 12:36:24 -05:00
Steve Howell 6e19f8b86f Split humans/bots in History tab for realm reports.
(imported from commit 02e398590f4f844ce30e76703006262296301b97)
2013-11-14 15:46:10 -05:00
Steve Howell 6ddfa35a49 Remove detailed "At risk users" tab
(imported from commit d5630fa3ae5a35d238e3c947a01e1a648668817f)
2013-11-14 14:07:24 -05:00
Steve Howell d4c0334b3b Show counts of at-risk users on Activity page.
(imported from commit 4ce1ecf47259f3a95ddf9cbbc356fa0247044e0f)
2013-11-14 14:07:24 -05:00
Steve Howell f60fa1b274 Don't linkify "Total" on the main Activity tab.
(imported from commit 2098f488f1276a022b4c585b99c25c3a5a66ca06)
2013-11-14 13:28:41 -05:00
Steve Howell 0c74d55d6b Clean up time display on the Activity page.
(imported from commit fc1d0cad1ed8a70ddf34c29633569ed473430fc8)
2013-11-14 13:28:41 -05:00
Steve Howell c214691a1e Linkify more realms on the Activity page.
(imported from commit f9a5ab46ce77b13ff9cf5dcc80231bfb1d11bbee)
2013-11-14 13:28:40 -05:00
Steve Howell b690918480 Extract user_activity_link() for activity reports.
(imported from commit 0ae3770a8dcba2bb9fad6a22133635316f74b0d7)
2013-11-14 13:28:40 -05:00
Kevin Mehall a593a798f8 Move send_stats management command back to zerver/
It's not analytics, and it's needed for restart-server.

(imported from commit 979fa15715ea437cbbc5d986c859ee4d6c668da8)
2013-11-12 15:50:08 -05:00
Waseem Daher 76fa95a5e9 activity: Display times in US/Eastern.
(imported from commit 730d37e0b8da046c1fe9e0c0cfc9b420da560865)
2013-11-08 11:39:59 -05:00
Waseem Daher 8b89a56922 Exclude wdaher.com from our analytics dashboard.
It really shouldn't count.

(imported from commit ad91ecfacbd421bc1e10ef3a15ef28ca9e87a0ae)
2013-11-08 10:47:50 -05:00
Steve Howell 59ec080a8d Simplify analytics/activity.html.
For legacy reasons, this template wanted each tab's content as
a one-key dictionary, instead of a string.  Each tab already has
a tuple to allow for fields like title, so this wasn't really giving us
any long term flexibility; it was just crufting up the calling
code.

(imported from commit 2a316107ec223a83efa8735f4810a6fa43107541)
2013-11-08 10:44:20 -05:00
Steve Howell 38e479d4a6 Extract make_table() in analytics/views.py.
(imported from commit f1457763ac491d32de19221fb9a2d11e43d58106)
2013-11-08 10:44:20 -05:00
Steve Howell 64fb17f9c2 Move management commands to the analytics app.
Move commands related to stats collection and reporting from
zilencer to analytics.  To do this, we had to make "analytics"
officially an app.

(imported from commit 63ef6c68d1b1ebb5043ee4aca999aa209e7f494d)
2013-11-06 16:51:08 -05:00