Previously we showed the total number of users with an active account. This
changes it to show only the number of users that have logged in in the past
two weeks.
Rename 'zulip_internal' decorator to 'require_server_admin', add
documentation for 'server_admin', explaining how to give permission
for ./activity page.
Fixes: #1463.
Previously we would update FillState for daily stats on hourly boundaries as
well. This would create two extra queries on the FillState table every hour
(for each CountStat), which adds roughly 50ms of extra processing for each
CountStat each day, as well as two extra lines each hour in the analytics
log. This can be a minor annoyance when backfilling stats.
I think this is more pythonic?
We could also get rid of LoggingCountStats altogether, since it's now just a
special case of CountStat (is_logging == data_collector.pull_function is None).
But I think it's nice to keep the distinction since they behave so differently.
Removes the circular dependency of CountStat containing a DataCollector, and
DataCollector containing a function that takes a CountStat as an argument.
This will allow us to appropriately generalize CountStat to include
LoggingCountStat and CustomPullCountStat. It'll also make life easier when
we introduce DependentCountStat.
The previous zerver_* names were unwieldy and not very readable. This also
puts more of the useful information in one place; in particular, makes it
easier to skim a CountStat declaration and see if we're collecting it at a
user/stream granularity or a realm granularity.
It turned out to not be that useful once we added subgroup. The previous
design of the CountStat object also assumed more reuseability of the *_query
strings than what ended up happening.
The filter_args also had some carrying costs:
* It's hard to be confident that filter_args other than the ones explicitly
in our tests would have had expected behavior.
* The filter_args/join_args system is the most complex part of the CountStat
object, and makes understanding the *_query strings unnecessarily
difficult for a new contributor.
Groundwork for allowing stats like "Monthly Active Users".
CountStat.interval is no longer as clean a value as before, so removed it
from views.get_chart_data. It wasn't being used by the frontend anyway.
Removing interval from logger calls in counts.py is not a big loss since we
now include the frequency (which is typically also the interval) in
CountStat.property.
The code in fixtures.py is only called from populate_analytics_db, and is
only used for generating pretty fixture data for manual testing. This commit
adds tests for a few things that were easy to add tests for, and provides
some minimal coverage of the file, but is not meant to be comprehensive.
Originally, all the client names in populate_analytics_db started with
underscores to make it easy to selectively delete and regenerate them when
re-running populate_analytics_db.
We eventually want to merge populate_analytics_db into populate_db though,
in which case it makes more sense for them to share client names, and not
worry about the case where we run (or re-run) populate_analytics_db
independently of populate_db.
It will simplify the logic needed to process the "Sent by Me" view in
Messages Sent Over Time in stats.js.
Also, we gzip the data sent from our server, so there is little additional
network usage by doing this.
Django 1.10 has changed the implementation of this function to
match our custom implementation; in addition to this, we prefer
render().
Fixes#1914 via #4093.
count_message_type_by_user_query is in a different format (no WHERE clause)
from the rest since I'm having a hard time reasoning about how that would
interact with the LEFT JOIN, especially given that there are %(join_args)s.
Analytics database tables are getting big, and so we're likely moving to a
model where ~all stats are day stats, and we keep hourly stats only for the
last N days.
Also changed the name because:
* messages_sent_* suggests the counts (summed over subgroup) should be the
same as the other messages_sent stats, but they are different (these don't
include PMs).
* messages_sent_by_stream:is_bot:day is longer than 32 characters, the max
allowable length for a BaseCount.property.
Includes a database migration to remove the old stat from the analytics
tables.
When you pass a naive datetime to the Django ORM, it uses settings.TIME_ZONE
for the time zone. In the development environment, both settings.TIME_ZONE
and datetime.now() use 'America/New_York', so there is no change in behavior
there. (fromtimestamp with no tz argument uses the same timezone as
datetime.now)
We are soon going to change settings.TIME_ZONE to UTC, so need to remove
naive datetimes from queries to the ORM.
This actually fixes previously broken behavior, since 'date' here gets
turned into the 'day' argument of seconds_active_during_day(day), where
tzinfo is set to UTC.
API: Adds a "display_order" to the response, which is a suggested order of
importance for the clients or recipient types respectively.
frontend: Changes messages_sent_by_{client,recipient_type} to use a fixed
order for any given user.
Also includes a number of changes to messages_sent_by_recipient_type that
were convenient to do at the same time, since the two charts share a lot of
code.
This adds a frontend for the analytics system we've had for a few
months, showing several graphs of the data in Zulip.
There's a ton more that we can do with this tooling, but this initial
version is enough to provide users with a pretty good experience.
Fixes#2052.
Makes a number of simplications to the analytics views code. The main one is
that we now return the entire data series, even if the data is eventually
going to go into a pie chart. This was prompted by us wanting several
different pie charts for each stat (one for last 30 days, one for all time,
etc), but I think it is also a more natural API. The total amount of data
being sent for the pie charts now is maybe half of what is being sent for
our single 'hourly' stat, or maybe up to 10,000 ints per year the
organization has been around.
The other big change is that the data being sent back is now always explicit
about whether it is data about the realm (stored in data['realm'], or data
about the user (stored in data['user']).
Having both messages_sent:hour and messages_sent:is_bot:day is confusing,
since a single messages_sent:is_bot:hour would have a superset of the
information and take less total space. This commit and its parent together
replace the two stats with a single messages_sent:is_bot:hour.
Includes a database migration. The interval field was originally there to
facilitate time aggregation (e.g. aggregate_hour_to_day), but we now do such
aggregations in views code or in the frontend.
A few reasons:
* Our two other subgroup'd message stats in UserCount are at CountStat.DAY
frequency (messages_sent:is_bot and messages_sent:message_type).
* Keeping this stat at hourly frequency would likely double the size of our
analytics table, given the current stats. (Counterpoint: if there are
roughly as many active streams as active users, and we keep
messages_sent_to_stream:is_bot at hourly frequency, then maybe this stat
is only a 30% or 50% increase).
* We're currently only showing this on the frontend as a pie chart anyway.