mirror of https://github.com/zulip/zulip.git
docs: Use `code` syntax in analytics subsystem doc for readability.
This doc was using a lot of references of to class names etc. without putting them in `<code>`, making it harder to read.
This commit is contained in:
parent
34ceafadd5
commit
cda25e8a4a
|
@ -20,64 +20,63 @@ effectively modify the system.
|
||||||
|
|
||||||
There are three main components:
|
There are three main components:
|
||||||
|
|
||||||
- models: The UserCount, StreamCount, RealmCount, and InstallationCount
|
- models: The `UserCount`, `StreamCount`, `RealmCount`, and `InstallationCount`
|
||||||
tables (analytics/models.py) collect and store time series data.
|
tables (`analytics/models.py`) collect and store time series data.
|
||||||
- stat definitions: The CountStat objects in the COUNT_STATS dictionary
|
- stat definitions: The `CountStat` objects in the `COUNT_STATS` dictionary
|
||||||
(analytics/lib/counts.py) define the set of stats Zulip collects.
|
(`analytics/lib/counts.py`) define the set of stats Zulip collects.
|
||||||
- accounting: The FillState table (analytics/models.py) keeps track of what
|
- accounting: The `FillState` table (`analytics/models.py`) keeps track of what
|
||||||
has been collected for which CountStats.
|
has been collected for which `CountStat`.
|
||||||
|
|
||||||
The next several sections will dive into the details of these components.
|
The next several sections will dive into the details of these components.
|
||||||
|
|
||||||
## The \*Count database tables
|
## The `*Count` database tables
|
||||||
|
|
||||||
The Zulip analytics system is built around collecting time series data in a
|
The Zulip analytics system is built around collecting time series data in a
|
||||||
set of database tables. Each of these tables has the following fields:
|
set of database tables. Each of these tables has the following fields:
|
||||||
|
|
||||||
- property: A human readable string uniquely identifying a CountStat
|
- property: A human readable string uniquely identifying a `CountStat`
|
||||||
object. Example: "active_users:is_bot:hour" or "messages_sent:client:day".
|
object. Example: `"active_users:is_bot:hour"` or `"messages_sent:client:day"`.
|
||||||
- subgroup: Almost all CountStats are further sliced by subgroup. For
|
- subgroup: Almost all `CountStat` objects are further sliced by subgroup. For
|
||||||
"active_users:is_bot:day", this column will be False for measurements of
|
`"active_users:is_bot:day"`, this column will be `False` for measurements of
|
||||||
humans, and True for measurements of bots. For "messages_sent:client:day",
|
humans, and `True` for measurements of bots. For `"messages_sent:client:day"`,
|
||||||
this column is the client_id of the client under consideration.
|
this column is the client_id of the client under consideration.
|
||||||
- end_time: A datetime indicating the end of a time interval. It will be on
|
- end_time: A datetime indicating the end of a time interval. It will be on
|
||||||
an hour (or UTC day) boundary for stats collected at hourly (or daily)
|
an hour (or UTC day) boundary for stats collected at hourly (or daily)
|
||||||
frequency. The time interval is determined by the CountStat.
|
frequency. The time interval is determined by the `CountStat`.
|
||||||
- various "id" fields: Foreign keys into Realm, UserProfile, Stream, or
|
- various "id" fields: Foreign keys into `Realm`, `UserProfile`, `Stream`, or
|
||||||
nothing. E.g. the RealmCount table has a foreign key into Realm.
|
nothing. E.g. the `RealmCount` table has a foreign key into `Realm`.
|
||||||
- value: The integer counts. For "active_users:is_bot:hour" in the
|
- value: The integer counts. For `"active_users:is_bot:hour"` in the
|
||||||
RealmCount table, this is the number of active humans or bots (depending
|
`RealmCount` table, this is the number of active humans or bots (depending
|
||||||
on subgroup) in a particular realm at a particular end_time. For
|
on subgroup) in a particular realm at a particular `end_time`. For
|
||||||
"messages_sent:client:day" in the UserCount table, this is the number of
|
`"messages_sent:client:day"` in the `UserCount` table, this is the number of
|
||||||
messages sent by a particular user, from a particular client, on the day
|
messages sent by a particular user, from a particular client, on the day
|
||||||
ending at end_time.
|
ending at `end_time`.
|
||||||
|
|
||||||
There are four tables: UserCount, StreamCount, RealmCount, and
|
There are four tables: `UserCount`, `StreamCount`, `RealmCount`, and
|
||||||
InstallationCount. Every CountStat is initially collected into UserCount,
|
`InstallationCount`. Every `CountStat` is initially collected into `UserCount`,
|
||||||
StreamCount, or RealmCount. Every stat in UserCount and StreamCount is
|
`StreamCount`, or `RealmCount`. Every stat in `UserCount` and `StreamCount` is
|
||||||
aggregated into RealmCount, and then all stats are aggregated from
|
aggregated into `RealmCount`, and then all stats are aggregated from
|
||||||
RealmCount into InstallationCount. So for example,
|
`RealmCount` into `InstallationCount`. So for example,
|
||||||
"messages_sent:client:day" has rows in UserCount corresponding to (user,
|
`"messages_sent:client:day"` has rows in `UserCount` corresponding to
|
||||||
end_time, client) triples. These are summed to rows in RealmCount
|
`(user, end_time, client)` triples. These are summed to rows in `RealmCount`
|
||||||
corresponding to triples of (realm, end_time, client). And then these are
|
corresponding to triples of `(realm, end_time, client)`. And then these are
|
||||||
summed to rows in InstallationCount with totals for pairs of (end_time,
|
summed to rows in `InstallationCount` with totals for pairs of `(end_time, client)`.
|
||||||
client).
|
|
||||||
|
|
||||||
Note: In most cases, we do not store rows with value 0. See
|
Note: In most cases, we do not store rows with value 0. See
|
||||||
[Performance strategy](#performance-strategy) below.
|
[Performance strategy](#performance-strategy) below.
|
||||||
|
|
||||||
## CountStats
|
## CountStats
|
||||||
|
|
||||||
CountStats declare what analytics data should be generated and stored. The
|
`CountStat` objects declare what analytics data should be generated and stored. The
|
||||||
CountStat class definition and instances live in `analytics/lib/counts.py`.
|
`CountStat` class definition and instances live in `analytics/lib/counts.py`.
|
||||||
These declarations specify at a high level which tables should be populated
|
These declarations specify at a high level which tables should be populated
|
||||||
by the system and with what data.
|
by the system and with what data.
|
||||||
|
|
||||||
## The FillState table
|
## The FillState table
|
||||||
|
|
||||||
The default Zulip production configuration runs a cron job once an hour that
|
The default Zulip production configuration runs a cron job once an hour that
|
||||||
updates the \*Count tables for each of the CountStats in the COUNT_STATS
|
updates the `*Count` tables for each of the `CountStat` objects in the `COUNT_STATS`
|
||||||
dictionary. The FillState table simply keeps track of the last end_time that
|
dictionary. The `FillState` table simply keeps track of the last `end_time` that
|
||||||
we successfully updated each stat. It also enables the analytics system to
|
we successfully updated each stat. It also enables the analytics system to
|
||||||
recover from errors (by retrying) and to monitor that the cron job is
|
recover from errors (by retrying) and to monitor that the cron job is
|
||||||
running and running to completion.
|
running and running to completion.
|
||||||
|
@ -94,27 +93,27 @@ designed set of tables in PostgreSQL.
|
||||||
This requires some care to avoid making the analytics tables larger than the
|
This requires some care to avoid making the analytics tables larger than the
|
||||||
rest of the Zulip database or adding a ton of computational load, but with
|
rest of the Zulip database or adding a ton of computational load, but with
|
||||||
careful design, we can make the analytics system very low cost to operate.
|
careful design, we can make the analytics system very low cost to operate.
|
||||||
Also, note that a Zulip application database has 2 huge tables: Message and
|
Also, note that a Zulip application database has 2 huge tables: `Message` and
|
||||||
UserMessage, and everything else is small and thus not performance or
|
`UserMessage`, and everything else is small and thus not performance or
|
||||||
space-sensitive, so it's important to optimize how many expensive queries we
|
space-sensitive, so it's important to optimize how many expensive queries we
|
||||||
do against those 2 tables.
|
do against those 2 tables.
|
||||||
|
|
||||||
There are a few important principles that we use to make the system
|
There are a few important principles that we use to make the system
|
||||||
efficient:
|
efficient:
|
||||||
|
|
||||||
- Not repeating work to keep things up to date (via FillState)
|
- Not repeating work to keep things up to date (via `FillState`)
|
||||||
- Storing data in the \*Count tables to avoid our endpoints hitting the core
|
- Storing data in the `*Count` tables to avoid our endpoints hitting the core
|
||||||
Message/UserMessage tables is key, because some queries could take minutes
|
`Message`/`UserMessage` tables is key, because some queries could take minutes
|
||||||
to calculate. This allows any expensive operations to run offline, and
|
to calculate. This allows any expensive operations to run offline, and
|
||||||
then the endpoints to server data to users can be fast.
|
then the endpoints to server data to users can be fast.
|
||||||
- Doing expensive operations inside the database, rather than fetching data
|
- Doing expensive operations inside the database, rather than fetching data
|
||||||
to Python and then sending it back to the database (which can be far
|
to Python and then sending it back to the database (which can be far
|
||||||
slower if there's a lot of data involved). The Django ORM currently
|
slower if there's a lot of data involved). The Django ORM currently
|
||||||
doesn't support the "insert into .. select" type SQL query that's needed
|
doesn't support the `"insert into .. select"` type SQL query that's needed
|
||||||
for this, which is why we use raw database queries (which we usually avoid
|
for this, which is why we use raw database queries (which we usually avoid
|
||||||
in Zulip) rather than the ORM.
|
in Zulip) rather than the ORM.
|
||||||
- Aggregating where possible to avoid unnecessary queries against the
|
- Aggregating where possible to avoid unnecessary queries against the
|
||||||
Message and UserMessage tables. E.g. rather than querying the Message
|
`Message` and `UserMessage` tables. E.g. rather than querying the `Message`
|
||||||
table both to generate sent message counts for each realm and again for
|
table both to generate sent message counts for each realm and again for
|
||||||
each user, we just query for each user, and then add up the numbers for
|
each user, we just query for each user, and then add up the numbers for
|
||||||
the users to get the totals for the realm.
|
the users to get the totals for the realm.
|
||||||
|
@ -147,18 +146,18 @@ analytics tests, to make sure it stays that way as we refactor.
|
||||||
|
|
||||||
The system discussed above is designed primarily around the technical
|
The system discussed above is designed primarily around the technical
|
||||||
problem of showing useful analytics about things where the raw data is
|
problem of showing useful analytics about things where the raw data is
|
||||||
already stored in the database (e.g. Message, UserMessage). This is great
|
already stored in the database (e.g. `Message`, `UserMessage`). This is great
|
||||||
because we can always backfill that data to the beginning of time, but of
|
because we can always backfill that data to the beginning of time, but of
|
||||||
course sometimes one wants to do analytics on things that aren't worth
|
course sometimes one wants to do analytics on things that aren't worth
|
||||||
storing every data point for (e.g. activity data, request performance
|
storing every data point for (e.g. activity data, request performance
|
||||||
statistics, etc.). There is currently a reference implementation of a
|
statistics, etc.). There is currently a reference implementation of a
|
||||||
"LoggingCountStat" that shows how to handle such a situation.
|
`LoggingCountStat` that shows how to handle such a situation.
|
||||||
|
|
||||||
## Analytics UI development and testing
|
## Analytics UI development and testing
|
||||||
|
|
||||||
### Setup and testing
|
### Setup and testing
|
||||||
|
|
||||||
The main testing approach for the /stats page UI is manual testing.
|
The main testing approach for the `/stats` page UI is manual testing.
|
||||||
For most UI testing, you can visit `/stats/realm/analytics` while
|
For most UI testing, you can visit `/stats/realm/analytics` while
|
||||||
logged in as Iago (this is the server administrator view of stats for
|
logged in as Iago (this is the server administrator view of stats for
|
||||||
a given realm). The only piece that you can't test here is the "Me"
|
a given realm). The only piece that you can't test here is the "Me"
|
||||||
|
@ -178,24 +177,24 @@ the updated graphs.
|
||||||
|
|
||||||
The relevant files are:
|
The relevant files are:
|
||||||
|
|
||||||
- analytics/views/stats.py: All chart data requests from the /stats page call
|
- `analytics/views/stats.py`: All chart data requests from the /stats page call
|
||||||
get_chart_data in this file.
|
get_chart_data in this file.
|
||||||
- web/src/stats/stats.js: The JavaScript and Plotly code.
|
- `web/src/stats/stats.js`: The JavaScript and Plotly code.
|
||||||
- templates/analytics/stats.html
|
- `templates/analytics/stats.html`
|
||||||
- web/styles/stats.css and web/styles/portico.css: We are in the
|
- `web/styles/stats.css` and `web/styles/portico.css`: We are in the
|
||||||
process of re-styling this page to use in-app css instead of portico css,
|
process of re-styling this page to use in-app css instead of portico css,
|
||||||
but there is currently still a lot of portico influence.
|
but there is currently still a lot of portico influence.
|
||||||
- analytics/urls.py: Has the URL routes; it's unlikely you will have to
|
- `analytics/urls.py`: Has the URL routes; it's unlikely you will have to
|
||||||
modify this, including for adding a new graph.
|
modify this, including for adding a new graph.
|
||||||
|
|
||||||
Most of the code is self-explanatory, and for adding say a new graph, the
|
Most of the code is self-explanatory, and for adding say a new graph, the
|
||||||
answer to most questions is to copy what the other graphs do. It is easy
|
answer to most questions is to copy what the other graphs do. It is easy
|
||||||
when writing this sort of code to have a lot of semi-repeated code blocks
|
when writing this sort of code to have a lot of semi-repeated code blocks
|
||||||
(especially in stats.js); it's good to do what you can to reduce this.
|
(especially in `stats.js`); it's good to do what you can to reduce this.
|
||||||
|
|
||||||
Tips and tricks:
|
Tips and tricks:
|
||||||
|
|
||||||
- Use `$.get` to fetch data from the backend. You can grep through stats.js
|
- Use `$.get` to fetch data from the backend. You can grep through `stats.js`
|
||||||
to find examples of this.
|
to find examples of this.
|
||||||
- The Plotly documentation is at
|
- The Plotly documentation is at
|
||||||
<https://plot.ly/javascript/> (check out the full reference, event
|
<https://plot.ly/javascript/> (check out the full reference, event
|
||||||
|
@ -205,11 +204,11 @@ Tips and tricks:
|
||||||
- Unless a graph has a ton of data, it is typically better to just redraw it
|
- Unless a graph has a ton of data, it is typically better to just redraw it
|
||||||
when something changes (e.g. in the various aggregation click handlers)
|
when something changes (e.g. in the various aggregation click handlers)
|
||||||
rather than to use retrace or relayout or do other complicated
|
rather than to use retrace or relayout or do other complicated
|
||||||
things. Performance on the /stats page is nice but not critical, and we've
|
things. Performance on the `/stats` page is nice but not critical, and we've
|
||||||
run into a lot of small bugs when trying to use Plotly's retrace/relayout.
|
run into a lot of small bugs when trying to use Plotly's retrace/relayout.
|
||||||
- There is a way to access raw d3 functionality through Plotly, though it
|
- There is a way to access raw d3 functionality through Plotly, though it
|
||||||
isn't documented well.
|
isn't documented well.
|
||||||
- 'paper' as a Plotly option refers to the bounding box of the graph (or
|
- `'paper'` as a Plotly option refers to the bounding box of the graph (or
|
||||||
something related to that).
|
something related to that).
|
||||||
- You can't right click and inspect the elements of a Plotly graph (e.g. the
|
- You can't right click and inspect the elements of a Plotly graph (e.g. the
|
||||||
bars in a bar graph) in your browser, since there is an interaction layer
|
bars in a bar graph) in your browser, since there is an interaction layer
|
||||||
|
@ -218,10 +217,10 @@ Tips and tricks:
|
||||||
|
|
||||||
### /activity page
|
### /activity page
|
||||||
|
|
||||||
- There's a somewhat less developed /activity page, for server
|
- There's a somewhat less developed `/activity` page, for server
|
||||||
administrators, showing data on all the realms on a server. To
|
administrators, showing data on all the realms on a server. To
|
||||||
access it, you need to have the `is_staff` bit set on your
|
access it, you need to have the `is_staff` bit set on your
|
||||||
UserProfile object. You can set it using `manage.py shell` and
|
`UserProfile` object. You can set it using `manage.py shell` and
|
||||||
editing the UserProfile object directly. A great future project is
|
editing the `UserProfile` object directly. A great future project is
|
||||||
to clean up that page's data sources, and make this a documented
|
to clean up that page's data sources, and make this a documented
|
||||||
interface.
|
interface.
|
||||||
|
|
Loading…
Reference in New Issue