docs: Use `code` syntax in analytics subsystem doc for readability.

This doc was using a lot of references of to class names etc. without putting them in `<code>`, making it harder to read.
2023-10-15 02:16:02 +02:00 · 2023-10-15 02:16:02 +02:00 · cda25e8a4a
parent 34ceafadd5
commit cda25e8a4a
1 changed files with 55 additions and 56 deletions
--- a/docs/subsystems/analytics.md
+++ b/docs/subsystems/analytics.md
@ -20,64 +20,63 @@ effectively modify the system.

 There are three main components:

- models: The UserCount, StreamCount, RealmCount, and InstallationCount
-  tables (analytics/models.py) collect and store time series data.
- stat definitions: The CountStat objects in the COUNT_STATS dictionary
-  (analytics/lib/counts.py) define the set of stats Zulip collects.
- accounting: The FillState table (analytics/models.py) keeps track of what
-  has been collected for which CountStats.
+- models: The `UserCount`, `StreamCount`, `RealmCount`, and `InstallationCount`
+  tables (`analytics/models.py`) collect and store time series data.
+- stat definitions: The `CountStat` objects in the `COUNT_STATS` dictionary
+  (`analytics/lib/counts.py`) define the set of stats Zulip collects.
+- accounting: The `FillState` table (`analytics/models.py`) keeps track of what
+  has been collected for which `CountStat`.

 The next several sections will dive into the details of these components.

-## The \*Count database tables
+## The `*Count` database tables

 The Zulip analytics system is built around collecting time series data in a
 set of database tables. Each of these tables has the following fields:

- property: A human readable string uniquely identifying a CountStat
-  object. Example: "active_users:is_bot:hour" or "messages_sent:client:day".
- subgroup: Almost all CountStats are further sliced by subgroup. For
-  "active_users:is_bot:day", this column will be False for measurements of
-  humans, and True for measurements of bots. For "messages_sent:client:day",
+- property: A human readable string uniquely identifying a `CountStat`
+  object. Example: `"active_users:is_bot:hour"` or `"messages_sent:client:day"`.
+- subgroup: Almost all `CountStat` objects are further sliced by subgroup. For
+  `"active_users:is_bot:day"`, this column will be `False` for measurements of
+  humans, and `True` for measurements of bots. For `"messages_sent:client:day"`,
  this column is the client_id of the client under consideration.
 - end_time: A datetime indicating the end of a time interval. It will be on
  an hour (or UTC day) boundary for stats collected at hourly (or daily)
-  frequency. The time interval is determined by the CountStat.
- various "id" fields: Foreign keys into Realm, UserProfile, Stream, or
-  nothing. E.g. the RealmCount table has a foreign key into Realm.
- value: The integer counts. For "active_users:is_bot:hour" in the
-  RealmCount table, this is the number of active humans or bots (depending
-  on subgroup) in a particular realm at a particular end_time. For
-  "messages_sent:client:day" in the UserCount table, this is the number of
+  frequency. The time interval is determined by the `CountStat`.
+- various "id" fields: Foreign keys into `Realm`, `UserProfile`, `Stream`, or
+  nothing. E.g. the `RealmCount` table has a foreign key into `Realm`.
+- value: The integer counts. For `"active_users:is_bot:hour"` in the
+  `RealmCount` table, this is the number of active humans or bots (depending
+  on subgroup) in a particular realm at a particular `end_time`. For
+  `"messages_sent:client:day"` in the `UserCount` table, this is the number of
  messages sent by a particular user, from a particular client, on the day
-  ending at end_time.
+  ending at `end_time`.

-There are four tables: UserCount, StreamCount, RealmCount, and
-InstallationCount. Every CountStat is initially collected into UserCount,
-StreamCount, or RealmCount. Every stat in UserCount and StreamCount is
-aggregated into RealmCount, and then all stats are aggregated from
-RealmCount into InstallationCount. So for example,
-"messages_sent:client:day" has rows in UserCount corresponding to (user,
-end_time, client) triples. These are summed to rows in RealmCount
-corresponding to triples of (realm, end_time, client). And then these are
-summed to rows in InstallationCount with totals for pairs of (end_time,
-client).
+There are four tables: `UserCount`, `StreamCount`, `RealmCount`, and
+`InstallationCount`. Every `CountStat` is initially collected into `UserCount`,
+`StreamCount`, or `RealmCount`. Every stat in `UserCount` and `StreamCount` is
+aggregated into `RealmCount`, and then all stats are aggregated from
+`RealmCount` into `InstallationCount`. So for example,
+`"messages_sent:client:day"` has rows in `UserCount` corresponding to
+`(user, end_time, client)` triples. These are summed to rows in `RealmCount`
+corresponding to triples of `(realm, end_time, client)`. And then these are
+summed to rows in `InstallationCount` with totals for pairs of `(end_time, client)`.

 Note: In most cases, we do not store rows with value 0. See
 [Performance strategy](#performance-strategy) below.

 ## CountStats

-CountStats declare what analytics data should be generated and stored. The
-CountStat class definition and instances live in `analytics/lib/counts.py`.
+`CountStat` objects declare what analytics data should be generated and stored. The
+`CountStat` class definition and instances live in `analytics/lib/counts.py`.
 These declarations specify at a high level which tables should be populated
 by the system and with what data.

 ## The FillState table

 The default Zulip production configuration runs a cron job once an hour that
-updates the \*Count tables for each of the CountStats in the COUNT_STATS
-dictionary. The FillState table simply keeps track of the last end_time that
+updates the `*Count` tables for each of the `CountStat` objects in the `COUNT_STATS`
+dictionary. The `FillState` table simply keeps track of the last `end_time` that
 we successfully updated each stat. It also enables the analytics system to
 recover from errors (by retrying) and to monitor that the cron job is
 running and running to completion.
@ -94,27 +93,27 @@ designed set of tables in PostgreSQL.
 This requires some care to avoid making the analytics tables larger than the
 rest of the Zulip database or adding a ton of computational load, but with
 careful design, we can make the analytics system very low cost to operate.
-Also, note that a Zulip application database has 2 huge tables: Message and
-UserMessage, and everything else is small and thus not performance or
+Also, note that a Zulip application database has 2 huge tables: `Message` and
+`UserMessage`, and everything else is small and thus not performance or
 space-sensitive, so it's important to optimize how many expensive queries we
 do against those 2 tables.

 There are a few important principles that we use to make the system
 efficient:

- Not repeating work to keep things up to date (via FillState)
- Storing data in the \*Count tables to avoid our endpoints hitting the core
-  Message/UserMessage tables is key, because some queries could take minutes
+- Not repeating work to keep things up to date (via `FillState`)
+- Storing data in the `*Count` tables to avoid our endpoints hitting the core
+  `Message`/`UserMessage` tables is key, because some queries could take minutes
  to calculate. This allows any expensive operations to run offline, and
  then the endpoints to server data to users can be fast.
 - Doing expensive operations inside the database, rather than fetching data
  to Python and then sending it back to the database (which can be far
  slower if there's a lot of data involved). The Django ORM currently
-  doesn't support the "insert into .. select" type SQL query that's needed
+  doesn't support the `"insert into .. select"` type SQL query that's needed
  for this, which is why we use raw database queries (which we usually avoid
  in Zulip) rather than the ORM.
 - Aggregating where possible to avoid unnecessary queries against the
-  Message and UserMessage tables. E.g. rather than querying the Message
+  `Message` and `UserMessage` tables. E.g. rather than querying the `Message`
  table both to generate sent message counts for each realm and again for
  each user, we just query for each user, and then add up the numbers for
  the users to get the totals for the realm.
@ -147,18 +146,18 @@ analytics tests, to make sure it stays that way as we refactor.

 The system discussed above is designed primarily around the technical
 problem of showing useful analytics about things where the raw data is
-already stored in the database (e.g. Message, UserMessage). This is great
+already stored in the database (e.g. `Message`, `UserMessage`). This is great
 because we can always backfill that data to the beginning of time, but of
 course sometimes one wants to do analytics on things that aren't worth
 storing every data point for (e.g. activity data, request performance
 statistics, etc.). There is currently a reference implementation of a
-"LoggingCountStat" that shows how to handle such a situation.
+`LoggingCountStat` that shows how to handle such a situation.

 ## Analytics UI development and testing

 ### Setup and testing

-The main testing approach for the /stats page UI is manual testing.
+The main testing approach for the `/stats` page UI is manual testing.
 For most UI testing, you can visit `/stats/realm/analytics` while
 logged in as Iago (this is the server administrator view of stats for
 a given realm). The only piece that you can't test here is the "Me"
@ -178,24 +177,24 @@ the updated graphs.

 The relevant files are:

- analytics/views/stats.py: All chart data requests from the /stats page call
+- `analytics/views/stats.py`: All chart data requests from the /stats page call
  get_chart_data in this file.
- web/src/stats/stats.js: The JavaScript and Plotly code.
- templates/analytics/stats.html
- web/styles/stats.css and web/styles/portico.css: We are in the
+- `web/src/stats/stats.js`: The JavaScript and Plotly code.
+- `templates/analytics/stats.html`
+- `web/styles/stats.css` and `web/styles/portico.css`: We are in the
  process of re-styling this page to use in-app css instead of portico css,
  but there is currently still a lot of portico influence.
- analytics/urls.py: Has the URL routes; it's unlikely you will have to
+- `analytics/urls.py`: Has the URL routes; it's unlikely you will have to
  modify this, including for adding a new graph.

 Most of the code is self-explanatory, and for adding say a new graph, the
 answer to most questions is to copy what the other graphs do. It is easy
 when writing this sort of code to have a lot of semi-repeated code blocks
-(especially in stats.js); it's good to do what you can to reduce this.
+(especially in `stats.js`); it's good to do what you can to reduce this.

 Tips and tricks:

- Use `$.get` to fetch data from the backend. You can grep through stats.js
+- Use `$.get` to fetch data from the backend. You can grep through `stats.js`
  to find examples of this.
 - The Plotly documentation is at
  <https://plot.ly/javascript/> (check out the full reference, event
@ -205,11 +204,11 @@ Tips and tricks:
 - Unless a graph has a ton of data, it is typically better to just redraw it
  when something changes (e.g. in the various aggregation click handlers)
  rather than to use retrace or relayout or do other complicated
-  things. Performance on the /stats page is nice but not critical, and we've
+  things. Performance on the `/stats` page is nice but not critical, and we've
  run into a lot of small bugs when trying to use Plotly's retrace/relayout.
 - There is a way to access raw d3 functionality through Plotly, though it
  isn't documented well.
- 'paper' as a Plotly option refers to the bounding box of the graph (or
+- `'paper'` as a Plotly option refers to the bounding box of the graph (or
  something related to that).
 - You can't right click and inspect the elements of a Plotly graph (e.g. the
  bars in a bar graph) in your browser, since there is an interaction layer
@ -218,10 +217,10 @@ Tips and tricks:

 ### /activity page

- There's a somewhat less developed /activity page, for server
+- There's a somewhat less developed `/activity` page, for server
  administrators, showing data on all the realms on a server. To
  access it, you need to have the `is_staff` bit set on your
-  UserProfile object. You can set it using `manage.py shell` and
-  editing the UserProfile object directly. A great future project is
+  `UserProfile` object. You can set it using `manage.py shell` and
+  editing the `UserProfile` object directly. A great future project is
  to clean up that page's data sources, and make this a documented
  interface.