Add all the stop words to page_params, reading from the
`zulip_english.stop` database, with caching to avoid loading the file
on every page load.
Part of #10592.
This causes changing the email_address_visibility field to actually
modify what user_profile.email values are generated for users, both on
user creation and afterwards as email addresses are edited.
The overall feature isn't yet complete, but this brings us pretty close.
This helps keep the realm.json small and easy to process; previously,
almost the entire size of that file was the analytics data.
We implement this by refactoring the analytics Config objects into a
separate subroutine that writes to a separate file, plus the
corresponding import code.
Manual testing was performed by exporting the 'analytics' realm, and
importing back to a newly created 'test' realm. The 'test' realm was
then exported and the json files were inspected. The data appeared
consistent with no abnormalities.
Fixes: #11220.
We eliminated use of this function in outgoing_webhook.py in
bdc95b5d72.
Tweaked by tabbott to also eliminate code only used for that mock.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
This commit does the following three things:
1. Update stream model to accomodate rendered description.
2. Render and save the stream rendered description on update.
3. Render and save stream descriptions on creation.
Further, the stream's rendered description is also sent whenever the
stream's description is being sent.
This is preparatory work for eliminating the use of the
non-authoritative marked.js markdown parser for stream descriptions.
This adds a new API for sending basic analytics data (number of users,
number of messages sent) from a Zulip server to the Zulip Cloud
central analytics database, which will make it possible for servers to
elect to have their usage numbers counted in published stats on the
size of the Zulip ecosystem.
Ever since we implemented support for stream IDs in Addressee,
Addressee.stream_name() can now return None. This commit ensures
that _internal_prep_message only calls ensure_stream when
Addressee.stream_name() is not None.
This commit also contains the following auxiliary changes:
* Adds a custom exception, StreamWithIDDoesNotExist for when
a stream with a given ID does not exist because the error
message returned by StreamDoesNotExist only makes with stream
names, not IDs.
* Adds a new helper, get_stream_by_id_in_realm, which is similar
to get_user_profile_by_id_in_realm (introduced in #10391).
* Adds a helper, validate_stream_id_with_pm_notification, which
returns the Stream object associated with a given ID and also
handles PM notifications to the bot owner if the message was
sent by a bot and if the stream does not exist or has no
subscribers.
* Modifies the message sent by send_pm_if_empty_stream to
accommodate stream IDs.
Note that all of the above changes are required before check_message
can be modified to support stream IDs.
This additional logic to prevent resizing is certain circumstances
(file size, dimensions) is necessary because the pillow gif handling
code seems to be rather flaky with regards to handling gif color
palletes, causing broken gifs after resizing. The workaround is to
only resize when absolutely necessary (e.g. because the file is larger
than 128x128 or 128KB).
Fixes#10351.
We had initially designed the poll widget like a blog
post with comments beneath it but it makes more sense
to think of it as just a simple poll with options.
We add a new syntax which converts the messages like the following:
```
/poll Who do you support?
Nadal
- Djokovic
```
to a poll with the two names as options. The list syntax is optional
since anyone making a poll is likely to want to create a list anyway.
Refactor the potentially expensive work done by Beautiful Soup into a
function that is called by the alter_content function, so that we can
cache the result. Saves a significant portion of the runtime of
loading of all of our /help/ and /api/ documentation pages (e.g. 12ms
for /api).
Fixes#11088.
Tweaked by tabbott to use the URL path as the cache key, clean up
argument structure, and use a clearer name for the function.
Earlier, our realm filters didn't render for languages that do not
use spaces (eg: Japanese) since we used to check for the presence
of an actual space character. This commit replaces that logic with
a complex scheme to detect word boundaries.
Also, we convert the RealmFilterPattern to subclass InlineProcessor
and make use of the new no-op feature in py-markdown 3.0.1 where we
can tell py-markdown that our pattern didn't find a match despite
the initial regex getting matched.
Fixes#9883.
Since we are building our parser from scratch now:
1. We have control over which proccessor goes at what priority number.
Thus, we have also shifted the deprecated `.add()` calls to use the
new `.register()` calls with explicit priorities, but maintaining
the original order that the old method generated.
2. We do not have to remove the processors added by py-markdown that
we do not use in Zulip; we explicitly add only the processors we
do require.
3. We can cluster the building of each type of parser in one place,
and in the order they need to be so that when we register them,
there is no need to sort the list. This also makes for a huge
improvement in the readability of the code, as all the components
of each type are registered in the same function.
These are significant performance improvements, because we save on
calls to `str.startswith` in `.add()`, all the resources taken to
generate the default to-be-removed processors and the time taken to
sort the list of processors.
Following are the profiling results for the changes made. Here, we
build 10 engines one after the other and note the time taken to build
each of them. 1st pass represents the state after this commit and 2nd
pass represent the state after some regex modifications in the commits
that follow by Steve Howell. All times are in microseconds.
| nth Engine | Old Time | 1st Pass | 2nd Pass |
| ---------- | -------- | -------- | -------- |
| 1 | 92117.0 | 81775.0 | 76710.0 |
| 2 | 1254.0 | 558.0 | 341.0 |
| 3 | 1170.0 | 472.0 | 305.0 |
| 4 | 1155.0 | 519.0 | 301.0 |
| 5 | 1170.0 | 546.0 | 326.0 |
| 6 | 1271.0 | 609.0 | 416.0 |
| 7 | 1125.0 | 459.0 | 299.0 |
| 8 | 1146.0 | 476.0 | 390.0 |
| 9 | 1274.0 | 446.0 | 301.0 |
| 10 | 1135.0 | 451.0 | 297.0 |
We avoid re-computing the regex string here, and we
also avoid re-compiling the regex itself.
I decided to put the "one_time" decorator in the
bugdown file itself, just to reduce friction in
folks reading the "buyer beware" comments.
Unfortunately, we can't use this for the
get_web_link_regex() function due to testing concerns,
so that continues to do an inelegant cache-with-global-var
scheme.
We use early-exit to flatten the code.
I also tweaked the comments a bit based on some recent
profile findings. (e.g. reading the file isn't actually
a big bottleneck, it's more the regex itself)