Now we only tokenize the file once, and we pass
**validated** tokens to the pretty printer.
There are a few reasons for this:
* It obviously saves a lot of extra computation
just in terms of tokenization.
* It allows our validator to add fields
to the Token objects that help the pretty
printer.
I also removed/tweaked a lot of legacy tests for
pretty_print.py that were exercising bizarrely
formatted HTML that we now simply ban during the
validation phase.
This accomplishes a few things:
* lighten the load for the main validation loop
* defer indentation checks until we are sure the author
even knows how to match up tags
* add some info to the Token objects that we may soon
consume in our pretty-printer
We now complain about programmers who don't use
4-space indents in template files, rather than
letting the pretty printer fix them.
This is partly just to simplify the pretty printer
code (in future commits), but it also makes the
symptom more obvious to newbie developers. They
are probably just as able to react to the direct
error messages as they are able to figure out how
to read diffs from the pretty printer and grok
the --fix syntax. And once they learn the convention
and configure their editor, it should then be a
one time problem.
We now create tokens for whitespace and text, such that you
could rebuild the template file with "".join(token.s for
token in tokens).
I also fixed a few bugs related to not parsing
whitespace-control tokens.
We no longer ignore template variables, although we could do
a lot better at validating them.
The most immediate use case for the more thorough parser is
to simplify the pretty printer, but it should also make it
less likely for us to skip over new template constructs
(i.e. the tool will fail hard rather than acting strange).
Note that this speeds up the tool by almost 3x, which may be
slightly surprising considering we are building more tokens.
The reason is that we are now munching efficiently through
big chunks of whitespace and text at a time, rather than
checking each individual character to see if it starts one
of the N other token types.
The changes to the pretty_print module here are a bit ugly,
but they should mostly be made irrelevant in subsequent
commits.
This tool has been unmaintained since our initial code
sweep to fix templates, and it has possibly bit-rotted
during unrelated code sweeps like introducing mypy, etc.
It's not documented anywhere.
The preferred method now is to run:
./tools/check-templates --fix
String 'Here are a few messages I understand:'(next commit) was failing
./tools/check-capitalization check because of the capital I. I added
'I understand' to the IGNORED_PHRASES list in tools/lib/capitalization.py.
Adding "I" was working as well but didn't seem to me as a very great fix.
Strangely enough, adding " I " to the list made the test fail again
(With a lot of failed strings this time) as mentioned in the following
CZO thread.
Relevent CZO chat -
https://chat.zulip.org/#narrow/stream/49-development-help/topic/capitalization.20confusion.2E
In https://github.com/jorisroovers/gitlint/pull/246 I split the
gitlint package into gitlint and gitlint-core, where the latter avoids
pinning exact versions of its requirements so we can use it again.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
The main focus is on improving the instructions around claiming issues
to try to create less issue claiming spam.
Additionally, we haven't done a detailed update in a few years, and
some of the content is stale/irrelevant.
I rewrote most of tools/lib/pretty-printer.py, which
was fairly easy due to being able to crib some
important details from the previous implementation.
The main motivation for the rewrite was that we weren't
handling else/elif blocks correctly, and it was difficult
to modify the previous code. The else/elif shortcomings
were somewhat historical in nature--the original parser
didn't recognize them (since they weren't in any Zulip
templates at the time), and then the pretty printer was
mostly able to hack around that due to the "nudge"
strategy. Eventually the nudge strategy became too
brittle.
The "nudge" strategy was that we would mostly trust
the existing templates, and we would just nudge over
some lines in cases of obviously faulty indentation.
Now we are bit more opinionated and rigorous, and
we basically set the indentation explicitly for any
line that is not in a code/script block. This leads
to this diff touching several templates for mostly
minor fix-ups.
We aren't completely opinionated, as we respect the
author's line wrapping decisions in many cases, and
we also allow authors not to indent blocks within
the template language's block constructs.
In cases where an opening tag is so long that we stretch
it to 2+ lines of code, we should try to use block-style
formatting in the template code.
Unfortunately, we have lots of legacy code that violates
this concept, so this is a timid fix.
There are also legit use cases like textarea where we
probably need to keep the ugly template syntax for things
to render properly.
We disallow this HTML:
junk-text-before-open-tag<p>
This is a paragraph.
</p>
We rarely see the above mistake, but we want to eliminate
the possibility to be somewhat rigorous, and so that we
can eliminate a pretty-printer mis-feature.
Previously, running `./tools/run-dev.py` when provision was required
would lead to a warning along the lines of:
```
Before we run tests, we make sure your provisioning version
is correct by looking at var/provision_version, which is at
version 165.1, and we compare it to the version in source
control (version.py), which is 165.2.
It looks like you checked out a branch that has added
dependencies beyond what you last provisioned. Your command
is likely to fail until you add dependencies by provisioning.
Do this: `./tools/provision`
If you really know what you are doing, use --skip-provision-check to
run anyway.
```
The assumption that we're trying to run tests might cause some
confusion, especially if its the first time you're seeing the
provision warning. Hence, we reword the first paragraph to avoid
making that assumption.
The second paragraph has also been slightly altered, since (1) it's
possible that we didn't checkout a different branch, but eg just
rebased with upstream and (2) we might not be on a VM.
The warning you'd get after this commit would be along the lines of:
```
Provisioning state check failed! This check compares
`var/provision_version` (currently 165.2) to the version in
source control (`version.py`), which is 164.6, to see if you
likely need to provision before this command can run
properly.
The branch you are currently on expects an older version of
dependencies than the version you provisioned last. This may
be ok, but it's likely that you either want to rebase your
branch on top of upstream/main or re-provision your machine.
Do this: `./tools/provision`
If you really know what you are doing, use --skip-provision-check to
run anyway.
```
or along the lines of:
```
Provisioning state check failed! This check compares
`var/provision_version` (currently 165.2) to the version in
source control (`version.py`), which is 167.2, to see if you
likely need to provision before this command can run
properly.
The branch you are currently on has added dependencies beyond
what you last provisioned. Your command is likely to fail
until you add dependencies by provisioning.
Do this: `./tools/provision`
If you really know what you are doing, use --skip-provision-check to
run anyway.
```
The `current_queue_size` key in the queue monitoring stats file was
the local queue size, not the global queue size -- d5a6b0f99a
renamed the function, but did not adjust the queue monitoring JSON,
despite the last use of it having been removed in cd9b194d88.
The function is still used to mark "we emptied our queue," and it
remains a reasonable metric for that.
It's better for this to catch all exit(...) calls with non-zero exit
code, given the purpose is to catch all exits with failure, as opposed
to only exit(1).
Unhandled exceptions propagating to process_queue were not caught there,
causing improper logging - errors didn't land in errors.log as expected.
Exceptions should be caught and explicitly logged by the process_queue
logger. Exceptions occurring during consuming events are caught and
handled inside the worker's logic - however those that happen while
setting up the worker were not addressed at all, and that's the core bug
we mean to address here.
Furthermore, in multi-threaded mode we want the autoreload mechanism to
be working - which it doesn't without catching the exceptions. The
correct approach is to - again - catch the exception, log it and then
send SIGUSR1 signal to trigger exit and autoreload.
Unless this was working around a specific bug that has somehow
persisted for eight years, this was just pointlessly defeating
Python’s bytecode cache every time you started the dev server.
This reverts commits deffda072f,
59228f7458, and
ae45217671.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
This tool helps catch common typos in code and documentation, which is
particularly useful for our many contributors who are not native
English speakers.
The config is based on the codespell that I ran in
https://github.com/zulip/zulip/pull/18535.
CircleCI expected 3434; GitHub Actions expects 1001. This is the
reason zulip-ci.yml currently needs to mess with the permissions in
/__w.
Signed-off-by: Anders Kaseorg <anders@zulip.com>