mirror of https://github.com/zulip/zulip.git
199 lines
10 KiB
Markdown
199 lines
10 KiB
Markdown
# Testing Philosophy
|
|
|
|
Zulip's automated tests are a huge part of what makes the project able
|
|
to make progress. This page records some of the key principles behind
|
|
how we have designed our automated test suites.
|
|
|
|
## Effective testing allows us to move quickly
|
|
|
|
Zulip's engineering strategy can be summarized as "move quickly
|
|
without breaking things". Despite reviewing many code submissions
|
|
from new contributors without deep expertise in the code they are
|
|
changing, Zulip's maintainers spend most of the time they spend
|
|
integrating changes on product decisions and code
|
|
structure/readability questions, not on correctness, style, or
|
|
lower-level issues.
|
|
|
|
This is possible because we have spent years systematically investing
|
|
in testing, tooling, code structure, documentation, and development
|
|
practices to help ensure that our contributors write code that needs
|
|
relatively few changes before it can be merged. The testing element
|
|
of this is to have reliable, extensive, easily extended test suits
|
|
that cover most classes of bugs. Our testing systems have been
|
|
designed to minimize the time spent manually testing or otherwise
|
|
investigating whether changes are correct.
|
|
|
|
For example, our [infrastructure for testing
|
|
authentication](../development/authentication.md) using e.g. a mock
|
|
LDAP database in both automated tests and the development environment
|
|
make it relatively easy to refactor and improve this important part of
|
|
the product than it was when you needed to setup an LDAP server and
|
|
populate it with some test data in order to test LDAP authentication.
|
|
|
|
While not every part of Zulip has a great test suite, many components
|
|
do, and for those components, the tests mean that new contributors can
|
|
often make substantive changes to that component and have their
|
|
changes that are more or less correct by the time they share their
|
|
changes for code review. More importantly, it means that maintainers
|
|
save most the time that would otherwise be spent verifying that the
|
|
changes are simply correct, and instead focus on making sure that the
|
|
codebase remains readable, well-structured, and well-tested.
|
|
|
|
## Test suite performance and reliability are critical
|
|
|
|
When automated test suites are slow or unreliable, developers will
|
|
avoid running them, and furthermore, avoid working on improving them
|
|
(both the system and individual tests). Because changes that make
|
|
tests slow or unreliable are often unintentional side effects of other
|
|
development, problems in this area tend to accumulate as a codebase
|
|
grows. As a result, barring focused effort to prevent this outcome,
|
|
any large software project will eventually have its test suite rot
|
|
into one that is slow, unreliable, untrustworthy, and hated. We aim
|
|
for Zulip to avoid that fate.
|
|
|
|
So we consider it essential to maintaing every automated test suite
|
|
setup in a way where it is fast and reliable (tests pass both in CI
|
|
and locally if there are no problems with the developer's changes).
|
|
|
|
Concretely, our performance goals are for the full backend suite
|
|
(`test-backend`) to complete in about a minute, and our full frontend
|
|
suite (`test-js-with-node`) to run in under 10 seconds.
|
|
|
|
It'd be a long blog post to summarize everything we do to help achieve
|
|
these goals, but a few techniques are worth highlighting:
|
|
* Our test suites are designed to not access the Internet, since the
|
|
Internet might be down or unreliable in the test environment. Where
|
|
outgoing HTTP requests are required to test something, we mock the
|
|
responses with libraries like `responses`.
|
|
* We carefully avoid the potential for contamination of data inside
|
|
services like postgres, redis, and memcached from different tests.
|
|
* Every test case prepends a unique random prefix to all keys it
|
|
uses when accessing redis and memcached.
|
|
* Every test case runs inside a database transaction, which is
|
|
aborted after the test completes. Each test process interacts
|
|
only with a fresh copy of a special template database used for
|
|
server tests that is destroyed after the process completes.
|
|
* We rigorously investigate non-deterministically failing tests as though
|
|
they were priority bugs in the product.
|
|
|
|
## Integration testing or unit testing?
|
|
|
|
Developers frequently ask whether they should write "integration
|
|
tests" or "unit tests". Our view is that tests should be written
|
|
against interfaces that you're already counting on keeping stable, or
|
|
already promising people you'll keep stable. In other words,
|
|
interfaces that you or other people are already counting on mostly not
|
|
changing except in compatible ways.
|
|
|
|
So writing tests for the Zulip server against Zulip's end-to-end API
|
|
is a great example of that: the API is something that people have
|
|
written lots of code against, which means all that code is counting on
|
|
the API generally continuing to work for the ways they're using it.
|
|
|
|
The same would be true even if the only users of the API were our own
|
|
project's clients like the mobile apps -- because there are a bunch of
|
|
already-installed copies of our mobile apps out there, and they're
|
|
counting on the API not suddenly changing incompatibly.
|
|
|
|
One big reason for this principle is that when you write tests against
|
|
an interface, those tests become a cost you pay any time you change
|
|
that interface -- you have to go update a bunch of tests.
|
|
|
|
So in a big codebase if you have a lot of "unit tests" that are for
|
|
tiny internal functions, then any time you refactor something and
|
|
change the internal interfaces -- even though you just made them up,
|
|
and they're completely internal to that codebase so there's nothing
|
|
that will break if you change them at will -- you have to go deal with
|
|
editing a bunch of tests to match the new interfaces. That's
|
|
especially a lot of work if you try to take the tests seriously,
|
|
because you have to think through whether the tests breaking are
|
|
telling you something you should actually listen to.
|
|
|
|
In some big codebases, this can lead to tests feeling a lot like
|
|
busywork... and it's because a lot of those tests really are
|
|
busywork. And that leads to developers not being committed to
|
|
maintaining and expanding the test suite in a thoughtful way.
|
|
|
|
But if your tests are written against an external API, and you make
|
|
some refactoring change and a bunch of tests break... now that's
|
|
telling you something very real! You can always edit the tests... but
|
|
the tests are stand-ins for real users and real code out there beyond
|
|
your reach, which will break the same way.
|
|
|
|
So you can still make the change... but you have to deal with figuring
|
|
out an appropriate migration or backwards-compatibility strategy for
|
|
all those real users out there. Updating the tests is one of the easy
|
|
parts. And those changes to the tests are a nice reminder to code
|
|
reviewers that you've changed an interface, and the reviewer should
|
|
think carefully about whether those interface changes will be a
|
|
problem for any existing clients and whether they're properly reflected
|
|
in any documentation for that interface.
|
|
|
|
Some examples of this philosophy:
|
|
|
|
* If you have a web service that's mainly an API, you want to write
|
|
your tests for that API.
|
|
* If you have a CLI program, you want to write your tests against the
|
|
CLI.
|
|
* If you have a compiler, an interpreter, etc., you want essentially
|
|
all your tests to be example programs, with a bit of metadata for
|
|
things like "should give an error at this line" or "should build and
|
|
run, and produce this output".
|
|
|
|
In the Zulip context:
|
|
* Zulip uses the same API for our webapp as for our mobile clients and
|
|
third-party API clients, and most of our server tests are written
|
|
against the Zulip API.
|
|
* The tests for Zulip's incoming webhooks work by sending actual
|
|
payloads captured from the real third-party service to the webhook
|
|
endpoints, and verifies that the webhook produces the expected Zulip
|
|
message as output, to test the actual interface.
|
|
|
|
So, to summarize our approach to integration vs. unit testing:
|
|
* While we aim to achieve test coverage of every significant code path
|
|
in the Zulip server, which is commonly associated with unit testing,
|
|
most of our tests are integration tests in the sense of sending a
|
|
complete HTTP API query to the Zulip server and checking both the
|
|
HTTP response and internal state of the server following the request
|
|
are correct.
|
|
* Following the end-to-end principle in system design, where possible
|
|
we write tests that execute a complete flow (e.g. registration a new
|
|
Zulip account) rather than testing the implementations of individual
|
|
functions.
|
|
* We invest in the performance of Zulip in part to give users a great
|
|
experience, but just as much for making our test suite fast enough
|
|
that we can write our tests this way.
|
|
|
|
## Avoid duplicating code with security impact
|
|
|
|
Developing secure software with few security bugs is extremely
|
|
difficult. An important part of our strategy for avoiding security
|
|
logic bugs is to design patterns for how all of our code that
|
|
processes untrusted user input can be well tested without either
|
|
writing (and reviewing!) endless tests or requiring every developer to
|
|
be good at thinking about security corner cases.
|
|
|
|
Our strategy for this is to write a small number of carefully-designed
|
|
functions like `access_stream_by_id` that we test carefully, and then
|
|
use linting and other coding conventions to require that all access to
|
|
data from code paths that might share that data with users be mediated
|
|
through those functions. So rather than having each view function do
|
|
it own security checks for whether the user can access a given stream,
|
|
and needing to test each of those copies of the logic, we only need to
|
|
do that work once for each major type of data structure and level of
|
|
access.
|
|
|
|
These `access_*_by_*` functions are written in a special style, with each
|
|
conditional on its own line (so our test coverage tooling helps verify
|
|
that every case is tested), detailed comments, and carefully
|
|
considered error-handling to avoid leaking information such as whether
|
|
the stream ID requested exists or not.
|
|
|
|
We will typically also write tests for a given view verifying that it
|
|
provides the appropriate errors when improper access is attempted, but
|
|
these tests are defense in depth; the main way we prevent invalid
|
|
access to streams is not offering developers a way to get a Stream
|
|
object in server code except as mediated through these security check
|
|
functions.
|
|
|