zulip/docs/testing/philosophy.md

199 lines
10 KiB
Markdown

# Testing Philosophy
Zulip's automated tests are a huge part of what makes the project able
to make progress. This page records some of the key principles behind
how we have designed our automated test suites.
## Effective testing allows us to move quickly
Zulip's engineering strategy can be summarized as "move quickly
without breaking things". Despite reviewing many code submissions
from new contributors without deep expertise in the code they are
changing, Zulip's maintainers spend most of the time they spend
integrating changes on product decisions and code
structure/readability questions, not on correctness, style, or
lower-level issues.
This is possible because we have spent years systematically investing
in testing, tooling, code structure, documentation, and development
practices to help ensure that our contributors write code that needs
relatively few changes before it can be merged. The testing element
of this is to have reliable, extensive, easily extended test suits
that cover most classes of bugs. Our testing systems have been
designed to minimize the time spent manually testing or otherwise
investigating whether changes are correct.
For example, our [infrastructure for testing
authentication](../development/authentication.md) using e.g. a mock
LDAP database in both automated tests and the development environment
make it relatively easy to refactor and improve this important part of
the product than it was when you needed to setup an LDAP server and
populate it with some test data in order to test LDAP authentication.
While not every part of Zulip has a great test suite, many components
do, and for those components, the tests mean that new contributors can
often make substantive changes to that component and have their
changes that are more or less correct by the time they share their
changes for code review. More importantly, it means that maintainers
save most the time that would otherwise be spent verifying that the
changes are simply correct, and instead focus on making sure that the
codebase remains readable, well-structured, and well-tested.
## Test suite performance and reliability are critical
When automated test suites are slow or unreliable, developers will
avoid running them, and furthermore, avoid working on improving them
(both the system and individual tests). Because changes that make
tests slow or unreliable are often unintentional side effects of other
development, problems in this area tend to accumulate as a codebase
grows. As a result, barring focused effort to prevent this outcome,
any large software project will eventually have its test suite rot
into one that is slow, unreliable, untrustworthy, and hated. We aim
for Zulip to avoid that fate.
So we consider it essential to maintaing every automated test suite
setup in a way where it is fast and reliable (tests pass both in CI
and locally if there are no problems with the developer's changes).
Concretely, our performance goals are for the full backend suite
(`test-backend`) to complete in about a minute, and our full frontend
suite (`test-js-with-node`) to run in under 10 seconds.
It'd be a long blog post to summarize everything we do to help achieve
these goals, but a few techniques are worth highlighting:
* Our test suites are designed to not access the Internet, since the
Internet might be down or unreliable in the test environment. Where
outgoing HTTP requests are required to test something, we mock the
responses with libraries like `responses`.
* We carefully avoid the potential for contamination of data inside
services like postgres, redis, and memcached from different tests.
* Every test case prepends a unique random prefix to all keys it
uses when accessing redis and memcached.
* Every test case runs inside a database transaction, which is
aborted after the test completes. Each test process interacts
only with a fresh copy of a special template database used for
server tests that is destroyed after the process completes.
* We rigorously investigate non-deterministically failing tests as though
they were priority bugs in the product.
## Integration testing or unit testing?
Developers frequently ask whether they should write "integration
tests" or "unit tests". Our view is that tests should be written
against interfaces that you're already counting on keeping stable, or
already promising people you'll keep stable. In other words,
interfaces that you or other people are already counting on mostly not
changing except in compatible ways.
So writing tests for the Zulip server against Zulip's end-to-end API
is a great example of that: the API is something that people have
written lots of code against, which means all that code is counting on
the API generally continuing to work for the ways they're using it.
The same would be true even if the only users of the API were our own
project's clients like the mobile apps -- because there are a bunch of
already-installed copies of our mobile apps out there, and they're
counting on the API not suddenly changing incompatibly.
One big reason for this principle is that when you write tests against
an interface, those tests become a cost you pay any time you change
that interface -- you have to go update a bunch of tests.
So in a big codebase if you have a lot of "unit tests" that are for
tiny internal functions, then any time you refactor something and
change the internal interfaces -- even though you just made them up,
and they're completely internal to that codebase so there's nothing
that will break if you change them at will -- you have to go deal with
editing a bunch of tests to match the new interfaces. That's
especially a lot of work if you try to take the tests seriously,
because you have to think through whether the tests breaking are
telling you something you should actually listen to.
In some big codebases, this can lead to tests feeling a lot like
busywork... and it's because a lot of those tests really are
busywork. And that leads to developers not being committed to
maintaining and expanding the test suite in a thoughtful way.
But if your tests are written against an external API, and you make
some refactoring change and a bunch of tests break... now that's
telling you something very real! You can always edit the tests... but
the tests are stand-ins for real users and real code out there beyond
your reach, which will break the same way.
So you can still make the change... but you have to deal with figuring
out an appropriate migration or backwards-compatibility strategy for
all those real users out there. Updating the tests is one of the easy
parts. And those changes to the tests are a nice reminder to code
reviewers that you've changed an interface, and the reviewer should
think carefully about whether those interface changes will be a
problem for any existing clients and whether they're properly reflected
in any documentation for that interface.
Some examples of this philosophy:
* If you have a web service that's mainly an API, you want to write
your tests for that API.
* If you have a CLI program, you want to write your tests against the
CLI.
* If you have a compiler, an interpreter, etc., you want essentially
all your tests to be example programs, with a bit of metadata for
things like "should give an error at this line" or "should build and
run, and produce this output".
In the Zulip context:
* Zulip uses the same API for our webapp as for our mobile clients and
third-party API clients, and most of our server tests are written
against the Zulip API.
* The tests for Zulip's incoming webhooks work by sending actual
payloads captured from the real third-party service to the webhook
endpoints, and verifies that the webhook produces the expected Zulip
message as output, to test the actual interface.
So, to summarize our approach to integration vs. unit testing:
* While we aim to achieve test coverage of every significant code path
in the Zulip server, which is commonly associated with unit testing,
most of our tests are integration tests in the sense of sending a
complete HTTP API query to the Zulip server and checking both the
HTTP response and internal state of the server following the request
are correct.
* Following the end-to-end principle in system design, where possible
we write tests that execute a complete flow (e.g. registration a new
Zulip account) rather than testing the implementations of individual
functions.
* We invest in the performance of Zulip in part to give users a great
experience, but just as much for making our test suite fast enough
that we can write our tests this way.
## Avoid duplicating code with security impact
Developing secure software with few security bugs is extremely
difficult. An important part of our strategy for avoiding security
logic bugs is to design patterns for how all of our code that
processes untrusted user input can be well tested without either
writing (and reviewing!) endless tests or requiring every developer to
be good at thinking about security corner cases.
Our strategy for this is to write a small number of carefully-designed
functions like `access_stream_by_id` that we test carefully, and then
use linting and other coding conventions to require that all access to
data from code paths that might share that data with users be mediated
through those functions. So rather than having each view function do
it own security checks for whether the user can access a given stream,
and needing to test each of those copies of the logic, we only need to
do that work once for each major type of data structure and level of
access.
These `access_*_by_*` functions are written in a special style, with each
conditional on its own line (so our test coverage tooling helps verify
that every case is tested), detailed comments, and carefully
considered error-handling to avoid leaking information such as whether
the stream ID requested exists or not.
We will typically also write tests for a given view verifying that it
provides the appropriate errors when improper access is attempted, but
these tests are defense in depth; the main way we prevent invalid
access to streams is not offering developers a way to get a Stream
object in server code except as mediated through these security check
functions.