mirror of https://github.com/zulip/zulip.git
docs: Add an article explaining our testing philosophy.
This commit is contained in:
parent
2f3a0fb80a
commit
32ff28bf24
|
@ -14,3 +14,4 @@ Code Testing
|
|||
typescript
|
||||
continuous-integration
|
||||
manual-testing
|
||||
philosophy
|
||||
|
|
|
@ -0,0 +1,198 @@
|
|||
# Testing Philosophy
|
||||
|
||||
Zulip's automated tests are a huge part of what makes the project able
|
||||
to make progress. This page records some of the key principles behind
|
||||
how we have designed our automated test suites.
|
||||
|
||||
## Effective testing allows us to move quickly
|
||||
|
||||
Zulip's engineering strategy can be summarized as "move quickly
|
||||
without breaking things". Despite reviewing many code submissions
|
||||
from new contributors without deep expertise in the code they are
|
||||
changing, Zulip's maintainers spend most of the time they spend
|
||||
integrating changes on product decisions and code
|
||||
structure/readability questions, not on correctness, style, or
|
||||
lower-level issues.
|
||||
|
||||
This is possible because we have spent years systematically investing
|
||||
in testing, tooling, code structure, documentation, and development
|
||||
practices to help ensure that our contributors write code that needs
|
||||
relatively few changes before it can be merged. The testing element
|
||||
of this is to have reliable, extensive, easily extended test suits
|
||||
that cover most classes of bugs. Our testing systems have been
|
||||
designed to minimize the time spent manually testing or otherwise
|
||||
investigating whether changes are correct.
|
||||
|
||||
For example, our [infrastructure for testing
|
||||
authentication](../development/authentication.md) using e.g. a mock
|
||||
LDAP database in both automated tests and the development environment
|
||||
make it relatively easy to refactor and improve this important part of
|
||||
the product than it was when you needed to setup an LDAP server and
|
||||
populate it with some test data in order to test LDAP authentication.
|
||||
|
||||
While not every part of Zulip has a great test suite, many components
|
||||
do, and for those components, the tests mean that new contributors can
|
||||
often make substantive changes to that component and have their
|
||||
changes that are more or less correct by the time they share their
|
||||
changes for code review. More importantly, it means that maintainers
|
||||
save most the time that would otherwise be spent verifying that the
|
||||
changes are simply correct, and instead focus on making sure that the
|
||||
codebase remains readable, well-structured, and well-tested.
|
||||
|
||||
## Test suite performance and reliability are critical
|
||||
|
||||
When automated test suites are slow or unreliable, developers will
|
||||
avoid running them, and furthermore, avoid working on improving them
|
||||
(both the system and individual tests). Because changes that make
|
||||
tests slow or unreliable are often unintentional side effects of other
|
||||
development, problems in this area tend to accumulate as a codebase
|
||||
grows. As a result, barring focused effort to prevent this outcome,
|
||||
any large software project will eventually have its test suite rot
|
||||
into one that is slow, unreliable, untrustworthy, and hated. We aim
|
||||
for Zulip to avoid that fate.
|
||||
|
||||
So we consider it essential to maintaing every automated test suite
|
||||
setup in a way where it is fast and reliable (tests pass both in CI
|
||||
and locally if there are no problems with the developer's changes).
|
||||
|
||||
Concretely, our performance goals are for the full backend suite
|
||||
(`test-backend`) to complete in about a minute, and our full frontend
|
||||
suite (`test-js-with-node`) to run in under 10 seconds.
|
||||
|
||||
It'd be a long blog post to summarize everything we do to help achieve
|
||||
these goals, but a few techniques are worth highlighting:
|
||||
* Our test suites are designed to not access the Internet, since the
|
||||
Internet might be down or unreliable in the test environment. Where
|
||||
outgoing HTTP requests are required to test something, we mock the
|
||||
responses with libraries like `httpretty`.
|
||||
* We carefully avoid the potential for contamination of data inside
|
||||
services like potsgres, redis, and memcached from different tests.
|
||||
* Every test case prepends a unique random prefix to all keys it
|
||||
uses when accessing redis and memcached.
|
||||
* Every test case runs inside a database transaction, which is
|
||||
aborted after the test completes. Each test process interacts
|
||||
only with a fresh copy of a special template database used for
|
||||
server tests that is destroyed after the process completes.
|
||||
* We rigorously investigate non-deterministically failing tests though
|
||||
they were priority bugs in the product.
|
||||
|
||||
## Integration testing or unit testing?
|
||||
|
||||
Developers frequently ask whether they should write "integration
|
||||
tests" or "unit tests". Our view is that tests should be written
|
||||
against interfaces that you're already counting on keeping stable, or
|
||||
already promising people you'll keep stable. In other words,
|
||||
interfaces that you or other people are already counting on mostly not
|
||||
changing except in compatible ways.
|
||||
|
||||
So writing tests for the Zulip server against Zulip's end-to-end API
|
||||
is a great example of that: the API is something that people have
|
||||
written lots of code against, which means all that code is counting on
|
||||
the API generally continuing to work for the ways they're using it.
|
||||
|
||||
The same would be true even if the only users of the API were our own
|
||||
project's clients like the mobile apps -- because there are a bunch of
|
||||
already-installed copies of our mobile apps out there, and they're
|
||||
counting on the API not suddenly changing incompatibly.
|
||||
|
||||
One big reason for this principle is that when you write tests against
|
||||
an interface, those tests become a cost you pay any time you change
|
||||
that interface -- you have to go update a bunch of tests.
|
||||
|
||||
So in a big codebase if you have a lot of "unit tests" that are for
|
||||
tiny internal functions, then any time you refactor something and
|
||||
change the internal interfaces -- even though you just made them up,
|
||||
and they're completely internal to that codebase so there's nothing
|
||||
that will break if you change them at will -- you have to go deal with
|
||||
editing a bunch of tests to match the new interfaces. That's
|
||||
especially a lot of work if you try to take the tests seriously,
|
||||
because you have to think through whether the tests breaking are
|
||||
telling you something you should actually listen to.
|
||||
|
||||
In some big codebases, this can lead to tests feeling a lot like
|
||||
busywork... and it's because a lot of those tests really are
|
||||
busywork. And that leads to developers not being committed to
|
||||
maintaining and expanding the test suite in a thoughtful way.
|
||||
|
||||
But if your tests are written against an external API, and you make
|
||||
some refactoring change and a bunch of tests break... now that's
|
||||
telling you something very real! You can always edit the tests... but
|
||||
the tests are stand-ins for real users and real code out there beyond
|
||||
your reach, which will break the same way.
|
||||
|
||||
So you can still make the change... but you have to deal with figuring
|
||||
out an appropriate migration or backwards-compatibility strategy for
|
||||
all those real users out there. Updating the tests is one of the easy
|
||||
parts. And those changes to the tests are a nice reminder to code
|
||||
reviewers that you've changed an interface, and the reviewer should
|
||||
think carefully about whether those interface changes will be a
|
||||
problem for any existing clients and whether they're properly reflected
|
||||
in any documentation for that interface.
|
||||
|
||||
Some examples of this philosophy:
|
||||
|
||||
* If you have a web service that's mainly an API, you want to write
|
||||
your tests for that API.
|
||||
* If you have a CLI program, you want to write your tests against the
|
||||
CLI.
|
||||
* If you have a compiler, an interpreter, etc., you want essentially
|
||||
all your tests to be example programs, with a bit of metadata for
|
||||
things like "should give an error at this line" or "should build and
|
||||
run, and produce this output".
|
||||
|
||||
In the Zulip context:
|
||||
* Zulip uses the same API for our webapp as for our mobile clients and
|
||||
third-party API clients, and most of our server tests are written
|
||||
against the Zulip API.
|
||||
* The tests for Zulip's incoming webhooks work by sending actual
|
||||
payloads captured from the real third-party service to the webhook
|
||||
endpoints, and verifies that the webhook produces the expected Zulip
|
||||
message as output, to test the actual interface.
|
||||
|
||||
So, to summarize our approach to integration vs. unit testing:
|
||||
* While we aim to achieve test coverage of every significant code path
|
||||
in the Zulip server, which is commonly associated with unit testing,
|
||||
most of our tests are integration tests in the sense of sending a
|
||||
complete HTTP API query to the Zulip server and checking both the
|
||||
HTTP response and internal state of the server following the request
|
||||
are correct.
|
||||
* Following the end-to-end principle in system design, where possible
|
||||
we write tests that execute a complete flow (e.g. registration a new
|
||||
Zulip account) rather than testing the implementations of individual
|
||||
functions.
|
||||
* We invest in the performance of Zulip in part to give users a great
|
||||
experience, but just as much for making our test suite fast enough
|
||||
that we can write our tests this way.
|
||||
|
||||
## Avoid duplicating code with security impact
|
||||
|
||||
Developing secure software with few security bugs is extremely
|
||||
difficult. An important part of our strategy for avoiding security
|
||||
logic bugs is to design patterns for how all of our code that
|
||||
processes untrusted user input can be well tested without either
|
||||
writing (and reviewing!) endless tests or requiring every developer to
|
||||
be good at thinking about security corner cases.
|
||||
|
||||
Our strategy for this is to write a small number of carefully-designed
|
||||
functions like `access_stream_by_id` that we test carefully, and then
|
||||
use linting and other coding conventions to require that all access to
|
||||
data from code paths that might share that data with users be mediated
|
||||
through those functions. So rather than having each view function do
|
||||
it own security checks for whether the user can access a given stream,
|
||||
and needing to test each of those copies of the logic, we only need to
|
||||
do that work once for each major type of data structure and level of
|
||||
access.
|
||||
|
||||
These `access_*_by_*` functions are written in a special style, with each
|
||||
conditional on its own line (so our test coverage tooling helps verify
|
||||
that every case is tested), detailed comments, and carefully
|
||||
considered error-handling to avoid leaking information such as whether
|
||||
the stream ID requested exists or not.
|
||||
|
||||
We will typically also write tests for a given view verifying that it
|
||||
provides the appropriate errors when improper access is attempted, but
|
||||
these tests are defense in depth; the main way we prevent invalid
|
||||
access to streams is not offering developers a way to get a Stream
|
||||
object in server code except as mediated through these security check
|
||||
functions.
|
||||
|
Loading…
Reference in New Issue