docs: Add an article explaining our testing philosophy.

2019-09-04 16:51:16 -07:00 · 2019-09-04 16:51:16 -07:00 · 32ff28bf24
parent 2f3a0fb80a
commit 32ff28bf24
2 changed files with 199 additions and 0 deletions
--- a/docs/testing/index.rst
+++ b/docs/testing/index.rst
@ -14,3 +14,4 @@ Code Testing
   typescript
   continuous-integration
   manual-testing
+   philosophy
--- a/docs/testing/philosophy.md
+++ b/docs/testing/philosophy.md
@ -0,0 +1,198 @@
+# Testing Philosophy
+
+Zulip's automated tests are a huge part of what makes the project able
+to make progress.  This page records some of the key principles behind
+how we have designed our automated test suites.
+
+## Effective testing allows us to move quickly
+
+Zulip's engineering strategy can be summarized as "move quickly
+without breaking things".  Despite reviewing many code submissions
+from new contributors without deep expertise in the code they are
+changing, Zulip's maintainers spend most of the time they spend
+integrating changes on product decisions and code
+structure/readability questions, not on correctness, style, or
+lower-level issues.
+
+This is possible because we have spent years systematically investing
+in testing, tooling, code structure, documentation, and development
+practices to help ensure that our contributors write code that needs
+relatively few changes before it can be merged.  The testing element
+of this is to have reliable, extensive, easily extended test suits
+that cover most classes of bugs.  Our testing systems have been
+designed to minimize the time spent manually testing or otherwise
+investigating whether changes are correct.
+
+For example, our [infrastructure for testing
+authentication](../development/authentication.md) using e.g. a mock
+LDAP database in both automated tests and the development environment
+make it relatively easy to refactor and improve this important part of
+the product than it was when you needed to setup an LDAP server and
+populate it with some test data in order to test LDAP authentication.
+
+While not every part of Zulip has a great test suite, many components
+do, and for those components, the tests mean that new contributors can
+often make substantive changes to that component and have their
+changes that are more or less correct by the time they share their
+changes for code review.  More importantly, it means that maintainers
+save most the time that would otherwise be spent verifying that the
+changes are simply correct, and instead focus on making sure that the
+codebase remains readable, well-structured, and well-tested.
+
+## Test suite performance and reliability are critical
+
+When automated test suites are slow or unreliable, developers will
+avoid running them, and furthermore, avoid working on improving them
+(both the system and individual tests).  Because changes that make
+tests slow or unreliable are often unintentional side effects of other
+development, problems in this area tend to accumulate as a codebase
+grows.  As a result, barring focused effort to prevent this outcome,
+any large software project will eventually have its test suite rot
+into one that is slow, unreliable, untrustworthy, and hated.  We aim
+for Zulip to avoid that fate.
+
+So we consider it essential to maintaing every automated test suite
+setup in a way where it is fast and reliable (tests pass both in CI
+and locally if there are no problems with the developer's changes).
+
+Concretely, our performance goals are for the full backend suite
+(`test-backend`) to complete in about a minute, and our full frontend
+suite (`test-js-with-node`) to run in under 10 seconds.
+
+It'd be a long blog post to summarize everything we do to help achieve
+these goals, but a few techniques are worth highlighting:
+* Our test suites are designed to not access the Internet, since the
+  Internet might be down or unreliable in the test environment.  Where
+  outgoing HTTP requests are required to test something, we mock the
+  responses with libraries like `httpretty`.
+* We carefully avoid the potential for contamination of data inside
+  services like potsgres, redis, and memcached from different tests.
+    * Every test case prepends a unique random prefix to all keys it
+      uses when accessing redis and memcached.
+    * Every test case runs inside a database transaction, which is
+      aborted after the test completes.  Each test process interacts
+      only with a fresh copy of a special template database used for
+      server tests that is destroyed after the process completes.
+* We rigorously investigate non-deterministically failing tests though
+  they were priority bugs in the product.
+
+## Integration testing or unit testing?
+
+Developers frequently ask whether they should write "integration
+tests" or "unit tests".  Our view is that tests should be written
+against interfaces that you're already counting on keeping stable, or
+already promising people you'll keep stable.  In other words,
+interfaces that you or other people are already counting on mostly not
+changing except in compatible ways.
+
+So writing tests for the Zulip server against Zulip's end-to-end API
+is a great example of that: the API is something that people have
+written lots of code against, which means all that code is counting on
+the API generally continuing to work for the ways they're using it.
+
+The same would be true even if the only users of the API were our own
+project's clients like the mobile apps -- because there are a bunch of
+already-installed copies of our mobile apps out there, and they're
+counting on the API not suddenly changing incompatibly.
+
+One big reason for this principle is that when you write tests against
+an interface, those tests become a cost you pay any time you change
+that interface -- you have to go update a bunch of tests.
+
+So in a big codebase if you have a lot of "unit tests" that are for
+tiny internal functions, then any time you refactor something and
+change the internal interfaces -- even though you just made them up,
+and they're completely internal to that codebase so there's nothing
+that will break if you change them at will -- you have to go deal with
+editing a bunch of tests to match the new interfaces.  That's
+especially a lot of work if you try to take the tests seriously,
+because you have to think through whether the tests breaking are
+telling you something you should actually listen to.
+
+In some big codebases, this can lead to tests feeling a lot like
+busywork... and it's because a lot of those tests really are
+busywork.  And that leads to developers not being committed to
+maintaining and expanding the test suite in a thoughtful way.
+
+But if your tests are written against an external API, and you make
+some refactoring change and a bunch of tests break... now that's
+telling you something very real!  You can always edit the tests... but
+the tests are stand-ins for real users and real code out there beyond
+your reach, which will break the same way.
+
+So you can still make the change... but you have to deal with figuring
+out an appropriate migration or backwards-compatibility strategy for
+all those real users out there. Updating the tests is one of the easy
+parts.  And those changes to the tests are a nice reminder to code
+reviewers that you've changed an interface, and the reviewer should
+think carefully about whether those interface changes will be a
+problem for any existing clients and whether they're properly reflected
+in any documentation for that interface.
+
+Some examples of this philosophy:
+
+* If you have a web service that's mainly an API, you want to write
+  your tests for that API.
+* If you have a CLI program, you want to write your tests against the
+  CLI.
+* If you have a compiler, an interpreter, etc., you want essentially
+  all your tests to be example programs, with a bit of metadata for
+  things like "should give an error at this line" or "should build and
+  run, and produce this output".
+
+In the Zulip context:
+* Zulip uses the same API for our webapp as for our mobile clients and
+  third-party API clients, and most of our server tests are written
+  against the Zulip API.
+* The tests for Zulip's incoming webhooks work by sending actual
+  payloads captured from the real third-party service to the webhook
+  endpoints, and verifies that the webhook produces the expected Zulip
+  message as output, to test the actual interface.
+
+So, to summarize our approach to integration vs. unit testing:
+* While we aim to achieve test coverage of every significant code path
+  in the Zulip server, which is commonly associated with unit testing,
+  most of our tests are integration tests in the sense of sending a
+  complete HTTP API query to the Zulip server and checking both the
+  HTTP response and internal state of the server following the request
+  are correct.
+* Following the end-to-end principle in system design, where possible
+  we write tests that execute a complete flow (e.g. registration a new
+  Zulip account) rather than testing the implementations of individual
+  functions.
+* We invest in the performance of Zulip in part to give users a great
+  experience, but just as much for making our test suite fast enough
+  that we can write our tests this way.
+
+## Avoid duplicating code with security impact
+
+Developing secure software with few security bugs is extremely
+difficult.  An important part of our strategy for avoiding security
+logic bugs is to design patterns for how all of our code that
+processes untrusted user input can be well tested without either
+writing (and reviewing!) endless tests or requiring every developer to
+be good at thinking about security corner cases.
+
+Our strategy for this is to write a small number of carefully-designed
+functions like `access_stream_by_id` that we test carefully, and then
+use linting and other coding conventions to require that all access to
+data from code paths that might share that data with users be mediated
+through those functions.  So rather than having each view function do
+it own security checks for whether the user can access a given stream,
+and needing to test each of those copies of the logic, we only need to
+do that work once for each major type of data structure and level of
+access.
+
+These `access_*_by_*` functions are written in a special style, with each
+conditional on its own line (so our test coverage tooling helps verify
+that every case is tested), detailed comments, and carefully
+considered error-handling to avoid leaking information such as whether
+the stream ID requested exists or not.
+
+We will typically also write tests for a given view verifying that it
+provides the appropriate errors when improper access is attempted, but
+these tests are defense in depth; the main way we prevent invalid
+access to streams is not offering developers a way to get a Stream
+object in server code except as mediated through these security check
+functions.
+