docs: Add an article explaining our testing philosophy.

2019-09-04 16:51:16 -07:00 · 2019-09-04 16:51:16 -07:00 · 32ff28bf24
parent 2f3a0fb80a
commit 32ff28bf24
2 changed files with 199 additions and 0 deletions
--- a/docs/testing/index.rst
+++ b/docs/testing/index.rst
@ -14,3 +14,4 @@ Code Testing
   typescript
   continuous-integration
   manual-testing
   philosophy
--- a/docs/testing/philosophy.md
+++ b/docs/testing/philosophy.md
@ -0,0 +1,198 @@
 # Testing Philosophy
 Zulip's automated tests are a huge part of what makes the project able
 to make progress.  This page records some of the key principles behind
 how we have designed our automated test suites.
 ## Effective testing allows us to move quickly
 Zulip's engineering strategy can be summarized as "move quickly
 without breaking things".  Despite reviewing many code submissions
 from new contributors without deep expertise in the code they are
 changing, Zulip's maintainers spend most of the time they spend
 integrating changes on product decisions and code
 structure/readability questions, not on correctness, style, or
 lower-level issues.
 This is possible because we have spent years systematically investing
 in testing, tooling, code structure, documentation, and development
 practices to help ensure that our contributors write code that needs
 relatively few changes before it can be merged.  The testing element
 of this is to have reliable, extensive, easily extended test suits
 that cover most classes of bugs.  Our testing systems have been
 designed to minimize the time spent manually testing or otherwise
 investigating whether changes are correct.
 For example, our [infrastructure for testing
 authentication](../development/authentication.md) using e.g. a mock
 LDAP database in both automated tests and the development environment
 make it relatively easy to refactor and improve this important part of
 the product than it was when you needed to setup an LDAP server and
 populate it with some test data in order to test LDAP authentication.
 While not every part of Zulip has a great test suite, many components
 do, and for those components, the tests mean that new contributors can
 often make substantive changes to that component and have their
 changes that are more or less correct by the time they share their
 changes for code review.  More importantly, it means that maintainers
 save most the time that would otherwise be spent verifying that the
 changes are simply correct, and instead focus on making sure that the
 codebase remains readable, well-structured, and well-tested.
 ## Test suite performance and reliability are critical
 When automated test suites are slow or unreliable, developers will
 avoid running them, and furthermore, avoid working on improving them
 (both the system and individual tests).  Because changes that make
 tests slow or unreliable are often unintentional side effects of other
 development, problems in this area tend to accumulate as a codebase
 grows.  As a result, barring focused effort to prevent this outcome,
 any large software project will eventually have its test suite rot
 into one that is slow, unreliable, untrustworthy, and hated.  We aim
 for Zulip to avoid that fate.
 So we consider it essential to maintaing every automated test suite
 setup in a way where it is fast and reliable (tests pass both in CI
 and locally if there are no problems with the developer's changes).
 Concretely, our performance goals are for the full backend suite
 (`test-backend`) to complete in about a minute, and our full frontend
 suite (`test-js-with-node`) to run in under 10 seconds.
 It'd be a long blog post to summarize everything we do to help achieve
 these goals, but a few techniques are worth highlighting:
 * Our test suites are designed to not access the Internet, since the
  Internet might be down or unreliable in the test environment.  Where
  outgoing HTTP requests are required to test something, we mock the
  responses with libraries like `httpretty`.
 * We carefully avoid the potential for contamination of data inside
  services like potsgres, redis, and memcached from different tests.
    * Every test case prepends a unique random prefix to all keys it
      uses when accessing redis and memcached.
    * Every test case runs inside a database transaction, which is
      aborted after the test completes.  Each test process interacts
      only with a fresh copy of a special template database used for
      server tests that is destroyed after the process completes.
 * We rigorously investigate non-deterministically failing tests though
  they were priority bugs in the product.
 ## Integration testing or unit testing?
 Developers frequently ask whether they should write "integration
 tests" or "unit tests".  Our view is that tests should be written
 against interfaces that you're already counting on keeping stable, or
 already promising people you'll keep stable.  In other words,
 interfaces that you or other people are already counting on mostly not
 changing except in compatible ways.
 So writing tests for the Zulip server against Zulip's end-to-end API
 is a great example of that: the API is something that people have
 written lots of code against, which means all that code is counting on
 the API generally continuing to work for the ways they're using it.
 The same would be true even if the only users of the API were our own
 project's clients like the mobile apps -- because there are a bunch of
 already-installed copies of our mobile apps out there, and they're
 counting on the API not suddenly changing incompatibly.
 One big reason for this principle is that when you write tests against
 an interface, those tests become a cost you pay any time you change
 that interface -- you have to go update a bunch of tests.
 So in a big codebase if you have a lot of "unit tests" that are for
 tiny internal functions, then any time you refactor something and
 change the internal interfaces -- even though you just made them up,
 and they're completely internal to that codebase so there's nothing
 that will break if you change them at will -- you have to go deal with
 editing a bunch of tests to match the new interfaces.  That's
 especially a lot of work if you try to take the tests seriously,
 because you have to think through whether the tests breaking are
 telling you something you should actually listen to.
 In some big codebases, this can lead to tests feeling a lot like
 busywork... and it's because a lot of those tests really are
 busywork.  And that leads to developers not being committed to
 maintaining and expanding the test suite in a thoughtful way.
 But if your tests are written against an external API, and you make
 some refactoring change and a bunch of tests break... now that's
 telling you something very real!  You can always edit the tests... but
 the tests are stand-ins for real users and real code out there beyond
 your reach, which will break the same way.
 So you can still make the change... but you have to deal with figuring
 out an appropriate migration or backwards-compatibility strategy for
 all those real users out there. Updating the tests is one of the easy
 parts.  And those changes to the tests are a nice reminder to code
 reviewers that you've changed an interface, and the reviewer should
 think carefully about whether those interface changes will be a
 problem for any existing clients and whether they're properly reflected
 in any documentation for that interface.
 Some examples of this philosophy:
 * If you have a web service that's mainly an API, you want to write
  your tests for that API.
 * If you have a CLI program, you want to write your tests against the
  CLI.
 * If you have a compiler, an interpreter, etc., you want essentially
  all your tests to be example programs, with a bit of metadata for
  things like "should give an error at this line" or "should build and
  run, and produce this output".
 In the Zulip context:
 * Zulip uses the same API for our webapp as for our mobile clients and
  third-party API clients, and most of our server tests are written
  against the Zulip API.
 * The tests for Zulip's incoming webhooks work by sending actual
  payloads captured from the real third-party service to the webhook
  endpoints, and verifies that the webhook produces the expected Zulip
  message as output, to test the actual interface.
 So, to summarize our approach to integration vs. unit testing:
 * While we aim to achieve test coverage of every significant code path
  in the Zulip server, which is commonly associated with unit testing,
  most of our tests are integration tests in the sense of sending a
  complete HTTP API query to the Zulip server and checking both the
  HTTP response and internal state of the server following the request
  are correct.
 * Following the end-to-end principle in system design, where possible
  we write tests that execute a complete flow (e.g. registration a new
  Zulip account) rather than testing the implementations of individual
  functions.
 * We invest in the performance of Zulip in part to give users a great
  experience, but just as much for making our test suite fast enough
  that we can write our tests this way.
 ## Avoid duplicating code with security impact
 Developing secure software with few security bugs is extremely
 difficult.  An important part of our strategy for avoiding security
 logic bugs is to design patterns for how all of our code that
 processes untrusted user input can be well tested without either
 writing (and reviewing!) endless tests or requiring every developer to
 be good at thinking about security corner cases.
 Our strategy for this is to write a small number of carefully-designed
 functions like `access_stream_by_id` that we test carefully, and then
 use linting and other coding conventions to require that all access to
 data from code paths that might share that data with users be mediated
 through those functions.  So rather than having each view function do
 it own security checks for whether the user can access a given stream,
 and needing to test each of those copies of the logic, we only need to
 do that work once for each major type of data structure and level of
 access.
 These `access_*_by_*` functions are written in a special style, with each
 conditional on its own line (so our test coverage tooling helps verify
 that every case is tested), detailed comments, and carefully
 considered error-handling to avoid leaking information such as whether
 the stream ID requested exists or not.
 We will typically also write tests for a given view verifying that it
 provides the appropriate errors when improper access is attempted, but
 these tests are defense in depth; the main way we prevent invalid
 access to streams is not offering developers a way to get a Stream
 object in server code except as mediated through these security check
 functions.