diff --git a/docs/testing/index.rst b/docs/testing/index.rst index 98fb594118..9543a64000 100644 --- a/docs/testing/index.rst +++ b/docs/testing/index.rst @@ -14,3 +14,4 @@ Code Testing typescript continuous-integration manual-testing + philosophy diff --git a/docs/testing/philosophy.md b/docs/testing/philosophy.md new file mode 100644 index 0000000000..31a5351eb1 --- /dev/null +++ b/docs/testing/philosophy.md @@ -0,0 +1,198 @@ +# Testing Philosophy + +Zulip's automated tests are a huge part of what makes the project able +to make progress. This page records some of the key principles behind +how we have designed our automated test suites. + +## Effective testing allows us to move quickly + +Zulip's engineering strategy can be summarized as "move quickly +without breaking things". Despite reviewing many code submissions +from new contributors without deep expertise in the code they are +changing, Zulip's maintainers spend most of the time they spend +integrating changes on product decisions and code +structure/readability questions, not on correctness, style, or +lower-level issues. + +This is possible because we have spent years systematically investing +in testing, tooling, code structure, documentation, and development +practices to help ensure that our contributors write code that needs +relatively few changes before it can be merged. The testing element +of this is to have reliable, extensive, easily extended test suits +that cover most classes of bugs. Our testing systems have been +designed to minimize the time spent manually testing or otherwise +investigating whether changes are correct. + +For example, our [infrastructure for testing +authentication](../development/authentication.md) using e.g. a mock +LDAP database in both automated tests and the development environment +make it relatively easy to refactor and improve this important part of +the product than it was when you needed to setup an LDAP server and +populate it with some test data in order to test LDAP authentication. + +While not every part of Zulip has a great test suite, many components +do, and for those components, the tests mean that new contributors can +often make substantive changes to that component and have their +changes that are more or less correct by the time they share their +changes for code review. More importantly, it means that maintainers +save most the time that would otherwise be spent verifying that the +changes are simply correct, and instead focus on making sure that the +codebase remains readable, well-structured, and well-tested. + +## Test suite performance and reliability are critical + +When automated test suites are slow or unreliable, developers will +avoid running them, and furthermore, avoid working on improving them +(both the system and individual tests). Because changes that make +tests slow or unreliable are often unintentional side effects of other +development, problems in this area tend to accumulate as a codebase +grows. As a result, barring focused effort to prevent this outcome, +any large software project will eventually have its test suite rot +into one that is slow, unreliable, untrustworthy, and hated. We aim +for Zulip to avoid that fate. + +So we consider it essential to maintaing every automated test suite +setup in a way where it is fast and reliable (tests pass both in CI +and locally if there are no problems with the developer's changes). + +Concretely, our performance goals are for the full backend suite +(`test-backend`) to complete in about a minute, and our full frontend +suite (`test-js-with-node`) to run in under 10 seconds. + +It'd be a long blog post to summarize everything we do to help achieve +these goals, but a few techniques are worth highlighting: +* Our test suites are designed to not access the Internet, since the + Internet might be down or unreliable in the test environment. Where + outgoing HTTP requests are required to test something, we mock the + responses with libraries like `httpretty`. +* We carefully avoid the potential for contamination of data inside + services like potsgres, redis, and memcached from different tests. + * Every test case prepends a unique random prefix to all keys it + uses when accessing redis and memcached. + * Every test case runs inside a database transaction, which is + aborted after the test completes. Each test process interacts + only with a fresh copy of a special template database used for + server tests that is destroyed after the process completes. +* We rigorously investigate non-deterministically failing tests though + they were priority bugs in the product. + +## Integration testing or unit testing? + +Developers frequently ask whether they should write "integration +tests" or "unit tests". Our view is that tests should be written +against interfaces that you're already counting on keeping stable, or +already promising people you'll keep stable. In other words, +interfaces that you or other people are already counting on mostly not +changing except in compatible ways. + +So writing tests for the Zulip server against Zulip's end-to-end API +is a great example of that: the API is something that people have +written lots of code against, which means all that code is counting on +the API generally continuing to work for the ways they're using it. + +The same would be true even if the only users of the API were our own +project's clients like the mobile apps -- because there are a bunch of +already-installed copies of our mobile apps out there, and they're +counting on the API not suddenly changing incompatibly. + +One big reason for this principle is that when you write tests against +an interface, those tests become a cost you pay any time you change +that interface -- you have to go update a bunch of tests. + +So in a big codebase if you have a lot of "unit tests" that are for +tiny internal functions, then any time you refactor something and +change the internal interfaces -- even though you just made them up, +and they're completely internal to that codebase so there's nothing +that will break if you change them at will -- you have to go deal with +editing a bunch of tests to match the new interfaces. That's +especially a lot of work if you try to take the tests seriously, +because you have to think through whether the tests breaking are +telling you something you should actually listen to. + +In some big codebases, this can lead to tests feeling a lot like +busywork... and it's because a lot of those tests really are +busywork. And that leads to developers not being committed to +maintaining and expanding the test suite in a thoughtful way. + +But if your tests are written against an external API, and you make +some refactoring change and a bunch of tests break... now that's +telling you something very real! You can always edit the tests... but +the tests are stand-ins for real users and real code out there beyond +your reach, which will break the same way. + +So you can still make the change... but you have to deal with figuring +out an appropriate migration or backwards-compatibility strategy for +all those real users out there. Updating the tests is one of the easy +parts. And those changes to the tests are a nice reminder to code +reviewers that you've changed an interface, and the reviewer should +think carefully about whether those interface changes will be a +problem for any existing clients and whether they're properly reflected +in any documentation for that interface. + +Some examples of this philosophy: + +* If you have a web service that's mainly an API, you want to write + your tests for that API. +* If you have a CLI program, you want to write your tests against the + CLI. +* If you have a compiler, an interpreter, etc., you want essentially + all your tests to be example programs, with a bit of metadata for + things like "should give an error at this line" or "should build and + run, and produce this output". + +In the Zulip context: +* Zulip uses the same API for our webapp as for our mobile clients and + third-party API clients, and most of our server tests are written + against the Zulip API. +* The tests for Zulip's incoming webhooks work by sending actual + payloads captured from the real third-party service to the webhook + endpoints, and verifies that the webhook produces the expected Zulip + message as output, to test the actual interface. + +So, to summarize our approach to integration vs. unit testing: +* While we aim to achieve test coverage of every significant code path + in the Zulip server, which is commonly associated with unit testing, + most of our tests are integration tests in the sense of sending a + complete HTTP API query to the Zulip server and checking both the + HTTP response and internal state of the server following the request + are correct. +* Following the end-to-end principle in system design, where possible + we write tests that execute a complete flow (e.g. registration a new + Zulip account) rather than testing the implementations of individual + functions. +* We invest in the performance of Zulip in part to give users a great + experience, but just as much for making our test suite fast enough + that we can write our tests this way. + +## Avoid duplicating code with security impact + +Developing secure software with few security bugs is extremely +difficult. An important part of our strategy for avoiding security +logic bugs is to design patterns for how all of our code that +processes untrusted user input can be well tested without either +writing (and reviewing!) endless tests or requiring every developer to +be good at thinking about security corner cases. + +Our strategy for this is to write a small number of carefully-designed +functions like `access_stream_by_id` that we test carefully, and then +use linting and other coding conventions to require that all access to +data from code paths that might share that data with users be mediated +through those functions. So rather than having each view function do +it own security checks for whether the user can access a given stream, +and needing to test each of those copies of the logic, we only need to +do that work once for each major type of data structure and level of +access. + +These `access_*_by_*` functions are written in a special style, with each +conditional on its own line (so our test coverage tooling helps verify +that every case is tested), detailed comments, and carefully +considered error-handling to avoid leaking information such as whether +the stream ID requested exists or not. + +We will typically also write tests for a given view verifying that it +provides the appropriate errors when improper access is attempted, but +these tests are defense in depth; the main way we prevent invalid +access to streams is not offering developers a way to get a Stream +object in server code except as mediated through these security check +functions. +