# Billing Zulip uses a third party (Stripe) for billing, so working on the billing system requires a little bit of setup. To set up the development environment to work on the billing code: * Create a Stripe account * Go to , and add the publishable key and secret key as `stripe_publishable_key` and `stripe_secret_key` to `zproject/dev-secrets.conf`. * Run `./manage.py setup_stripe`. It is safe to run `manage.py setup_stripe` multiple times. Nearly all the billing-relevant code lives in `corporate/`. ## General architecture Notes: * Anything that talks directly to Stripe should go in `corporate/lib/stripe.py`. * We generally try to store billing-related data in Stripe, rather than in Zulip database tables. We'd rather pay the penalty of making extra stripe API requests than deal with keeping two sources of data in sync. * A realm should have a customer object in Stripe if and only if it has a `Customer` object in Zulip. The two main billing-related states for a realm are "have never successfully been charged for anything" and its opposite. This is determined by whether the `realm` has a corresponding `Customer` object with `has_billing_relationship=True`. There are only a few cases where a `realm` might have a `Customer` object with `has_billing_relationship=False`: * They are approved as a non-profit or otherwise have a partial discount, but haven't entered any payment info. * They entered valid payment info, but the initial charge failed (rare but possible). If a realm doesn't have a billing relationship, all the messaging, screens, etc. are geared towards making it easy to upgrade. If a realm does have a billing relationship, all the screens are geared toward making it easy to access current and historical billing information. Note that having a billing relationship doesn't necessarily mean they are currently on a paid plan, or that they currently have a card on file. Notes: * When manually testing, I find I often run `Customer.objects.all().delete()` to reset the state. * 4242424242424242 is Stripe's test credit card, also useful for manually testing. You can put anything in the address fields, any future expiry date, and anything for the CVV code. has some other fun ones. ## BillingProcessor The general strategy here is that billing-relevant events get written to RealmAuditLog with `requires_billing_update = True`, and then a worker goes through, reads RealmAuditLog row by row, and makes the appropriate updates in Stripe (in order), keeping track of its state in `BillingProcessor`. An invariant is that it cannot be important when exactly the worker gets around to making the update in Stripe, as long as the updates for each customer (realm) are made in `RealmAuditLog.id` order. Almost all the complexity in the code is due to error handling. We distinguish three kinds of errors: * Transient errors, like rate limiting or network failures, where we just wait a bit and try again. * Card decline errors (see below) * Everything else (e.g. misconfigured API keys, errors thrown by buggy code, etc.), where we just throw an exception and stop the worker. We use the following strategy for card decline errors. There is a global BillingProcessor (with `realm=None`) that processes RealmAuditLog entries for every customer (realm). If it runs into a card decline error on some entry, it gives up on that entry and (temporarily) all future entries of that realm, and spins off a realm-specific BillingProcessor that marks that realm as needing manual attention. When whatever issue has been corrected, the realm-specific BillingProcessor completes any realm-specific RealmAuditLog entries, and then deletes itself. Notes for manually resolving errors: * `BillingProcessor.objects.filter(state='stalled')` is always safe to handle manually. * `BillingProcessor.objects.filter(state='started')` is safe to handle manually only if the billing process worker is not running. * After resolving the issue, set the processor's state to `done`. * Stripe's idempotency keys are only valid for 24 hours. So be mindful of that if manually cleaning something up more than 24 hours after the error occured.