4.2 KiB
Billing
Zulip uses a third party (Stripe) for billing, so working on the billing system requires a little bit of setup.
To set up the development environment to work on the billing code:
- Create a Stripe account
- Go to https://dashboard.stripe.com/account/apikeys, and add the
publishable key and secret key as
stripe_publishable_key
andstripe_secret_key
tozproject/dev-secrets.conf
. - Run
./manage.py setup_stripe
.
It is safe to run manage.py setup_stripe
multiple times.
Nearly all the billing-relevant code lives in zilencer/
.
General architecture
Notes:
- Anything that talks directly to Stripe should go in
zilencer/lib/stripe.py
. - We generally try to store billing-related data in Stripe, rather than in Zulip database tables. We'd rather pay the penalty of making extra stripe API requests than deal with keeping two sources of data in sync.
- A realm should have a customer object in Stripe if and only if it has a
Customer
object in Zulip.
The two main billing-related states for a realm are "have never successfully
been charged for anything" and its opposite. This is determined by whether
the realm
has a corresponding Customer
object with
has_billing_relationship=True
. There are only a few cases where a realm
might have a Customer
object with has_billing_relationship=False
:
- They are approved as a non-profit or otherwise have a partial discount, but haven't entered any payment info.
- They entered valid payment info, but the initial charge failed (rare but possible).
If a realm doesn't have a billing relationship, all the messaging, screens, etc. are geared towards making it easy to upgrade. If a realm does have a billing relationship, all the screens are geared toward making it easy to access current and historical billing information.
Note that having a billing relationship doesn't necessarily mean they are currently on a paid plan, or that they currently have a card on file.
Notes:
- When manually testing, I find I often run
Customer.objects.all().delete()
to reset the state. - 4242424242424242 is Stripe's test credit card, also useful for manually testing. You can put anything in the address fields, any future expiry date, and anything for the CVV code. https://stripe.com/docs/testing#cards-responses has some other fun ones.
BillingProcessor
The general strategy here is that billing-relevant events get written to
RealmAuditLog with requires_billing_update = True
, and then a worker
goes through, reads RealmAuditLog row by row, and makes the appropriate
updates in Stripe (in order), keeping track of its state in
BillingProcessor
. An invariant is that it cannot be important when
exactly the worker gets around to making the update in Stripe, as long
as the updates for each customer (realm) are made in RealmAuditLog.id
order.
Almost all the complexity in the code is due to error handling. We distinguish three kinds of errors:
- Transient errors, like rate limiting or network failures, where we just wait a bit and try again.
- Card decline errors (see below)
- Everything else (e.g. misconfigured API keys, errors thrown by buggy code, etc.), where we just throw an exception and stop the worker.
We use the following strategy for card decline errors. There is a global
BillingProcessor (with realm=None
) that processes RealmAuditLog
entries for every customer (realm). If it runs into a card decline error on
some entry, it gives up on that entry and (temporarily) all future entries
of that realm, and spins off a realm-specific BillingProcessor that
marks that realm as needing manual attention. When whatever issue has been
corrected, the realm-specific BillingProcessor completes any
realm-specific RealmAuditLog entries, and then deletes itself.
Notes for manually resolving errors:
BillingProcessor.objects.filter(state='stalled')
is always safe to handle manually.BillingProcessor.objects.filter(state='started')
is safe to handle manually only if the billing process worker is not running.- After resolving the issue, set the processor's state to
done
. - Stripe's idempotency keys are only valid for 24 hours. So be mindful of that if manually cleaning something up more than 24 hours after the error occured.