zulip

Commit Graph

Author	SHA1	Message	Date
Alex Vandiver	528d0ebcf0	puppet: Serve /etc/zulip/well-known/ in nginx as /.well-known/.	2023-10-04 15:56:42 -07:00
Tim Abbott	5e7d61464d	puppet: Include trusted-proto definition in zulip_ops configurations. This should have been part of `0935d388f0`.	2023-05-29 15:13:45 -07:00
Alex Vandiver	04cf68b45e	uploads: Serve S3 uploads directly from nginx. When file uploads are stored in S3, this means that Zulip serves as a 302 to S3. Because browsers do not cache redirects, this means that no image contents can be cached -- and upon every page load or reload, every recently-posted image must be re-fetched. This incurs extra load on the Zulip server, as well as potentially excessive bandwidth usage from S3, and on the client's connection. Switch to fetching the content from S3 in nginx, and serving the content from nginx. These have `Cache-control: private, immutable` headers set on the response, allowing browsers to cache them locally. Because nginx fetching from S3 can be slow, and requests for uploads will generally be bunched around when a message containing them are first posted, we instruct nginx to cache the contents locally. This is safe because uploaded file contents are immutable; access control is still mediated by Django. The nginx cache key is the URL without query parameters, as those parameters include a time-limited signed authentication parameter which lets nginx fetch the non-public file. This adds a number of nginx-level configuration parameters to control the caching which nginx performs, including the amount of in-memory index for he cache, the maximum storage of the cache on disk, and how long data is retained in the cache. The currently-chosen figures are reasonable for small to medium deployments. The most notable effect of this change is in allowing browsers to cache uploaded image content; however, while there will be many fewer requests, it also has an improvement on request latency. The following tests were done with a non-AWS client in SFO, a server and S3 storage in us-east-1, and with 100 requests after 10 requests of warm-up (to fill the nginx cache). The mean and standard deviation are shown. \| \| Redirect to S3 \| Caching proxy, hot \| Caching proxy, cold \| \| ----------------- \| ------------------- \| ------------------- \| ------------------- \| \| Time in Django \| 263.0 ms ± 28.3 ms \| 258.0 ms ± 12.3 ms \| 258.0 ms ± 12.3 ms \| \| Small file (842b) \| 586.1 ms ± 21.1 ms \| 266.1 ms ± 67.4 ms \| 288.6 ms ± 17.7 ms \| \| Large file (660k) \| 959.6 ms ± 137.9 ms \| 609.5 ms ± 13.0 ms \| 648.1 ms ± 43.2 ms \| The hot-cache performance is faster for both large and small files, since it saves the client the time having to make a second request to a separate host. This performance improvement remains at least 100ms even if the client is on the same coast as the server. Cold nginx caches are only slightly slower than hot caches, because VPC access to S3 endpoints is extremely fast (assuming it is in the same region as the host), and nginx can pool connections to S3 and reuse them. However, all of the 648ms taken to serve a cold-cache large file is occupied in nginx, as opposed to the only 263ms which was spent in nginx when using redirects to S3. This means that to overall spend less time responding to uploaded-file requests in nginx, clients will need to find files in their local cache, and skip making an uploaded-file request, at least 60% of the time. Modeling shows a reduction in the number of client requests by about 70% - 80%. The `Content-Disposition` header logic can now also be entirely shared with the local-file codepath, as can the `url_only` path used by mobile clients. While we could provide the direct-to-S3 temporary signed URL to mobile clients, we choose to provide the served-from-Zulip signed URL, to better control caching headers on it, and greater consistency. In doing so, we adjust the salt used for the URL; since these URLs are only valid for 60s, the effect of this salt change is minimal.	2023-01-09 18:23:58 -05:00
Anders Kaseorg	7666ff603d	sharding: Configure Tornado sharding with nginx map. https://nginx.org/en/docs/http/ngx_http_map_module.html Since Puppet doesn’t manage the contents of nginx_sharding.conf after its initial creation, it needs to be renamed so we can give it different default contents. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-09-15 16:07:50 -07:00
Anders Kaseorg	0da0ee3c92	puppet: Remove nginx configuration for zulip.org. This is unused since commit `1806e0f45e` (#19625). Signed-off-by: Anders Kaseorg <anders@zulip.com>	2022-09-01 10:03:18 -07:00
Alex Vandiver	fc1adef28a	puppet: Fix server_name of internal staging server.	2022-01-18 12:36:56 -08:00
Alex Vandiver	7e630b81f8	puppet: Switch to using snakeoil certs for staging. This parallels `ba3b88c81b`, but for the staging host.	2022-01-18 12:36:56 -08:00
Anders Kaseorg	129ea6dd11	nginx: Consistently listen on IPv6 and with HTTP/2. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2021-03-17 17:46:32 -07:00
Alex Vandiver	06c07109e4	puppet: Add missing semicolons left off in `ba3b88c81b`.	2021-03-12 15:48:53 -08:00
Alex Vandiver	ba3b88c81b	puppet: Explicitly use the snakeoil certificates for nginx. In production, the `wildcard-zulipchat.com.combined-chain.crt` file is just a symlink to the snakeoil certificates; but we do not puppet that symlink, which makes new hosts fail to start cleanly. Instead, point explicitly to the snakeoil certificate, and explain why.	2021-03-12 13:31:54 -08:00
Alex Vandiver	0b736ef4cf	puppet: Remove puppet_ops configuration for separate loadbalancer host.	2021-02-22 16:05:13 -08:00
Alex Vandiver	16af05758d	puppet: Move zulip_org into zulip_ops. This class is not of general interest.	2020-10-22 11:30:53 -07:00
Tim Abbott	7c2c82b190	nginx: Update nginx configuration for fhir/hl7 organization. We should eventually add templating for the set of hosts here, but it's worth merging this change to remove the deleted hostname and replace it with the current one.	2020-10-13 16:50:26 -07:00
Anders Kaseorg	723d285e46	nginx: Redirect {www.,}zulipchat.com, www.zulip.com to zulip.com. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2020-10-13 16:49:23 -07:00
Tim Abbott	220620e7cf	sharding: Add basic sharding configuration for Tornado. This allows straight-forward configuration of realm-based Tornado sharding through simply editing /etc/zulip/zulip.conf to configure shards and running scripts/refresh-sharding-and-restart. Co-Author-By: Mateusz Mandera <mateusz.mandera@zulip.com>	2020-05-20 13:47:20 -07:00
Anders Kaseorg	c0ffa71fa9	nginx: Replace unanchored regexes in location directives. We could anchor the regexes, but there’s no need for the power (and responsibility) of regexes here. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-04-24 16:58:19 -07:00
Anders Kaseorg	ea6934c26d	dependencies: Remove WebSockets system for sending messages. Zulip has had a small use of WebSockets (specifically, for the code path of sending messages, via the webapp only) since ~2013. We originally added this use of WebSockets in the hope that the latency benefits of doing so would allow us to avoid implementing a markdown local echo; they were not. Further, HTTP/2 may have eliminated the latency difference we hoped to exploit by using WebSockets in any case. While we’d originally imagined using WebSockets for other endpoints, there was never a good justification for moving more components to the WebSockets system. This WebSockets code path had a lot of downsides/complexity, including: * The messy hack involving constructing an emulated request object to hook into doing Django requests. * The `message_senders` queue processor system, which increases RAM needs and must be provisioned independently from the rest of the server). * A duplicate check_send_receive_time Nagios test specific to WebSockets. * The requirement for users to have their firewalls/NATs allow WebSocket connections, and a setting to disable them for networks where WebSockets don’t work. * Dependencies on the SockJS family of libraries, which has at times been poorly maintained, and periodically throws random JavaScript exceptions in our production environments without a deep enough traceback to effectively investigate. * A total of about 1600 lines of our code related to the feature. * Increased load on the Tornado system, especially around a Zulip server restart, and especially for large installations like zulipchat.com, resulting in extra delay before messages can be sent again. As detailed in https://github.com/zulip/zulip/pull/12862#issuecomment-536152397, it appears that removing WebSockets moderately increases the time it takes for the `send_message` API query to return from the server, but does not significantly change the time between when a message is sent and when it is received by clients. We don’t understand the reason for that change (suggesting the possibility of a measurement error), and even if it is a real change, we consider that potential small latency regression to be acceptable. If we later want WebSockets, we’ll likely want to just use Django Channels. Signed-off-by: Anders Kaseorg <anders@zulipchat.com>	2020-01-14 22:34:00 -08:00
Tim Abbott	b218c2a70e	loadbalancer: Use same certbot cert for zulipstaging.com. This is a simple configuration improvement.	2018-12-07 13:43:21 -08:00
Tim Abbott	467694c1fa	nginx: Enable http2 in external nginx configuration. This should be a nice performance improvement for browsers that support it. We can't yet enabled this in the Zulip on-premise nginx configuration, because that still has to support Trusty.	2018-12-07 13:43:02 -08:00
Tim Abbott	b53a712856	nginx: Update configuration for using certbot certs everywhere.	2018-08-22 11:59:15 -07:00
Tim Abbott	02ae71f27f	api: Stop using API keys for Django->Tornado authentication. As part of our effort to change the data model away from each user having a single API key, we're eliminating the couple requests that were made from Django to Tornado (as part of a /register or home request) where we used the user's API key grabbed from the database for authentication. Instead, we use the (already existing) internal_notify_view authentication mechanism, which uses the SHARED_SECRET setting for security, for these requests, and just fetch the user object using get_user_profile_by_id directly. Tweaked by Yago to include the new /api/v1/events/internal endpoint in the exempt_patterns list in test_helpers, since it's an endpoint we call through Tornado. Also added a couple missing return type annotations.	2018-07-30 12:28:31 -07:00
Tim Abbott	94554c65da	certbot: Modify nginx configuration to support automated renewal.	2017-11-08 12:32:26 -08:00
Tim Abbott	62bb465896	puppet: Modify lb0 nginx configuration.	2017-11-08 12:32:26 -08:00
Tim Abbott	3af01bed85	puppet: Simplify zulip_ops nginx configuration. Whatever dist/ functionality this had in 2014 is now served by zulip.org, and since this serves as a sample, it should be as simple as possible. Previously, this was more cluttered than it needed to be.	2017-10-05 21:17:57 -07:00
Tim Abbott	9b7a3f040c	Remove now-unused /json/get_events endpoint.	2016-10-27 21:34:58 -07:00
Tim Abbott	8584c05d80	zulip_ops: Remove unnecessary loadbalancer stanzas.	2016-10-25 23:52:37 -07:00
Tim Abbott	869f0724ce	zulip_ops: Remove humbughq.com nginx configuration. The humbughq.com name hasn't been the product's name since 2013, and it's nice to finish clearing it out of the repository.	2016-10-16 20:13:29 -07:00
Tim Abbott	36e336edc3	puppet: Rename zulip_internal to zulip_ops. The old "zulip_internal" name was from back when Zulip, Inc. had two distributions of Zulip, the enterprise distribution in puppet/zulip/ and the "internal" SAAS distribution in puppet/zulip_internal. I think the name is a bit confusing in the new fully open-source Zulip work, so we're replacing it with "zulip_ops". I don't think the new name is perfect, but it's better. In the following commits, we'll delete a bunch of pieces of Zulip, Inc.'s infrastructure that don't exist anymore and thus are no longer useful (e.g. the old Trac configuration), with the goal of cleaning the repository of as much unnecessary content as possible.	2016-10-16 19:23:27 -07:00

28 Commits