Since we flush memcached when we do a server restart, the flurry of
get_updates requests that fly in afterwards are all cache misses for
getting the User/UserProfile objects, so Tornado ends up spending
around 70ms per get_updates request rather than the usual 1-2ms.
So this should substantially improve our Tornado performance around
server restarts.
(imported from commit 07b8126bdfd4ff14e4c3362f9eda1fe5fd571c5b)
Our previous code could in theory end up clearing the caches it had
just filled, if Tornado's cache filling work happened to be faster
than the memcached flush.
(imported from commit 48174aadad398fb7a7c917a1df765c1261b12a55)
This is required because our migration is going to go in two phases.
When we do the database migration (on pushing to master), we update
all messages at that point. But prod doesn't know about the new
flags field, so any new messages sent on prod will not have the
read bit set.
When we push to prod, we want to re-run the bit of the migration script
that automatically sets read flags on messages older than the users's
pointer.
(imported from commit 961d33e972eac9ada80089bf1b1269c7fb42d56b)
We now clean up the stream subscription in more places, but some
historical tutorial streams are still around and if an error or page
reload happens during the tutorial it'll stick around.
(imported from commit 8cf0ebda26bf752c1a23296a4ba85d194bbb3004)
With this change,
pkill -SIGUSR1 -f runtornado
will dump the stack and SIGUSR2 will enable an interactive debugging
session.
This fixes#613 for Tornado which was the original motive for that
ticket; I'm not sure whether we want to do this for our Django
processes as well, but it would be easy to do so if we did.
(imported from commit a7de7c6070f4bf0404bed6f434e6a6b291d66a26)
The idea here is: part of the onboarding tutorial is going to
be you talking to the tutorial bot and it talking to you, from
our Javascript.
The reason it's driven by Javascript is that then in principle we can
do nice stuff like making popovers appear in places to point things
out to you, whereas if we were to do it strictly server-side, doing so
would be a lot harder.
The downside to doing it in Javascript is that you don't get any of
the Markdown rendering, since that happens on the server. So instead
we add this call where you give it a message, and it responds by
having the tutorial bot send you that message.
I don't think there are any security concerns here because
(1) The bot only messages you -- so you can't use it to make someone
else think that the system is telling them to do something
(2) If there were an issue associated with having the server parse
arbitrary Markdown, you could just trigger the issue by sending
a message yourself.
(imported from commit b34f594dab6be6bcb81899278ae1cbe447404468)
To work around the issue we're having with queue draining between
parallel blocking connections, use the same rabbitmq queue for both
activity and presence events, keyed on a 'type' flag in the message
itself.
(imported from commit 188e8fda1695734e52c5740db2195072cfc81479)
Note: When deploying, restarting the process-user-activity-commandline script is needed
(imported from commit 63ee795c9c7a7db4a40170cff5636dc1dd0b46a8)
The production database will need to have this user created before
this commit is pushed
(imported from commit cc8356d8afa0f0747486b7b4c82337c60499d3fd)
We need this so that we can safely expunge old events without interfering with
the running server. See #414.
(imported from commit 4739e59e36ea69f877c158c13ee752bf6a2dacfe)
Before this is deployed, we need to install rabbitmq and pika on the
target server (see the puppet part of this commit for how).
When this is deployed, we need to start the new user activity bot:
./manage.py process_user_activity
in the screen session on the relevant server, or user_activity logs
won't be processed (which will eventually result in all users getting
notifications about how their mirrors are out of date).
(imported from commit 44d605aca0290bef2c94fb99267e15e26b21673b)
This commit has the effect of eliminating all of the non-UserActivity
database queries from the Tornado process -- at least in the uncached
case.
This is safe to do, if a bit fragile, since our Tornado code only
accesses these objects (as opposed to their IDs) in a few places that
are all fine with old data, and I don't expect us to add any new ones
soon:
* UserActivity logging, which I plan to move out of Tornado entirely
* Checking whether we're authenticated in our decorators (which could
be simplified -- the actual security check is just whether the
Django session object has a particular field)
* Checking the user realm for whether we should sync to the client
notices about their Zephyr mirror being up to date, which is quite
static and I think we can move out of this code path.
But implementation constraints around mapping the user_ids to
user_profile_ids mean that it makes sense to get the actual objects
for now.
This code is not what I want to do long-term. I expect we'll be able
to clean up the dual User/UserProfile nonsense once we integrate the
upcoming Django 1.5 release, with its support for pluggable User
models, and after that I change, I expect it'll be fairly easy to make
the Tornado code only work with the user ID, not the actual objects.
(imported from commit 82e25b62fd0e3af7c86040600c63a4deec7bec06)
Otherwise one gets:
AttributeError: 'module' object has no attribute 'time'
when trying to use the time module from inside zephyr.lib.
(imported from commit 645368672a3eff68320278dd480edeed56721fcc)
Note that on local dev servers, this will print out every half second, as
Tornado polls for file changes for autoreloading. In production it will only
print out when network events occur.
(imported from commit adfe88879e4e446b7dfa6ee69e0a9ad013e9c4d4)
tornado.web already does this, based on the setting of the 'debug' kwarg.
Dropping this in production saves us waking up twice a second to stat()
a bunch of files.
We already explicitly restart the server on deploys.
(imported from commit 283bb0da609acb2699a04111a74c13224fe5124c)
So, I got annoyed that our test suite was taking forever to run:
real 2m13.443s
user 1m32.630s
sys 0m3.748s
Some quick profiling determined that the test suite is spending all of
its time loading the fixtures files (zephyr/fixtures/messages.json)
that it loads for each test case (3s to load that for each test case).
To improve this situation, I cut out from the test database used by
the test suite most of the users, subscriptions, etc. that aren't
being used directly by the test cases. The impact is a quite
significant speedup:
real 0m15.176s
user 0m9.161s
sys 0m0.508s
We're still spending over a quarter of a second per test, which isn't
great -- but this is at least no longer unbearable.
This commit doesn't make any changes to the populate_db output if you
don't pass the new --test-suite option.
(imported from commit 2334ba5399b33edab3d29ff269fde4ea77ccd48e)
This is needed to avoid exceptions trying to do internal_send_message
in any test against a simple populate_db database.
(imported from commit 36927f57cbbb7e30ae249b5f1a0549fb352827f5)
Importing zephyr.views here has the unfortunate side effect of
creating Client ids 1 and 2 automatically (via decorators.py
instantiating the two client objects it makes), before we go ahead and
delete all objects in the database as part of the populate_db startup.
(imported from commit da03cb7606334d5926e42f422ab94d1c884937b9)