mirror of https://github.com/zulip/zulip.git
8cc82c6cbe
I added filter() statements to do_update_message_flags(). Here is some context: Steve Howell: Case 1, have AND clause to reduce work for DB. humbug=> update zerver_usermessage set flags = (flags & ~1) where id > 9000; UPDATE 382 humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0; count ------- 382 (1 row) humbug=> explain analyze update zerver_usermessage set flags = (flags | 1) where (flags & 1) = 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Update on zerver_usermessage (cost=0.00..266.85 rows=47 width=27) (actual time=5.727..5.727 rows=0 loops=1) -> Seq Scan on zerver_usermessage (cost=0.00..266.85 rows=47 width=27) (actual time=0.045..2.751 rows=382 loops=1) Filter: ((flags & 1::bigint) = 0) Rows Removed by Filter: 9000 Total runtime: 5.759 ms (5 rows) humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0; count ------- 0 (1 row) Leo Franchi: Sounds reasonable, but I know way less than zev about DBs so I'll defer to his judgement :) Steve Howell: Case 2, how the code works now: humbug=> update zerver_usermessage set flags = (flags & ~1) where id > 9000; UPDATE 382 humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0; count ------- 382 (1 row) humbug=> explain analyze update zerver_usermessage set flags = (flags | 1); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Update on zerver_usermessage (cost=0.00..243.28 rows=9382 width=27) (actual time=362.075..362.075 rows=0 loops=1) -> Seq Scan on zerver_usermessage (cost=0.00..243.28 rows=9382 width=27) (actual time=0.008..6.138 rows=9382 loops=1) Total runtime: 362.105 ms (3 rows) humbug=> select count(*) from zerver_usermessage where (flags & 1) = 0; count ------- 0 (1 row) Steve Howell: In both trials, we set it up so that only 382 of 9382 rows need to be updated. The first trial runs about 63x as fast. The second trial, if my theory is correct, is doing 24x as many writes as it needs. Both trials are reading all 9382 rows. Steve Howell: The expense of the update statement seems to be proportional to the number of rows you "update", not the number of rows that you actually change. Steve Howell: For now I created #1869. Zev Benjamin: That sounds like a reasonable explanation. The disk IO can be expensive (imported from commit d9090daee1f81cad76c430de0956f9bd504da075) |
||
---|---|---|
.. | ||
bugdown | ||
__init__.py | ||
actions.py | ||
alert_words.py | ||
avatar.py | ||
bulk_create.py | ||
cache.py | ||
cache_helpers.py | ||
ccache.py | ||
context_managers.py | ||
create_user.py | ||
debug.py | ||
event_queue.py | ||
html_diff.py | ||
initial_password.py | ||
logging_util.py | ||
mandrill_client.py | ||
mention.py | ||
parallel.py | ||
query.py | ||
queue.py | ||
rate_limiter.py | ||
response.py | ||
timeout.py | ||
timestamp.py | ||
tornado_ioloop_logging.py | ||
unminify.py | ||
upload.py | ||
utils.py |