Before this fix, the installer has an extremely annoying bug where
when run inside a container with `lxc-attach`, when the installer
finishes, the `lxc-attach` just hangs and doesn't respond even to
C-c or C-z. The only way to get the terminal back is to root around
from some other terminal to find the PID and kill it; then run
something like `stty sane` to fix the messed-up terminal settings
left behind.
After bisecting pieces of the install script to locate which step
was causing the issue, it comes down to the `service camo restart`.
The comment here indicates that we knew about an annoying bug here
years ago, and just swept it under the rug by skipping this step
when in Travis. >_<
The issue can be reproduced by running simply `service camo restart`
under `lxc-attach` instead of the installer; or `service camo start`,
following a `service camo stop`. If `lxc-attach` is used to get an
interactive shell, these commands appear to work fine; but then when
that shell exits, the same hang appears. So, when we start camo
we're evidently leaving some kind of mess that entangles the daemon
with our shell.
Looking at the camo initscript where it starts the daemon, there's
not much code, and one flag jumps out as suspicious:
start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
--exec $DAEMON --no-close -c nobody --test > /dev/null 2>&1 \
|| return 1
start-stop-daemon --start --quiet --pidfile $PIDFILE -bm \
--no-close -c nobody --exec $DAEMON -- \
$DAEMON_ARGS >> /var/log/camo/camo.log 2>&1 \
|| return 2
What does `--no-close` do?
-C, --no-close
Do not close any file descriptor when forcing the daemon
into the background (since version 1.16.5). Used for
debugging purposes to see the process output, or to
redirect file descriptors to log the process output.
And in fact, looking in /proc/PID/fd while a hang is happening finds
that fd 0 on the camo daemon process, aka stdin, is connected to our
terminal.
So, stop that by denying the initscript our stdin in the first place.
This fixes the problem.
The Debian maintainer turns out to be "Zulip Debian Packaging Team",
at debian@zulip.com; so this package and its bugs are basically ours.
This is a tool that throws away `fsync` calls and other requests for
the system to sync files to disk. It may make the install faster; for
example, if it has to install a number of system packages, `dpkg` is
known to make a lot of `fsync` calls which slow things down
significantly. Conversely, if there's a power failure in the middle
of running a test install, we really don't mind if the test install's
data becomes corrupt.
When the install script is successful, one of the final things it
wants to do is to move the tree that Zulip was installed from into the
deployments directory. It can't do that, at least not in a naive way
with `mv`, if the tree is actually a mount point. So, stick the tree
inside some other directory that we create just for the purpose of
being the mount point and containing the install tree.
This saves several minutes off the install time. Sadly pip still
clones Git repos for dependencies that point to them, but for many
others (not all? not sure) it just gets a wheel from the cache.
The third-party `install-yarn.sh` script uses `curl`, and we invoke it
in `install-node`. So we need to install it as a dependency.
We've mostly gotten away with this because it's common for `curl` to
already be installed; but it isn't always.
This greatly simplifies iterating on changes to the installer and
associated code: just edit in the shared directory (or edit in your
worktree and rsync to the directory), and rerun.
With this change, the form with a directory is now really the main
way to run the script; the form accepting a tarball is really just
a convenience feature, unpacking the tarball and then proceeding with
that directory.
This will facilitate testing interesting installer features
using its own CLI.
On my laptop, with a recent base image (updated a few days ago with
`prepare-base`), it takes just 7 or 8 seconds to get to the installer
running, as timed by passing `--help` so that the installer promptly
exits.
In order to do development on the installer itself in a sane way,
we need a reasonably fast and automatic way to get a fresh environment
to try to run it in.
This calls for some form of virtualization. Choices include
* A public cloud, like EC2 or Digital Ocean. These could work, if we
wrote some suitable scripts against their APIs, to manage
appropriate base images (as AMIs or snapshots respectively) and to
start fresh instances/droplets from a base image. There'd be some
latency on starting a new VM, and this would also require the user
to have an account on the relevant cloud with API access to create
images and VMs.
* A local whole-machine VM system (hypervisor) like VirtualBox or
VMware, perhaps managing the configuration through Vagrant. These
hypervisors can be unstable and painfully slow. They're often the
only way to get development work done on a Mac or Windows machine,
which is why we use them there for the normal Zulip development
environment; but I don't really want to find out how their
instability scales when constantly spawning fresh VMs from an image.
* Containers. The new hotness, the name on everyone's lips, is Docker.
But Docker is not designed for virtualizing a traditional Unix server,
complete with its own init system and a fleet of processes with a
shared filesystem -- in other words, the platform Zulip's installer
and deployment system are for. Docker brings its own quite
different model of deployment, and someday we may port Zulip from
the traditional Unix server to the Docker-style deployment model,
but for testing our traditional-Unix-server deployment we need a
(virtualized) traditional Unix server.
* Containers, with LXC. LXC provides containers that function as
traditional Unix servers; because of the magic of containers, the
overhead is quite low, and LXC offers handy snapshotting features
so that we can quickly start up a fresh environment from a base
image. Running LXC does require a Linux base system. For
contributors whose local development machine isn't already Linux,
the same solutions are available as for our normal development
environment: the base system for running LXC could be e.g. a
Vagrant-managed VirtualBox VM, or a machine in a public cloud.
This commit adds a first version of such a thing, using LXC to manage
a base image plus a fresh container for each test run. The test
containers function as VMs: once installed, all the Zulip services run
normally in them and can be managed in the normal production ways.
This initial version has a shortage of usage messages or docs, and
likely has some sharp edges. It also requires familiarity with the
basics of LXC commands in order to make good use of the resulting
containers: `lxc-ls -f`, `lxc-attach`, `lxc-stop`, and `lxc-start`,
in particular.
Stripe Checkout means using JS code provided by Stripe to handle
almost all of the UI, which is great for us.
There are more features we should add to this page and changes we
should make, but this gives us an MVP.
[greg: expanded commit message; fixed import ordering and some types.]
This is fairly often -- though not always! -- failing, with a nasty
failure mode where it takes like 6 minutes to time out. See
discussion on #7748 (search for "bad link").
Actually, after seeing it happen just now when running
test-documentation on my laptop, on some other link, it occurs to me
that I've seen this before -- it's fairly common in Travis, too. It's
just that it doesn't actually cause the build to fail :-/, and on
Travis we haven't been paying as close attention to slow builds as we
are on Circle right now.
Generally stderr is the conventional place for this sort of running
commentary, and it's better set up for it: by default stdout may have
a buffer inside the process so that things written to it don't reach
the outside until later, while stderr is always by default unbuffered,
so messages are printed immediately.
Here, until the previous commit, because our color-reset sequence was
being printed without a following newline (with `echo -n`), it was
getting buffered; and then error messages from `scrapy` to stderr were
being erroneously painted with the color intended for the message
"Testing links in documentation...".
The autoreload code of Django works by looping over the files associated
with all the loaded modules. This loop is run after every 1 second. If
the file is found for the first time by the loop, it is assumed that the
file is new and is not modified between the time it is loaded and is
checked by the loop. This assumption is the source of a race condition.
We can either implement a more sensitive version of the loop or we can
just allow enough time to the Django loop to touch every file at least
once.
For the time being, we are going with the second option.
Previously, there were following problems with the implmentation:
* Same file handle was being used to read and write. We used to do
`seek(0)` and then `read()`. This had a chance to overwrite
file data. Now we use different file handles to read and write data.
* We were using text streams. Text streams cannot be used with
`bufferring=0`. Now we use binary streams without buffering so that
data is available for reading without any delay.
This commit also updates the key(s) that we search in the logfile.
Previously, launch of all queues was announced in the log, now we only
anounce the number of threads that were launched.
This commit also makes sure that we always exit after gracefull shutting
down the development server.
Previously, there were following problems with the implmentation:
* Same file handle was being used to read and write. We used to do
`seek(0)` and then `read()`. This had a chance to overwrite file
data. Now we use different file handles to read and write data.
* We were using text streams. Text streams cannot be used with
`bufferring=0`. Now we use binary streams without buffering so that
data is available for reading without any delay.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
We get the following error (edited slightly):
Dec 19 06:13:27 commit_messages| An error occurred while executing
'/usr/bin/git rev-list --max-count=-1 upstream/master..HEAD':
fatal: ambiguous argument 'upstream/master..HEAD':
unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
We'll need to adjust the remotes to make `upstream` mean what we expect.
This isn't really the right way to do this -- commit
dcd80e665 "travis/setup-backend: Remove the '--travis' flag"
took us in the wrong direction by introducing more magic, deeper in
the stack.
But it's the same way we do it for Travis. For now, just copy that.
[Thanks to hackerkid for cleaning up my original crude hack.]
At this point if we were accidentally using `/srv/zulip-venv` for
anything, we'd have run into it by now. So just drop the bit of
historical logic that we had to ensure that.
This reverts commit 66261f1cc. See parent commit for reason; here,
provision worked but `tools/run-dev.py` would give errors.
We need to figure out a test that reproduces these issues, then make a
version of these changes that keeps that test working, before we
re-merge them.
We should omit these for mypy. For most class definitions,
mypy doesn't need `Any`, and it provides no real useful info.
For clever monkeypatches, you should provide a more specific
type than `Any`.
These commands are super boring standard Docker commands,
so this probably isn't helpful for anyone who is familiar
with building Docker images... but I had to consult docs
to work out the right commands again today, so they'd help me.
Provision was failing at tools/setup/postgres-init-dev-db
with this in the log:
Sep 20 02:27:01 + sudo -i -u postgres psql ''
[sudo] password for circleci:
The issue is that the old version of this line (from Circle upstream)
only lets the `circleci` user sudo to root -- not to other users, or
not directly anyway -- because sudoers syntax is complicated. Fix it,
after studying `man sudoers`.
This is only an optimization -- if this list is missing anything,
we'll get to it in `provision` in the actual build. That's important,
because we want an existing image to work fine for testing new
versions of our codebase, including changes that may install more
packages in `provision`.
What this does accomplish is keeping provision's `apt-get install`
fast, by leaving it very little work to do.
The list comes from looking at the APT output during provision in an
actual run without this step, and leaving out two packages which
aren't available at this stage, because we get them from PPAs:
postgresql-9.3-pgroonga and postgresql-9.3-tsearch-extras.
Install `jq` with APT -- that's a lot simpler to read than this
explicit download.
And coalesce several commands, following Docker upstream's
recommendation and avoiding unnecessary overhead.
This is nearly the same as Circle's version, linked in the comment.
I've
* changed the FROM line to get Ubuntu,
* added a couple of distro packages to compensate, and
* revised the comments.
This seems to have been causing the travis production suite to fail.
It's a direct consequence of removing travis' giant library of apt
sources.list files; now that those are gone, there aren't copies of
all these extra packages available anyway.
This method was new in Tornado 4.0. It saves us from having to get
the time ourselves and do the arithmetic -- which not only makes the
code a bit shorter, but also easier to get right. Tornado docs (see
http://www.tornadoweb.org/en/stable/ioloop.html) say we should have
been getting the time from `ioloop.time()` rather than hardcoding
`time.time()`, because the loop could e.g. be running on the
`time.monotonic()` clock.
tools/setup/generate_zulip_bots_static_files now starts off by
deleting static/generated/bots/ (if it hasn't been removed already)
so that outdated static files from older versions of the zulip_bots
package don't supress errors in the main repo that would otherwise
break.
For more info, see #7542Fixes: #7542.
The `re.match` function in the Python stdlib is a trap for the unwary,
with surprising and asymmetrical semantics; we should probably add a
lint rule to ban it entirely. The docstring says:
> Try to apply the pattern at the start of the string, [...]
In other words, it effectively adds a `^` at the start (or `\A`, where
the distinction matters.) It's bad enough that this differs from what
grep, sed, perl, less, and every other tool I can think of do when
looking for matches to a regex; on top of that, it treats the
beginning of the string differently from the end, for no obvious
reason. The function that does what the rest of the world understands
by "match against this regex" is `re.search`.
In this case, it's unlikely that anyone intended for comments with
URLs, or `api_url` references, to miss out on their respective
exceptions to the long-line rule if they happen to start after the
first column. So fix those rules by just switching to `re.search`
with the same pattern.
I think Markdown URL references may have to start at the beginning of
the line, so I've left a `^` there to preserve -- but now make
explicit -- the `re.match` behavior.
Tweaked by tabbott to move changes from the next commit that are
required for this to pass tests into this commit.
Note that this exports a few items that were not previously exported.
This is part of our efforts to change our integrations/webhooks
docs to follow the same sort of numbered-list format as our /help
docs. In order to indicate that paragraphs separated by newlines
are part of the same numbered-list point, every paragraph must be
indented 4 spaces.
This both improves the comment to be more readable, and also uses the
new and improved exclude feature to limit the exclusion to just the
webhook fixtures (where it's needed).
Also fixes a mypy error.
The previous exclude rules only allowed excluding a directory (and
things in subdirectories would silently still be linted). Anyone
using this would expect it to exclude a directory tree, so we make it
do that.
The readthedocs theme overrides a few settings in their layout template.
We might want to change some settings back to their default values.
This commit copies the original readthedocs layout file from
https://github.com/rtfd/sphinx_rtd_theme/blob/master/sphinx_rtd_theme/layout.html
to _templates/layout.html, and excludes it from lint and template checks.
Addresses #7417.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak. While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.
Fixes#7095.
This commit modifies `test-locked-requirements` to use some caching
so that we don't need to use the `update-locked-requirements` tool
everytime for checking the validity of locked requirements as it is
slow.
Fixes: #6969.
This commit renames various source requirements files like `dev.txt`,
`mypy.txt` etc to `dev.in`, `mypy.in` etc and various locked requirements
files like `dev_lock.txt`, `mypy_lock.txt` etc to `dev.txt`, `mypy.txt`
etc. This will help in emphasizing to the user that *.in are actually
input to `update-locked-requirements` tool which should be run after
updating any of these.
This often can cause minor caching problems.
Obviously, it'd be better if we had access to the AST and thus could
do this rule for UserProfile objects in general.
This prevents the caches in /srv from growing to fill up the disk --
e.g., on my laptop after 6 months of regular development the venv cache
was 12G and the NPM cache 5G, making them by far the largest disk hogs
on the machine.
It costs about 0.4s, apart from any time spent actually removing
things. This is a little annoyingly slow to be adding to every
provision, and seems like it could be optimized, but I think already
worth it as is.
We already do this by default in tools/build-docs, but since we
migrated test-documentation to not run that directly (to disable
collapsing), we need to add the recent parallelism fix here too.
It saves about 5-10s when running this test suite for me, which is
good, but definitely leaves me feeling like there could be more
improvement.
In this commit we add new dependencies needed for running thumbor.
Also we add the script for creating the virtual environment ready
for thumbor.
Note: Thumbor will use python2 and thus have different virtualenv
dedicated to it.
Credits to @TigorC and @joshland as well for there work on this.
In this commit we add a new option which could be used to specify
python version. When 'py2' is specified, future/futures are not
removed from the requirements lock file generated.
Even where this is actually used for a temporary checkout, it obscures
the relationship between this and $TMPDIR -- and some of our logic
depends on that. In other places, it isn't actually even a checkout.
In all cases, the expanded version is clearer.
This script, and tools/update-prod-static which it relies on,
have kept getting more complex since this conditional was added
in 2013, and the places we rely on GNU features have probably
multiplied beyond `mktemp -d`. It's unlikely this works on
macOS with BSD tools now, and it'd be hard to maintain that way
if it did; drop the pretense.
There's no need to remove this file here -- the whole tree will be
removed a few commands later, and the `tar` command we do first, to
supplement our tarball with various generated files, is quite
selective and wouldn't look at this file anyway.
With the new portico work we've done, the help documentation does
sorta depend on the database if you're logged in. So it's best to
just require it for these tests.
This commit helps reduce clutter on the navigation sidebar.
Creates new directories and moves relevant files into them.
Modifies index.rst, symlinks, and image paths accordingly.
This commit also enables expandable/collapsible navigation items,
renames files in docs/development and docs/production,
modifies /tools/test-documentation so that it overrides a theme setting,
Also updates links to other docs, file paths in the codebase that point
to developer documents, and files that should be excluded from lint tests.
Note that this commit does not update direct links to
zulip.readthedocs.io in the codebase; those will be resolved in an
upcoming follow-up commit (it'll be easier to verify all the links
once this is merged and ReadTheDocs is updated).
Fixes#5265.
The CSS linter was pretty hard to reason about. It was
pretty flexible about certain things, but then it would
prevent seemingly innocuous code from getting checked in.
This commit overhauls the pretty-printer to be more composable,
where every object in the AST knows how to render itself. It
also cleans up a little bit of the pre_fluff/post_fluff logic
in the parser itself, so comments are more likely to be "attached"
to the AST node that make sense.
The linter is actually a bit more finicky about newlines, but
this is mostly a good thing, as most of the variations before
this commit were pretty arbitrary.
This adds the "--disallow-any=generics" option to run-mypy, which no
longer permits:
- inheriting from "list"; use "List[sometype]" (or a TypeVar)
- generic types with no following square brackets specifying the type
(even if initially 'Any')
Any (and '...' for Callable) is a lot easier to search for than an
absence of square brackets, and should improve overall typing quality.
In addition to decreasing the excessive number of bundles we had, this
will set us up to fix rendering of code blocks when clicking the
sidebar links in the /api-new site.
This commit allows for the /api-new/ page to rendered similarly to our
/help pages. It's based on the old content for /api, but we're not
replacing the old content yet, to give a bit of time to restructure
things reasonably.
Tweaked by eeshangarg and tabbott.
The "subdomain" label is redundant, to the extent it's even
accurate -- this is really just the URL we want to display,
which may or may not involve a subdomain. Similarly "external".
The former `external_api_path_subdomain` was never a path -- it's a
host, followed by a path, which together form a scheme-relative URL.
I'm not quite convinced that value is actually the right thing in
2 of the 3 places we use it, but fixing that can start by giving an
accurate name to the thing we have.
I'd much rather see something like
if (thing_is_permissible(user, thing)
or (user_possesses_hammer(user)
and glass_break_requested(thing))):
than
if (thing_is_permissible(user, thing) or
(user_possesses_hammer(user) and
glass_break_requested(thing))):
because the former makes the overall logic much easier to scan.
Similarly for a formula full of arithmetic rather than Boolean
operators. And the actual PEP 8 agrees (though until 2016 it
unfortunately had the opposite advice.)
The upstream linter still applies the backward rule, so disable that.
This creates a dropdown in place of the normal register/login links
you get when logged out, with an option to go to the app or log out if
that appears you click on the avatar.
A bit more work is needed to make this look really good, but it's a
great start.
Sparkle was the auto-update system used by the legacy desktop app. We
haven't been capable of using it for auto-update in years, so there's
no reason to keep around the configuration.
The new Electron app uses a different system anyway.
Except in:
- docs/writing-bots-guide.md, because bots are supposed to be Python 2
compatible
- puppet/zulip_ops/files/zulip-ec2-configure-interfaces, because this
script is still on python2.7
- tools/lint
- tools/linter_lib
- tools/lister.py
For the latter two, because they might be yanked away to a separate repo
for general use with other FLOSS projects.
This should mean that maintaining two Zulip development environments
using the same Git checkout no longer has caching problems keeping
track of the migration status.
This didn't work at all when one did a `vagrant destroy` and then
`vagrant up`, because the cache state would be preserved even though
the machine is gone.
Fixes#5981.
This commit adds a test to check if the user forgot to run
`tools/update-locked-requirements` after updating dependencies.
Modified by tabbott to disable it by default, since it takes over a
minute to run.
Fixes: #6324.
Unfortunately, GitHub's web UI for generating release tarballs uses
`.gitattributes` to control what files to download, and thus if you
downloaded a source tarball for older Zulip versions using the GitHub
web UI, you'd be missing important files.
We fix this for future releases by moving the blacklist out of
.gitattributes.
Fixes#129.
It appears the mongodb repo is not accessible by Travis CI right now.
This is sadly our problem, because Travis puts a bunch of crap in
their apt `sources.list` file, so `apt-get update` starts failing.
This function was extracted from build_user_sidebar(). We
also slightly streamlined it to not unnecessarily call
filter() when the filter text was blank. This extraction
also eliminated the need for us to have the two-line
filter_and_sort() function.
Also, we get to 100% coverage in this commit.
This lint rule has bitten me a couple of times in working on logging.
These regex rules will inevitably be heuristic, but we can make it a bit more
specific so that the heuristic mainly means it could occasionally miss
something, rather than get in the way with an obviously wrong complaint.
This has a ton of exclude rules, for two reasons:
(1) We haven't been particularly systematic about avoiding unnecessary
inline style in the past, so there's a lot of code we need to fix.
(2) There are cases where one wants to dynamically compute style
rules. For the latter category, ideally we'd figure out a way to
exclude these automatically (e.g. checking for mustache tags in the
style tag).
We just learned we should be using the "onlytranslated" mode of
Transifex. Since the command is getting a bit complex (and you need
to remember to run `makemessages` first), it makes sense to have a
tool for it.
Emojis which are represented by a sequence of codepoints or emojis
with ZWJ are not included until we implement a mechanism for dealing
with their unicode versions.
Fixes: #6279.
While running the mypy script we were not passing the `--force`
argument correctly to the run-mypy which was causing it complain
about provision status.
Tweaked by tabbott to not remove it from lister.py, linter_lib, and
friends, since those are intended to support both Python 2 and 3
(we're planning to extract them from the repository).
This hack was used to fix the broken number emojis in the emoji picker.
It was broken because of the partial migration to the iamcal dataset.
See issue #4775 for more details.
This hack was used to fix the broken flag emojis in emoji-picker.
It was broken due to the incomplete migration to iamcal dataset.
See issue #4775 for more details.
This commit switches to use sprite sheets for rendering emojis
in all the remaining places, i.e., message bodies and composebox
typeahead. This commit also includes some changes to notifications.py
file so that the spans used for rendering emojis can be converted
to corresponding image tags so that we don't break the emoji rendering
in missed message emails since we can't use sprite sheets there.
As part of switching the bugdown system to use sprite sheets, we need
to switch the name_to_codepoint mappings to match the new sprite
sheets. This has the side effect of fixing a bunch of emoji like
numbers and flag emoji in the emoji pickers.
Fixes: #3895.
Fixes: #3972.
These are long enough to still be self-explanatory (the only one I'm
at all in doubt about there is DEBG; I avoided "DBUG" because it reads
"BUG" which suggests a high-priority message, and those are the
opposite of that), while saving a good bit of horizontal space
vs. padding everything to the 8 characters of "CRITICAL".
Also add a linter exception to allow easy-to-read alignment here,
similar to several existing exceptions for other alignment cases.
This doesn't yet do much, but it gives us a suitable place to
add code to customize how log messages are displayed, beyond what
a format string passed to the default formatter can do.
Printing the version in Travis builds will help in debugging when we
get different results there from locally. The new `--version` path
also gives us a handy place to put the "what mypy command are we running"
diagnostic, getting it out of the way of normal interactive use.
Most CLI tools (including GNU tools and Mercurial) use lowercase
sentence fragments, with no period, for option glosses, so we
follow that style. Also make the voice and wording consistent,
and the quote type in the Python source.
This exclusion was getting snuck in at the end bypassing --all (and so
giving the lie to the --help documentation). There is no "stubs"
directory in the tree in any case.
argparse has reasonable default behavior for the `dest` argument,
and for `default` at least in these two obvious cases.
So cut those out where we're just doing the default thing.
Also rewrap a couple of calls to fit at least in 100 columns.