mirror of https://github.com/zulip/zulip.git
187 lines
9.0 KiB
Markdown
187 lines
9.0 KiB
Markdown
# Emoji
|
|
|
|
Emoji seem like a simple idea, but there's actually a ton of
|
|
complexity that goes into an effective emoji implementation. This
|
|
document discusses a number of these issues.
|
|
|
|
Currently, Zulip supports these four display formats for emoji:
|
|
|
|
* Google modern
|
|
* Google classic
|
|
* Twitter
|
|
* Plain text
|
|
|
|
## Emoji codes
|
|
|
|
The Unicode standard has various ranges of characters set aside for
|
|
emoji. So you can put emoji in your terminal using actual Unicode
|
|
characters like 😀 and 👍. If you paste those into Zulip, Zulip will
|
|
render them as the corresponding emoji image.
|
|
|
|
However, the Unicode committee did not standardize on a set of
|
|
human-readable names for emoji. So, for example, when using the
|
|
popular `:` based style for entering emoji from the keyboard, we have
|
|
to decide whether to use `:angry:` or `:angry_face:` to represent an
|
|
angry face. Different products use different approaches, but for
|
|
purposes like emoji pickers or autocomplete, you definitely want to
|
|
pick exactly one of these names, since otherwise users will always be
|
|
seeing duplicates of a given emoji next to each other.
|
|
|
|
Picking which emoji name to use is surprisingly complicated! See the
|
|
section on [picking emoji names](#picking-emoji-names) below.
|
|
|
|
### Custom emoji
|
|
|
|
Zulip supports custom user-uploaded emoji. We manage those by having
|
|
the name of the emoji be its "emoji code", and using an emoji_type
|
|
field to keep track of it. We are in the progress of migrating Zulip
|
|
to refer to these emoji only by ID, which is a requirement for being
|
|
able to support deprecating old realm emoji in a sensible way.
|
|
|
|
## Tooling
|
|
|
|
We use the [iamcal emoji data package][iamcal] to provide sprite
|
|
sheets and individual images for our emoji, as well as a data set of
|
|
emoji categories, code points, etc. The sprite sheets are used
|
|
by the Zulip web app to display emoji in messages, emoji reactions,
|
|
etc. However, we can't use the sprite sheets in some contexts, such
|
|
as missed-message and digest emails, that need to have self-contained
|
|
assets. For those, we use individual emoji files under
|
|
`static/generated/emoji`. The structure of that repository contains
|
|
both files named after the Unicode representation of emoji (as actual
|
|
image files) as well as symlinks pointing to those emoji.
|
|
|
|
We need to maintain those both for the names used in the iamcal emoji
|
|
data set as well as our old emoji data set (`emoji_map.json`). Zulip
|
|
has a tool, `tools/setup/emoji/build_emoji`, that combines the
|
|
`emoji.json` file from iamcal with the old `emoji_map.json` data set
|
|
to construct the various symlink farms and output files described
|
|
below that support our emoji experience.
|
|
|
|
The `build_emoji` tool generates the set of files under
|
|
`static/generated/emoji` (or really, it generates the
|
|
`/srv/zulip-emoji-cache/<sha1>/emoji` tree, and
|
|
`static/generated/emoji` is a symlink to that tree; we do this in
|
|
order to cache old versions to make provisioning and production
|
|
deployments super fast in the common case that we haven't changed the
|
|
emoji tooling). See [our dependencies document](../subsystems/dependencies.md)
|
|
for more details on this strategy.
|
|
|
|
The emoji tree generated by this process contains several import elements:
|
|
* `emoji_codes.json`: A set of mappings used by the Zulip frontend to
|
|
understand what Unicode emoji exist and what their shortnames are,
|
|
used for autocomplete, emoji pickers, etc. This has been
|
|
deduplicated using the logic in
|
|
`tools/setup/emoji/emoji_setup_utils.py` to generally only have
|
|
`:angry:` and not also `:angry_face:`, since having both is ugly and
|
|
pointless for purposes like autocomplete and emoji pickers.
|
|
* `images/emoji/unicode/*.png`: A farm of emoji
|
|
* `images/emoji/*.png`: A farm of symlinks from emoji names to the
|
|
`images/emoji/unicode/` tree. This is used to serve individual emoji
|
|
images, as well as for the
|
|
[backend Markdown processor](../subsystems/markdown.md) to know which emoji
|
|
names exist and what Unicode emoji / images they map to. In this
|
|
tree, we currently include all of the emoji in `emoji-map.json`;
|
|
this means that if you send `:angry_face:`, it won't autocomplete,
|
|
but will still work (but not in previews).
|
|
* Some CSS and PNGs for the emoji spritesheets, used in Zulip for
|
|
emoji pickers where we would otherwise need to download over 1000 of
|
|
individual emoji images (which would cause a browser performance
|
|
problem). We have multiple spritesheets: one for each emoji
|
|
provider that we support (Google, Twitter, EmojiOne, and Apple.).
|
|
|
|
[iamcal]: https://github.com/iamcal/emoji-data
|
|
|
|
## Picking emoji names
|
|
|
|
I think it is fair to say Zulip has by far the best set of emoji names of
|
|
any product at the time of the writing of this document. If you find an
|
|
emoji name you don't like, or think is missing, please let us know!
|
|
|
|
The following set of considerations is not comprehensive, but has a few
|
|
principles that were applied to the current set of names. We use (strong),
|
|
(medium), and (weak) denote how strong a consideration it is.
|
|
|
|
* Even with over 1000 symbols, emoji feels surprisingly sparse as a language,
|
|
and more often than not, if you search for something, you don't find an
|
|
appropriate emoji for it. So a primary goal for our set of names is to
|
|
maximize the number of situations in which the user finds an emoji that
|
|
feels appropriate. (strong)
|
|
|
|
* Conversely, we remove generic words that will gum up the typeahead. So
|
|
`:outbox:` instead of `:outbox_tray:`. Each word should count. (medium)
|
|
|
|
* We aim for the set of names to be as widely culturally applicable as
|
|
possible, even if the glyphs are not. So `:statue:` instead of
|
|
`:new_york:` for the statue of liberty, and `:tower:` instead of
|
|
`:tokyo_tower:`. (strong)
|
|
|
|
* We remove unnecessary gender descriptions. So `:ok_signal:` instead of
|
|
`:ok_woman:`. (strong)
|
|
|
|
* We don't add names that could be inappropriate in school or work
|
|
environments, even if the use is common on the internet. For example, we
|
|
have not added `:butt:` for `:peach:`, or `:cheers:` for
|
|
`:beers:`. (strong)
|
|
|
|
* Names should be compatible with the four emoji sets we support, but don't
|
|
have to be compatible with any other emoji set. (medium)
|
|
|
|
* We try not to use a creative canonical_name for emoji that are likely to
|
|
be familiar to a large subset of users. This largely applies to certain
|
|
faces. (medium)
|
|
|
|
* The set of names should be compatible with the iamcal, gemoji, and Unicode
|
|
names. Compatible here means that if there is an emoji name a user knows
|
|
from one of those sets, and the user searches for the key word of that
|
|
name, they will get an emoji in our set. It is okay if this emoji has a
|
|
slightly different name or codepoint from the names/codepoints in the
|
|
other sets. (weak)
|
|
|
|
Much of the work of picking names went into the first bullet above: making
|
|
the emoji language less sparse. Some tricks and heuristics that were used
|
|
for that:
|
|
|
|
* There are many near duplicates, like `:dog:` and `:dog_face:`, or
|
|
`:mailbox:`, `:mailbox_with_mail:`, and `:mailbox_with_no_mail:`. In these
|
|
cases we repurpose the duplicates to be as useful as we can, like `:dog:`
|
|
and `:puppy:`, and `:mailbox:`, `:unread_mail:`, `:inbox_zero:` for the
|
|
ones above. There isn't a ton of flexibility, since we can't change the
|
|
glyphs. But in most cases we have been able to come up with something.
|
|
|
|
* Many emoji have commonly understood meanings among people that use emoji a
|
|
lot, and there are websites and articles that document some of these
|
|
meanings. A commonly understood meaning can be a great thing to add as an
|
|
alternate name, since often it is a sign that the meaning is addressing a
|
|
real gap in the emoji system.
|
|
|
|
* Many emoji names are unnecessarily specific in iamcal/etc, like
|
|
`:flower_playing_cards:`, `:izakaya_lantern:`, or `:amphora:`. Renaming
|
|
them to `:playing_cards:`, `:lantern:`, and `:vase:` makes them more
|
|
widely usable. In such cases we often keep the specific name as an
|
|
alternate.
|
|
|
|
* If there are natural things someone might type, like `:happy:`, we try to
|
|
find an emoji to match. This extends to things that someone might not
|
|
think to type, but as soon as someone in the organization discovers it it
|
|
could get wide use, like `:working_on_it:`. Good future work would be to
|
|
collect (by survey or tooling) things people type into the emoji picker
|
|
typeahead on chat.zulip.org, and find ways to add those names as
|
|
alternates.
|
|
|
|
Other notes
|
|
|
|
* Occasionally there are near duplicates where we don't have ideas for
|
|
useful names for the second one. In that case we sometimes remove the
|
|
emoji rather than have two nearly identical glyphs in the emoji picker and
|
|
typeahead. For instance, we kept `:spiral_notepad:` and dropped
|
|
`:spiral_calendar_pad:`. If the concepts are near duplicates but the sets
|
|
of glyphs look very different, we'll find two names that allow them both
|
|
to stay.
|
|
|
|
* We removed many of the moons and clocks, to make the typeahead experience
|
|
better when searching for something that catches all the moons or all the
|
|
clocks. We kept all the squares and diamonds and other shapes, even though
|
|
they have the same problem, since they are commonly used to make emoji art
|
|
on Twitter, and could conceivably be used the same way on Zulip.
|