zulip/docs/subsystems/emoji.md

# Emoji

Emoji seem like a simple idea, but there's actually a ton of
complexity that goes into an effective emoji implementation.  This
document discusses a number of these issues.

Currently, Zulip supports these four display formats for emoji:

* Google modern
* Google classic
* Twitter
* Plain text

## Emoji codes

The Unicode standard has various ranges of characters set aside for
emoji.  So you can put emoji in your terminal using actual Unicode
characters like 😀  and 👍.  If you paste those into Zulip, Zulip will
render them as the corresponding emoji image.

However, the Unicode committee did not standardize on a set of
human-readable names for emoji.  So, for example, when using the
popular `:` based style for entering emoji from the keyboard, we have
to decide whether to use `:angry:` or `:angry_face:` to represent an
angry face.  Different products use different approaches, but for
purposes like emoji pickers or autocomplete, you definitely want to
pick exactly one of these names, since otherwise users will always be
seeing duplicates of a given emoji next to each other.

Picking which emoji name to use is surprisingly complicated! See the
section on [picking emoji names](#picking-emoji-names) below.

### Custom emoji

Zulip supports custom user-uploaded emoji.  We manage those by having
the name of the emoji be its "emoji code", and using an emoji_type
field to keep track of it.  We are in the progress of migrating Zulip
to refer to these emoji only by ID, which is a requirement for being
able to support deprecating old realm emoji in a sensible way.

## Tooling

We use the [iamcal emoji data package][iamcal] to provide sprite
sheets and individual images for our emoji, as well as a data set of
emoji categories, code points, etc.  The sprite sheets are used
by the Zulip webapp to display emoji in messages, emoji reactions,
etc.  However, we can't use the sprite sheets in some contexts, such
as missed-message and digest emails, that need to have self-contained
assets.  For those, we use individual emoji files under
`static/generated/emoji`.  The structure of that repository contains
both files named after the Unicode representation of emoji (as actual
image files) as well as symlinks pointing to those emoji.

We need to maintain those both for the names used in the iamcal emoji
data set as well as our old emoji data set (`emoji_map.json`).  Zulip
has a tool, `tools/setup/emoji/build_emoji`, that combines the
`emoji.json` file from iamcal with the old `emoji_map.json` data set
to construct the various symlink farms and output files described
below that support our emoji experience.

The `build_emoji` tool generates the set of files under
`static/generated/emoji` (or really, it generates the
`/srv/zulip-emoji-cache/<sha1>/emoji` tree, and
`static/generated/emoji` is a symlink to that tree; we do this in
order to cache old versions to make provisioning and production
deployments super fast in the common case that we haven't changed the
emoji tooling).  See [our dependencies document](../subsystems/dependencies.md)
for more details on this strategy.

The emoji tree generated by this process contains several import elements:
* `emoji_codes.json`: A set of mappings used by the Zulip frontend to
  understand what Unicode emoji exist and what their shortnames are,
  used for autocomplete, emoji pickers, etc.  This has been
  deduplicated using the logic in
  `tools/setup/emoji/emoji_setup_utils.py` to generally only have
  `:angry:` and not also `:angry_face:`, since having both is ugly and
  pointless for purposes like autocomplete and emoji pickers.
* `images/emoji/unicode/*.png`: A farm of emoji
* `images/emoji/*.png`: A farm of symlinks from emoji names to the
  `images/emoji/unicode/` tree.  This is used to serve individual emoji
  images, as well as for the
  [backend Markdown processor](../subsystems/markdown.md) to know which emoji
  names exist and what Unicode emoji / images they map to.  In this
  tree, we currently include all of the emoji in `emoji-map.json`;
  this means that if you send `:angry_face:`, it won't autocomplete,
  but will still work (but not in previews).
* Some CSS and PNGs for the emoji spritesheets, used in Zulip for
  emoji pickers where we would otherwise need to download over 1000 of
  individual emoji images (which would cause a browser performance
  problem).  We have multiple spritesheets: one for each emoji
  provider that we support (Google, Twitter, EmojiOne, and Apple.).

[iamcal]: https://github.com/iamcal/emoji-data

## Picking emoji names

I think it is fair to say Zulip has by far the best set of emoji names of
any product at the time of the writing of this document. If you find an
emoji name you don't like, or think is missing, please let us know!

The following set of considerations is not comprehensive, but has a few
principles that were applied to the current set of names. We use (strong),
(medium), and (weak) denote how strong a consideration it is.

* Even with over 1000 symbols, emoji feels surprisingly sparse as a language,
  and more often than not, if you search for something, you don't find an
  appropriate emoji for it. So a primary goal for our set of names is to
  maximize the number of situations in which the user finds an emoji that
  feels appropriate. (strong)

* Conversely, we remove generic words that will gum up the typeahead. So
  `:outbox:` instead of `:outbox_tray:`. Each word should count. (medium)

* We aim for the set of names to be as widely culturally applicable as
  possible, even if the glyphs are not. So `:statue:` instead of
  `:new_york:` for the statue of liberty, and `:tower:` instead of
  `:tokyo_tower:`. (strong)

* We remove unnecessary gender descriptions. So `:ok_signal:` instead of
  `:ok_woman:`. (strong)

* We don't add names that could be inappropriate in school or work
  environments, even if the use is common on the internet. For example, we
  have not added `:butt:` for `:peach:`, or `:cheers:` for
  `:beers:`. (strong)

* Names should be compatible with the four emoji sets we support, but don't
  have to be compatible with any other emoji set. (medium)

* We try not to use a creative canonical_name for emoji that are likely to
  be familiar to a large subset of users. This largely applies to certain
  faces. (medium)

* The set of names should be compatible with the iamcal, gemoji, and Unicode
  names. Compatible here means that if there is an emoji name a user knows
  from one of those sets, and the user searches for the key word of that
  name, they will get an emoji in our set. It is okay if this emoji has a
  slightly different name or codepoint from the names/codepoints in the
  other sets. (weak)

Much of the work of picking names went into the first bullet above: making
the emoji language less sparse. Some tricks and heuristics that were used
for that:

* There are many near duplicates, like `:dog:` and `:dog_face:`, or
  `:mailbox:`, `:mailbox_with_mail:`, and `:mailbox_with_no_mail:`. In these
  cases we repurpose the duplicates to be as useful as we can, like `:dog:`
  and `:puppy:`, and `:mailbox:`, `:unread_mail:`, `:inbox_zero:` for the
  ones above. There isn't a ton of flexibility, since we can't change the
  glyphs. But in most cases we have been able to come up with something.

* Many emoji have commonly understood meanings among people that use emoji a
  lot, and there are websites and articles that document some of these
  meanings. A commonly understood meaning can be a great thing to add as an
  alternate name, since often it is a sign that the meaning is addressing a
  real gap in the emoji system.

* Many emoji names are unnecessarily specific in iamcal/etc, like
  `:flower_playing_cards:`, `:izakaya_lantern:`, or `:amphora:`. Renaming
  them to `:playing_cards:`, `:lantern:`, and `:vase:` makes them more
  widely usable. In such cases we often keep the specific name as an
  alternate.

* If there are natural things someone might type, like `:happy:`, we try to
  find an emoji to match. This extends to things that someone might not
  think to type, but as soon as someone in the organization discovers it it
  could get wide use, like `:working_on_it:`. Good future work would be to
  collect (by survey or tooling) things people type into the emoji picker
  typeahead on chat.zulip.org, and find ways to add those names as
  alternates.

Other notes

* Occasionally there are near duplicates where we don't have ideas for
  useful names for the second one. In that case we sometimes remove the
  emoji rather than have two nearly identical glyphs in the emoji picker and
  typeahead. For instance, we kept `:spiral_notepad:` and dropped
  `:spiral_calendar_pad:`. If the concepts are near duplicates but the sets
  of glyphs look very different, we'll find two names that allow them both
  to stay.

* We removed many of the moons and clocks, to make the typeahead experience
  better when searching for something that catches all the moons or all the
  clocks. We kept all the squares and diamonds and other shapes, even though
  they have the same problem, since they are commonly used to make emoji art
  on Twitter, and could conceivably be used the same way on Zulip.