zulip/docs/subsystems/emoji.md

# Emoji

Emoji seem like a simple idea, but there's actually a ton of
complexity that goes into an effective emoji implementation.  This
document discusses a number of these issues.

Currently, Zulip supports these four display formats for emoji:

* Google modern
* Google classic
* Twitter
* Plain text

## Emoji codes

The Unicode standard has various ranges of characters set aside for
emoji.  So you can put emoji in your terminal using actual Unicode
characters like 😀  and 👍.  If you paste those into Zulip, Zulip will
render them as the corresponding emoji image.

However, the Unicode committee did not standardize on a set of
human-readable names for emoji.  So, for example, when using the
popular `:` based style for entering emoji from the keyboard, we have
to decide whether to use `:angry:` or `:angry_face:` to represent an
angry face.  Different products use different approaches, but for
purposes like emoji pickers or autocomplete, you definitely want to
pick exactly one of these names, since otherwise users will always be
seeing duplicates of a given emoji next to each other.

Picking which emoji name to use is surprisingly complicated! See the
section on [picking emoji names](#picking-emoji-names) below.

### Custom emoji

Zulip supports custom user-uploaded emoji.  We manage those by having
the name of the emoji be its "emoji code", and using an emoji_type
field to keep track of it.  We are in the progress of migrating Zulip
to refer to these emoji only by ID, which is a requirement for being
able to support deprecating old realm emoji in a sensible way.

## Tooling

We use the [iamcal emoji data package][iamcal] to provide sprite
sheets and individual images for our emoji, as well as a data set of
emoji categories, code points, etc.  The sprite sheets are used
by the Zulip webapp to display emoji in messages, emoji reactions,
etc.  However, we can't use the sprite sheets in some contexts, such
as missed-message and digest emails, that need to have self-contained
assets.  For those, we use individual emoji files under
`static/generated/emoji`.  The structure of that repository contains
both files named after the Unicode representation of emoji (as actual
image files) as well as symlinks pointing to those emoji.

We need to maintain those both for the names used in the iamcal emoji
data set as well as our old emoji data set (`emoji_map.json`).  Zulip
has a tool, `tools/setup/emoji/build_emoji`, that combines the
`emoji.json` file from iamcal with the old `emoji_map.json` data set
to construct the various symlink farms and output files described
below that support our emoji experience.

The `build_emoji` tool generates the set of files under
`static/generated/emoji` (or really, it generates the
`/srv/zulip-emoji-cache/<sha1>/emoji` tree, and
`static/generated/emoji` is a symlink to that tree; we do this in
order to cache old versions to make provisioning and production
deployments super fast in the common case that we haven't changed the
emoji tooling).  See [our dependencies document](../subsystems/dependencies.md)
for more details on this strategy.

The emoji tree generated by this process contains several import elements:
* `emoji_codes.json`: A set of mappings used by the Zulip frontend to
  understand what Unicode emoji exist and what their shortnames are,
  used for autocomplete, emoji pickers, etc.  This has been
  deduplicated using the logic in
  `tools/setup/emoji/emoji_setup_utils.py` to generally only have
  `:angry:` and not also `:angry_face:`, since having both is ugly and
  pointless for purposes like autocomplete and emoji pickers.
* `images/emoji/unicode/*.png`: A farm of emoji
* `images/emoji/*.png`: A farm of symlinks from emoji names to the
  `images/emoji/unicode/` tree.  This is used to serve individual emoji
  images, as well as for the
  [backend Markdown processor](../subsystems/markdown.md) to know which emoji
  names exist and what Unicode emoji / images they map to.  In this
  tree, we currently include all of the emoji in `emoji-map.json`;
  this means that if you send `:angry_face:`, it won't autocomplete,
  but will still work (but not in previews).
* Some CSS and PNGs for the emoji spritesheets, used in Zulip for
  emoji pickers where we would otherwise need to download over 1000 of
  individual emoji images (which would cause a browser performance
  problem).  We have multiple spritesheets: one for each emoji
  provider that we support (Google, Twitter, EmojiOne, and Apple.).

[iamcal]: https://github.com/iamcal/emoji-data

## Picking emoji names

I think it is fair to say Zulip has by far the best set of emoji names of
any product at the time of the writing of this document. If you find an
emoji name you don't like, or think is missing, please let us know!

The following set of considerations is not comprehensive, but has a few
principles that were applied to the current set of names. We use (strong),
(medium), and (weak) denote how strong a consideration it is.

* Even with over 1000 symbols, emoji feels surprisingly sparse as a language,
  and more often than not, if you search for something, you don't find an
  appropriate emoji for it. So a primary goal for our set of names is to
  maximize the number of situations in which the user finds an emoji that
  feels appropriate. (strong)

* Conversely, we remove generic words that will gum up the typeahead. So
  `:outbox:` instead of `:outbox_tray:`. Each word should count. (medium)

* We aim for the set of names to be as widely culturally applicable as
  possible, even if the glyphs are not. So `:statue:` instead of
  `:new_york:` for the statue of liberty, and `:tower:` instead of
  `:tokyo_tower:`. (strong)

* We remove unnecessary gender descriptions. So `:ok_signal:` instead of
  `:ok_woman:`. (strong)

* We don't add names that could be inappropriate in school or work
  environments, even if the use is common on the internet. For example, we
  have not added `:butt:` for `:peach:`, or `:cheers:` for
  `:beers:`. (strong)

* Names should be compatible with the four emoji sets we support, but don't
  have to be compatible with any other emoji set. (medium)

* We try not to use a creative canonical_name for emoji that are likely to
  be familiar to a large subset of users. This largely applies to certain
  faces. (medium)

* The set of names should be compatible with the iamcal, gemoji, and Unicode
  names. Compatible here means that if there is an emoji name a user knows
  from one of those sets, and the user searches for the key word of that
  name, they will get an emoji in our set. It is okay if this emoji has a
  slightly different name or codepoint from the names/codepoints in the
  other sets. (weak)

Much of the work of picking names went into the first bullet above: making
the emoji language less sparse. Some tricks and heuristics that were used
for that:

* There are many near duplicates, like `:dog:` and `:dog_face:`, or
  `:mailbox:`, `:mailbox_with_mail:`, and `:mailbox_with_no_mail:`. In these
  cases we repurpose the duplicates to be as useful as we can, like `:dog:`
  and `:puppy:`, and `:mailbox:`, `:unread_mail:`, `:inbox_zero:` for the
  ones above. There isn't a ton of flexibility, since we can't change the
  glyphs. But in most cases we have been able to come up with something.

* Many emoji have commonly understood meanings among people that use emoji a
  lot, and there are websites and articles that document some of these
  meanings. A commonly understood meaning can be a great thing to add as an
  alternate name, since often it is a sign that the meaning is addressing a
  real gap in the emoji system.

* Many emoji names are unnecessarily specific in iamcal/etc, like
  `:flower_playing_cards:`, `:izakaya_lantern:`, or `:amphora:`. Renaming
  them to `:playing_cards:`, `:lantern:`, and `:vase:` makes them more
  widely usable. In such cases we often keep the specific name as an
  alternate.

* If there are natural things someone might type, like `:happy:`, we try to
  find an emoji to match. This extends to things that someone might not
  think to type, but as soon as someone in the organization discovers it it
  could get wide use, like `:working_on_it:`. Good future work would be to
  collect (by survey or tooling) things people type into the emoji picker
  typeahead on chat.zulip.org, and find ways to add those names as
  alternates.

Other notes

* Occasionally there are near duplicates where we don't have ideas for
  useful names for the second one. In that case we sometimes remove the
  emoji rather than have two nearly identical glyphs in the emoji picker and
  typeahead. For instance, we kept `:spiral_notepad:` and dropped
  `:spiral_calendar_pad:`. If the concepts are near duplicates but the sets
  of glyphs look very different, we'll find two names that allow them both
  to stay.

* We removed many of the moons and clocks, to make the typeahead experience
  better when searching for something that catches all the moons or all the
  clocks. We kept all the squares and diamonds and other shapes, even though
  they have the same problem, since they are commonly used to make emoji art
  on Twitter, and could conceivably be used the same way on Zulip.
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00			`# Emoji`

			`Emoji seem like a simple idea, but there's actually a ton of`
			`complexity that goes into an effective emoji implementation. This`
			`document discusses a number of these issues.`

docs: List four types of emojis. 2020-05-17 14:41:17 +02:00			`Currently, Zulip supports these four display formats for emoji:`

			`* Google modern`
			`* Google classic`
			`* Twitter`
			`* Plain text`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00
			`## Emoji codes`

			`The Unicode standard has various ranges of characters set aside for`
docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-10-23 02:43:28 +02:00			`emoji. So you can put emoji in your terminal using actual Unicode`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00			`characters like 😀 and 👍. If you paste those into Zulip, Zulip will`
			`render them as the corresponding emoji image.`

			`However, the Unicode committee did not standardize on a set of`
			`human-readable names for emoji. So, for example, when using the`
			popular `:` based style for entering emoji from the keyboard, we have
			to decide whether to use `:angry:` or `:angry_face:` to represent an
			`angry face. Different products use different approaches, but for`
			`purposes like emoji pickers or autocomplete, you definitely want to`
			`pick exactly one of these names, since otherwise users will always be`
			`seeing duplicates of a given emoji next to each other.`

docs: Update emoji.md to explain new naming scheme. 2017-10-03 07:22:52 +02:00			`Picking which emoji name to use is surprisingly complicated! See the`
			`section on [picking emoji names](#picking-emoji-names) below.`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00
			`### Custom emoji`

			`Zulip supports custom user-uploaded emoji. We manage those by having`
			`the name of the emoji be its "emoji code", and using an emoji_type`
			`field to keep track of it. We are in the progress of migrating Zulip`
			`to refer to these emoji only by ID, which is a requirement for being`
			`able to support deprecating old realm emoji in a sensible way.`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00
			`## Tooling`

docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`We use the [iamcal emoji data package][iamcal] to provide sprite`
			`sheets and individual images for our emoji, as well as a data set of`
docs: Update emoji.md to explain new naming scheme. 2017-10-03 07:22:52 +02:00			`emoji categories, code points, etc. The sprite sheets are used`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`by the Zulip webapp to display emoji in messages, emoji reactions,`
			`etc. However, we can't use the sprite sheets in some contexts, such`
docs: Update emoji.md to explain new naming scheme. 2017-10-03 07:22:52 +02:00			`as missed-message and digest emails, that need to have self-contained`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`assets. For those, we use individual emoji files under`
			`static/generated/emoji`. The structure of that repository contains
docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-10-23 02:43:28 +02:00			`both files named after the Unicode representation of emoji (as actual`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`image files) as well as symlinks pointing to those emoji.`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`We need to maintain those both for the names used in the iamcal emoji`
			data set as well as our old emoji data set (`emoji_map.json`). Zulip
			has a tool, `tools/setup/emoji/build_emoji`, that combines the
docs: Fix typo in emoji.md. Change emoji-map.json to emoji_map.json. 2021-01-12 11:38:52 +01:00			`emoji.json` file from iamcal with the old `emoji_map.json` data set
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`to construct the various symlink farms and output files described`
			`below that support our emoji experience.`

			The `build_emoji` tool generates the set of files under
			`static/generated/emoji` (or really, it generates the
			`/srv/zulip-emoji-cache/<sha1>/emoji` tree, and
			`static/generated/emoji` is a symlink to that tree; we do this in
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00			`order to cache old versions to make provisioning and production`
			`deployments super fast in the common case that we haven't changed the`
docs: Reduce the number of apparently broken links on github. - Updated 260+ links from ".html" to ".md" to reduce the number of issues reported about hyperlinks not working when viewing docs on Github. - Removed temporary workaround that suppressed all warnings reported by sphinx build for every link ending in ".html". Details: The recent upgrade to recommonmark==0.5.0 supports auto-converting ".md" links to ".html" so that the resulting HTML output is correct. Notice that links pointing to a heading i.e. "../filename.html#heading", were not updated because recommonmark does not auto-convert them. These links do not generate build warnings and do not cause any issues. However, there are about ~100 such links that might still get misreported as broken links. This will be a follow-up issue. Background: docs: pip upgrade recommonmark and CommonMark #13013 docs: Allow .md links between doc pages #11719 Fixes #11087. 2019-09-30 19:37:56 +02:00			`emoji tooling). See [our dependencies document](../subsystems/dependencies.md)`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`for more details on this strategy.`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00
			`The emoji tree generated by this process contains several import elements:`
emoji_codes: Replace JS module with JSON module. webpack optimizes JSON modules using JSON.parse("{…}"), which is faster than the normal JavaScript parser. Update the backend to use emoji_codes.json too instead of the three separate JSON files. Signed-off-by: Anders Kaseorg <anders@zulipchat.com> 2020-02-06 07:07:10 +01:00			* `emoji_codes.json`: A set of mappings used by the Zulip frontend to
docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-10-23 02:43:28 +02:00			`understand what Unicode emoji exist and what their shortnames are,`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00			`used for autocomplete, emoji pickers, etc. This has been`
			`deduplicated using the logic in`
			`tools/setup/emoji/emoji_setup_utils.py` to generally only have
			`:angry:` and not also `:angry_face:`, since having both is ugly and
			`pointless for purposes like autocomplete and emoji pickers.`
			* `images/emoji/unicode/*.png`: A farm of emoji
			* `images/emoji/*.png`: A farm of symlinks from emoji names to the
			`images/emoji/unicode/` tree. This is used to serve individual emoji
			`images, as well as for the`
docs: Capitalize Markdown consistently. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-08-11 01:47:49 +02:00			`[backend Markdown processor](../subsystems/markdown.md) to know which emoji`
docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-10-23 02:43:28 +02:00			`names exist and what Unicode emoji / images they map to. In this`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00			tree, we currently include all of the emoji in `emoji-map.json`;
			this means that if you send `:angry_face:`, it won't autocomplete,
			`but will still work (but not in previews).`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`* Some CSS and PNGs for the emoji spritesheets, used in Zulip for`
			`emoji pickers where we would otherwise need to download over 1000 of`
docs: Add initial documentation on the emoji system. 2017-01-29 21:15:29 +01:00			`individual emoji images (which would cause a browser performance`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00			`problem). We have multiple spritesheets: one for each emoji`
docs: Update emoji.md to explain new naming scheme. 2017-10-03 07:22:52 +02:00			`provider that we support (Google, Twitter, EmojiOne, and Apple.).`
docs: Update emoji tooling documentation. This is a start and fixes the most glaring problems from not updating this documentation; I'd like Harshit to do a helpful pass on updating this to cover some of the more subtle details about how our emoji picker works, emoji aliases, etc. 2017-09-24 18:02:12 +02:00
			`[iamcal]: https://github.com/iamcal/emoji-data`
docs: Update emoji.md to explain new naming scheme. 2017-10-03 07:22:52 +02:00
			`## Picking emoji names`

			`I think it is fair to say Zulip has by far the best set of emoji names of`
			`any product at the time of the writing of this document. If you find an`
			`emoji name you don't like, or think is missing, please let us know!`

			`The following set of considerations is not comprehensive, but has a few`
			`principles that were applied to the current set of names. We use (strong),`
			`(medium), and (weak) denote how strong a consideration it is.`

docs: Fix some typos in documentation (most of them found and fixed by codespell). Signed-off-by: Stefan Weil <sw@weilnetz.de> 2020-03-17 13:57:10 +01:00			`* Even with over 1000 symbols, emoji feels surprisingly sparse as a language,`
docs: Update emoji.md to explain new naming scheme. 2017-10-03 07:22:52 +02:00			`and more often than not, if you search for something, you don't find an`
			`appropriate emoji for it. So a primary goal for our set of names is to`
			`maximize the number of situations in which the user finds an emoji that`
			`feels appropriate. (strong)`

			`* Conversely, we remove generic words that will gum up the typeahead. So`
			`:outbox:` instead of `:outbox_tray:`. Each word should count. (medium)

			`* We aim for the set of names to be as widely culturally applicable as`
			possible, even if the glyphs are not. So `:statue:` instead of
			`:new_york:` for the statue of liberty, and `:tower:` instead of
			`:tokyo_tower:`. (strong)

			* We remove unnecessary gender descriptions. So `:ok_signal:` instead of
			`:ok_woman:`. (strong)

			`* We don't add names that could be inappropriate in school or work`
			`environments, even if the use is common on the internet. For example, we`
			have not added `:butt:` for `:peach:`, or `:cheers:` for
			`:beers:`. (strong)

			`* Names should be compatible with the four emoji sets we support, but don't`
			`have to be compatible with any other emoji set. (medium)`

			`* We try not to use a creative canonical_name for emoji that are likely to`
			`be familiar to a large subset of users. This largely applies to certain`
			`faces. (medium)`

docs: Fix more capitalization issues. Signed-off-by: Anders Kaseorg <anders@zulip.com> 2020-10-23 02:43:28 +02:00			`* The set of names should be compatible with the iamcal, gemoji, and Unicode`
docs: Update emoji.md to explain new naming scheme. 2017-10-03 07:22:52 +02:00			`names. Compatible here means that if there is an emoji name a user knows`
			`from one of those sets, and the user searches for the key word of that`
			`name, they will get an emoji in our set. It is okay if this emoji has a`
			`slightly different name or codepoint from the names/codepoints in the`
			`other sets. (weak)`

			`Much of the work of picking names went into the first bullet above: making`
			`the emoji language less sparse. Some tricks and heuristics that were used`
			`for that:`

			* There are many near duplicates, like `:dog:` and `:dog_face:`, or
			`:mailbox:`, `:mailbox_with_mail:`, and `:mailbox_with_no_mail:`. In these
			cases we repurpose the duplicates to be as useful as we can, like `:dog:`
			and `:puppy:`, and `:mailbox:`, `:unread_mail:`, `:inbox_zero:` for the
			`ones above. There isn't a ton of flexibility, since we can't change the`
			`glyphs. But in most cases we have been able to come up with something.`

			`* Many emoji have commonly understood meanings among people that use emoji a`
			`lot, and there are websites and articles that document some of these`
			`meanings. A commonly understood meaning can be a great thing to add as an`
			`alternate name, since often it is a sign that the meaning is addressing a`
			`real gap in the emoji system.`

			`* Many emoji names are unnecessarily specific in iamcal/etc, like`
			`:flower_playing_cards:`, `:izakaya_lantern:`, or `:amphora:`. Renaming
			them to `:playing_cards:`, `:lantern:`, and `:vase:` makes them more
			`widely usable. In such cases we often keep the specific name as an`
			`alternate.`

			* If there are natural things someone might type, like `:happy:`, we try to
			`find an emoji to match. This extends to things that someone might not`
			`think to type, but as soon as someone in the organization discovers it it`
			could get wide use, like `:working_on_it:`. Good future work would be to
			`collect (by survey or tooling) things people type into the emoji picker`
			`typeahead on chat.zulip.org, and find ways to add those names as`
			`alternates.`

			`Other notes`

			`* Occasionally there are near duplicates where we don't have ideas for`
			`useful names for the second one. In that case we sometimes remove the`
			`emoji rather than have two nearly identical glyphs in the emoji picker and`
			typeahead. For instance, we kept `:spiral_notepad:` and dropped
			`:spiral_calendar_pad:`. If the concepts are near duplicates but the sets
			`of glyphs look very different, we'll find two names that allow them both`
			`to stay.`

			`* We removed many of the moons and clocks, to make the typeahead experience`
			`better when searching for something that catches all the moons or all the`
			`clocks. We kept all the squares and diamonds and other shapes, even though`
			`they have the same problem, since they are commonly used to make emoji art`
			`on Twitter, and could conceivably be used the same way on Zulip.`