9.0 KiB

Raw Blame History

Emoji

Emoji seem like a simple idea, but there's actually a ton of complexity that goes into an effective emoji implementation. This document discusses a number of these issues.

Currently, Zulip supports these four display formats for emoji:

Google
Twitter
Plain text
Google blob (deprecated)

Emoji codes

The Unicode standard has various ranges of characters set aside for emoji. So you can put emoji in your terminal using actual Unicode characters like 😀 and 👍. If you paste those into Zulip, Zulip will render them as the corresponding emoji image.

However, the Unicode committee did not standardize on a set of human-readable names for emoji. So, for example, when using the popular : based style for entering emoji from the keyboard, we have to decide whether to use :angry: or :angry_face: to represent an angry face. Different products use different approaches, but for purposes like emoji pickers or autocomplete, you definitely want to pick exactly one of these names, since otherwise users will always be seeing duplicates of a given emoji next to each other.

Picking which emoji name to use is surprisingly complicated! See the section on picking emoji names below.

Custom emoji

Zulip supports custom user-uploaded emoji. We manage those by having the name of the emoji be its "emoji code", and using an emoji_type field to keep track of it. We are in the progress of migrating Zulip to refer to these emoji only by ID, which is a requirement for being able to support deprecating old realm emoji in a sensible way.

Tooling

We use the iamcal emoji data package to provide sprite sheets and individual images for our emoji, as well as a data set of emoji categories, code points, etc. The sprite sheets are used by the Zulip web app to display emoji in messages, emoji reactions, etc. However, we can't use the sprite sheets in some contexts, such as missed-message and digest emails, that need to have self-contained assets. For those, we use individual emoji files under static/generated/emoji. The structure of that repository contains both files named after the Unicode representation of emoji (as actual image files) as well as symlinks pointing to those emoji.

We need to maintain those both for the names used in the iamcal emoji data set as well as our old emoji data set (emoji_map.json). Zulip has a tool, tools/setup/emoji/build_emoji, that combines the emoji.json file from iamcal with the old emoji_map.json data set to construct the various symlink farms and output files described below that support our emoji experience.

The build_emoji tool generates the set of files under static/generated/emoji (or really, it generates the /srv/zulip-emoji-cache/<sha1>/emoji tree, and static/generated/emoji is a symlink to that tree; we do this in order to cache old versions to make provisioning and production deployments super fast in the common case that we haven't changed the emoji tooling). See our dependencies document for more details on this strategy.

The emoji tree generated by this process contains several import elements:

emoji_codes.json: A set of mappings used by the Zulip frontend to understand what Unicode emoji exist and what their shortnames are, used for autocomplete, emoji pickers, etc. This has been deduplicated using the logic in tools/setup/emoji/emoji_setup_utils.py to generally only have :angry: and not also :angry_face:, since having both is ugly and pointless for purposes like autocomplete and emoji pickers.
images/emoji/unicode/*.png: A farm of emoji
images/emoji/*.png: A farm of symlinks from emoji names to the images/emoji/unicode/ tree. This is used to serve individual emoji images, as well as for the backend Markdown processor to know which emoji names exist and what Unicode emoji / images they map to. In this tree, we currently include all of the emoji in emoji-map.json; this means that if you send :angry_face:, it won't autocomplete, but will still work (but not in previews).
Some CSS and PNGs for the emoji spritesheets, used in Zulip for emoji pickers where we would otherwise need to download over 1000 of individual emoji images (which would cause a browser performance problem). We have multiple spritesheets: one for each emoji provider that we support (Google, Twitter, EmojiOne, and Apple.).

Picking emoji names

I think it is fair to say Zulip has by far the best set of emoji names of any product at the time of the writing of this document. If you find an emoji name you don't like, or think is missing, please let us know!

The following set of considerations is not comprehensive, but has a few principles that were applied to the current set of names. We use (strong), (medium), and (weak) denote how strong a consideration it is.

Even with over 1000 symbols, emoji feels surprisingly sparse as a language, and more often than not, if you search for something, you don't find an appropriate emoji for it. So a primary goal for our set of names is to maximize the number of situations in which the user finds an emoji that feels appropriate. (strong)
Conversely, we remove generic words that will gum up the typeahead. So :outbox: instead of :outbox_tray:. Each word should count. (medium)
We aim for the set of names to be as widely culturally applicable as possible, even if the glyphs are not. So :statue: instead of :new_york: for the statue of liberty, and :tower: instead of :tokyo_tower:. (strong)
We remove unnecessary gender descriptions. So :ok_signal: instead of :ok_woman:. (strong)
We don't add names that could be inappropriate in school or work environments, even if the use is common on the internet. For example, we have not added :butt: for :peach:, or :cheers: for :beers:. (strong)
Names should be compatible with the four emoji sets we support, but don't have to be compatible with any other emoji set. (medium)
We try not to use a creative canonical_name for emoji that are likely to be familiar to a large subset of users. This largely applies to certain faces. (medium)
The set of names should be compatible with the iamcal, gemoji, and Unicode names. Compatible here means that if there is an emoji name a user knows from one of those sets, and the user searches for the key word of that name, they will get an emoji in our set. It is okay if this emoji has a slightly different name or codepoint from the names/codepoints in the other sets. (weak)

Much of the work of picking names went into the first bullet above: making the emoji language less sparse. Some tricks and heuristics that were used for that:

There are many near duplicates, like :dog: and :dog_face:, or :mailbox:, :mailbox_with_mail:, and :mailbox_with_no_mail:. In these cases we repurpose the duplicates to be as useful as we can, like :dog: and :puppy:, and :mailbox:, :unread_mail:, :inbox_zero: for the ones above. There isn't a ton of flexibility, since we can't change the glyphs. But in most cases we have been able to come up with something.
Many emoji have commonly understood meanings among people that use emoji a lot, and there are websites and articles that document some of these meanings. A commonly understood meaning can be a great thing to add as an alternate name, since often it is a sign that the meaning is addressing a real gap in the emoji system.
Many emoji names are unnecessarily specific in iamcal/etc, like :flower_playing_cards:, :izakaya_lantern:, or :amphora:. Renaming them to :playing_cards:, :lantern:, and :vase: makes them more widely usable. In such cases we often keep the specific name as an alternate.
If there are natural things someone might type, like :happy:, we try to find an emoji to match. This extends to things that someone might not think to type, but as soon as someone in the organization discovers it could get wide use, like :working_on_it:. Good future work would be to collect (by survey or tooling) things people type into the emoji picker typeahead on chat.zulip.org, and find ways to add those names as alternates.

Other notes

Occasionally there are near duplicates where we don't have ideas for useful names for the second one. In that case we sometimes remove the emoji rather than have two nearly identical glyphs in the emoji picker and typeahead. For instance, we kept :spiral_notepad: and dropped :spiral_calendar_pad:. If the concepts are near duplicates but the sets of glyphs look very different, we'll find two names that allow them both to stay.
We removed many of the moons and clocks, to make the typeahead experience better when searching for something that catches all the moons or all the clocks. We kept all the squares and diamonds and other shapes, even though they have the same problem, since they are commonly used to make emoji art on Twitter, and could conceivably be used the same way on Zulip.

9.0 KiB Raw Blame History