Boneyard Tools

What an emoji really is: code points, ZWJ and skin tones

How emoji are stored in Unicode: pictographic code points, zero-width joiner sequences, skin-tone modifiers and flags, and why some need many bytes.

Emoji are Unicode code points, not pictures

When you type a smiley your keyboard does not insert an image, it inserts a Unicode code point that your device happens to draw as a picture. Most single emoji belong to a Unicode property called Extended_Pictographic, which is exactly what this tool matches when it decides what to delete. Because they are code points, emoji live in the same text stream as your letters and can be searched, counted and removed like any other character. The picture you see is chosen by your operating system's font, which is why the same waving hand looks different on an iPhone, an Android and Windows. Removing an emoji simply means deleting its code points from the string.

Skin tones and variation selectors

A single visible emoji is often more than one code point. A thumbs-up with a skin tone is the base thumbs-up followed by a skin-tone modifier from a small block of five shades. Many symbols also carry a variation selector, an invisible character that says 'draw the colorful emoji form' rather than the plain black-and-white glyph. These helper characters are invisible on their own, so if a tool strips only the base symbol it can leave the modifiers behind as hidden junk. That is why this remover also sweeps up orphaned joiners and selectors after removing the visible emoji.

Zero-width joiner sequences and flags

The most complex emoji are built by gluing several together. A family emoji is separate man, woman and child emoji joined by an invisible zero-width joiner, and a profession emoji joins a person with an object the same way. Flags work differently: each is a pair of regional-indicator letters, so the United States flag is really the letters that spell its country code drawn as one banner. To delete these correctly a tool has to recognise the whole joined cluster, not just the first piece, otherwise it leaves a stray half-emoji or a lone letter. This tool matches the entire cluster so a family or a flag is removed as one unit and the counter reads one.

What this tool does not touch

The remover is deliberately conservative so it never eats your real content. Currency signs, math operators, arrows and ordinary punctuation are not pictographic emoji, so they always stay. Keycap emoji are a tricky edge case: the '5' key emoji is a plain digit five plus an enclosing keycap mark, and because the digit is genuine text the number can survive the clean. Regional letters that are not paired into a flag are also left alone. If you need to remove those remaining characters, do it by hand after running the tool.

Frequently asked questions

Why does one emoji count as several characters?

A visible emoji can be a base symbol plus skin-tone modifiers, variation selectors and zero-width joiners. Your text editor may report several code points for what looks like one picture, but this tool counts the whole cluster as a single emoji.

Why do the same emoji look different on my phone and laptop?

Emoji are code points, and each platform ships its own emoji font that decides how to draw them. The stored text is identical, only the picture differs, and removing the emoji removes it everywhere regardless of how it looked.

Can removing emoji corrupt the surrounding text?

No. Deleting emoji code points and their invisible helpers never changes the letters, digits or punctuation around them. At most you get an extra space where the emoji sat, which the collapse and trim options tidy up.