Unicode code points, planes and UTF-8
What a code point is, how Unicode planes and the BMP are organized, and how UTF-8 encodes any character in one to four bytes.
Code points and the U+ notation
A code point is the unique number Unicode assigns to a character, written as U+ followed by hexadecimal digits. The letter A is U+0041, which is 65 in decimal, and a grinning face emoji is U+1F600, which is 128512. The current range runs from U+0000 up to U+10FFFF, giving room for more than a million code points. A code point is an abstract identity; it says which character you mean but not how it is stored in memory or a file.
Planes and the Basic Multilingual Plane
Unicode divides its code space into 17 planes of 65,536 code points each. Plane 0, the Basic Multilingual Plane or BMP, holds almost every character in common use: Latin, Greek, Cyrillic, Arabic, Hebrew, the main CJK ideographs and thousands of symbols. The higher planes, often called astral or supplementary, hold most emoji, historic scripts, rarer ideographs and specialized symbols. Anything with a code point above U+FFFF lives outside the BMP.
How UTF-8 encodes a code point
UTF-8 is a variable-length encoding that stores a code point in one to four bytes. Code points up to U+007F, the ASCII range, take a single byte, so A stays 0x41. Values up to U+07FF take two bytes, up to U+FFFF take three, and everything above that takes four. The leading byte signals the length and the continuation bytes each begin with the bits 10, which is why a two-byte character like é becomes 0xC3 0xA9 and a four-byte emoji becomes four bytes such as 0xF0 0x9F 0x98 0x80.
Surrogate pairs in UTF-16
JavaScript strings use UTF-16, which stores BMP characters in one 16-bit unit but needs two units, a surrogate pair, for anything above U+FFFF. That is why a single emoji has a string length of 2 even though it is one character. Iterating with the spread operator or a for-of loop walks whole code points and rejoins the pair, which is exactly what this tool does so each astral character is counted and shown once.