Boneyard Tools

Hex, bytes and UTF-8: how text becomes hexadecimal

Why two hex digits make one byte, how UTF-8 maps characters to bytes, and how tolerant parsing of spaces and 0x prefixes works.

Two hex digits are one byte

Hexadecimal is base 16, using the digits 0 to 9 and then A to F for the values ten to fifteen. A single byte holds a value from 0 to 255, which is exactly the range two hex digits can express, from 00 to FF. That one-to-one fit is why hex is the standard way to write raw bytes: every byte becomes a tidy two-character pair. It also explains the even-length rule this tool enforces, because an odd number of digits would leave one byte half-written. When you read '48' you are reading the number seventy-two, which happens to be the byte for a capital H.

From bytes to characters with UTF-8

Bytes are just numbers; turning them into letters needs an encoding, and the modern default is UTF-8. In UTF-8 the classic ASCII characters keep their original single-byte values, so 48 65 6c 6c 6f spells Hello just as it always did. Characters outside that range use two, three or four bytes in a defined pattern, which is how one visible accented letter can occupy several hex pairs. This tool decodes with a UTF-8 decoder and encodes with a UTF-8 encoder, so text survives a round trip through hex unchanged. The catch is that the hex length no longer matches the character count once you leave plain ASCII.

Why spaces and 0x prefixes are tolerated

Hex rarely arrives clean. A hex dump groups bytes with spaces, a C or JavaScript listing writes each byte as 0x48, and copied output may mix the two. Rather than force you to scrub the input, the decoder removes all whitespace and strips a leading 0x or 0X from every byte before it reads anything. So '0x48 0x65', '48 65' and '4865' all decode to the same two characters. Only after this cleanup does it check the even-length and hex-only rules, which keeps genuine mistakes visible while shrugging off harmless formatting.

Common uses and honest limits

Hex text conversion shows up when you inspect a network packet, read a database blob, decode a token, or debug how a string is stored on disk. It is a lossless view of the underlying bytes, not an encryption or a checksum, so anyone can reverse it; never treat hex as a way to hide secrets. This tool also assumes UTF-8, so bytes produced by a different encoding such as Latin-1 may decode into unexpected characters. And because it works on bytes, it cannot know the original character boundaries if the hex is truncated mid-character. For those cases, confirm the source encoding before trusting the decoded text.

Frequently asked questions

Why is my hex longer than my text?

Each byte is two hex digits, and any character beyond plain ASCII uses more than one byte in UTF-8. An accented letter or emoji therefore expands into several hex pairs, making the hex string longer than the visible characters.

Is hex encoding the same as encryption?

No. Hex is a plain, reversible representation of bytes with no key involved. Anyone can decode it back to the original text, so it protects nothing; use real encryption for secrets.

What if the source was not UTF-8?

The decoder assumes UTF-8, so bytes from another encoding like Latin-1 may turn into wrong or garbled characters. Check the original encoding first if the decoded text looks off.