Boneyard Tools

How hash identification works, and its limits

Why a hash type can be narrowed but rarely proven from the string alone, covering prefixes, length maps, and where the guess fails.

Prefixes are the only sure signal

Some password hashing schemes prepend a marker that names the algorithm and stores its parameters. The Modular Crypt Format uses dollar-delimited tags such as $2y$ for bcrypt, $6$ for SHA-512 crypt, and $argon2id$ for Argon2, while LDAP uses braces like {SSHA}. When one of these is present the identification is reliable because the format itself declares what it is. These prefixes also carry salt and cost information, which is why a bcrypt string is long even though the underlying digest is small.

Length and character set as a fallback

When there is no prefix and the string is entirely hexadecimal, the only clues left are how many characters it has and that it uses the 0 to 9 and a to f alphabet. A 32-character hex string is 128 bits, which points to MD5, NTLM, or the older MD4 and MD2. Forty characters is 160 bits, the size of SHA-1 and RIPEMD-160. Sixty-four characters is 256 bits, shared by SHA-256, SHA3-256, and BLAKE2s. Because several algorithms produce each size, this stage can only rank, never confirm.

Why confidence badges matter

The tool attaches a high, medium, or low badge to each guess so you do not treat a coincidence as a fact. Within a length bucket the most widely deployed algorithm gets the higher rank, which is why MD5 outranks MD2 at 32 characters and SHA-256 outranks the SHA3 and BLAKE2 variants at 64. Confidence reflects real-world prevalence and format certainty, not any inspection of the bytes, since the digests are statistically indistinguishable once you know the length.

Where identification breaks down

Several situations defeat pattern matching. A hash encoded as base64 rather than hex uses a different alphabet and length, so it will not map cleanly. A salt stored separately from the digest strips away the context that a prefixed format would provide. Truncated or concatenated values, uppercase versus lowercase differences, and stray whitespace can all push a result into the Unknown bucket. Treat the output as a starting hypothesis, then confirm it against the system that generated the hash or by test-hashing a known value.

Frequently asked questions

Why is bcrypt identified with high confidence but MD5 only sometimes?

Bcrypt carries a $2a$, $2b$, or $2y$ prefix that no other scheme uses, so the format is self-declaring. A raw MD5 is just 32 hex characters with no marker, a shape it shares with NTLM and others, so it can only be ranked as a likely guess.

The tool says Unknown, but I know it is a hash. Why?

It probably is not raw hex of a common length. Check for base64 encoding, an attached salt, uppercase or non-hex characters, or accidental whitespace. Isolating the pure hexadecimal digest usually lets the length matcher recognize it.