Boneyard Tools

Alphabetical vs numeric sorting: why 10 can come before 2

Why text sorting puts 10 before 2, how numeric sorting fixes it, and how case, locale, and duplicates change the order of a list.

Sorting reads characters, not values

Default sorting is lexicographic, meaning it compares lines one character at a time from left to right. Under that rule '10' comes before '2' because the first character '1' sorts before '2', and the comparison stops there without ever looking at the numeric value. This is exactly right for words but surprising for numbers, and it is the single most common reason a sorted list looks out of order. Switching on 'Numeric sort' tells the tool to parse the number at the start of each line and compare those values instead.

How numeric mode handles mixed lines

Numeric sorting reads the leading number of each line, so '9 apples' and '80 pears' order as 9 then 80 rather than 80 then 9. A line that does not begin with a number cannot be placed on the numeric scale, so those lines are collected and moved to the end of the list without being removed. This keeps a stray label or heading from breaking the ordering of the real data. If you want a purely alphabetical arrangement of such mixed content, leave numeric mode off.

Case sensitivity and locale

With case sensitivity off, the tool lowercases each line before comparing, so 'Apple' and 'apple' fall next to each other and read as equal. Turning case sensitivity on treats them as distinct using a stricter comparison. The comparison is locale-aware rather than a raw code-point sort, which means accented characters and mixed alphabets are ordered the way a reader would expect instead of by their underlying byte values. For most everyday lists the default case-insensitive setting produces the most natural result.

Removing duplicates cleanly

Duplicate removal runs after the sort, so identical lines sit together and only the first occurrence of each is kept. Whether two lines count as duplicates follows your case setting: with case insensitivity, 'FIXME' and 'fixme' collapse to one; with case sensitivity on, they survive as two separate entries. Because a single trailing newline is ignored, a file that ends in a line break will not leave a phantom blank line among your results. The live line count under the output lets you confirm how many unique lines remain.

Frequently asked questions

Why did my version numbers sort in the wrong order?

Version strings like 1.10 and 1.2 sort by character, so 1.10 lands before 1.2. Numeric sort only reads the first number, so for multi-part versions you may need a dedicated version sorter rather than plain line sorting.

Does sorting keep two identical lines unless I ask to remove them?

Yes. Duplicates are preserved by default and simply sit next to each other after sorting. They are only collapsed when you tick 'Remove duplicates'.