How word deduplication works (and when to use it)
Understand how repeated-word removal tokenizes text, why punctuation and case change the result, and where deduping words actually helps.
Tokenizing text into words
Before any duplicate can be removed, the text has to be cut into discrete words. This tool uses the simplest reliable rule: a word is any unbroken run of non-whitespace characters, and words are separated by spaces, tabs or line breaks. That keeps the logic predictable, but it also means whatever is glued to a word stays part of it. A trailing comma, a closing bracket or a period all ride along, so 'done.' at the end of a sentence is a different token from 'done' in the middle.
Why case and punctuation change the count
By default the comparison is exact, so 'Apple' and 'apple' are two separate words and both survive. Turning on Ignore case folds them together and keeps only the first spelling you wrote. Punctuation is never folded, which is deliberate: collapsing 'cat' and 'cat,' would risk deleting a word that genuinely belongs at the end of a clause. If your goal is to normalize spelling variants, clean the punctuation first, then dedupe.
Keeping order and spacing intact
The tool always keeps the first appearance of each word and preserves the original left-to-right order, so the result still reads as a sentence rather than a shuffled set. The 'Collapse extra spaces' switch controls the seams. Left on, kept words are re-joined with single spaces for a clean line. Turned off, the original whitespace that preceded each surviving word is re-emitted, which matters when tabs or deliberate indentation carry meaning.
Good and bad uses for word deduping
Removing duplicate words shines on machine-generated lists, stutter typos, repeated tags, and keyword strings where each term should appear once. It is a poor fit for prose you care about, because natural writing repeats common words like 'the' and 'and' on purpose, and stripping them produces broken grammar. When you only want to audit repetition rather than rewrite the text, reach for the Duplicate Word Finder, which reports counts and consecutive doubles without touching your words.