Boneyard Tools

AI Semantic Diff

Compare two texts by what they MEAN, not by characters. A line diff flags every reworded or reordered sentence as a change; a semantic diff matches sentences that say the same thing even when the words and order differ. A MiniLM embedding model embeds each sentence, then a greedy one-to-one alignment pairs the closest matches above a threshold and reports the rest as removed or added ideas. Everything runs in your browser, so nothing is uploaded; the model downloads once on first use, then is cached.

How to diff two texts by meaning

  1. Paste the first version into Original and the edited version into Revised.
  2. Set the same-idea threshold (higher is stricter), then click Compare meaning (the first run loads the model).
  3. Read the three sections: common ideas (matched pairs), only in original (removed), and only in revised (added).

Examples

Reordered and reworded, yet matched

Original: 'The cat sat on the mat. It was a sunny day.'  Revised: 'It was sunny outside. A feline rested on the rug.'
Both sentences match across the two versions, so nothing reads as removed or added despite the rewording.

Frequently asked questions

How is this different from a normal line or character diff?

A line diff compares characters, so it marks 'a feline rested on the rug' as a deletion plus an addition versus 'the cat sat on the mat'. A semantic diff compares meaning: it pairs those sentences as the same idea because their embeddings are close, even with no shared words and a different order.

Is my text uploaded anywhere?

No. The MiniLM embedding model runs entirely in your browser via WebAssembly. Both versions are processed on your device and never sent to a server. Only the model is downloaded, once, then cached, so nothing is uploaded.

Which AI model does this use?

all-MiniLM-L6-v2, a compact sentence-transformer (about 23 MB) that maps text to 384-dimensional vectors. It is fast, widely used for sentence matching, and runs locally through transformers.js and ONNX.

What does the same-idea threshold control?

It is the minimum cosine similarity for two sentences to count as the same idea and be paired. A higher value is stricter, so only near-identical sentences match and more end up as removed or added. A lower value is looser and pairs roughly related sentences. Re-run after changing it.

How are sentences paired when several are similar?

Greedily and one-to-one: the tool repeatedly takes the single highest-similarity pair still above the threshold whose sentences are both unused, locks them together, and repeats. Leftover original sentences are reported as removed and leftover revised sentences as added. Everything still runs in your browser, nothing is uploaded.

Related tools