Boneyard Tools

AI Text Clustering

Paste a list of items, one per line, and group them into topics by meaning instead of shared words. A MiniLM embedding model maps every line to a vector, then items are clustered by cosine similarity: lines about dogs land together, finance lines in another group, cooking in a third, even when they share no words. A tightness slider sets how similar items must be to join a cluster (higher = tighter, more clusters). Everything runs in your browser, so nothing is uploaded; the model downloads once on first use, then is cached.

How to cluster a list by meaning

  1. Paste your items into the box, one per line.
  2. Set the tightness slider (higher means tighter, more granular clusters), then click Group similar items (the first run loads the model).
  3. Read each cluster card, titled by a representative item and listing its members; the largest clusters show first.

Examples

Group a mixed list into topics

Lines: 'The puppy chased a ball.' / 'My savings account earns interest.' / 'I baked sourdough bread.' / 'The dog fetched the stick.'
Cluster 1 (dogs): the puppy and the dog lines. Cluster 2: the savings line. Cluster 3: the bread line.

Frequently asked questions

How is this different from sorting or grouping by keyword?

Keyword grouping needs the same words to appear. Clustering compares meaning: 'the puppy chased a ball' and 'the dog fetched the stick' group together even though they share no words, because their sentence embeddings are close. It uses cosine similarity, not string matching.

Is my text uploaded anywhere?

No. The MiniLM embedding model runs entirely in your browser via WebAssembly. Your list is processed on your device and never sent to a server. Only the model is downloaded, once, then cached, so nothing is uploaded.

Which AI model does this use?

all-MiniLM-L6-v2, a compact sentence-transformer (about 23 MB) that maps text to 384-dimensional vectors. It is fast, widely used for semantic grouping, and runs locally through transformers.js and ONNX.

What does the tightness slider do?

It sets the minimum cosine similarity for an item to join an existing cluster. A higher value makes clusters tighter, so you get more, smaller groups of very similar items. A lower value merges loosely related items into fewer, broader groups. Re-cluster after changing it.

How big a list can it handle?

It comfortably handles hundreds to a few thousand lines. Embedding scales with the amount of text, so very large lists take longer to embed on the first pass, but clustering itself is fast once the vectors exist. Everything still runs in your browser, nothing is uploaded.

Related tools