Boneyard Tools

AI Semantic Search

Find the lines that MEAN what you are looking for, even when they share no words with your query. Paste a list, document, or set of notes, type what you want in plain language, and a MiniLM embedding model ranks every line by semantic similarity. The corpus is embedded once, then results update live as you type. Everything runs in your browser, so nothing is uploaded; the model downloads once on first use, then is cached.

How to search text by meaning

  1. Paste your text into the corpus box and choose whether to search by line, paragraph, or sentence.
  2. Type a query in plain language and click Build index and search (the first run loads the model).
  3. Read the ranked matches; keep typing to refine and results re-rank instantly.

Examples

Find a concept with different words

Corpus lines: 'The puppy chased the ball.' / 'Rates rose at the bank.' / 'She baked bread.'  Query: 'a young dog playing'
Top match: 'The puppy chased the ball.' (high relevance) despite sharing no words.

Frequently asked questions

How is this different from Ctrl+F or keyword search?

Keyword search needs the exact words to appear. Semantic search compares meaning: a query like 'a young dog playing' will surface 'the puppy chased the ball' even though they share no words. It uses sentence embeddings and cosine similarity rather than string matching.

Is my text uploaded anywhere?

No. The MiniLM embedding model runs entirely in your browser via WebAssembly. Your corpus and queries are processed on your device and never sent to a server. Only the model is downloaded, once, then cached.

Which AI model does this use?

all-MiniLM-L6-v2, a compact sentence-transformer (about 23 MB) that maps text to 384-dimensional vectors. It is fast, widely used for semantic search, and runs locally through transformers.js and ONNX.

Why does the first search take a moment but the rest are instant?

On the first search the tool downloads the model and embeds your whole corpus once. After that, only your short query needs to be embedded, so re-ranking as you type is nearly instant. Changing the corpus or the unit (line/paragraph/sentence) rebuilds the index.

How big a corpus can it handle?

It comfortably handles thousands of lines. Embedding scales with the amount of text, so very large corpora take longer to index on the first pass, but querying stays fast because the corpus vectors are reused.

Related tools