Boneyard Tools

AI Image Tagger

Most auto-taggers only know a fixed list of words. This one tags an image with YOUR OWN labels: type the candidates you care about and CLIP, a zero-shot vision model, scores how well each one fits the picture, no training required. Drop in an image, start from a one-click label pack (scene, pet, content safety, photo type) or write your own, and get a ranked list with confidence bars. Editing the labels re-scores the SAME image instantly without re-reading it, so you can iterate freely. It runs entirely in your browser with CLIP, so nothing is uploaded. The model downloads once on first use, then is cached for instant reuse.

How to tag an image with your own labels

  1. Drop an image in, or click to browse.
  2. Type your candidate labels (comma or line separated), or click a preset pack to load one.
  3. Read the ranked results: each label gets a confidence bar and the top match is highlighted.
  4. Tweak the labels and re-tag to re-score the same image instantly.

Examples

Zero-shot tagging with custom labels

Labels: "indoor", "outdoor", "a beach", "a forest"
Ranked by confidence, for example: a beach 71%, outdoor 22%, a forest 5%, indoor 2%.

Frequently asked questions

How is this different from a normal image tagger?

A normal tagger only outputs words from a fixed vocabulary it was trained on. This tool is zero-shot: you supply the candidate labels and CLIP scores how well each one matches the image. That means you can tag for anything, from 'safe for work' to 'a watercolor painting' to your own product categories, without training a model.

Is my image uploaded anywhere?

No. The CLIP model runs entirely in your browser with WebAssembly. Your image is processed on your device and never uploaded. Only the model itself is downloaded, once, then it is cached for instant reuse.

How should I phrase the labels?

Short, natural phrases work best because CLIP was trained on image and caption pairs. 'a cat' or 'a photo of a beach' usually scores more reliably than a bare word like 'cat'. The preset packs are written this way, so use them as a template.

Why do the scores add up to about 100%?

CLIP compares the image against all of your labels at once and runs a softmax, so the scores are relative and sum to roughly 100%. Adding or removing a label changes every score. To judge labels independently, score them in separate runs.

Which AI model does this use?

CLIP (clip-vit-base-patch16) from OpenAI, run client-side through transformers.js. It maps images and text into the same space, which is what lets it score arbitrary text labels against any image with no task-specific training.

Related tools