AI Image Tagger
Most auto-taggers only know a fixed list of words. This one tags an image with YOUR OWN labels: type the candidates you care about and CLIP, a zero-shot vision model, scores how well each one fits the picture, no training required. Drop in an image, start from a one-click label pack (scene, pet, content safety, photo type) or write your own, and get a ranked list with confidence bars. Editing the labels re-scores the SAME image instantly without re-reading it, so you can iterate freely. It runs entirely in your browser with CLIP, so nothing is uploaded. The model downloads once on first use, then is cached for instant reuse.
How to tag an image with your own labels
- Drop an image in, or click to browse.
- Type your candidate labels (comma or line separated), or click a preset pack to load one.
- Read the ranked results: each label gets a confidence bar and the top match is highlighted.
- Tweak the labels and re-tag to re-score the same image instantly.
Examples
Zero-shot tagging with custom labels
Labels: "indoor", "outdoor", "a beach", "a forest"
Ranked by confidence, for example: a beach 71%, outdoor 22%, a forest 5%, indoor 2%.
Frequently asked questions
How is this different from a normal image tagger?
A normal tagger only outputs words from a fixed vocabulary it was trained on. This tool is zero-shot: you supply the candidate labels and CLIP scores how well each one matches the image. That means you can tag for anything, from 'safe for work' to 'a watercolor painting' to your own product categories, without training a model.
Is my image uploaded anywhere?
No. The CLIP model runs entirely in your browser with WebAssembly. Your image is processed on your device and never uploaded. Only the model itself is downloaded, once, then it is cached for instant reuse.
How should I phrase the labels?
Short, natural phrases work best because CLIP was trained on image and caption pairs. 'a cat' or 'a photo of a beach' usually scores more reliably than a bare word like 'cat'. The preset packs are written this way, so use them as a template.
Why do the scores add up to about 100%?
CLIP compares the image against all of your labels at once and runs a softmax, so the scores are relative and sum to roughly 100%. Adding or removing a label changes every score. To judge labels independently, score them in separate runs.
Which AI model does this use?
CLIP (clip-vit-base-patch16) from OpenAI, run client-side through transformers.js. It maps images and text into the same space, which is what lets it score arbitrary text labels against any image with no task-specific training.
Related tools
AI Photo Tagger
Drop a photo and get an AI caption plus clean keyword tags, in your browser. Copy as a comma list or hashtags for stock sites, SEO, and assets.
Object Detector
Detect objects in any photo with YOLO11, right in your browser. Draw labeled boxes, count what it finds, and download the result. Nothing is uploaded.
Image to Text (OCR)
Extract text from an image with OCR, right in your browser. Supports many languages, copy or download the result. Your image is never uploaded.
Add Border to Image
Add a colored border or frame around an image, with adjustable width and color. Live preview, runs in your browser, nothing uploaded.
AI Alt Text Generator
Generate accessible alt text for any image with AI, right in your browser. Copy clean alt text or a ready HTML img tag. Nothing is uploaded.
AI Background Remover
Remove the background from a photo automatically with AI, right in your browser. Get a transparent PNG or a solid color. Nothing is uploaded.