Question 1

How is this different from a normal image tagger?

Accepted Answer

A normal tagger only outputs words from a fixed vocabulary it was trained on. This tool is zero-shot: you supply the candidate labels and CLIP scores how well each one matches the image. That means you can tag for anything, from 'safe for work' to 'a watercolor painting' to your own product categories, without training a model.

Question 2

Is my image uploaded anywhere?

Accepted Answer

No. The CLIP model runs entirely in your browser with WebAssembly. Your image is processed on your device and never uploaded. Only the model itself is downloaded, once, then it is cached for instant reuse.

Question 3

How should I phrase the labels?

Accepted Answer

Short, natural phrases work best because CLIP was trained on image and caption pairs. 'a cat' or 'a photo of a beach' usually scores more reliably than a bare word like 'cat'. The preset packs are written this way, so use them as a template.

Question 4

Why do the scores add up to about 100%?

Accepted Answer

CLIP compares the image against all of your labels at once and runs a softmax, so the scores are relative and sum to roughly 100%. Adding or removing a label changes every score. To judge labels independently, score them in separate runs.

Question 5

Which AI model does this use?

Accepted Answer

CLIP (clip-vit-base-patch16) from OpenAI, run client-side through transformers.js. It maps images and text into the same space, which is what lets it score arbitrary text labels against any image with no task-specific training.

AI Image Tagger

How to tag an image with your own labels

Examples

Frequently asked questions

Related tools

AI Photo Tagger

Object Detector

Image to Text (OCR)

Add Border to Image

AI Alt Text Generator

AI Background Remover