Boneyard Tools

AI Toxicity Checker

Screen a comment or message for harmful language before you publish it. Paste any text and a toxic-bert AI model scores it across toxic, severe toxic, obscene, threat, insult and identity hate, then gives you a single clean, warning, or flagged verdict. Use it as a moderation and safety helper for community comments, support tickets, or user submissions. The model runs entirely in your browser, so nothing is uploaded. It downloads once on first use, then is cached.

How to check text for toxicity

  1. Paste the comment, message, or user submission you want to moderate into the box.
  2. Click Check toxicity and wait a moment while the model loads on first use.
  3. Read the overall verdict, then the per-category scores to see what triggered it.

Examples

A friendly comment

Thanks so much for your help today, I really appreciate it. Have a great weekend!
Verdict: Clean. Every category scores low, well under the warning threshold.

Frequently asked questions

Is my text uploaded anywhere?

No. Nothing is uploaded. The toxic-bert model runs entirely in your browser through WebAssembly, so your text is checked on your device and never sent to a server. Only the model itself is downloaded, once, then cached.

Which AI model does this use?

It uses toxic-bert, a BERT model fine-tuned on the Jigsaw toxic-comment data. It scores text across several abuse categories at once and runs locally in your browser through transformers.js and ONNX, with no API calls.

What do the categories mean?

Each category is a separate probability from 0 to 100 percent: toxic (rude or disrespectful), severe toxic (very hateful or aggressive), obscene, threat, insult, and identity hate (attacks on a group). Because they are scored independently, the numbers do not add up to 100 percent.

How are clean, warning, and flagged decided?

The verdict is driven by the single highest category score. At or above 70 percent it is flagged, at or above 40 percent it is a warning, and below that it is clean. Moderation cares about the worst dimension, so the maximum is used rather than an average.

Can I rely on it to moderate automatically?

Treat it as a moderation aid, not a final verdict. Like any model it can miss context such as sarcasm, reclaimed words, or quoted speech, and it can flag harmless text. Keep a human in the loop for decisions that matter.

Related tools