Boneyard Tools

From CSV columns to valid XML element names

The rules XML places on element names, how this tool sanitizes messy headers, and how text values are escaped to keep the document well-formed.

What XML requires of an element name

XML is strict about tag names in a way CSV headers are not. A name must begin with a letter or an underscore, never a digit, and may then contain letters, digits, hyphens, dots and underscores but no spaces or most punctuation. Names are also case sensitive, so Age and age are different tags. Because a spreadsheet header like 2020 Total or First Name would be illegal as a raw tag, a converter has to clean each header before it can build valid XML.

How headers are sanitized here

Each header is first trimmed of surrounding spaces. Any character outside letters, digits, dot, hyphen and underscore is replaced with an underscore, so a space or a slash becomes an underscore. If the cleaned name still starts with a digit or another illegal first character, an underscore is added at the front. A header that is empty after cleaning falls back to a positional name like column3, which guarantees every column ends up with a usable tag.

Escaping the text inside elements

Even when the tag names are valid, the values between them can break XML. An ampersand starts an entity reference and a less-than sign starts a tag, so both must be escaped, along with greater-than for safety. This tool converts them to &, < and > in every cell before writing the element. Quote characters are not escaped because the values sit in element text rather than inside attribute quotes, where they would otherwise need handling.

Structure, roots and rows

The document always opens with an XML declaration, then a single root element that wraps everything. Inside it, each data row becomes one row element whose children are the column elements in header order. You can rename the root and row wrappers, for example to catalog and product, to make the output read naturally for your data. The layout stays flat and indented, which keeps it easy to read and simple to load into most XML parsers.

Frequently asked questions

Why did my header get an underscore in front of it?

XML names cannot start with a digit or punctuation, so a header like 2020 is prefixed to become _2020. This keeps the document well-formed while staying close to your original label.

Can two columns end up with the same tag name?

Yes, if two headers sanitize to the same string, such as A/B and A B both becoming A_B. The XML is still well-formed, but you may want to rename the source headers to keep the columns distinct.

Will the output validate against a schema?

The output is well-formed XML, meaning correctly nested and escaped. It is not tied to any DTD or XSD, so validating against a specific schema is up to you if your workflow needs one.