Boneyard Tools

Phone number formats and why extraction is tricky

How US, E.164 and international phone formats differ, why a 7 to 15 digit rule catches most of them, and how to clean extracted numbers.

Why phone numbers are hard to pattern match

A phone number has no single universal shape. The same US number can appear as (212) 555-0143, 212-555-0143, 212.555.0143 or 2125550143, and an international number adds a country code and grouping that varies by nation. Punctuation such as brackets, dots and spaces is decorative and differs between writers. Because of this variety, an extractor cannot lean on one rigid template; it has to accept a flexible run of digits and separators and then decide afterwards whether the run is plausibly a phone number.

The 7 to 15 digit rule

This tool keeps only candidates whose digit count lands between 7 and 15. The lower bound of 7 reflects the shortest local numbers still in use, while 15 is the maximum length defined by the international E.164 standard, which counts the country code plus the national number. That window is wide enough to catch national and international numbers yet narrow enough to reject four-digit years, six-digit dates and the very long strings used for order IDs or account references. It is a heuristic, so an unusually short or long real number can slip through.

What E.164 looks like

E.164 is the format telecom systems prefer: a plus sign, a country code of one to three digits, and the national number, with no spaces or punctuation, up to 15 digits in total. A UK number written locally as 020 7946 0958 becomes +442079460958 in E.164. When you feed data into a CRM, a messaging API or a spreadsheet, normalising to this form makes numbers comparable and strips formatting noise. This extractor keeps your original spelling, so converting to E.164 is a separate clean-up step you apply once the candidates are pulled out.

Cleaning the results after extraction

The list this tool produces is a starting point, not a verified directory. Before using the numbers, scan for entries that are actually reference codes or timestamps that happened to fall in the 7 to 15 digit range. If you need a single consistent format, strip the punctuation and prepend the correct country code to build an E.164 string. For dialing or messaging at scale, validate the cleaned numbers against a phone number library that knows real area codes and line types.

Frequently asked questions

Does the extractor understand country codes?

It recognises a leading plus sign and includes the following digits in the match, but it does not look up which country a code belongs to. The digits are captured verbatim and left for you to interpret.

How do I convert the results to a single format?

Remove every separator to get the raw digits, then add the country code you need with a plus sign to build an E.164 number. A spreadsheet find and replace or a short script handles this in one pass.