Boneyard Tools

Greedy versus lazy quantifiers in regex

Why regex quantifiers grab as much as they can by default, how the lazy ? modifier reins them in, and when each behaviour is what you want.

What a quantifier does

A quantifier controls how many times the token before it may repeat. The core three on the cheatsheet are * for zero or more, + for one or more, and ? for zero or one. There are also counted forms: {n} for exactly n, {n,} for at least n, and {n,m} for a range. So \d{4} matches exactly four digits like 2026, while ab*c matches ac, abc or abbbc. Getting the count right is half of writing a pattern; the other half is deciding how much text it should swallow.

Greedy is the default

By default * and + are greedy, meaning they consume as much of the string as they can while still letting the overall pattern succeed. This is efficient and usually what you want, but it can overshoot. The classic trap is matching HTML-like tags with <.*>. Against the string <a><b>, the greedy .* races to the very end, then backtracks only far enough to find a final >, so it captures the whole <a><b> instead of the first tag. The engine did nothing wrong; greedy simply means it prefers the longest match.

Adding ? makes a quantifier lazy

Putting a ? after a quantifier flips it to lazy, so it matches as few characters as possible and expands only when forced. The cheatsheet lists *? for lazy zero or more and +? for lazy one or more. Rewriting the earlier pattern as <.*?> makes the .*? stop at the first > it can find, so it correctly matches just <a> in <a><b>. Likewise ".+?" stops at the first closing quote rather than the last, which is what you want when scanning several quoted strings on one line.

Choosing between them

Reach for greedy quantifiers when you genuinely want the longest run, such as trimming everything up to the last separator on a line. Reach for lazy ones when you want the shortest match up to the next delimiter, which is the common case for tags, quotes and brackets. A cleaner alternative to both is often a negated character class, for example <[^>]*> to match a tag without any backtracking at all. When performance matters on long inputs, that approach avoids the repeated backtracking that a naive greedy or lazy pattern can trigger.

Frequently asked questions

Does lazy matching change what can match, or just how much?

It changes how much. Greedy and lazy versions can match the same overall strings, but greedy prefers the longest match and lazy prefers the shortest, which is why they capture different substrings.

Is a lazy quantifier slower than a greedy one?

Not inherently; each just backtracks from a different direction. For predictable speed on long text, a negated character class like [^>]* often beats both because it avoids backtracking.