Look-ahead and look-behind — Regular expressions
Look-ahead and look-behind
Sometimes we need to find only those matches for a pattern that are followed or preceded by another pattern.
There’s a special syntax for that, called “look-ahead” and “look-behind”, together referred to as “look-around”.
For the start, let’s find the price from the string like 1 turkey costs 30€. That is: a number, followed by € sign.
Look-ahead
The syntax is: X(?=Y), it means “look for X, but match only if followed by Y”. There may be any pattern instead of X and Y.
For an integer number followed by €, the regexp will be \d+(?=€):
let str = "1 turkey costs 30€"; alert( str.match(/\d+(?=€)/) ); // 30, the number 1 is ignored, as it's not followed by €
Please note: the look-ahead is merely a test, the contents of the parentheses (?=...) is not included in the result 30.
When we look for X(?=Y), the regular expression engine finds X and then checks if there’s Y immediately after it. If it’s not so, then the potential match is skipped, and the search continues.
More complex tests are possible, e.g., X(?=Y)(?=Z) means:
- Find
X. - Check if
Yis immediately afterX(skip if isn’t). - Check if
Zis also immediately afterX(skip if isn’t). - If both tests passed, then the
Xis a match, otherwise continue searching.
In other words, such pattern means that we’re looking for X followed by Y and Z at the same time.
That’s only possible if patterns Y and Z aren’t mutually exclusive.
For example, \d+(?=\s)(?=.*30) looks for \d+ that is followed by a space (?=\s), and there’s 30 somewhere after it (?=.*30):
let str = "1 turkey costs 30€"; alert( str.match(/\d+(?=\s)(?=.*30)/) ); // 1
In our string that exactly matches the number 1.
Negative look-ahead
Let’s say that we want a quantity instead, not a price from the same string. That’s a number \d+, NOT followed by €.
For that, a negative look-ahead can be applied.
The syntax is: X(?!Y), it means “search X, but only if not followed by Y”.
let str = "2 turkeys cost 60€"; alert( str.match(/\d+\b(?!€)/g) ); // 2 (the price is not matched)
Look-behind
Look-behind browser compatibility
Please Note: Look-behind is not supported in non-V8 browsers, such as Safari, Internet Explorer.
Look-ahead allows adding a condition for “what follows”.
Look-behind is similar, but it looks behind. That is, it allows matching a pattern only if there’s something before it.
The syntax is:
- Positive look-behind:
(?<=Y)X, matchesX, but only if there’sYbefore it. - Negative look-behind:
(?<!Y)X, matchesX, but only if there’s noYbefore it.
For example, let’s change the price to US dollars. The dollar sign is usually before the number, so to look for $30 we’ll use (?<=\$)\d+ – an amount preceded by $:
let str = "1 turkey costs $30"; // the dollar sign is escaped \$ alert( str.match(/(?<=\$)\d+/) ); // 30 (skipped the sole number)
And, if we need the quantity – a number, not preceded by $, then we can use a negative look-behind (?<!\$)\d+:
let str = "2 turkeys cost $60"; alert( str.match(/(?<!\$)\b\d+/g) ); // 2 (the price is not matched)
Capturing groups
Generally, the contents inside look-around parentheses does not become a part of the result.
E.g., in the pattern \d+(?=€), the € sign doesn’t get captured as a part of the match. That’s natural: we look for a number \d+, while (?=€) is just a test that it should be followed by €.
But in some situations we might want to capture the look-around expression as well, or a part of it. That’s possible. Just wrap that part into additional parentheses.
In the example below the currency sign (€|kr) is captured, along with the amount:
let str = "1 turkey costs 30€"; let regexp = /\d+(?=(€|kr))/; // extra parentheses around €|kr alert( str.match(regexp) ); // 30, €
And here’s the same for look-behind:
let str = "1 turkey costs $30"; let regexp = /(?<=(\$|£))\d+/; alert( str.match(regexp) ); // 30, $
Summary
Look-ahead and look-behind (commonly referred to as “look-around”) are useful when we’d like to match something, depending on the context before/after it.
For simple regexps we can do the similar thing manually. That is: match everything, in any context, and then filter by context in the loop.
Remember, str.match (without flag g) and str.matchAll (always) return matches as arrays with index property, so we know where exactly in the text it is, and can check the context.
But generally, look-around is more convenient.
Look-around types:
Tasks
Find non-negative integers
There’s a string of integer numbers.
Create a regexp that looks for only non-negative ones (zero is allowed).
An example of use:
let regexp = /your regexp/g; let str = "0 12 -5 123 -18"; alert( str.match(regexp) ); // 0, 12, 123
Insert After Head
We have a string with an HTML Document.
Write a regular expression that inserts <h1>Hello</h1> immediately after <body> tag. The tag may have attributes.
For instance:
let regexp = /your regular expression/; let str = ` <html> <body style="height: 200px"> ... </body> </html> `; str = str.replace(regexp, `<h1>Hello</h1>`);
After that the value of str should be:
<html><body style="height: 200px"><h1>Hello</h1> ... </body> </html>
Original Content at: https://javascript.info/regexp-lookahead-lookbehind
© 2007–2024 Ilya Kantor, https://javascript.info