This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] regex: fix backreferences in forward assertions
Needs ReviewPublic

Authored by pammon on Apr 28 2017, 12:02 AM.

Details

Summary

In regex, forward assertions like '(?=stuff)' are implemented by
constructing a child regular expression 'stuff' and matching that.
If the child regular expression contains a backreference, this would
trip an assertion or reference the wrong capture group, because the
child was ignorant of the capture groups of its parent. For example,
/(x)(?=\1)/ would trip an assertion.

Address this by propagating submatches into the child, so that
backreferences reference the correct capture groups. This also allows us
to eliminate the mexp_ field, because the child expression shares the
entire submatch array with the parent.

Diff Detail

Event Timeline

pammon created this revision.Apr 28 2017, 12:02 AM
Qix- added a subscriber: Qix-.Jan 25 2019, 2:37 PM

Ping @EricWF - few years but this is still an issue, rendering ECMAscript regex backreferences almost entirely broken in libcxx :/ Would be great to get a champion for it. OP has indicated on Github that they'd be happy to rebase.

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 6:28 AM