This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Handle invalid escaped characters in POSIX regex
Needs ReviewPublic

Authored by miyuki on Jan 30 2018, 9:52 AM.

Details

Summary

Currently when parsing basic and extended POSIX regular expressions
libc++ silently skips invalid escaped characters and trailing escapes.
This patch changes the behavior, so that a std::regex_error with code
set to error_escape is thrown in these cases.
The patch also extends the bad_escape test case to document the
expected behavior for commonly used escape sequences.

Diff Detail

Event Timeline

miyuki created this revision.Jan 30 2018, 9:52 AM

I like this. One nit and a question.

include/regex
3492–3493

Do we need any more cases here?

test/std/re/re.regex/re.regex.construct/bad_escape.pass.cpp
50

should be const. (Yes, this is me being picky)

Quuxplusone added inline comments.
include/regex
3465

FWIW, I don't understand what's going on in this switch.
Is it intentional that '(' and '|' now take different paths here?

test/std/re/re.regex/re.regex.construct/bad_escape.pass.cpp
60

I would think about adding test cases here to document the intended behavior of

  • "\\n" and "\\t" which are valid of course;
  • "\\\n" which could be a common typo and should probably throw;
  • "\\/" which is common in Perl but maybe should throw anyway;
  • "\\1" in a regex mode that doesn't support backreferences;
  • "\\0".

If these are already covered elsewhere in the suite, then never mind me.

miyuki added inline comments.Jan 31 2018, 7:15 AM
include/regex
3465

Yes. As far as I understand, only extended POSIX regular expressions support alternation: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html, so "\\|" should be treated as error_escape

miyuki updated this revision to Diff 132352.Feb 1 2018, 3:32 AM
miyuki edited the summary of this revision. (Show Details)

Updated __parse_QUOTED_CHAR_ERE. Added more test cases.

miyuki marked 2 inline comments as done.Feb 1 2018, 3:33 AM
miyuki added inline comments.
include/regex
3492–3493

Probably not, but we should throw an exception here as well.