This is an archive of the discontinued LLVM Phabricator instance.

[FileCheck] Add ability to match newline characters
Needs ReviewPublic

Authored by jhenderson on Jan 25 2017, 4:32 AM.

Details

Summary

Currently there is no way using FileCheck to match an explicit newline character. Using, for example, the pattern "{{\n}}" matches an 'n' character, not '\n' (nor for that matter the string "\n"). The current suggested method of using [[:space:]] matches all forms of whitespace, so may not be always suitable.

This change coverts the pattern "\n" into a single '\n' character for use by the regex matcher, within regular expression patterns, if the new switch "--match-new-line-characters" is specified. Backslashes preceding 'n' characters can be escaped using double backslashes, if necessary.

I considered putting this change in without the switch. However, in that case FileCheck's behaviour when presented with the string "a\nnewline" would be different to grep's, which matches "annewline" in this case. I'm not sure that this is desirable.

Diff Detail

Event Timeline

jhenderson created this revision.Jan 25 2017, 4:32 AM
dsanders edited edge metadata.Feb 10 2017, 3:12 AM

Currently there is no way using FileCheck to match an explicit newline character. Using,
for example, the pattern "{{\n}}" matches an 'n' character, not '\n' (nor for that matter the
string "\n"). The current suggested method of using [[:space:]] matches all forms of
whitespace, so may not be always suitable.

Does this kind of pattern work on Windows? That OS uses CRLF (\r\n) for line endings and will probably fail to match patterns that explicitly check for \n.

Does something like this achieve the effect you need?:

// CHECK: foo{{$}}
// CHECK-NEXT: {{^}}bar

Currently there is no way using FileCheck to match an explicit newline character. Using,
for example, the pattern "{{\n}}" matches an 'n' character, not '\n' (nor for that matter the
string "\n"). The current suggested method of using [[:space:]] matches all forms of
whitespace, so may not be always suitable.

Does this kind of pattern work on Windows? That OS uses CRLF (\r\n) for line endings and will probably fail to match patterns that explicitly check for \n.

Yes, FileCheck normalises line endings to '\n'. In fact, all my development on this was done on Windows, so I'm more confident about that side than the Linux side!

Does something like this achieve the effect you need?:

// CHECK: foo{{$}}
// CHECK-NEXT: {{^}}bar

Unfortunately, this isn't sufficient for our use case. FileCheck doesn't provide a method for the matching explicit blank lines, due to the way the checks work. If the previous check matches to the end of a line, e.g. foo\n\nbar becomes \n\nbar in the buffer after matching foo{{$}}. A "CHECK-NEXT: {{^$}}" will then match immediately, but be treated as on the same line, causing an error. @probinson suggested an alternative method in D28896 which would work, but is not particularly clear, whereas I think a "CHECK: abc{{\n\n}}def" would be quite clear. It also reduces the need of using {{$}} and {{^}} repeatedly, if exact matches are desired.

Currently there is no way using FileCheck to match an explicit newline character. Using,
for example, the pattern "{{\n}}" matches an 'n' character, not '\n' (nor for that matter the
string "\n"). The current suggested method of using [[:space:]] matches all forms of
whitespace, so may not be always suitable.

Does this kind of pattern work on Windows? That OS uses CRLF (\r\n) for line endings and will probably fail to match patterns that explicitly check for \n.

Yes, FileCheck normalises line endings to '\n'. In fact, all my development on this was done on Windows, so I'm more confident about that side than the Linux side!

:-)

Does something like this achieve the effect you need?:

// CHECK: foo{{$}}
// CHECK-NEXT: {{^}}bar

Unfortunately, this isn't sufficient for our use case. FileCheck doesn't provide a method for the matching explicit blank lines, due to the way the checks work. If the previous check matches to the end of a line, e.g. foo\n\nbar becomes \n\nbar in the buffer after matching foo{{$}}. A "CHECK-NEXT: {{^$}}" will then match immediately, but be treated as on the same line, causing an error. @probinson suggested an alternative method in D28896 which would work, but is not particularly clear, whereas I think a "CHECK: abc{{\n\n}}def" would be quite clear. It also reduces the need of using {{$}} and {{^}} repeatedly, if exact matches are desired.

Thanks, I see the problem now. I believe CHECK-NEXT would work as expected if it ate a single leading newline (when present) before attempting the match. That way 'CHECK-NEXT: {{^$}}' would successfully match on the next line. I wouldn't normally be keen on a special case like that but I'd be surprised if anyone would use CHECK-NEXT and {{^}} together and want it to fail on a single leading newline.

The main reason I'm suggesting this as an alternative is because I think that if we're going to support '\n', the right place to do it would be inside llvm::Regex's engine (guarded by a flag). Having had a quick look at that, I wouldn't want to modify it (and more to the point, re-test it) if it can be avoided.

jdenny added a subscriber: jdenny.Sep 29 2018, 5:58 AM