This is an archive of the discontinued LLVM Phabricator instance.

[FileCheck] Support comment directives
ClosedPublic

Authored by jdenny on May 1 2020, 4:58 PM.

Details

Summary

Sometimes you want to disable a FileCheck directive without removing
it entirely, or you want to write comments that mention a directive by
name. The COM: directive makes it easy to do this. For example,
you might have:

; X32: pinsrd_1:
; X32:    pinsrd $1, 4(%esp), %xmm0

; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but 
; COM: X64 will have something similar to X32:
; COM:
; COM:   X64: pinsrd_1:
; COM:   X64:    pinsrd $1, %edi, %xmm0

Without this patch, you need to use some combination of rewording and
directive syntax mangling to prevent FileCheck from recognizing the
commented occurrences of X32: and X64: above as directives.
Moreover, FileCheck diagnostics have been proposed that might complain
about the occurrences of X64 that don't have the trailing :
because they look like directive typos:

<http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html>

I think dodging all these problems can prove tedious for test authors,
and directive syntax mangling already makes the purpose of existing
test code unclear. COM: can avoid all these problems.

This patch also updates the small set of existing tests that define
COM as a check prefix:

  • clang/test/CodeGen/default-address-space.c
  • clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
  • clang/test/Driver/hip-device-libs.hip
  • llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll

I think lit should support COM: as well. Perhaps clang -verify
should too.

Diff Detail

Event Timeline

jdenny created this revision.May 1 2020, 4:58 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 1 2020, 4:58 PM

I hesitate to suggest this, but is it worth using REM as a comment prefix? It's the comment marker for Windows batch files, so has some precedence.

llvm/docs/CommandGuide/FileCheck.rst
57

Nit: the rest of this document uses single spaces after full stops. Please do that in the new text too.

llvm/lib/Support/FileCheck.cpp
1922

It looks to me like the changes related to this function could be done separately (perhaps first). Is that the case, and if so, could you do so? It's a little hard to follow what's just refactoring of the existing stuff and what is new, with the extra comment stuff thrown in.

llvm/test/FileCheck/comment.txt
1 ↗(On Diff #261575)

You don't need to prefix the lines with '//' or anything else for that matter, since this isn't being used as an assembly/yaml/C/etc input.

If you do want to have the characters (I don't care either way), I'd prefer '#' for RUN/CHECK directive lines and '##' for comments (the latter so they stand out better from directive lines). It's fewer characters, whilst still as explicit ('#' is used widely in some parts of the testsuite at least).

2 ↗(On Diff #261575)

Don't prefix empty lines.

4 ↗(On Diff #261575)

begin -> beginning

9 ↗(On Diff #261575)

Is the lack of space after 'RUN:' deliberate?

10–11 ↗(On Diff #261575)

I have a personal preference for multi-line commands like this to be formatted like this:

// RUN: <command> <args> ... | \
// RUN:   <next command> <args> ...

with the | and \ on the same line, to show that the next line is the start of a new command (as opposed to just being more arguments for that line), and the extra two space indentation showing that the line is a continuation of the previous.

15 ↗(On Diff #261575)

Not a big deal, but it might be easier for debugging the test in the future if the %t.in (and %t.chk) of each case is named differently, e.g. %t2.in, %t2.chk etc. That way, the later ones won't overwrite the earlier ones.

38–43 ↗(On Diff #261575)

I'm struggling a little with this case. Firstly, why the '8' in the column count in the remark? Secondly, if COM: was being treated as a genuine comment here, then the check directive would become CHECK: foo which would still be a match of the input, if I'm not mistaken? I guess the two might somehow be related, but I don't know how if so.

127 ↗(On Diff #261575)

Maybe worth cases like "-comment-prefixes=,FOO", "-comment-prefixes=FOO," and "-comment-prefixes=FOO,,BAR".

133 ↗(On Diff #261575)

Maybe worth an additional case where the invalid character is not the first character in the string.

143–147 ↗(On Diff #261575)

I would naively expect --check-prefixes=FOO,COM to work, and disable the comment prefix. Is that complicated to implement?

llvm/utils/FileCheck/FileCheck.cpp
50

Similar comment to earlier. Every multi-sentence diagnostic I've seen in LLVM uses single spaces, so this probably should too.

jdenny marked 4 inline comments as done.May 4 2020, 8:21 AM

Thanks for the review. I've replied to a few comments, and I'll address the rest later.

I hesitate to suggest this, but is it worth using REM as a comment prefix? It's the comment marker for Windows batch files, so has some precedence.

I don't have a strong preference. Does anyone else following this have a preference?

Some data about existing usage:

$ cd llvm.git
$ git grep '\<REM\(-[a-zA-Z0-9_-]*\)\?:' | wc -l
115

In 21 matches (1 test file), REM is a FileCheck prefix.

In 4 matches (2 test files), // REM: is actually used to prefix a comment. I'm not sure why.

The remaining 90 matches (29 test files) use REM: to define a FileCheck variable and shouldn't actually break behavior if REM is also a FileCheck comment prefix. However, to make test code less confusing, people should probably avoid using such a FileCheck comment prefix for other purposes even if it works fine.

$ git grep '\<COM\(-[a-zA-Z0-9_-]*\)\?:' | wc -l
12

In 11 matches (4 test files), COM is a FileCheck prefix. I fixed those as part of this patch.

In 1 match (1 test file), CHECK-COM is a FileCheck prefix. No conflict there.

llvm/test/FileCheck/comment.txt
1 ↗(On Diff #261575)

I don't much care either. The FileCheck test suite is not consistent in this regard.

What about the following style, making use of COM: (or REM: or whatever we choose) for comments as that's it's purpose:

COM: -------------------------------------------------------------------------
COM: Comment directives successfully comment out check directives.
COM: -------------------------------------------------------------------------

COM: Check all default comment prefixes.
COM: Check that a comment directive at the begin/end of the file is handled.
COM: Check that the preceding/following line's check directive is not affected.
RUN: echo 'foo'                   >  %t.in
RUN: echo 'COM: CHECK: bar'       >  %t.chk
RUN: echo 'CHECK: foo'            >> %t.chk
RUN: echo 'RUN:echo "CHECK: baz"' >> %t.chk
RUN: %ProtectFileCheckOutput FileCheck -vv %t.chk < %t.in 2>&1 \
RUN: | FileCheck -check-prefix=SUPPRESSES-CHECKS -DCHECK_LINE=2 %s

COM: Check the case of one user-specified comment prefix.
COM: Check that a comment directive not at the beginning of a line is handled.
RUN: echo 'foo'                                      >  %t.in
RUN: echo 'CHECK: foo'                               >  %t.chk
RUN: echo 'letters then space MY-PREFIX: CHECK: bar' >> %t.chk
RUN: %ProtectFileCheckOutput \
RUN: FileCheck -vv %t.chk -comment-prefixes=MY-PREFIX < %t.in 2>&1 \
RUN: | FileCheck -check-prefix=SUPPRESSES-CHECKS -DCHECK_LINE=1 %s

COM: Check the case of multiple user-specified comment prefixes.
RUN: echo 'foo'               >  %t.in
RUN: echo 'Bar_2: CHECK: Bar' >  %t.chk
RUN: echo 'CHECK: foo'        >> %t.chk
RUN: echo 'Foo_1: CHECK: Foo' >> %t.chk
RUN: echo 'Baz_3: CHECK: Baz' >> %t.chk
RUN: %ProtectFileCheckOutput \
RUN: FileCheck -vv %t.chk -comment-prefixes=Foo_1,Bar_2 \
RUN:           -comment-prefixes=Baz_3 < %t.in 2>&1 \
RUN: | FileCheck -check-prefix=SUPPRESSES-CHECKS -DCHECK_LINE=2 %s

SUPPRESSES-CHECKS: .chk:[[CHECK_LINE]]:8: remark: CHECK: expected string found in input
2 ↗(On Diff #261575)

I was doing that to keep related lines together in a block, but I can use "-----" lines to separate blocks instead, as shown in my reply above.

10–11 ↗(On Diff #261575)

Has the community ever discussed a preferred style for the LLVM coding standards? I don't have strong feelings about what we choose, but settling on one standard would be nice.

143–147 ↗(On Diff #261575)

It's probably not complicated to implement.

However, there can already be much confusion over check prefixes, and people keep proposing diagnostics to catch mistakes: which check prefixes are enabled, what text is intended to be directives vs. comments, undefined prefixes, unused prefixes, what prefixes are mistyped. The motivation for a comment directive is to address part of this confusion. But I don't want to extend the confusion along a new dimension by forcing people to have to also figure out what comment prefixes are in effect. Instead, I think it would be good for LLVM test suites to settle on a single comment style and stick with it as much as possible. The documentation I wrote specifically discourages using -comment-prefixes unless you're working in some test environment where the default comment prefixes just aren't appropriate.

Following this reasoning, I don't think it should be possible to quietly and perhaps accidentally override the default comment prefixes via -check-prefixes. You should have to do so explicitly with -comment-prefixes.

What do you think?

jdenny added a comment.May 4 2020, 8:47 AM

Thanks for the review. I've replied to a few comments, and I'll address the rest later.

I hesitate to suggest this, but is it worth using REM as a comment prefix? It's the comment marker for Windows batch files, so has some precedence.

I don't have a strong preference. Does anyone else following this have a preference?

One reason not to choose REM: is that it looks much more like RUN: than COM:. For example, imagine the long code sample I wrote in my last comment, but with COM: replaced by REM:.

jdenny updated this revision to Diff 261962.May 4 2020, 4:42 PM
jdenny marked 11 inline comments as done.

Addressed most reviewer comments.

jdenny added inline comments.May 4 2020, 4:47 PM
llvm/docs/CommandGuide/FileCheck.rst
57

It doesn't matter to me which we use, especially given that it doesn't seem to affect how the HTML renders. However, the majority of periods (ignoring .. marking blocks) in this document that are followed by at least one space are followed by two spaces. Does that change your suggestion?

llvm/lib/Support/FileCheck.cpp
1922

Good idea: D79375

llvm/test/FileCheck/comment.txt
38–43 ↗(On Diff #261575)

Firstly, why the '8' in the column count in the remark?

Are you asking why it's not 1? Here, FileCheck gives the location of the first character of the pattern from the check file.

Secondly, if COM: was being treated as a genuine comment here, then the check directive would become CHECK: foo which would still be a match of the input, if I'm not mistaken?

Ah, thanks. Fixed.

jdenny updated this revision to Diff 261998.May 4 2020, 8:33 PM

Fixed a missed rename in the test code in the last update.

jhenderson added inline comments.May 5 2020, 1:23 AM
llvm/docs/CommandGuide/FileCheck.rst
57

Ah, it looks like it depends on who wrote the respective section. I know @probinson prefers double-spaces, and he wrote a lot of this stuff originally. I personally prefer single spaces (why waste extra characters etc etc). As the document is already inconsistent, I don't mind. When I made my original comment, I obviously didn't look at the right bits of the docs.

llvm/lib/Support/FileCheck.cpp
1915

This can probably be "check and comment prefixes"

1922

Similar to what I mentioned in D79375, perhaps it would make more sense for COM and RUN to be defined by the tool itself rather than the library to allow users to customise their respective front-ends as they wish.

1965

This is incredibly nit-picky, but perhaps this should have a check for PrefixRegexStr.empty(). It's such an edge case that it probably doesn't matter, but assuming you adopt my comment about the default prefixes being defined by the tool, you could imagine a situation where a different front-end defined no check prefixes, and the user then didn't specify any either.

llvm/test/FileCheck/comment.txt
1 ↗(On Diff #261575)

Honestly, I don't find COM: stands out enough from other non-comment lines, so I generally wouldn't use it unless I needed to workaround an issue, although I'm not opposed to it in principle.

In this particular test, which is all about FileCheck's testing of the comment directive, I'd suggest it might be best to avoid using it for anything other than testing purposes to avoid any risk of confusion.

2 ↗(On Diff #261575)

Ah, I see what you are doing. Maybe it would just be better to split the related groups into separate test files. That'll make them run faster when run in isolation, and allow for easier debugging too, whilst also keeping relateed stuff clearly together.

10–11 ↗(On Diff #261575)

I don't think there's ever been a discussion. Feel free to start one, and I'll chime in.

38–43 ↗(On Diff #261575)

Are you asking why it's not 1? Here, FileCheck gives the location of the first character of the pattern from the check file.

Apparently I can't count. Thanks.

143–147 ↗(On Diff #261575)

Okay, that sounds reasonable to me. I guess it should be fairly obvious what is going on.

jdenny updated this revision to Diff 262141.May 5 2020, 9:32 AM
jdenny marked 14 inline comments as done.

Applied most reviewer comments.

jdenny added inline comments.May 5 2020, 9:37 AM
llvm/lib/Support/FileCheck.cpp
1922

I suggested there that we handle the check prefix side of that change in a separate patch. Perhaps the same patch could handle the comment prefix side of it.

Do you see a reason to handle this before committing the current patches?

1965

Agreed. As you say, it won't matter until tools can define default prefixes, so it can be addressed wherever that's addressed.

llvm/test/FileCheck/comment.txt
1 ↗(On Diff #261575)

In this particular test, which is all about FileCheck's testing of the comment directive, I'd suggest it might be best to avoid using it for anything other than testing purposes to avoid any risk of confusion.

Fair enough. We can do something different for this test. But I am interested in pursuing more discussion on the general appeal of COM:....

Honestly, I don't find COM: stands out enough from other non-comment lines, so I generally wouldn't use it unless I needed to workaround an issue, although I'm not opposed to it in principle.

By that logic, the example from this patch's summary might end up like this:

; X32: pinsrd_1:
; X32:    pinsrd $1, 4(%esp), %xmm0

; FIXME: X64 isn't working correctly yet for this part of codegen, but 
; COM: X64 will have something similar to X32:
; 
; COM:   X64: pinsrd_1:
; COM:   X64:    pinsrd $1, %edi, %xmm0

After more FileCheck diagnostics land, the FIXME line might break, and a new COM: would have to be added.

Instead, my hope was that people would start to see COM: (usually with a leading ;, //, etc.) as a general way to style entire comment blocks in test files. Once that habit is in place and eyes are trained, true comments would become very obvious, and test authors would never have to think about whether FileCheck would trip on a comment that looks like a directive.

Of course, I'm not suggesting an overhaul of all test suites once this lands. I'm just suggesting people would use it more and more to improve readability of tests, both for humans and for tools like FileCheck.

Is there anything we can do to improve the appeal of COM:? Or is it just a matter of getting used to it, as with any comment style?

10–11 ↗(On Diff #261575)

For this patch, I've switched to the style you suggested.

I'm afraid that leading a general discussion on that issue is low on my list of priorities. Maybe some day.

jdenny updated this revision to Diff 262185.May 5 2020, 11:46 AM

My last update accidentally included D79375. This strips it back out. Sorry about that.

jhenderson added inline comments.May 6 2020, 12:47 AM
llvm/lib/Support/FileCheck.cpp
1922

I don't, as there's no current use-case for it. Let's leave it as-is for now.

llvm/test/FileCheck/comment/bad-comment-prefix.txt
37

Perhaps worth checking against RUN as well as COM?

llvm/test/FileCheck/comment/blank-comments.txt
10

I'm assuming that FileCheck treats the entire line after the first CHECK: here as part of the CHECK, and doesn't try to start a second CHECK directive?

llvm/test/FileCheck/comment/suffixes.txt
2–3

I don't think you should change anything here, but if I'm following this right, this leads to the amusing limitation that you can "comment out" (via --comment-prefixes) any CHECK lines that use a suffix (CHECK-NEXT, CHECK-NOT etc) but not those that don't without changing --check-prefixes value!

By the way, do we need to have a test-case for that? I.e. that --comment-prefixes=CHECK-NEXT disables the CHECK-NEXT lines (assuming it does of course)?

llvm/test/FileCheck/comment/suppresses-checks.txt
2

Missing '#'

jdenny updated this revision to Diff 262406.May 6 2020, 9:53 AM
jdenny marked 8 inline comments as done.

Addressed remaining reviewer comments except one, which is still under discussion.

jdenny added inline comments.May 6 2020, 9:54 AM
llvm/test/FileCheck/comment/blank-comments.txt
10

That's right.

llvm/test/FileCheck/comment/suffixes.txt
2–3

I don't think you should change anything here, but if I'm following this right, this leads to the amusing limitation that you can "comment out" (via --comment-prefixes) any CHECK lines that use a suffix (CHECK-NEXT, CHECK-NOT etc) but not those that don't without changing --check-prefixes value!

That's right, but check prefixes have this problem too. That is, you can do things like -check-prefixes=FOO,FOO-NOT so that FOO-NOT is not negative.

ValidatePrefixes should be extended to catch such cases, I think. But in a separate patch.

By the way, do we need to have a test-case for that? I.e. that --comment-prefixes=CHECK-NEXT disables the CHECK-NEXT lines (assuming it does of course)?

Hmm. I think it's behavior we don't want to support. Maybe the test case should be added when extending ValidatePrefixes as I described above?

jhenderson accepted this revision.May 7 2020, 12:26 AM

LGTM, but might not hurt to have someone else looking at it.

llvm/test/FileCheck/comment/bad-comment-prefix.txt
41

I'm guessing the underscore is to circumvent lit trying to run things? If so, I think we need to fix lit to only run lines where the RUN: is at the start of the line (ignoring whitespace and non alpha-numerics). I don't think anything else makes sense.

Not related to this patch of course, so nothing to do here, unless my guess is incorrect.

llvm/test/FileCheck/comment/suffixes.txt
2–3

I agree it's separate work. FWIW, I just came up with a genuinely useful use-case for it with CHECK directives, but it might just be unnecessary. Imagine the case where you want a test where some specific output is printed under one condition and not another condition. You'd want something like:

# RUN: ... | FileCheck %s --check-prefix=WITH
# RUN: ... | FileCheck %s --check-prefix=WITHOUT

# WITH: some text that should be matched
# WITHOUT-NOT: some text that should be matched

A careleses developer could change the text of "WITH" to match some new behaviour without changing "WITHOUT-NOT", thus breaking the second case. You could instead do:

# RUN: ... | FileCheck %s --check-prefix=CHECK-NOT
# RUN: ... | FileCheck %s

# CHECK-NOT: some text that should be matched

Then, if the output changed, you'd update both the regular and NOT match. I might have used this pattern in a few tests in the past had it occurred to me.

Anyway, I think there are other ways of solving that problem, although not without work on FileCheck (I've been prototyping a method with only limited success so far), and I agree it's otherwise mostly horrible, so I'm not seriously opposing the suggestion.

This revision is now accepted and ready to land.May 7 2020, 12:26 AM
jdenny marked 5 inline comments as done.May 7 2020, 7:54 AM

LGTM, but might not hurt to have someone else looking at it.

I'll probably wait until Monday to give others more time to participate.

Thanks for the review!

llvm/test/FileCheck/comment/bad-comment-prefix.txt
41

I'm guessing the underscore is to circumvent lit trying to run things?

Yep.

If so, I think we need to fix lit to only run lines where the RUN: is at the start of the line (ignoring whitespace and non alpha-numerics). I don't think anything else makes sense.

I agree lit should change. That might be the right condition. People do things like the following, but maybe they shouldn't:

clang/test/Index/complete-access-checks-crash.cpp:    Derived(). // RUN: c-index-test -code-completion-at=%s:10:15 %s | FileCheck %s

At the very least, lit shouldn't recognize RUN: at the ends of other words (where - is considered a word character). Lit even recognizes XRUN:.

llvm/test/FileCheck/comment/suffixes.txt
2–3

I agree the use case is important, and I also agree there must be a better solution.

The underlying issue is that we want to reuse a pattern. Perhaps there should be some way to define a FileCheck variable once and reuse it among multiple FileCheck commands. For example:

# RUN: ... | FileCheck %s --check-prefix=WITH,ALL
# RUN: ... | FileCheck %s --check-prefix=WITHOUT,ALL

# ALL-DEF: [[VAR:some text that should be matched]]
# WITH: [[VAR]]
# WITHOUT-NOT: [[VAR]]

It should probably be possible to use regexes in such a variable. I'm not sure if that's possible now. It might require a special variable type. We currently have # to indicate numeric variables. Perhaps ~ would indicate pattern variables.

jdenny marked 2 inline comments as done.May 7 2020, 7:56 AM
thopre added inline comments.May 7 2020, 9:15 AM
llvm/test/FileCheck/comment/after-words.txt
2

How about characters that cannot be part of an identifier? E.g. would "@COM:" be interpreted as a comment? Should we only accept comment prefix if it's a space or beginning of line before?

jdenny marked an inline comment as done.May 7 2020, 10:04 AM
jdenny added inline comments.
llvm/test/FileCheck/comment/after-words.txt
2

Yes, this patch interprets @COM: as a comment directive.

My justification is that, to keep things simple for the FileCheck user and implementation, I tried to keep parsing of comment directives and check directives as similar as possible. One intentional difference is that comment directives are ignored if they have check suffixes, such as -NOT.

jdenny marked an inline comment as done.May 7 2020, 10:10 AM
jdenny added inline comments.
llvm/test/FileCheck/comment/after-words.txt
2

git grep '@RUN' shows that @ without a following space might be a comment style we should support. Here's an example match:

llvm/test/MC/ARM/ldr-pseudo-cond-darwin.s:@RUN: llvm-mc -triple armv7-base-apple-darwin %s | FileCheck --check-prefix=CHECK-ARM --check-prefix=CHECK %s
thopre accepted this revision.May 7 2020, 10:19 AM

LGTM. Sorry for the late review.

llvm/test/FileCheck/comment/after-words.txt
2

You're right, I forgot CHECK directives have the same problem. This should be fixed by the patch that was proposed to force a comment mark or beginning of line for a prefix to be recognized as a directive. Patch LGTM then.

jdenny marked 2 inline comments as done.May 7 2020, 10:50 AM

Thanks!

jhenderson added inline comments.May 11 2020, 12:26 AM
llvm/test/FileCheck/comment/suffixes.txt
2–3

That's almost exactly what I've been prototyping on-and-off over the past few months, but I've been running into various ordering issues, which I haven't yet solved to my satisfaction. My original thread that inspired the idea is http://lists.llvm.org/pipermail/llvm-dev/2020-January/138822.html.

This revision was automatically updated to reflect the committed changes.
jdenny marked an inline comment as done.May 11 2020, 12:28 PM
jdenny added inline comments.
llvm/test/FileCheck/comment/suffixes.txt
2–3

Cool. I think I could benefit from this feature.