Page MenuHomePhabricator

[RFC][FileCheck] Require colon immediately after CHECK directives
Needs ReviewPublic

Authored by jroelofs on Apr 1 2020, 9:44 AM.

Details

Summary

There are a lot of common pattern that results in testcases that look like they
are checking something, but in fact they silently hide failures, leading to
accidental false negatives.

CHECK:          legitimate test
CHECK           gotcha 1
CHECK :         gotcha 2
CHECK-SAME-DAG: gotcha 3
CHECKNEXT:      gotcha 4
CHECKDAG:       gotcha 5

This patch attempts to catch 1 & 2, but it unfortunately still has a lot of
false positives, as it triggers on comments in testcases that refer to the
CHECKs themselves. The vast majority of false positives come from this
diagnostic triggering on RUN lines, which could be ameliorated (mostly) with a
precursor patch that quotes all the CHECK prefixes:

find -E llvm/test -regex ".*\.(c|cpp|m|ll|mir|yaml|s|td|test|txt)" -exec perl -p -i -e "s/check-prefix(es)?[= ]([^ \r\n]+)/check-prefix\1=\"\2\"/g" {} \;

But even with that, there are still ~455 "failures" this causes in llvm/test alone.

Is this worth moving forward on? If so, how else can we improve FileCheck to
prevent these kinds of issues?

Diff Detail

Unit TestsFailed

TimeTest
120 msClang.CodeGen::bittest-intrin.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/clang/11.0.0/include -nostdsysteminc -fms-extensions -triple x86_64-windows-msvc /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGen/bittest-intrin.c -emit-llvm -o - | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGen/bittest-intrin.c --check-prefix=X64
50 msClang.CodeGen::mips-type-sizes-int128.c
Script: -- : 'RUN: at line 1'; not /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/clang/11.0.0/include -nostdsysteminc -triple mips-none-linux-gnu -emit-llvm -w -o - /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGen/mips-type-sizes-int128.c 2> /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/tools/clang/test/CodeGen/Output/mips-type-sizes-int128.c.tmp1
60 msClang.CodeGen::mips-vector-return.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/clang -target mipsel-unknown-linux -O3 -S -o - -emit-llvm /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGen/mips-vector-return.c | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGen/mips-vector-return.c -check-prefix=O32
380 msClang.CodeGenCUDA::device-stub.cu
Script: -- : 'RUN: at line 1'; echo "GPU binary would be here" > /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/tools/clang/test/CodeGenCUDA/Output/device-stub.cu.tmp
30 msClang.CodeGenCXX::attr.cpp
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/clang/11.0.0/include -nostdsysteminc -triple x86_64-pc-linux-gnu -emit-llvm -o - /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGenCXX/attr.cpp | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGenCXX/attr.cpp
View Full Test Results (230 Failed)

Event Timeline

jroelofs created this revision.Apr 1 2020, 9:44 AM
jroelofs edited the summary of this revision. (Show Details)Apr 1 2020, 9:45 AM
jroelofs edited the summary of this revision. (Show Details)Apr 1 2020, 9:47 AM

Here's a sample of what I believe to be the true positives for this new diagnostic:

llvm/test/tools/llvm-symbolizer/flag-grouping.test
llvm/test/tools/llvm-readobj/COFF/codeview-linetables.test:107
llvm/test/tools/llvm-objdump/ELF/ARM/v5te-subarch.s:8
llvm/test/tools/llvm-objcopy/COFF/patch-debug-dir.test:20
llvm/test/tools/llvm-dwarfdump/X86/verify_strings.s:88
llvm/test/tools/llvm-dwarfdump/X86/statistics.ll:47
llvm/test/tools/llvm-ar/replace-update.test:60
llvm/test/tools/dsymutil/fat-binary-output.test:27
llvm/test/tools/dsymutil/X86/modules.m:120
llvm/test/tools/dsymutil/ARM/scattered.c:10
llvm/test/Transforms/SimplifyCFG/pr33605.ll:30
llvm/test/Transforms/SimplifyCFG/Hexagon/switch-to-lookup-table.ll
llvm/test/Transforms/SampleProfile/gcc-simple.ll:132
llvm/test/Transforms/PGOProfile/memop_clone.ll:13
llvm/test/Transforms/PGOProfile/icp_invoke.ll:104
llvm/test/Transforms/LoopVectorize/followup.ll:32
llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll:132
llvm/test/Transforms/LoopVectorize/debugloc.ll:20
llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll:203
llvm/test/Transforms/LoopUnroll/runtime-loop5.ll:26
llvm/test/Transforms/LoopUnroll/peel-loop-negative.ll:7
llvm/test/Transforms/LoopUnroll/AArch64/falkor-prefetch.ll
llvm/test/Transforms/LoopStrengthReduce/X86/lsr-insns-2.ll:13
llvm/test/Transforms/LoopRotate/pr35210.ll:67
llvm/test/Transforms/LoopInterchange/call-instructions.ll:30
llvm/test/Transforms/LoopDeletion/use-in-unreachable.ll:9
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll:114
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/vect-ptr-ptr-size-mismatch.ll:58
llvm/test/Transforms/InstSimplify/compare.ll:281
llvm/test/Transforms/InstCombine/str-int.ll:112
llvm/test/Transforms/InstCombine/str-int-2.ll:111
llvm/test/Transforms/InstCombine/phi-preserve-ir-flags.ll:20
llvm/test/Transforms/InstCombine/intptr1.ll:30

These came from a range of about 6k test files, putting failure rate of this class of bugs right around 0.53%.

jdenny added a subscriber: jdenny.Apr 1 2020, 10:12 AM

Thanks for working on this.

CHECK           gotcha 1

This case seems like it would trigger too many false positives. Do you have numbers on how many true positives that this case alone catches and that are not caught by the other cases?

CHECK :         gotcha 2

As long as only whitespace separates CHECK and :, this one seems ok.

CHECK-SAME-DAG: gotcha 3
CHECKNEXT:      gotcha 4
CHECKDAG:       gotcha 5

These seem much less likely to trigger false positives.

@probinson and @SjoerdMeijer were also involved in previous discussions in this area and might want to take a look.

The first I looked up llvm/test/Transforms/LoopUnroll/runtime-loop5.ll:26 is UNROLL-16-LABEL-NOT:, which may not be allowed.
Real error (partial list):

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll:114
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/vect-ptr-ptr-size-mismatch.ll:58
llvm/test/Transforms/InstCombine/intptr1.ll:30
llvm/test/Transforms/InstCombine/phi-preserve-ir-flags.ll:20 [almost all lines]

Harmless errors (partial list):

llvm/test/Transforms/InstSimplify/compare.ll:281
llvm/test/Transforms/InstCombine/str-int.ll:112
llvm/test/Transforms/InstCombine/str-int-2.ll:111

The first I looked up llvm/test/Transforms/LoopUnroll/runtime-loop5.ll:26 is UNROLL-16-LABEL-NOT:, which may not be allowed.

There are already a number of bad NOT combination checks. I'm not sure why it doesn't include LABEL.

Real error (partial list):

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll:114
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/vect-ptr-ptr-size-mismatch.ll:58
llvm/test/Transforms/InstCombine/intptr1.ll:30
llvm/test/Transforms/InstCombine/phi-preserve-ir-flags.ll:20 [almost all lines]

That's unfortunate.

Checks that are likely to produce many false positives (like standalone CHECK) should be opt-in. Ideally, such checks would also have error recovery (warning not error) so the user can see all occurrences and ignore false positives instead of having to rewrite false positives to see later occurrences. To reduce false positives, checks might need heuristics like detecting whether CHECK appears at the beginning of a line and possibly in comments. I don't know that we want to complicate FileCheck itself with such heuristics. For example, FileCheck doesn't currently parse comments. A separate script might be better.

Cases that are very unlikely to produce false positives (like bad NOT combinations) are safer to be implemented directly in FileCheck.

The first I looked up llvm/test/Transforms/LoopUnroll/runtime-loop5.ll:26 is UNROLL-16-LABEL-NOT:, which may not be allowed.

There are already a number of bad NOT combination checks. I'm not sure why it doesn't include LABEL.

I was trying to avoid doing a combinatoric check of all suffixes, but maybe I was being too optimistic. People are far more creative about their mistakes than I had imagined. CHECK-SAME-DAG? Whoa.

This patch attempts to catch 1 & 2, but it unfortunately still has a lot of false positives, as it triggers on comments in testcases that refer to the CHECKs themselves.

While FileCheck allows a CHECK directive anywhere, in practice people always put them at the start of a line with possibly comment delimiters of some form in front (which are always punctuation marks of some kind, in the languages that FileCheck supports). Maybe FileCheck should say, that's where directives *have* to be. Are there real tests where that would be a problem?

Thanks for working on this.

CHECK           gotcha 1

This case seems like it would trigger too many false positives. Do you have numbers on how many true positives that this case alone catches and that are not caught by the other cases?

I don't yet, but I'd be happy to take some measurements/estimates.

Subjectively, it was pretty bad. There are a lot of CHECK's without colons in comments (rightfully so).

One solution to that might be to add a FileCheck-specific comment sequence, though I don't imagine that being very ergonomic to use, and wouldn't help with all the existing tests that have these false positive comments.

CHECK :         gotcha 2

As long as only whitespace separates CHECK and :, this one seems ok.

Ok in what sense? If there is any whitespace between the check directive and the colon, FileCheck does not trigger on it. That, to me, makes it look like a valid test that doesn't actually doing anything. Whether this is intentional or not isn't particularly clear to me.

CHECK-SAME-DAG: gotcha 3
CHECKNEXT:      gotcha 4
CHECKDAG:       gotcha 5

These seem much less likely to trigger false positives.

Agreed.

@probinson and @SjoerdMeijer were also involved in previous discussions in this area and might want to take a look.

Maybe FileCheck should say, that's where directives *have* to be. Are there real tests where that would be a problem?

Very good question, though I'm not sure what that buys us.

Big plus is that we would be able to skip the massive patch that quotes all the --check-prefix= arguments on the RUN lines.

If we require that CHECKs happen within the first N columns (for small N), there'd still be a decision to make as to what to do about ones that are detected beyond that limit. If those are forbidden, then we end up diagnosing use in comments again. If they are silently ignored, I'd argue we've made things worse.

Thanks for working on this.

CHECK           gotcha 1

This case seems like it would trigger too many false positives. Do you have numbers on how many true positives that this case alone catches and that are not caught by the other cases?

I don't yet, but I'd be happy to take some measurements/estimates.

Subjectively, it was pretty bad. There are a lot of CHECK's without colons in comments (rightfully so).

I was trying to gauge whether this case is worth the effort. @jdoerfert called out several true positives. If those are representative, then maybe it is worthwhile... at least in some form.

One solution to that might be to add a FileCheck-specific comment sequence,

I think that idea came up once before. It might also be useful when you want to temporarily comment out a directive. It would be better than the current practice of mangling the directive.

though I don't imagine that being very ergonomic to use, and wouldn't help with all the existing tests that have these false positive comments.

Agreed.

CHECK :         gotcha 2

As long as only whitespace separates CHECK and :, this one seems ok.

Ok in what sense? If there is any whitespace between the check directive and the colon, FileCheck does not trigger on it. That, to me, makes it look like a valid test that doesn't actually doing anything. Whether this is intentional or not isn't particularly clear to me.

I'm assuming that the whitespace is likely to be unintentional and that the line is meant to be a directive. We really need numbers for existing test suites to determine the likelihood it's a false positive instead. If so, then it needs to be opt-in (and perhaps in a separate script as I mentioned earlier).

While FileCheck allows a CHECK directive anywhere, in practice people always put them at the start of a line with possibly comment delimiters of some form in front (which are always punctuation marks of some kind, in the languages that FileCheck supports). Maybe FileCheck should say, that's where directives *have* to be. Are there real tests where that would be a problem?

I just tried this in clang/test:

$ git grep '[a-zA-Z0-9].*// CHECK'
1543

Eyeballing it, they're mostly legitimate FileCheck directives.

While FileCheck allows a CHECK directive anywhere, in practice people always put them at the start of a line with possibly comment delimiters of some form in front (which are always punctuation marks of some kind, in the languages that FileCheck supports). Maybe FileCheck should say, that's where directives *have* to be. Are there real tests where that would be a problem?

I just tried this in clang/test:

$ git grep '[a-zA-Z0-9].*// CHECK'

I accidentally dropped the | wc -l there.

1543

Eyeballing it, they're mostly legitimate FileCheck directives.

Okay, 1500+ tests to fix seems excessive. Requiring CHECK directives to be on lines by themselves (with optional comment) is more restrictive than the LLVM test suite can tolerate.

I tried this:
git grep -I "CHECK\b" llvm\test | grep -v "CHECK.*:" | grep -v "RUN:" | wc -l
which found 387 lines with the word CHECK not followed by a colon, and not on a RUN line. That's just for the word CHECK, rather than actual check prefixes. (The -I is, skip binary files.) I captured the results and hand-edited out the commentary lines and other cases that were obviously not mistakes.
Final tally, 178

That's a really high false-positive rate, although I freely admit it's simplistic--many of the captured lines were using CHECK in a variable name and had some other prefix as the actual directive.

On the other hand, I just found 178 CHECK lines across 72 test files that are *probably* mistakes, and at least deserve a closer look. That's a valuable test-validity tool, and it's hard to argue that we *shouldn't* have something like this.

From the original description:

The vast majority of false positives come from this diagnostic triggering on RUN lines,

If I leave out the filtering on "RUN:" then I get 10,660 hits, instead of 387. So, if we want a diagnostic aid like this in FileCheck itself, it really needs to skip RUN: lines.

jdenny added a comment.Apr 1 2020, 2:27 PM

Okay, 1500+ tests to fix seems excessive. Requiring CHECK directives to be on lines by themselves (with optional comment) is more restrictive than the LLVM test suite can tolerate.

Maybe the right rule should be that it *either* appears on a line by itself *or* at the beginning of some single-line comment style (which might not be at the beginning of a line).

If I leave out the filtering on "RUN:" then I get 10,660 hits, instead of 387. So, if we want a diagnostic aid like this in FileCheck itself, it really needs to skip RUN: lines.

The above rule would rule out FileCheck prefixes within RUN lines.

Okay, 1500+ tests to fix seems excessive. Requiring CHECK directives to be on lines by themselves (with optional comment) is more restrictive than the LLVM test suite can tolerate.

Maybe the right rule should be that it *either* appears on a line by itself *or* at the beginning of some single-line comment style (which might not be at the beginning of a line).

So only diagnose if it meets those conditions (with some inferred comment delimiter based on file extension/flag?), but is also missing the colon?

Okay, 1500+ tests to fix seems excessive. Requiring CHECK directives to be on lines by themselves (with optional comment) is more restrictive than the LLVM test suite can tolerate.

Maybe the right rule should be that it *either* appears on a line by itself *or* at the beginning of some single-line comment style (which might not be at the beginning of a line).

Worth a shot.

If I leave out the filtering on "RUN:" then I get 10,660 hits, instead of 387. So, if we want a diagnostic aid like this in FileCheck itself, it really needs to skip RUN: lines.

The above rule would rule out FileCheck prefixes within RUN lines.

There aren't going to be FileCheck directives on a RUN line, so excluding RUN lines from the "does this line have a potentially incorrect directive on it?" check, won't filter out any true positives. It will filter out about 98% of the false positives.

Or am I not understanding your comment?

jdenny added a comment.Apr 1 2020, 3:16 PM

Okay, 1500+ tests to fix seems excessive. Requiring CHECK directives to be on lines by themselves (with optional comment) is more restrictive than the LLVM test suite can tolerate.

Maybe the right rule should be that it *either* appears on a line by itself *or* at the beginning of some single-line comment style (which might not be at the beginning of a line).

So only diagnose if it meets those conditions (with some inferred comment delimiter based on file extension/flag?),

I think it would be easier just to look for any common single-line comment style regardless of file type.

but is also missing the colon?

That still bugs me because prefixes mentioned in English sentences in comments can easily wrap to the start of a comment line.

I still say complex diagnostics like that one should be opt-in, perhaps via a separate script. Besides, it's probably easier to build arbitrary, user-configurable heuristics in python than in C++ within FileCheck.

jdenny added a comment.Apr 1 2020, 3:19 PM

Okay, 1500+ tests to fix seems excessive. Requiring CHECK directives to be on lines by themselves (with optional comment) is more restrictive than the LLVM test suite can tolerate.

Maybe the right rule should be that it *either* appears on a line by itself *or* at the beginning of some single-line comment style (which might not be at the beginning of a line).

Worth a shot.

If I leave out the filtering on "RUN:" then I get 10,660 hits, instead of 387. So, if we want a diagnostic aid like this in FileCheck itself, it really needs to skip RUN: lines.

The above rule would rule out FileCheck prefixes within RUN lines.

There aren't going to be FileCheck directives on a RUN line, so excluding RUN lines from the "does this line have a potentially incorrect directive on it?" check, won't filter out any true positives. It will filter out about 98% of the false positives.

Or am I not understanding your comment?

I agree we don't want these diagnostics to examine FileCheck prefixes within RUN lines. I was just observing that the rule as I formulated it above would be sufficient for that purpose, so we wouldn't need a separate constraint for RUN lines.

Okay, 1500+ tests to fix seems excessive. Requiring CHECK directives to be on lines by themselves (with optional comment) is more restrictive than the LLVM test suite can tolerate.

Maybe the right rule should be that it *either* appears on a line by itself *or* at the beginning of some single-line comment style (which might not be at the beginning of a line).

So only diagnose if it meets those conditions (with some inferred comment delimiter based on file extension/flag?), but is also missing the colon?

Right, once you find a check-prefix, you look at the text before it on the line, which must either be all whitespace, or arbitrary text terminated by a comment delimiter. I wouldn't get fancy about aligning delimiters with file types; the set is pretty small, basically # // ; should cover essentially everything.
The above rule would exclude nearly everything that my "grep" sequence found, that wasn't really a mistake.

Now, there are tests in the list that my "grep" sequence found, that have comment blocks along the lines of

; CHECK1 - look for one thing
; CHECK2 - look for another thing
; CHECK3 - look for something else

which of course look exactly like the mistakes we're trying to find. We could "fix" those tests (there weren't many, IIRC) by inserting some non-comment text before the check-prefix name, and maybe we can live with that restriction.

but is also missing the colon?

That still bugs me because prefixes mentioned in English sentences in comments can easily wrap to the start of a comment line.

If forced to pick between rewriting such comments in those one-off [1] cases vs catching this class of bugs, I would probably lean toward the latter.

1: that those cases are infrequent compared to the general case of this bug is an assumption I need to validate.

I still say complex diagnostics like that one should be opt-in, perhaps via a separate script. Besides, it's probably easier to build arbitrary, user-configurable heuristics in python than in C++ within FileCheck.

I'd like to steer away from making these things super configurable, since that will lead to different test suites having different assumptions, and likely steepen the learning curve for working on said tests. If there's some rule we could arrive at that catches the vast majority of problematic cases, without disturbing too many false positive comments, I think it should be applied as universally as possible across the whole codebase.

That said, I think we should decide what rules to validate before deciding what tool should validate them.

I agree we don't want these diagnostics to examine FileCheck prefixes within RUN lines. I was just observing that the rule as I formulated it above would be sufficient for that purpose, so we wouldn't need a separate constraint for RUN lines.

Ah, so I misunderstood what you meant by "the above rule" sorry about that! If we require a comment marker to precede the directive, then we're good (until some clever language decides to use '=' as their comment delimiter).

I think the time has come to implement the new rule, and see what breaks. It's past the end of my working day or I'd try to come up with another grep sequence.

I am only faintly worried that there might actually be a capricious test out there that says
// this is too clever: CHECK: this happens
but again I'll leave finding that for one of you or for another day.

thopre added a comment.Apr 1 2020, 4:07 PM

I agree we don't want these diagnostics to examine FileCheck prefixes within RUN lines. I was just observing that the rule as I formulated it above would be sufficient for that purpose, so we wouldn't need a separate constraint for RUN lines.

Ah, so I misunderstood what you meant by "the above rule" sorry about that! If we require a comment marker to precede the directive, then we're good (until some clever language decides to use '=' as their comment delimiter).

I would also like to keep FileCheck usable outside of LLVM since I think it's a valuable tool. Therefore it's nice if we can indeed not special case RUN lines which it seems we can.

jdenny added a comment.Apr 1 2020, 4:28 PM

Agreed. At this point, let's try and see what happens.

After sleeping on it....

I wouldn't get fancy about aligning delimiters with file types; the set is pretty small, basically # // ; should cover essentially everything.

It would simplify the check if we looked for single-character delimiters, so / instead of // for C++ (which incidentally also covers Rust and C++-style comments in C). I'm fairly sure Objective-C[++] doesn't add new comment characters.

There are Clang .c tests that use /* CHECK so we should add * to the list. Julia uses # which is already on the list. The flang project is coming so we should be ready for Fortran; historically that used C in column 1, but the language has evolved and ! is now common. I think it's reasonable to require Fortran tests to use the ! comment delimiter, although it would be courteous to get the flang people to say okay. Are there other cases we should consider?

The text-preceding-the-directive would then have to be either whitespace or arbitrary text ending with [#/;*!] which is easy to express with a regex.

I believe where we're headed is that the (whitespace or ending with [#/;*!]) rule is how we'll identify directives that we expect the user *intended* to be directives for FileCheck to act on. If we then find syntactic errors (multiple suffixes, missing colon) we'll report those as errors.

Would it be useful to have one of the verbose modes report directive matches that *don't* satisfy the (whitespace or [#/;*!]) rule? It would report all the false positives, but could be really helpful in diagnosing a malfunctioning test. And it would be easy to add this, as we've already analyzed the line to see if it's one we should act on.

I believe where we're headed is that the (whitespace or ending with [#/;*!]) rule is how we'll identify directives that we expect the user *intended* to be directives for FileCheck to act on. If we then find syntactic errors (multiple suffixes, missing colon) we'll report those as errors.

I'm not sure if I'm reading this correctly. Which of the following do you intend?

A. This new rule constrains where FileCheck recognizes malformed directives for diagnostics.

B. This new rule constrains where FileCheck recognizes both (1) malformed directives for diagnostics and (2) well formed directives for execution.

I thought we were planning A.

I think the only change from A to B is to (at least theoretically) create a new class of malformed directives that won't be executed, and they happen to also be ones we cannot diagnose under this new rule. We would need the mode you describe below to diagnose these, but those diagnostics would apparently be plagued by false positives.

If that new class of malformed directives doesn't exist in practice, then A = B in practice, so we might as well stick with A to avoid the difference in the future.

B seems to have no advantage.

Would it be useful to have one of the verbose modes report directive matches that *don't* satisfy the (whitespace or [#/;*!]) rule? It would report all the false positives, but could be really helpful in diagnosing a malfunctioning test. And it would be easy to add this, as we've already analyzed the line to see if it's one we should act on.

I think such a mode would be helpful. This is the type of configurability I was thinking of.

jroelofs updated this revision to Diff 254562.Apr 2 2020, 11:25 AM

Draft of paulr/jdenny's fuzzy-comment-prefix filter idea.

After sleeping on it....

snip

I believe where we're headed is that the (whitespace or ending with [#/;*!]) rule is how we'll identify directives that we expect the user *intended* to be directives for FileCheck to act on. If we then find syntactic errors (multiple suffixes, missing colon) we'll report those as errors.

Fantastic idea. This rule seems to have a really good signal/noise ratio on this class of bug, at least for llvm/tests.

The diagnostic in this patch triggers 186 times in the suite of 36125 tests in llvm/test. Most of them are true positives, though there are ~45 false-positives that seem easy to work around by rewording the comments:

llvm/test/CodeGen/X86/fp-intrinsics.ll:922
llvm/test/CodeGen/X86/avx-cast.ll:27
llvm/test/CodeGen/Mips/mips64muldiv.ll

Would it be useful to have one of the verbose modes report directive matches that *don't* satisfy the (whitespace or [#/;*!]) rule? It would report all the false positives, but could be really helpful in diagnosing a malfunctioning test. And it would be easy to add this, as we've already analyzed the line to see if it's one we should act on.

Another approach would be to always emit them as warnings, and conditionally elevate that to a hard failure with a -Werror flag.

The diagnostic in this patch triggers 186 times in the suite of 36125 tests in llvm/test. Most of them are true positives, though there are ~45 false-positives that seem easy to work around by rewording the comments:

So 1/4 of positives are false positives? Am I reading those numbers correctly?

The diagnostic in this patch triggers 186 times in the suite of 36125 tests in llvm/test. Most of them are true positives, though there are ~45 false-positives that seem easy to work around by rewording the comments:

So 1/4 of positives are false positives? Am I reading those numbers correctly?

Yes, that's correct.

Here's an example that justifies all of this for me:

llvm/test/Analysis/MemorySSA/invariant-groups.ll:345

If you add the colon there, the test starts failing.

jdenny added a comment.Apr 2 2020, 2:32 PM

The results so far are promising, but it's not clear to me yet that this should be on by default.

I think the next step is to try more test suites. It has to be done eventually. Clang would be a good one.

llvm/lib/Support/FileCheck.cpp
1131–1145

I believe you need to change the order here. For example, CHECK-NEXT-NOT will match CHECK-NEXT and then be ignored. Likewise for CHECK-NOT-NEXT and CHECK-NOT.

If FileCheck's test suite doesn't already cover cases like that, now would be a good time to extend it. It might not be a bad idea to move all that into a parent patch as the refactoring and tests seems worthwhile independently of the new diagnostic.

jdenny added inline comments.Apr 2 2020, 2:35 PM
llvm/lib/Support/FileCheck.cpp
1131–1145

Oh, and please add a comment explaining why order here matters.

I believe where we're headed is that the (whitespace or ending with [#/;*!]) rule is how we'll identify directives that we expect the user *intended* to be directives for FileCheck to act on. If we then find syntactic errors (multiple suffixes, missing colon) we'll report those as errors.

I'm not sure if I'm reading this correctly. Which of the following do you intend?

A. This new rule constrains where FileCheck recognizes malformed directives for diagnostics.

B. This new rule constrains where FileCheck recognizes both (1) malformed directives for diagnostics and (2) well formed directives for execution.

I thought we were planning A.

I think the only change from A to B is to (at least theoretically) create a new class of malformed directives that won't be executed, and they happen to also be ones we cannot diagnose under this new rule. We would need the mode you describe below to diagnose these, but those diagnostics would apparently be plagued by false positives.

Ah. My idea was to constrain executable directives, on the theory that the existing population of actual directives that DON'T follow the (whitespace or comment character) rule is minuscule, and that it would be user-hostile to execute a directive in a context where we can't diagnose the same typo errors that we DO diagnose elsewhere.

Regarding the population of directives that don't follow the rule, I did some grepping:
find llvm/test -type f -exec grep '\bCHECK:' \{} \; | grep -v '[#/;*!_@-]\s*CHECK:' | grep -v '^\s*CHECK:'
(I added _@- because FOO-CHECK: and FOO_CHECK: show up a fair amount; and it seems @ is sometimes an assembler comment character.)

This produced 37 lines of output. Two are from binary files and one from a Python file. 27 are from FileCheck's own tests, and it looks like only one of those tests (line-count.txt) would have to be modified, because it uses lines such as
17 CHECK: [[@LINE-3]] {{a}}aa

That leaves 7 more lines,
llvm/test/tools/llvm-objdump/COFF/large-bss.test has a leading colon.
llvm/test/Transforms/LoopSimplifyCFG/constant-fold-branch.ll has 4 lines that use CHECK as the name of a FileCheck variable; while this blows my mind, it wouldn't actually be treated as a real directive or caught by the diagnostic.
llvm/test/Transforms/[New]GVN/pr25440.ll each have one line with ;%CHECK:.

Repeating the exercise in clang/test, there are 10 matches; 6 are intended to "comment out" a CHECK: directive, two are another use of CHECK as a FileCheck variable, and the last two are of the form // line #4: CHECK: yadayada.

So, the population of genuine directives that don't follow the (whitespace or comment character) rule is extremely small, but not zero.

Do we try to find and fix them all, so that we can use the more restrictive rule?
Or do we fail to diagnose problems with directives that don't follow the rule, because that would introduce too many false positives?

Wondering what y'all think about that.

The diagnostic in this patch triggers 186 times in the suite of 36125 tests in llvm/test. Most of them are true positives, though there are ~45 false-positives that seem easy to work around by rewording the comments:

So 1/4 of positives are false positives? Am I reading those numbers correctly?

Yes, that's correct.

Probably a little higher, actually. I've been working through each positive individually, and have found a few things that should not have been broken. I suspect I've introduced a bug in FindCheckType.

jdenny added a comment.Apr 2 2020, 3:02 PM

So, the population of genuine directives that don't follow the (whitespace or comment character) rule is extremely small, but not zero.

Thanks for the analysis.

Do we try to find and fix them all, so that we can use the more restrictive rule?
Or do we fail to diagnose problems with directives that don't follow the rule, because that would introduce too many false positives?

Instead of ignoring well formed directives that don't follow the rule, maybe FileCheck should report them as errors.

FileCheck still couldn't diagnose *malformed* directives that don't follow the rule, but at least FileCheck would start discouraging people from thinking it's ok not to follow the rule.

jdenny added a comment.Apr 2 2020, 3:22 PM

Instead of ignoring well formed directives that don't follow the rule, maybe FileCheck should report them as errors.

I fear that was not clear. Consider this example that has a well formed directive that doesn't follow the rule:

// FIXME(201806L) CHECK: assert: 0

Approach A (from a previous comment): FileCheck executes the directive. If the user later accidentally removes the :, FileCheck won't execute the directive and won't diagnose the error unless the user is wiling to endure false positives by opting into the more verbose mode Paul suggested.

Approach B (from that some comment): FileCheck ignores the directive. That just makes things worse because the above otherwise well formed directive is then an undiagnosed malformed directive (unless the user opts into a more verbose mode).

Approach C (new proposal): FileCheck reports the directive as an error (in any mode). The more verbose mode is still needed to catch the case that the : is missing here, but at least users are guaranteed to get a slap when they write them with :.

jhenderson added a subscriber: jhenderson.

Chiming in late, as I only just saw this discussion. I'll be honest, I don't fully follow the explanations. Would it be possible to write up the policy in FileCheck's documentation somewhere?

Ok in what sense? If there is any whitespace between the check directive and the colon, FileCheck does not trigger on it. That, to me, makes it look like a valid test that doesn't actually doing anything. Whether this is intentional or not isn't particularly clear to me.

This one seems like something that should be fixed in FileCheck to me. It seems reasonable to add whitespace before or after the "CHECK" part to make things line up in certain cases, particularly when --match-full-lines + --strict-whitespace are enabled.

I've written and/or reviewed several tests which successfully use something like the following pattern for readability:

# RUN: ... | FileCheck %s --check-prefixes=ONE,ANOTHER --strict-whitespace --match-full-lines

#          ONE:some output
#     ONE-NEXT:more output
# ANOTHER-NEXT:yet more output

Beauty is in the eye of the beholder, but some people might prefer:

# RUN: ... | FileCheck %s --check-prefixes=ONE,ANOTHER --strict-whitespace --match-full-lines

# ONE         :some output
# ONE-NEXT    :more output
# ANOTHER-NEXT:yet more output

so it seems like it should work.

One solution to that might be to add a FileCheck-specific comment sequence,

I think that idea came up once before. It might also be useful when you want to temporarily comment out a directive. It would be better than the current practice of mangling the directive.

FWIW, I'd be supportive of something like this too (I'd go with '##' personally, because that's how I've started marking my comments in tests, to distinguish from lit/FileCheck lines). Mangling the CHECK directive is just asking for forgetting to unmangle it later, and I've had to do the former on more than one occasion. FWIW, I think any comment syntax we adopted should be followed by lit too. I like the symmetry in lit directives and FileCheck prefixes as things stand.

Chiming in late, as I only just saw this discussion. I'll be honest, I don't fully follow the explanations. Would it be possible to write up the policy in FileCheck's documentation somewhere?

After we figure out what it should be. :-) But there's a deeper point here, the debate should take place on llvm-dev and not in a review.

Ok in what sense? If there is any whitespace between the check directive and the colon, FileCheck does not trigger on it. That, to me, makes it look like a valid test that doesn't actually doing anything. Whether this is intentional or not isn't particularly clear to me.

This one seems like something that should be fixed in FileCheck to me. It seems reasonable to add whitespace before or after the "CHECK" part to make things line up in certain cases, particularly when --match-full-lines + --strict-whitespace are enabled.

I've written and/or reviewed several tests which successfully use something like the following pattern for readability:

# RUN: ... | FileCheck %s --check-prefixes=ONE,ANOTHER --strict-whitespace --match-full-lines

#          ONE:some output
#     ONE-NEXT:more output
# ANOTHER-NEXT:yet more output

Beauty is in the eye of the beholder, but some people might prefer:

# RUN: ... | FileCheck %s --check-prefixes=ONE,ANOTHER --strict-whitespace --match-full-lines

# ONE         :some output
# ONE-NEXT    :more output
# ANOTHER-NEXT:yet more output

so it seems like it should work.

I'd be okay with that.

One solution to that might be to add a FileCheck-specific comment sequence,

I think that idea came up once before. It might also be useful when you want to temporarily comment out a directive. It would be better than the current practice of mangling the directive.

FWIW, I'd be supportive of something like this too (I'd go with '##' personally, because that's how I've started marking my comments in tests, to distinguish from lit/FileCheck lines). Mangling the CHECK directive is just asking for forgetting to unmangle it later, and I've had to do the former on more than one occasion. FWIW, I think any comment syntax we adopted should be followed by lit too. I like the symmetry in lit directives and FileCheck prefixes as things stand.

I wouldn't use ## as I believe there are some assembler variants that use ## (and not just #) as the comment delimiter. I have a vague memory of writing (possibly Darwin?) assembler where single # was insufficient.

We could use = which would take care of many of the false-positives from RUN: lines. Within llvm\test,
--check-prefix FOO occurs 519 times while --check-prefix=FOO occurs 13313 times.
--check-prefixes FOO occurs 85 times while --check-prefixes=FOO occurs 5312 times
so it would take a chunk of churn before we could require = on the option; that part could be automated, if we went that route.

jdenny added a comment.Apr 3 2020, 7:38 AM

Ok in what sense? If there is any whitespace between the check directive and the colon, FileCheck does not trigger on it. That, to me, makes it look like a valid test that doesn't actually doing anything. Whether this is intentional or not isn't particularly clear to me.

This one seems like something that should be fixed in FileCheck to me. It seems reasonable to add whitespace before or after the "CHECK" part to make things line up in certain cases, particularly when --match-full-lines + --strict-whitespace are enabled.

I've written and/or reviewed several tests which successfully use something like the following pattern for readability:

# RUN: ... | FileCheck %s --check-prefixes=ONE,ANOTHER --strict-whitespace --match-full-lines

#          ONE:some output
#     ONE-NEXT:more output
# ANOTHER-NEXT:yet more output

Beauty is in the eye of the beholder, but some people might prefer:

# RUN: ... | FileCheck %s --check-prefixes=ONE,ANOTHER --strict-whitespace --match-full-lines

# ONE         :some output
# ONE-NEXT    :more output
# ANOTHER-NEXT:yet more output

so it seems like it should work.

I'd be okay with that.

Agreed. In a parent revision.

One solution to that might be to add a FileCheck-specific comment sequence,

I think that idea came up once before. It might also be useful when you want to temporarily comment out a directive. It would be better than the current practice of mangling the directive.

FWIW, I'd be supportive of something like this too (I'd go with '##' personally, because that's how I've started marking my comments in tests, to distinguish from lit/FileCheck lines). Mangling the CHECK directive is just asking for forgetting to unmangle it later, and I've had to do the former on more than one occasion. FWIW, I think any comment syntax we adopted should be followed by lit too. I like the symmetry in lit directives and FileCheck prefixes as things stand.

I wouldn't use ## as I believe there are some assembler variants that use ## (and not just #) as the comment delimiter. I have a vague memory of writing (possibly Darwin?) assembler where single # was insufficient.

We could use = which would take care of many of the false-positives from RUN: lines. Within llvm\test,
--check-prefix FOO occurs 519 times while --check-prefix=FOO occurs 13313 times.
--check-prefixes FOO occurs 85 times while --check-prefixes=FOO occurs 5312 times
so it would take a chunk of churn before we could require = on the option; that part could be automated, if we went that route.

I think we're converging on another solution for RUN lines, so hopefully this isn't needed.

I feel like using any kind of punctuation to comment out directives is going to lead to the same problems we're trying to diagnose now: people will write directives that are ignored but won't realize they're ignored.

I would prefer something very explicit, like a CHECK-COM directive that accepts and discards the rest of the line. It'll be hard to misunderstand that. An issue is that you'd need to use the same prefix as any directive it contains as other prefixes might not be enabled.

Chiming in late, as I only just saw this discussion. I'll be honest, I don't fully follow the explanations. Would it be possible to write up the policy in FileCheck's documentation somewhere?

After we figure out what it should be. :-) But there's a deeper point here, the debate should take place on llvm-dev and not in a review.

Good point. I'll try to summarize our observations / hypotheses so far & start a thread there.

RKSimon added a subscriber: RKSimon.Apr 3 2020, 8:22 AM

Grepping for \bCHECK[A-Z0-9-]*\s+: finds nothing in clang/test or llvm/test. Is "gotcha 2" a real thing we should worry about? If there are any cases like that, we'd catch them under the checks for "gotcha 1".
I'm inclined to change my mind and say let's NOT allow spaces before the colon. There are actual CHECKs in the wild that want to check strings that start with a colon, and the rules about CHECK directives are already a bit much. People are used to indenting directives to make things line up, I don't see spaces-before-colon as really being a huge win for readability.

Grepping for \bCHECK(NEXT|DAG|SAME|NOT): in llvm/test and clang/test (that is, a directive missing the hyphen separator) turns up nothing. Are gotchas 4 and 5 really something we should worry about?

jdenny added a comment.Apr 3 2020, 9:22 AM

Grepping for \bCHECK[A-Z0-9-]*\s+: finds nothing in clang/test or llvm/test. Is "gotcha 2" a real thing we should worry about? If there are any cases like that, we'd catch them under the checks for "gotcha 1".

You need a backslash before that +. It does match.

jdenny added a comment.Apr 3 2020, 9:35 AM

I'm inclined to change my mind and say let's NOT allow spaces before the colon. There are actual CHECKs in the wild that want to check strings that start with a colon, and the rules about CHECK directives are already a bit much. People are used to indenting directives to make things line up, I don't see spaces-before-colon as really being a huge win for readability.

I think I agree. Permitting those spaces would mean we couldn't diagnose this:

// CHECK: :15:1: error: expected identifier or '('
// CHECK  :13:14: note: expanded from macro 'FOOL'
// CHECK: :12:15: note: expanded from macro 'DROOL'

But there's a deeper point here, the debate should take place on llvm-dev and not in a review.

Good point. I'll try to summarize our observations / hypotheses so far & start a thread there.

http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html

Grepping for \bCHECK[A-Z0-9-]*\s+: finds nothing in clang/test or llvm/test. Is "gotcha 2" a real thing we should worry about? If there are any cases like that, we'd catch them under the checks for "gotcha 1".

You need a backslash before that +. It does match.

Well I'll be... Matches 46 lines in 10 files under llvm/test, none in clang/test. Worth looking for, then.

Grepping for \bCHECK(NEXT|DAG|SAME|NOT): in llvm/test and clang/test (that is, a directive missing the hyphen separator) turns up nothing. Are gotchas 4 and 5 really something we should worry about?

After Joel pointed out my grep-failure in a previous case, I revisited this with proper backslash quoting.

CHECK_[suffix] (underscore not hyphen) occurs 25 times in llvm/test and 15 times in clang/test.
CHECK[suffix] (missing hyphen) occurs 1 time in llvm/test and 0 times in clang/test.