Page MenuHomePhabricator

Fix PR46880: Fail CHECK-NOT with undefined variable
Needs ReviewPublic

Authored by thopre on Aug 19 2020, 8:48 AM.

Details

Summary

Currently a CHECK-NOT directive succeeds whenever the corresponding
match fails. However match can fail due to an error rather than a lack
of match, for instance if a variable is undefined. This commit makes match
error a failure for CHECK-NOT.

Diff Detail

Event Timeline

thopre created this revision.Aug 19 2020, 8:48 AM
thopre requested review of this revision.Aug 19 2020, 8:48 AM
thopre planned changes to this revision.Aug 19 2020, 3:46 PM

FYI: I've only looked at a couple of the testsuite failures and they are all genuine use of undefined variable. There's 59 of them in total.

I am not well familar with the FileCheck logic probably to verify the code,
but following tests are failing for me with this diff applied:
(e.g.: Transforms/LoopVectorize/pr34681.ll fails with "error: uses undefined variable(s): "TEST"")

LLVM :: CodeGen/AArch64/aarch64-tbz.ll
LLVM :: CodeGen/AArch64/machine-outliner-retaddr-sign-thunk.ll
LLVM :: CodeGen/AArch64/speculation-hardening.ll
LLVM :: CodeGen/AMDGPU/amdgpu.private-memory.ll
LLVM :: CodeGen/AMDGPU/private-memory-r600.ll
LLVM :: CodeGen/Hexagon/hwloop3.ll
LLVM :: CodeGen/Mips/sr1.ll
LLVM :: CodeGen/PowerPC/ctrloops-softfloat.ll
LLVM :: CodeGen/PowerPC/ppc-disable-non-volatile-cr.ll
LLVM :: CodeGen/PowerPC/ppc-shrink-wrapping.ll
LLVM :: CodeGen/PowerPC/ppc64-i128-abi.ll
LLVM :: Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll
LLVM :: Transforms/Coroutines/coro-split-sink-lifetime-02.ll
LLVM :: Transforms/GVN/PRE/load-pre-nonlocal.ll
LLVM :: Transforms/GVN/big-endian.ll
LLVM :: Transforms/HardwareLoops/ARM/structure.ll
LLVM :: Transforms/InferFunctionAttrs/annotate.ll
LLVM :: Transforms/LoopVectorize/X86/x86-pr39099.ll
LLVM :: Transforms/LoopVectorize/pr34681.ll
LLVM :: Transforms/NewGVN/big-endian.ll

looks they all have undefined variables. So, seems this patch works well!
I've leaved 2 nits inline too.

llvm/lib/Support/FileCheck.cpp
2023

Perhaps a bit cleaner would be to return false, since it it known that UndefVar is always false here.

llvm/test/FileCheck/var-scope.txt
10

So, there are no ERRUNDEF check line anymore and seems you can remove it from here (and below) and switch to --check-prefix (just here)?

grimar added a comment.EditedAug 20 2020, 2:16 AM

FYI: I've only looked at a couple of the testsuite failures and they are all genuine use of undefined variable. There's 59 of them in total.

Oh, OK, I did not see this comment when wrote mine one.

thopre updated this revision to Diff 286750.Aug 20 2020, 2:22 AM
thopre marked 2 inline comments as done.
  • Address review comments
  • rename PrintNoMatch to printNoMatch to satisfy clang-tidy

There seems to be a number of failures like due to pattern like (taken from clang/test/CodeGen/debug-info-extern-call.c):

DECLS-FOR-EXTERN-NOT: !DICompileUnit({{.*}}retainedTypes: ![[RETTYPES:[0-9]+]]
DECLS-FOR-EXTERN-NOT: ![[RETTYPES]] = !{

I think FileCheck should forbid variable definition in a CHECK-NOT pattern. Granted with this patch we'll catch all the uses of variables defined in CHECK-NOT but we won't catch if they are not used.

There seems to be a number of failures like due to pattern like (taken from clang/test/CodeGen/debug-info-extern-call.c):

DECLS-FOR-EXTERN-NOT: !DICompileUnit({{.*}}retainedTypes: ![[RETTYPES:[0-9]+]]
DECLS-FOR-EXTERN-NOT: ![[RETTYPES]] = !{

I think FileCheck should forbid variable definition in a CHECK-NOT pattern. Granted with this patch we'll catch all the uses of variables defined in CHECK-NOT but we won't catch if they are not used.

I have no idea what that is even trying to do. A definition within a CHECK-NOT doesn't make sense - either the pattern will match, enabling the capture of the variable, and FileCheck to fail (because of a positive CHECK-NOT match), or it will not match, meaning the variable can't be used. In other words, I think forbidding variable definitions in CHECK-NOT makes sense. It's probably worth bringing this up on llvm-dev and cfe-dev though, to give people a head's up/ask for opinions. There may be a way this actually is useful, but if so, I don't know it.

This commit makes match error a failure for CHECK-NOT and changes PrintNoMatch to print all CHECK-NOT failure in a CHECK-NOT block rather than just the first one.

Are these two inherently linked, or could you split them into two separate changes?

llvm/lib/Support/FileCheck.cpp
1997

I'm not a fan on using booleans to indicate success or failure of a function, for two reasons:

  1. It's easy to forget to check the result, meaning errors will be missed.
  2. It's not obvious without reading whether a true indicates success or failure.

Can we use an Error instead?

2002

I'm not a fan of how this variable is being used throughout this function - it feels like it would be easy to forget to check it when doing something or check it incorrectly, making this function hard to maintain.

Could we bail out early if an undefined variable is detected?

llvm/test/FileCheck/numeric-expression.txt
249

Is this change related?

There seems to be a number of failures like due to pattern like (taken from clang/test/CodeGen/debug-info-extern-call.c):

DECLS-FOR-EXTERN-NOT: !DICompileUnit({{.*}}retainedTypes: ![[RETTYPES:[0-9]+]]
DECLS-FOR-EXTERN-NOT: ![[RETTYPES]] = !{

I think FileCheck should forbid variable definition in a CHECK-NOT pattern. Granted with this patch we'll catch all the uses of variables defined in CHECK-NOT but we won't catch if they are not used.

I have no idea what that is even trying to do. A definition within a CHECK-NOT doesn't make sense - either the pattern will match, enabling the capture of the variable, and FileCheck to fail (because of a positive CHECK-NOT match), or it will not match, meaning the variable can't be used. In other words, I think forbidding variable definitions in CHECK-NOT makes sense. It's probably worth bringing this up on llvm-dev and cfe-dev though, to give people a head's up/ask for opinions. There may be a way this actually is useful, but if so, I don't know it.

I have no idea either. I've found another case of 2 different invocation of FileCheck with different prefix where one invocation sets a variable and the other use it. I understand the intent having seen the output but I have so far no idea on how to fix it.

This commit makes match error a failure for CHECK-NOT and changes PrintNoMatch to print all CHECK-NOT failure in a CHECK-NOT block rather than just the first one.

Are these two inherently linked, or could you split them into two separate changes?

I think they can be separated if that change land first.

thopre updated this revision to Diff 287246.Aug 23 2020, 2:45 AM
thopre marked 3 inline comments as done.
  • Address comments
  • Restructure
thopre edited the summary of this revision. (Show Details)Aug 23 2020, 2:45 AM
jhenderson added inline comments.Aug 24 2020, 12:46 AM
llvm/lib/Support/FileCheck.cpp
2030–2031

Why is there commented out code here?

2302–2306

Aside from the DirectiveFail = true line here, this and the other two instances of this pattern look identical to me. I guess that's partly my fault for the suggestion earlier of not returning a boolean to indicate an error. Perhaps part of the problem previously was the suggestion that the printNoMatch function even returned anything. I certainly wouldn't expect it.

How about we wrap this common code in a helper function, called handleMatchFailure or something like that, and that can return a boolean saying whether the failure was due to an error or regular failure? We could invent an enum that indicates the same thing but with better words, to avoid the boolean, but that might be going further than necessary. An alternative would be to pass in a lambda OnMatchError or something, which does nothing in the two other cases, but here sets the DirectiveFail variable.

thopre updated this revision to Diff 288570.Aug 28 2020, 3:30 AM
thopre marked 2 inline comments as done.

Address review comments

thopre added inline comments.Aug 28 2020, 1:55 PM
llvm/lib/Support/FileCheck.cpp
2030–2031

Oops.

Code mostly looks fine now, but there are still plenty of test failures being reported by the pre-merge bot. What's the situation with those currently?

llvm/lib/Support/FileCheck.cpp
2064

This might be a dumb question, but I'm not following why we're skipping '\t' and ' ' here. It certainly doesn't line up with the comment above.

2081–2096

Can't these two be folded into a single handleAllErrors call? IIRC, handleAllErrors takes any number of handler lambdas, and iterates through them until one matches, or it runs out. Thus the above code should look like the something like the suggested edit.

thopre marked an inline comment as done.Sep 3 2020, 3:42 PM

Code mostly looks fine now, but there are still plenty of test failures being reported by the pre-merge bot. What's the situation with those currently?

I've started trying to fix them but sadly most of them are non trivial which is even more difficult when one is not familiar with what the test is trying to check. It seems a fair amount are using several CHECK-NOT as a way to say "I don't want THIS followed by THAT to match the input", where THAT uses a variable defined in THIS. It basically expects a back reference. This can be fixed by merging the two CHECK-NOT when they don't use different prefix but it seems to be yet another example where a grouping directive would be useful, e.g.:

CHECK-REGION-START: REGION-A
CHECK: THIS
CHECK: THAT
CHECK-REGION-END: REGION-A
CHECK-NOT: REGION-A

This is just an example, better syntax would need to be use. The reason I suggest a region rather than a new special-purpose directive is I wanted to have something like REGION-DAG where directives in a region are CHECK: that need to appear linearly but directives in different regions can be interleaved. Anyway, my plan is to try to merge those CHECK-NOT directives and then we can discuss how to introduce new syntax to make this more readable.

llvm/lib/Support/FileCheck.cpp
2064

I'm not sure, this is copied over from PrintNoMatch. I thought it is used so that the "scanning from here" shows the first non blank line after last match but I haven't looked in details.

2081–2096

It can be merged yes. It's a relic from when I would group all undefined variables in one error message in which case the undefined variable errors needs to be processed together. I'll change that once I'm back to work on Monday.

thopre updated this revision to Diff 290106.Sep 5 2020, 12:37 PM
thopre marked an inline comment as done.

Merge handleErrors in handleAllErrors below

jhenderson added inline comments.Sep 8 2020, 1:09 AM
llvm/lib/Support/FileCheck.cpp
2064

Could you investigate a bit more and then update the comment with your findings? This code at the moment also skips some whitespace, which isn't always desirable (imagine if you had a regex match looking for some explicit whitespace before some string).

thopre added inline comments.Sep 15 2020, 7:03 AM
llvm/lib/Support/FileCheck.cpp
2064

The code was added in da108b4ed4e6e7267701e76d5fd3b87609c9ab77 when CHECK-NEXT was added and at the same time as the "scanning from here" diagnostic was added. There was no CHECK-SAME at the time and I guess the goal was to avoid confusion by showing the last matching line (even though the column info would show it's the end of the line). There was no regular expression match either, so no way to match a newline.

I think this should simply be removed, in a separate patch. What is your opinion?

jhenderson added inline comments.Sep 16 2020, 12:38 AM
llvm/lib/Support/FileCheck.cpp
2064

I'm inclined to agree with you. Perhaps worth adding a TODO or FIXME note here and at the other location, referencing each other, saying that this should probably be removed?

@probinson, (or anybody else) do you have any thoughts on this?

Thanks for working on this. I'm sorry to start reviewing so late. I haven't digested the implementation fully yet, but I have a few comments to get started.

llvm/include/llvm/Support/FileCheck.h
107

The name MatchError and its comment seem too general. It could easily describe many of the following enumerators. I suggest:

/// Indicates a match that can never succeed because the pattern is found to
/// be undefined at match time (e.g., contains an undefined variable).
MatchUndefined,

Undefined variables are probably the only such case we have right now, and maybe that will always be true, but I'm trying to ensure this can handle more cases if needed in the future.

Also, the middle paragraph in the preceding comments is now incorrect. Here's a possible revision:

/// A directive's supplied pattern is said to be either undefined, expected,
/// or excluded depending on whether the pattern is well formed at match time
/// and whether it must have or must not have a match in order for the 
/// directive to succeed.  For example, a CHECK directive's pattern is 
/// normally expected, a CHECK-NOT directive's pattern is normally excluded, 
/// but either directive's pattern is undefined if it contains an undefined
/// variable at match time.  The match result type MatchUndefined is for 
/// undefined patterns.  All match result types whose names end with 
/// "Excluded" are for excluded patterns.  All other match result types are 
/// for expected patterns.

Here, I'm assuming you consistently use the new enumerator for undefined variables, regardless of whether a directive is positive or negative. I haven't checked the implementation, but the test suite changes seems to confirm that.

llvm/test/FileCheck/dump-input-annotations.txt
626

This test used to cover printing of variable definitions when a positive directive fails. It no longer does.

A separate test could be added to cover that case. But do we need to drop that info when a variable is undefined? It could be helpful for debugging.

llvm/utils/FileCheck/FileCheck.cpp
201

If you agree to changing MatchError to MatchUndefined, I suggest changing error: match error to error: match undefined.

thopre added inline comments.Thu, Oct 15, 3:19 AM
llvm/test/FileCheck/dump-input-annotations.txt
626

We currently only print substitutions if there are no errors. I think if we want to print substitutions in case of undefined variable we should do it regardless of whether there are errors to be consistent. Obviously that means ignoring substitution errors in printSubstitutions

jdenny added inline comments.Thu, Oct 15, 12:10 PM
llvm/test/FileCheck/dump-input-annotations.txt
626

What's an example of an error besides undefined variable for which we currently don't print substitutions but you're suggesting we would?

thopre marked an inline comment as done.Fri, Oct 16, 1:30 AM
thopre added inline comments.
llvm/test/FileCheck/dump-input-annotations.txt
626

Overflow and underflow come to mind. There might be others.

% cat overflow_undef.test
BIGVAR=10000000000000000
CHECK: BIGVAR: [[#BIGVAR:0x8000000000000000+0x8000000000000000]] [[#UNDEFVAR]]
% ./bin/FileCheck --dump-input=never --input-file overflow_undef.test overflow_undef.test
overflow_undef.test:2:19: error: unable to substitute variable or numeric expression: overflow error
CHECK: BIGVAR: [[#BIGVAR:0x8000000000000000+0x8000000000000000]] [[#UNDEFVAR]]
                  ^
zsh: exit 1     ./bin/FileCheck --dump-input=never --input-file overflow_undef.test
jdenny added inline comments.Fri, Oct 16, 8:27 AM
llvm/include/llvm/Support/FileCheck.h
107

Another possibility that might be clearer is MatchPatternMalformed.

llvm/test/FileCheck/dump-input-annotations.txt
626

In the case of overflow/underflow, I'm in favor of printing used variable definitions. It would be especially helpful if it would print any variable definitions used in the expression that overflowed/underflowed, but printing other used variable definitions could be helpful as well.

Your example is a bit different than the case I was pointing out originally: in your example, there are no used variable definitions to report. There is however an undefined variable. I am also in favor of printing all errors (e.g., overflow and undefined variable) from a directive instead of just the first (overflow), where error recovery makes that feasible.

Again, if supporting all this info isn't feasible, then please at least add another test so we don't lose coverage of -dump-input printing used variable definitions when a positive directive with a well formed pattern fails to match.

(By the way, I just tried a few examples (without the current patch), and I'm noticing issues with overflow errors with respect to -dump-input. The traditional error message doesn't print undefined variables, but -dump-input does. -dump-input reports error: no match found twice and doesn't report the overflow. My master is a couple of months old already, so ignore me if all this has been fixed.)

llvm/utils/FileCheck/FileCheck.cpp
201

And MatchPatternMalformed would obviously become something like error: match pattern malformed.