Currently a CHECK-NOT directive succeeds whenever the corresponding
match fails. However match can fail due to an error rather than a lack
of match, for instance if a variable is undefined. This commit makes match
error a failure for CHECK-NOT.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
FYI: I've only looked at a couple of the testsuite failures and they are all genuine use of undefined variable. There's 59 of them in total.
I am not well familar with the FileCheck logic probably to verify the code,
but following tests are failing for me with this diff applied:
(e.g.: Transforms/LoopVectorize/pr34681.ll fails with "error: uses undefined variable(s): "TEST"")
LLVM :: CodeGen/AArch64/aarch64-tbz.ll LLVM :: CodeGen/AArch64/machine-outliner-retaddr-sign-thunk.ll LLVM :: CodeGen/AArch64/speculation-hardening.ll LLVM :: CodeGen/AMDGPU/amdgpu.private-memory.ll LLVM :: CodeGen/AMDGPU/private-memory-r600.ll LLVM :: CodeGen/Hexagon/hwloop3.ll LLVM :: CodeGen/Mips/sr1.ll LLVM :: CodeGen/PowerPC/ctrloops-softfloat.ll LLVM :: CodeGen/PowerPC/ppc-disable-non-volatile-cr.ll LLVM :: CodeGen/PowerPC/ppc-shrink-wrapping.ll LLVM :: CodeGen/PowerPC/ppc64-i128-abi.ll LLVM :: Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll LLVM :: Transforms/Coroutines/coro-split-sink-lifetime-02.ll LLVM :: Transforms/GVN/PRE/load-pre-nonlocal.ll LLVM :: Transforms/GVN/big-endian.ll LLVM :: Transforms/HardwareLoops/ARM/structure.ll LLVM :: Transforms/InferFunctionAttrs/annotate.ll LLVM :: Transforms/LoopVectorize/X86/x86-pr39099.ll LLVM :: Transforms/LoopVectorize/pr34681.ll LLVM :: Transforms/NewGVN/big-endian.ll
looks they all have undefined variables. So, seems this patch works well!
I've leaved 2 nits inline too.
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2023 | Perhaps a bit cleaner would be to return false, since it it known that UndefVar is always false here. | |
llvm/test/FileCheck/var-scope.txt | ||
10 | So, there are no ERRUNDEF check line anymore and seems you can remove it from here (and below) and switch to --check-prefix (just here)? |
There seems to be a number of failures like due to pattern like (taken from clang/test/CodeGen/debug-info-extern-call.c):
DECLS-FOR-EXTERN-NOT: !DICompileUnit({{.*}}retainedTypes: ![[RETTYPES:[0-9]+]]
DECLS-FOR-EXTERN-NOT: ![[RETTYPES]] = !{
I think FileCheck should forbid variable definition in a CHECK-NOT pattern. Granted with this patch we'll catch all the uses of variables defined in CHECK-NOT but we won't catch if they are not used.
I have no idea what that is even trying to do. A definition within a CHECK-NOT doesn't make sense - either the pattern will match, enabling the capture of the variable, and FileCheck to fail (because of a positive CHECK-NOT match), or it will not match, meaning the variable can't be used. In other words, I think forbidding variable definitions in CHECK-NOT makes sense. It's probably worth bringing this up on llvm-dev and cfe-dev though, to give people a head's up/ask for opinions. There may be a way this actually is useful, but if so, I don't know it.
This commit makes match error a failure for CHECK-NOT and changes PrintNoMatch to print all CHECK-NOT failure in a CHECK-NOT block rather than just the first one.
Are these two inherently linked, or could you split them into two separate changes?
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
1997 | I'm not a fan on using booleans to indicate success or failure of a function, for two reasons:
Can we use an Error instead? | |
2002 | I'm not a fan of how this variable is being used throughout this function - it feels like it would be easy to forget to check it when doing something or check it incorrectly, making this function hard to maintain. Could we bail out early if an undefined variable is detected? | |
llvm/test/FileCheck/numeric-expression.txt | ||
249 | Is this change related? |
I have no idea either. I've found another case of 2 different invocation of FileCheck with different prefix where one invocation sets a variable and the other use it. I understand the intent having seen the output but I have so far no idea on how to fix it.
This commit makes match error a failure for CHECK-NOT and changes PrintNoMatch to print all CHECK-NOT failure in a CHECK-NOT block rather than just the first one.
Are these two inherently linked, or could you split them into two separate changes?
I think they can be separated if that change land first.
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2030–2031 | Why is there commented out code here? | |
2302–2306 | Aside from the DirectiveFail = true line here, this and the other two instances of this pattern look identical to me. I guess that's partly my fault for the suggestion earlier of not returning a boolean to indicate an error. Perhaps part of the problem previously was the suggestion that the printNoMatch function even returned anything. I certainly wouldn't expect it. How about we wrap this common code in a helper function, called handleMatchFailure or something like that, and that can return a boolean saying whether the failure was due to an error or regular failure? We could invent an enum that indicates the same thing but with better words, to avoid the boolean, but that might be going further than necessary. An alternative would be to pass in a lambda OnMatchError or something, which does nothing in the two other cases, but here sets the DirectiveFail variable. |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2030–2031 | Oops. |
Code mostly looks fine now, but there are still plenty of test failures being reported by the pre-merge bot. What's the situation with those currently?
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2064 | This might be a dumb question, but I'm not following why we're skipping '\t' and ' ' here. It certainly doesn't line up with the comment above. | |
2081–2096 | Can't these two be folded into a single handleAllErrors call? IIRC, handleAllErrors takes any number of handler lambdas, and iterates through them until one matches, or it runs out. Thus the above code should look like the something like the suggested edit. |
I've started trying to fix them but sadly most of them are non trivial which is even more difficult when one is not familiar with what the test is trying to check. It seems a fair amount are using several CHECK-NOT as a way to say "I don't want THIS followed by THAT to match the input", where THAT uses a variable defined in THIS. It basically expects a back reference. This can be fixed by merging the two CHECK-NOT when they don't use different prefix but it seems to be yet another example where a grouping directive would be useful, e.g.:
CHECK-REGION-START: REGION-A
CHECK: THIS
CHECK: THAT
CHECK-REGION-END: REGION-A
CHECK-NOT: REGION-A
This is just an example, better syntax would need to be use. The reason I suggest a region rather than a new special-purpose directive is I wanted to have something like REGION-DAG where directives in a region are CHECK: that need to appear linearly but directives in different regions can be interleaved. Anyway, my plan is to try to merge those CHECK-NOT directives and then we can discuss how to introduce new syntax to make this more readable.
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2064 | I'm not sure, this is copied over from PrintNoMatch. I thought it is used so that the "scanning from here" shows the first non blank line after last match but I haven't looked in details. | |
2081–2096 | It can be merged yes. It's a relic from when I would group all undefined variables in one error message in which case the undefined variable errors needs to be processed together. I'll change that once I'm back to work on Monday. |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2064 | Could you investigate a bit more and then update the comment with your findings? This code at the moment also skips some whitespace, which isn't always desirable (imagine if you had a regex match looking for some explicit whitespace before some string). |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2064 | The code was added in da108b4ed4e6e7267701e76d5fd3b87609c9ab77 when CHECK-NEXT was added and at the same time as the "scanning from here" diagnostic was added. There was no CHECK-SAME at the time and I guess the goal was to avoid confusion by showing the last matching line (even though the column info would show it's the end of the line). There was no regular expression match either, so no way to match a newline. I think this should simply be removed, in a separate patch. What is your opinion? |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2064 | I'm inclined to agree with you. Perhaps worth adding a TODO or FIXME note here and at the other location, referencing each other, saying that this should probably be removed? @probinson, (or anybody else) do you have any thoughts on this? |
Thanks for working on this. I'm sorry to start reviewing so late. I haven't digested the implementation fully yet, but I have a few comments to get started.
llvm/include/llvm/Support/FileCheck.h | ||
---|---|---|
107 | The name MatchError and its comment seem too general. It could easily describe many of the following enumerators. I suggest: /// Indicates a match that can never succeed because the pattern is found to /// be undefined at match time (e.g., contains an undefined variable). MatchUndefined, Undefined variables are probably the only such case we have right now, and maybe that will always be true, but I'm trying to ensure this can handle more cases if needed in the future. Also, the middle paragraph in the preceding comments is now incorrect. Here's a possible revision: /// A directive's supplied pattern is said to be either undefined, expected, /// or excluded depending on whether the pattern is well formed at match time /// and whether it must have or must not have a match in order for the /// directive to succeed. For example, a CHECK directive's pattern is /// normally expected, a CHECK-NOT directive's pattern is normally excluded, /// but either directive's pattern is undefined if it contains an undefined /// variable at match time. The match result type MatchUndefined is for /// undefined patterns. All match result types whose names end with /// "Excluded" are for excluded patterns. All other match result types are /// for expected patterns. Here, I'm assuming you consistently use the new enumerator for undefined variables, regardless of whether a directive is positive or negative. I haven't checked the implementation, but the test suite changes seems to confirm that. |
llvm/test/FileCheck/dump-input-annotations.txt | ||
---|---|---|
626 | This test used to cover printing of variable definitions when a positive directive fails. It no longer does. A separate test could be added to cover that case. But do we need to drop that info when a variable is undefined? It could be helpful for debugging. | |
llvm/utils/FileCheck/FileCheck.cpp | ||
201 | If you agree to changing MatchError to MatchUndefined, I suggest changing error: match error to error: match undefined. |
llvm/test/FileCheck/dump-input-annotations.txt | ||
---|---|---|
626 | We currently only print substitutions if there are no errors. I think if we want to print substitutions in case of undefined variable we should do it regardless of whether there are errors to be consistent. Obviously that means ignoring substitution errors in printSubstitutions |
llvm/test/FileCheck/dump-input-annotations.txt | ||
---|---|---|
626 | What's an example of an error besides undefined variable for which we currently don't print substitutions but you're suggesting we would? |
llvm/test/FileCheck/dump-input-annotations.txt | ||
---|---|---|
626 | Overflow and underflow come to mind. There might be others. % cat overflow_undef.test BIGVAR=10000000000000000 CHECK: BIGVAR: [[#BIGVAR:0x8000000000000000+0x8000000000000000]] [[#UNDEFVAR]] % ./bin/FileCheck --dump-input=never --input-file overflow_undef.test overflow_undef.test overflow_undef.test:2:19: error: unable to substitute variable or numeric expression: overflow error CHECK: BIGVAR: [[#BIGVAR:0x8000000000000000+0x8000000000000000]] [[#UNDEFVAR]] ^ zsh: exit 1 ./bin/FileCheck --dump-input=never --input-file overflow_undef.test |
llvm/include/llvm/Support/FileCheck.h | ||
---|---|---|
107 | Another possibility that might be clearer is MatchPatternMalformed. | |
llvm/test/FileCheck/dump-input-annotations.txt | ||
626 | In the case of overflow/underflow, I'm in favor of printing used variable definitions. It would be especially helpful if it would print any variable definitions used in the expression that overflowed/underflowed, but printing other used variable definitions could be helpful as well. Your example is a bit different than the case I was pointing out originally: in your example, there are no used variable definitions to report. There is however an undefined variable. I am also in favor of printing all errors (e.g., overflow and undefined variable) from a directive instead of just the first (overflow), where error recovery makes that feasible. Again, if supporting all this info isn't feasible, then please at least add another test so we don't lose coverage of -dump-input printing used variable definitions when a positive directive with a well formed pattern fails to match. (By the way, I just tried a few examples (without the current patch), and I'm noticing issues with overflow errors with respect to -dump-input. The traditional error message doesn't print undefined variables, but -dump-input does. -dump-input reports error: no match found twice and doesn't report the overflow. My master is a couple of months old already, so ignore me if all this has been fixed.) | |
llvm/utils/FileCheck/FileCheck.cpp | ||
201 | And MatchPatternMalformed would obviously become something like error: match pattern malformed. |
@jhenderson I've made a patch to remove that skip-past-end-of-line code: https://reviews.llvm.org/D93341. Sorry for the long delay.
Given that I've gone cold on this change and need to go through it again properly, and the number of other reviews that are on my plate, in addition to higher priority internal work I need to do before the Christmas break (my last day is Friday), I doubt I'll get a chance to look at this until some time in the New Year. Sorry! If people feel it's important to land sooner than that, please don't let my lack of review block this (I can always do post-commit reviews if needed).
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
2064 | I've created https://reviews.llvm.org/D93341. I'll rebase this patch on top of it once the approach is agreed upon. |
Tests still need to be fixed prior to landing this patch anyway, so don't worry it can wait. I've started working on that and sent 3 diff for review. A lot more will be needed though.
Rebase and massively simply thanks to Joel's work.
llvm/test/FileCheck/dump-input-annotations.txt | ||
---|---|---|
626 | Does this updated patch solve your concern? Ignore all the tests failures, I still haven't gotten round to look at those. |
I haven't looked through the tests yet, but the implementation logic looks right to me. Thanks!
llvm/lib/FileCheck/FileCheck.cpp | ||
---|---|---|
2130 ↗ | (On Diff #333458) | Is this change intentional? There's already a NotFoundError handler below (I'd prefer to keep its comment). |
llvm/lib/FileCheck/FileCheckImpl.h | ||
232 ↗ | (On Diff #333458) | When is this function called now? |
759 ↗ | (On Diff #333458) | It looks like everything after the "or" is no longer true. |
Address comments
llvm/lib/FileCheck/FileCheckImpl.h | ||
---|---|---|
232 ↗ | (On Diff #333458) | It is a virtual function and thus needs to be defined. It's not called anywhere but I cannot remove it. |
You say there are many test failures. Should I hold off reviewing the test changes already in this patch, or are those fine?
llvm/lib/FileCheck/FileCheckImpl.h | ||
---|---|---|
232 ↗ | (On Diff #333458) | Makes sense. So the point of the change here is, in case this function is one day called, to have a log message that is consistent with other ErrorInfo log messages now that old message here is no longer used by printSubstitutions. Right? |
Oh no you can review, those are errors in tests that this patch uncover from what I could see (I only looked at the first 10 or so). I'll mail llvm-def to ask people's help. The only thing missing are unit tests.
llvm/lib/FileCheck/FileCheckImpl.h | ||
---|---|---|
232 ↗ | (On Diff #333458) | This change was made because I wanted to have a general error catcher in printNoMatch that would catch both UndefVarError and ErrorDiagnostic using the log() and message() methods from ErrorInfoBase. When changing the message to fit the current one I thought the "uses" is not really needed and is not consistent with other error message and the quotes and escaping are not needed since variable names are known to be well behaved. However UndefVarError has no source manager attached to it and thus cannot print an error message with location info because there's no SourceMgr available in places where the error is created. Therefore I had to do this translation from UndefVarError to ErrorDiagnostic. But I've just found a way to reuse the log message (see last update) so it's all good. |
llvm/lib/FileCheck/FileCheckImpl.h | ||
---|---|---|
232 ↗ | (On Diff #333458) | Ah, looks good. |
llvm/test/FileCheck/dump-input/annotations.txt | ||
633 ↗ | (On Diff #333558) | Please restore the correct alignment here and elsewhere in this file. (I'm starting to think it might have been better to use -strict-whitespace in most of these tests. I find myself turning it on temporarily anyway while adjusting tests because the squashed whitespace makes debugging confusing. My original goal for leaving it off was to make test updates easier if we were to adjust formatting, but that rarely happens. What do you think?) |
645 ↗ | (On Diff #333558) | This fixme can go now. |
Fix whitespace
llvm/test/FileCheck/dump-input/annotations.txt | ||
---|---|---|
633 ↗ | (On Diff #333558) | If whitespace is important it should be on. If formatting changes then we do want things to fail where whitespace is important. |
It seems the main issue left to address is tests that this patch breaks.
llvm/test/FileCheck/dump-input/annotations.txt | ||
---|---|---|
633 ↗ | (On Diff #333558) | The first test in this file checks whitespace in formatting. The remaining tests do not check whitespace, but I find them to be more legible when at least aligned, such as aligning the error and possible intended match lines above with the other diagnostics. To help maintain that and make debugging easier, maybe -strict-whitespace is worthwhile. One difficulty is that some tests involve multiple verbosity levels, across which whitespace varies, but regexes could help there. It seems we're accumulating many stylistic changes to make to these tests, so maybe I'll write a patch one day to tackle it all. |
A lot of the failures (e.g. llvm/test/Transforms/GVN/big-endian.ll, llvm/test/Transforms/LoopVectorize/pr34681.ll, llvm/test/Transforms/LoopVectorize/X86/x86-pr39099.ll) are tests that use several CHECK-NOT with one line defining a variable and another one using it. The intent seems to be to check that a sequence of lines are not happening. These could be converted to a combined CHECK-NOT with {{(.*[[:space:]])+}} between each of the individual CHECK-NOT pattern involved but it quickly becomes unreadable. I think this highlight the need for a new feature like "CHECK-NOT-NEXT" if that makes sense. It might not be the best name, because I'm sure there will be case where they want successive line so we'd want the CHECK-NOT equivalent of (1) two consecutive CHECK, (2) a CHECK and a CHECK-NEXT and (3) a CHECK and a CHECK-SAME.
What do you think? Is adding a new feature first the best way forward? If yes, how would you design it?
A lot of the failures (e.g. llvm/test/Transforms/GVN/big-endian.ll, llvm/test/Transforms/LoopVectorize/pr34681.ll, llvm/test/Transforms/LoopVectorize/X86/x86-pr39099.ll) are tests that use several CHECK-NOT with one line defining a variable and another one using it.
How many tests does this patch break?
What do you think? Is adding a new feature first the best way forward? If yes, how would you design it?
Those tests are passing without this patch, right? That should mean the CHECK-NOT that defines the variable never matches, so it should be impossible for the CHECK-NOT that uses the variable to match even if we had the feature you're suggesting. A quick fix is then to remove or comment out the CHECK-NOT that uses the variable. I believe the tests would be just as robust as now.
Before trying to design a new feature, I'd pursue that quick fix and see what the test authors say.
60, one of them being the FileCheck unittest which I haven't adapted yet. I've already made 4 patches solving 5 tests:
https://reviews.llvm.org/D99581
https://reviews.llvm.org/D99582
https://reviews.llvm.org/D99583
https://reviews.llvm.org/D99589
What do you think? Is adding a new feature first the best way forward? If yes, how would you design it?
Those tests are passing without this patch, right? That should mean the CHECK-NOT that defines the variable never matches, so it should be impossible for the CHECK-NOT that uses the variable to match even if we had the feature you're suggesting. A quick fix is then to remove or comment out the CHECK-NOT that uses the variable. I believe the tests would be just as robust as now.
Before trying to design a new feature, I'd pursue that quick fix and see what the test authors say.
You're right, let's try that.
Down to 19 when applying these diffs on top of current main branch:
https://reviews.llvm.org/D99581
https://reviews.llvm.org/D99582
https://reviews.llvm.org/D99589
https://reviews.llvm.org/D99602
https://reviews.llvm.org/D99770
https://reviews.llvm.org/D99771
All tests fixed except for 2 related ones but with a RFC review. Most fix have already landed, 12 still in review (incl. the RFC one). We're almost there. :-)
@jdenny I have no more changes to make to this patch and the CI is now green. Are you ok with the patch?
Thanks! Note there's still 4 dependent patches waiting for review so don't be surprised if it takes me a few days before I can land this.
The name MatchError and its comment seem too general. It could easily describe many of the following enumerators. I suggest:
Undefined variables are probably the only such case we have right now, and maybe that will always be true, but I'm trying to ensure this can handle more cases if needed in the future.
Also, the middle paragraph in the preceding comments is now incorrect. Here's a possible revision:
Here, I'm assuming you consistently use the new enumerator for undefined variables, regardless of whether a directive is positive or negative. I haven't checked the implementation, but the test suite changes seems to confirm that.