Page MenuHomePhabricator

FileCheck [12/12]: Support use of var defined on same line
Needs ReviewPublic

Authored by thopre on Apr 7 2019, 4:24 PM.

Details

Summary

This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support for using a
numeric variable with empty numeric expression defined on the same line
either directly or indirectly.

The regex language used to represent a constraint corresponding to a
given CHECK directive does not allow to express a numeric relation. It
is therefore necessary to initially match any number in the appropriate
matching format and check the constraints are verified in a follow-up
step. Such a process must then be repeated as long as the constraints
fail because of cases with a check directive like #N: #N+1
where the input contains lines such as:
10 12
10 11

The first line would match the regex "[0-9]+ [0-9]+" generated for the
CHECK directive but the constraint would fail. The second line would
then need to be tried since it both matches successfully and verifies
the constraints of the CHECK directive. Note that this process would
still fail to match if the input was:
10 12 13
because the regex would match 10 and 12 which fail to verify the
constraints and the process would then skip to the next line since there
is no way to ask the regex engine if the line had other possible
matches. This caveat is documented.

Copyright:

  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thopre added a comment.Apr 9 2019, 7:35 AM

Use camel casing for new functions

thopre updated this revision to Diff 194488.Apr 10 2019, 4:45 AM

Make numeric value constructor explicit

tra removed a subscriber: tra.Apr 11 2019, 1:16 PM
thopre updated this revision to Diff 197124.Apr 29 2019, 8:42 AM

Rebase on latest changes

thopre updated this revision to Diff 198039.May 3 2019, 9:50 AM

Rebase on latest changes

thopre updated this revision to Diff 198101.May 3 2019, 3:03 PM

Rebase on latest unittest changes

thopre updated this revision to Diff 198448.May 7 2019, 5:36 AM

Rebase on top of latest changes

thopre updated this revision to Diff 198554.May 7 2019, 3:32 PM

Rebase on top of latest changes

thopre updated this revision to Diff 198650.May 8 2019, 6:54 AM

Rebase on top of latest changes

thopre updated this revision to Diff 198955.May 9 2019, 4:56 PM

Rebase on top of latest changes

thopre updated this revision to Diff 199143.May 11 2019, 7:18 AM

Rebase on top of latest changes

thopre updated this revision to Diff 199241.May 13 2019, 3:59 AM

Rebase on top of latest changes

thopre updated this revision to Diff 199401.May 14 2019, 4:08 AM

Rebase on top of latest changes

rnk removed a reviewer: rnk.May 14 2019, 11:28 AM
thopre updated this revision to Diff 200478.May 21 2019, 5:56 AM

rebase on top of latest changes

thopre updated this revision to Diff 200555.May 21 2019, 11:15 AM

rebase on top of latest changes

thopre updated this revision to Diff 201987.May 29 2019, 10:53 AM

Rebase on top of latest changes in patch series

thopre updated this revision to Diff 203214.Jun 5 2019, 11:22 AM

Rebase on top of latest changes in the series

thopre updated this revision to Diff 205685.Jun 19 2019, 3:23 PM

Rebase on top of latest changes in patch series

thopre planned changes to this revision.May 11 2020, 3:09 PM

Codebase has changed considerably since last time this review was updated. Patch will need some reworking and before that I want to build consensus on whether the feature should be done given the caveats it has.

I don't have a clear solution to the problems you're describing, but I have already run into one or two places where I wanted the behaviour of being able to use a numeric variable on the same line it was defined. CHECK-SAME is a valid workaround, but has its own issues (for example, it doesn't inherently ensure the next match starts immediately after the previous one, and is just simply more verbose). My instinct is that the right approach is to ignore the constraint initially, match a number, and then verify that it is the right number, failing if not, without further search attempts. I think it's relatively easy to workaround that issue in most cases, with additional checks elsewhere, although I can't be positive that is the case.

I don't have a clear solution to the problems you're describing, but I have already run into one or two places where I wanted the behaviour of being able to use a numeric variable on the same line it was defined. CHECK-SAME is a valid workaround, but has its own issues (for example, it doesn't inherently ensure the next match starts immediately after the previous one, and is just simply more verbose). My instinct is that the right approach is to ignore the constraint initially, match a number, and then verify that it is the right number, failing if not, without further search attempts. I think it's relatively easy to workaround that issue in most cases, with additional checks elsewhere, although I can't be positive that is the case.

What worries me is the effect on a CHECK-NOT directive since it could lead to the directive being fulfilled in a case where the intent was to forbid it. Consider the following:

CHECK-NOT: [[#VAR:]] [[#VAR+1]]

Say the author of the test was looking a a bug where the output being checked was 10 11 14. The test would pass. Now if the bug wasn't fully fixed, perhaps this doesn't happen anymore but we can have cases like 10 13 14. The above directive will turn into ([[:digit:]]+) ([[:digit:]]+) which will match 10 and 13 and will then check if they are consecutive numbers. They are not and thus the CHECK-NOT is satisfied even though there was 13 14 later on the line. Of course here we could retry after the first number matched but that doesn't work in the general case where regex can appear anywhere (e.g. #VAR: {{.*}} #VAR+1 where the input is now 10 8 11).

Perhaps the solution is to forbid using a variable defined on the same line for a CHECK-NOT directive? In all other cases we'll get a false negative (directive not satisfied) instead of a false positive (directive satisfied).

Perhaps the solution is to forbid using a variable defined on the same line for a CHECK-NOT directive? In all other cases we'll get a false negative (directive not satisfied) instead of a false positive (directive satisfied).

Hmm, you make a fair argument, and I think this suggestion might work. I'll need to give it more thought though. Perhaps it needs a mailing list thread to get more eyes on this?

Perhaps the solution is to forbid using a variable defined on the same line for a CHECK-NOT directive? In all other cases we'll get a false negative (directive not satisfied) instead of a false positive (directive satisfied).

Hmm, you make a fair argument, and I think this suggestion might work. I'll need to give it more thought though. Perhaps it needs a mailing list thread to get more eyes on this?

That was my intention. I'm focusing on D60390 for now, then I'll start a thread.

Status update: I've started manually rebasing this patch on top of latest FileCheck. Hopefully I'll update this diff in about 2 weeks. Sorry for the delay.

Status update: I've started manually rebasing this patch on top of latest FileCheck. Hopefully I'll update this diff in about 2 weeks. Sorry for the delay.

Sorry, I dropped the ball on that one but recently picked it up again and have a patch ready. It needs clean up but it's feature complete and all tests are already written. Please hang in there.

thopre updated this revision to Diff 381083.Oct 20 2021, 1:25 PM

Rebase and fix issues with variables defined from an expression and with
verification failure failing to restore initial values of variables.

Sorry for ignoring this for a couple of weeks - it fell off my radar a bit, when I was swamped with deadline-realted tasks.

I recommend doing the llvm/test/FileCheck/numeric-expressions/numeric-expressions.txt renaming in a separate patch, as it's not really related to this patch.

I've not looked at the testing in depth yet, and I'm a little concerned that I'm very cold on the surrounding code. I think it would be useful if someone else could take a look too (@jdenny maybe?).

llvm/docs/CommandGuide/FileCheck.rst
913–915

Where this refers to "line" should it be "CHECK" directive?

Also, I think it would be valuable to give an example.

917–918
llvm/lib/FileCheck/FileCheck.cpp
382

My general rule is that consumeError should always have a comment explaining why you can just "ignore" the error and continue. In this case, I guess you're not ignoring the error per se, but you are throwing away the context, so a comment would still be useful.

411–415

Do you need this change? Can't you just return the result of EvalBinop directly?

449–450
1184

TODO that needs action still?

1307

(or possibly "using variables with empty expressions")

1311–1312

(or "using variables")

Should this definitely be "line"?

Perhaps change "Known issue" to FIXME and file a bug (bug filing to happen once this patch has landed, or shortly beforehand, with the FIXME to contain a link to that bug).

1322
1323–1324
1385–1386
1403
1424
1425–1426
1431
1433

No need for braces for single line if.

1438

Any reason we can't just use a for loop here?

1487
llvm/lib/FileCheck/FileCheckImpl.h
269
292–294

Seems like TentativeValue and StrTentativeValue are unnecessary? Could you not just set Value and StrValue in this context? If you need to distinguish a tentative versus concrete definition, add a boolean member or similar that states isTentative.

520
578

Nit: add new line after this method declaration.

jdenny added a comment.Dec 3 2021, 9:57 AM

I've not looked at the testing in depth yet, and I'm a little concerned that I'm very cold on the surrounding code. I think it would be useful if someone else could take a look too (@jdenny maybe?).

Sorry for the delay, but I need a bit more time. I'll plan to take a look next week. Thanks for working on this.

jdenny added a comment.EditedDec 8 2021, 7:30 PM

I understand the desire to be able to define and reference a variable within the same directive. However, as I understand it, this patch adds that ability in a way that introduces a new subtlety to FileCheck: the semantics of numeric variable references are then not always consistent. If you assume the wrong semantics, you can end up with a false pass of a test. I understand you've disabled the case of CHECK-NOT, but it does not appear to be the only case that can lead to false passes.

Example

Let's say we want to check that the very last value of y on an input line is equal to the first value of y plus 3. Thus, the following input line is then erroneous:

x=20, y=30, z=40, x=21, y=31, z=41, x=22, y=33, z=42, x=23, y=32, z=43,

Check version 1

CHECK: y=[[#VAR:]],{{.*}}y=[[#VAR+3]],

The behavior is:

  • y=[[#VAR:]], matches y=30,.
  • {{.*}} is maximal munch.
  • So y=[[#VAR+3]], matches the last y= expression, which is y=32,.

The directive then fails because 32 is not 30+3. OK, that's the exact behavior I was going for. And now I think I understand how numeric variable references work.

(If the above is not the behavior you expect after applying this patch, let me know. Every example I tried after applying this patch produced seg faults or assertion failures, so I couldn't verify.)

Check version 2

Let's say I later realize I'd rather use VAR as defined to 30 by some earlier directive, so I rewrite check version 1 as follows:

CHECK: y=[[#VAR]],{{.*}}y=[[#VAR+3]],

The behavior is:

  • y=[[#VAR]], is visibly differently than y=[[#VAR:]],, and its semantics are different too: it now matches a previously defined value instead of capturing a new value. Nevertheless, it still matches y=30,. So far, so good.
  • {{.*}} is still maximal munch.
  • Given that previous parts of the pattern match as before and VAR ends up with the same value as before, shouldn't the semantics of y=[[#VAR+3]], be the same as before? No. It can only match y=33, now, and so that's what it does.

The directive then accepts the erroneous input. That's pretty subtle.

Check version 3

Or, let's say I decide that splitting check version 1 into two directives would look better:

     CHECK: y=[[#VAR:]],
CHECK-SAME: {{.*}}y=[[#VAR+3]],

That's equivalent to version 1, right? Normally, I think it would be. Not in this case. Again, y=[[#VAR+3]], now has a different behavior: it only matches y=33,, and the erroneous input is now accepted.

Conclusion

I'm leery of letting the semantics of numeric variable references vary depending on where the variable happens to have been defined. It's too subtle.

I think it would be better to introduce the desired functionality with a different syntax to reflect the semantic shift. For example:

CHECK: y=[[#VAR1:]],{{.*}}y=[[#VAR2:]],[[!#VAR2 == VAR1 + 3]]

That introduces the concept of a post-match variable assertion. It would always appear at the end of a pattern to indicate it's evaluated last. It would always be delimited by [[! and ]], which hopefully doesn't collide with any existing syntax.

There's probably a better syntax I haven't thought of. My main point is to demonstrate how a different syntax could eliminate the inconsistent behavior I am concerned about.

jdenny added a comment.Dec 8 2021, 9:47 PM

Or maybe a CHECK-ASSERT would be clearer and more generally useful:

CHECK: y=[[#VAR1:]],{{.*}}y=[[#VAR2:]]
CHECK-ASSERT: #VAR2 == VAR1 + 3
thopre added a comment.Dec 9 2021, 2:06 AM

Hi Joel,

Thanks for your feedback, good job catching that!

I understand the desire to be able to define and reference a variable within the same directive. However, as I understand it, this patch adds that ability in a way that introduces a new subtlety to FileCheck: the semantics of numeric variable references are then not always consistent. If you assume the wrong semantics, you can end up with a false pass of a test. I understand you've disabled the case of CHECK-NOT, but it does not appear to be the only case that can lead to false passes.

Example

Let's say we want to check that the very last value of y on an input line is equal to the first value of y plus 3. Thus, the following input line is then erroneous:

x=20, y=30, z=40, x=21, y=31, z=41, x=22, y=33, z=42, x=23, y=32, z=43,

Check version 1

CHECK: y=[[#VAR:]],{{.*}}y=[[#VAR+3]],

The behavior is:

  • y=[[#VAR:]], matches y=30,.
  • {{.*}} is maximal munch.
  • So y=[[#VAR+3]], matches the last y= expression, which is y=32,.

The directive then fails because 32 is not 30+3. OK, that's the exact behavior I was going for. And now I think I understand how numeric variable references work.

(If the above is not the behavior you expect after applying this patch, let me know. Every example I tried after applying this patch produced seg faults or assertion failures, so I couldn't verify.)

I think the real issue here is using a wildcard pattern. I feel most often that not the user actually intended that. I remember someone asking for being able to use variables defined on the same line because using CHECK-SAME was different as pointed in your example below. See I think this is the behaviour people usually expect, judging from https://bugs.llvm.org/show_bug.cgi?id=30198 also about wildcard being a risky business.

Check version 2

Let's say I later realize I'd rather use VAR as defined to 30 by some earlier directive, so I rewrite check version 1 as follows:

CHECK: y=[[#VAR]],{{.*}}y=[[#VAR+3]],

The behavior is:

  • y=[[#VAR]], is visibly differently than y=[[#VAR:]],, and its semantics are different too: it now matches a previously defined value instead of capturing a new value. Nevertheless, it still matches y=30,. So far, so good.
  • {{.*}} is still maximal munch.
  • Given that previous parts of the pattern match as before and VAR ends up with the same value as before, shouldn't the semantics of y=[[#VAR+3]], be the same as before? No. It can only match y=33, now, and so that's what it does.

The directive then accepts the erroneous input. That's pretty subtle.

I think the main issue is the regex was badly written in the first place but I agree this patch is making things worse by having inconsistent behaviour. Someone ought to have written either of those two regex instead depending on intended pattern to be matched:

CHECK: y=[[#VAR:]],{{.*}}y=[[#VAR+3]],{{[^y]*$}}

or

CHECK: y=[[#VAR:]],{{[^y]*}}y=[[#VAR+3]],

I like the suggestion of CHECK-ASSERT or a new syntax. I'll think about it and in the meantime anyone with an opinion on this feel free to chime in.

jdenny added a comment.Dec 9 2021, 3:17 PM

Hi Joel,

Hi Thomas.

I like the suggestion of CHECK-ASSERT or a new syntax. I'll think about it

Thanks for considering that.

and in the meantime anyone with an opinion on this feel free to chime in.

Yes, it will be good to get other opinions. @jhenderson?

Hi Joel,

Hi Thomas.

I like the suggestion of CHECK-ASSERT or a new syntax. I'll think about it

Thanks for considering that.

and in the meantime anyone with an opinion on this feel free to chime in.

Yes, it will be good to get other opinions. @jhenderson?

I think @jdenny has hit the nail on the head with regards to a fairly major problem with this patch. I don't think we want to introduce subtleties in behaviour like has been highlighted. Sorry for not considering this before.

Regarding the CHECK-ASSERT suggestion, do we need it? I was under the impression one of the earlier patches added a syntax like [[#VAR2, VAR1 + 3:]] or something to that effect, which achieves the same thing, right? Maybe I'm missing something though.

Regarding the CHECK-ASSERT suggestion, do we need it? I was under the impression one of the earlier patches added a syntax like [[#VAR2, VAR1 + 3:]] or something to that effect, which achieves the same thing, right? Maybe I'm missing something though.

You mean [[#VAR2:VAR1+3]]? Why does that prevent the need for something like CHECK-ASSERT? You still can only refer to a variable defined in an earlier line.

Regarding the CHECK-ASSERT suggestion, do we need it? I was under the impression one of the earlier patches added a syntax like [[#VAR2, VAR1 + 3:]] or something to that effect, which achieves the same thing, right? Maybe I'm missing something though.

You mean [[#VAR2:VAR1+3]]? Why does that prevent the need for something like CHECK-ASSERT? You still can only refer to a variable defined in an earlier line.

Looking back at the example, I guess it doesn't, although you could use CHECK-SAME to achieve the same effect, I think. More generally, I'm not sure how much I like the idea of checks that don't actually match anything, but I could be persuaded, especially if we can leverage the same functionality to support the CHECK-DEFINE concept I proposed previously (but never got as far as finishing off).

Regarding the CHECK-ASSERT suggestion, do we need it? I was under the impression one of the earlier patches added a syntax like [[#VAR2, VAR1 + 3:]] or something to that effect, which achieves the same thing, right? Maybe I'm missing something though.

You mean [[#VAR2:VAR1+3]]? Why does that prevent the need for something like CHECK-ASSERT? You still can only refer to a variable defined in an earlier line.

Looking back at the example, I guess it doesn't, although you could use CHECK-SAME to achieve the same effect, I think. More generally, I'm not sure how much I like the idea of checks that don't actually match anything, but I could be persuaded, especially if we can leverage the same functionality to support the CHECK-DEFINE concept I proposed previously (but never got as far as finishing off).

I too would like a CHECK-DEFINE. Conceptually at least, I imagine CHECK-DEFINE or CHECK-ASSERT would match an empty string at the start of its search range (that is, end of the previous match or beginning of input if none). Giving it a match location in this way should help with diagnostics, such as -dump-input.

rnk removed a subscriber: rnk.Dec 13 2021, 2:46 PM