This is an archive of the discontinued LLVM Phabricator instance.

FileCheck [6/12]: Introduce numeric variable definition
ClosedPublic

Authored by thopre on Apr 7 2019, 4:20 PM.

Details

Summary

This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch introduces support for defining
numeric variable in a CHECK directive.

This commit introduces support for defining numeric variable from a
litteral value in the input text. Numeric expressions can then use the
variable provided it is on a later line.

Copyright:

  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thopre updated this revision to Diff 199137.May 11 2019, 7:14 AM

Rebase on top of latest changes

thopre updated this revision to Diff 199235.May 13 2019, 3:57 AM

Rebase on top of latest changes

Starting set of comments on this one. Haven't looked at everything yet.

llvm/docs/CommandGuide/FileCheck.rst
623 ↗(On Diff #199235)

Which makes the above example illegal, so you should rewrite the example.

llvm/include/llvm/Support/FileCheck.h
379 ↗(On Diff #199235)

"of the numeric"
comma after "fails"

382 ↗(On Diff #199235)

That last sentence feels awkward... maybe:
"If the expression defines a numeric variable, sets \p DefinedNumericVariable to point to the class representing the variable."
(with maybe "or nullptr if it does not define a variable.")

397 ↗(On Diff #199235)

Need to remove LineNumber from the documentation, if it's no longer a parameter.

llvm/lib/Support/FileCheck.cpp
42 ↗(On Diff #199235)

Omit the != nullptr on pointer assertions.

170 ↗(On Diff #199235)

I think this comment should not have changed.

289 ↗(On Diff #199235)

no else after return

388 ↗(On Diff #199235)

variables -> variable names

412 ↗(On Diff #199235)

= false

467 ↗(On Diff #199235)

I think NumExpr and DefinedNumericVariable could be declared here, much closer to where they are used.

473 ↗(On Diff #199235)

Omit != nullptr

1794 ↗(On Diff #199235)

"detect the use of an undefined"
(missing rebase maybe?)

thopre updated this revision to Diff 199390.May 14 2019, 3:32 AM
thopre marked 13 inline comments as done.
  • Address Paul's review comments
  • Rename numeric expression subsitution for numeric substitution to clearly distinguish numeric expression (the part that gets evaluated) from the whole #<something> that gets substituted (which might include a numeric variable definition)
  • Rename ParsePattern into parsePattern since it is modified
thopre added inline comments.May 14 2019, 3:34 AM
llvm/lib/Support/FileCheck.cpp
1794 ↗(On Diff #199235)

Rather incorrect conflict resouition on my part I suspect.

probinson added inline comments.May 14 2019, 10:09 AM
llvm/docs/CommandGuide/FileCheck.rst
606 ↗(On Diff #199390)

"lines" (or maybe just "text" as I suspect you'll want to go back to the one-line example in patch 12 of this series).

613 ↗(On Diff #199390)

"lines" (or "text")

llvm/lib/Support/FileCheck.cpp
472 ↗(On Diff #199390)

IsNumExpr is guaranteed true here, because of the simplified 'if' expression above.

llvm/test/FileCheck/numeric-expression.txt
78 ↗(On Diff #199390)

It appears that the only reason to include the CHECK prefix here is so it will define VAR1 for you? Maybe use a different variable instead, so the sections of the test are less intertwined.

90 ↗(On Diff #199390)

Needs a period/full-stop at the end of the sentence.

llvm/test/FileCheck/var-scope.txt
4 ↗(On Diff #199390)

Please add a comment noting that this first RUN doesn't enable scoping.
(I've been caught by that twice now! Clearly I need help reading tests.)

thopre updated this revision to Diff 199479.May 14 2019, 10:39 AM
thopre marked 7 inline comments as done.
  • Address review comments
  • Add final point for all comments in numeric-expression.txt
thopre added inline comments.May 14 2019, 10:40 AM
llvm/test/FileCheck/numeric-expression.txt
32–33 ↗(On Diff #194481)

@arsenm: ping?

rnk removed a reviewer: rnk.May 14 2019, 11:30 AM
rnk removed a subscriber: rnk.

I've barely scratched the surface of this so far, but here are some initial comments for where I've got to.

llvm/docs/CommandGuide/FileCheck.rst
576 ↗(On Diff #199479)

Since an example usage is actually [[#FOO:]], either lose the '<' and '>' in the first bit, or add them in the second.

590 ↗(On Diff #199479)

Perhaps worth adding "or on the command line".

llvm/include/llvm/Support/FileCheck.h
45 ↗(On Diff #199479)

class, variable -> class. Variable

56 ↗(On Diff #199479)

I'd be inclined to make this size_t rather than unsigned, as it's tied to the file system/memory availability. (However, see the note below).

Probably this should be assigned to 0 for command-line defined variables?

60–61 ↗(On Diff #199479)

It is theoretically possible for a standards-compliant compiler to find this and the next constructor ambiguous, because there is nothing stopping unsigned to be the same size as uint64_t. This becomes even more likely for size_t versus uint64_t. Furthermore, a user could very easily call the wrong constructor unintentionally.

I'd therefore suggest changing the signature somehow so as to avoid the risk of confusion for users of this class. Options include writing a pair of factory methods (effectively named constructors), or reordering the signature of one. There may be other options too.

145 ↗(On Diff #199479)

I'm not sure I understand why these comments have changed?

225 ↗(On Diff #199479)

What is a "capture number"? It probably needs explaining.

232 ↗(On Diff #199479)

See above comment. I have no idea what a "capture number" is or why it needs parenthesizing.

256 ↗(On Diff #199479)

definition -> definitions.

257 ↗(On Diff #199479)

from -> by (as it was).

259 ↗(On Diff #199479)

What is "NumericVariableDefs" referring to here?

332 ↗(On Diff #199479)

See above comments re. capture parentheses

pointer to corresponding -> pointer to the corresponding

335 ↗(On Diff #199479)

Should this use a StringMap?

354 ↗(On Diff #199479)

I'd change this name to Line, to avoid any risk of clashes with the member.

380 ↗(On Diff #199479)

I'd put a comma after "substituted"

442 ↗(On Diff #199479)

Commas after "expression" and "fails".

The whole first part of this sentence feels clunky. I think it's the use of "that" that's the problem. Probably just replace with "the".

444 ↗(On Diff #199479)

Could this just be parseBinop?

thopre updated this revision to Diff 199579.May 15 2019, 4:44 AM
thopre marked 18 inline comments as done.

Address review comments, in particular:

parenthesized capture number -> parenthesized subexpression number in RegExStr

(to capture the variable value)
llvm/include/llvm/Support/FileCheck.h
145 ↗(On Diff #199479)

Prior to this patch, the only thing #<something> block could contain was a numeric expression to substitute by its value when matching. It can now contain a numeric variable definition which needs to be substituted for a suitable wildcard expression ([0-9]+), in which case there is no numeric expression. I therefore thought the name "numeric expression substitution" was inappropriate and decided to change it to "numeric substitution". While patterns variable are never involved in expression, for consistency I've decided to also rename pattern variable substitution into pattern substitution.

I think there was also some inconsistency (perhaps in later patches in the series) where #<something> used to be referred to as numeric expression, now there is a separate name for this and the numeric expression part of a numeric substitution.

There's still some inconsistency in the terminology, as clearly evidenced by the name and description of the IsNumExpr member of FileCheckPatternSubstitution. That class used to be where we'd find variables (now called pattern variables); but it has had numeric expressions crammed into it, except the comment wants to call it a substitution, and so we have "pattern" and "substitution" and "expression" all competing for clarity.

I'm uncertain what to suggest. It may be that "pattern variable" was not a great name, and instead it should be "string variable" to contrast more naturally with "numeric variable." And then substitutions are just substitutions, which can be either a string substitution or a numeric substitution, with the numeric substitution implicitly using a numeric expression. Following this idea, FileCheckPatternSubstitution would be renamed FileCheckSubstitution and the references to "numeric expression" within the class don't change. I don't know that I'd go so far as to have String and Numeric subclasses of FileCheckSubstitution, the abstraction of a Substitution might not really support that. But the naming can surely be improved.

What do you think? This is a discussion point, not a please-do-this comment.

llvm/include/llvm/Support/FileCheck.h
66 ↗(On Diff #199579)

This leaves DefLineNumber uninitialized. Maybe combine the two constructors into a single 3-parameter constructor? Value can still default to None if it's the last parameter.

130 ↗(On Diff #199579)

See main comment.

355 ↗(On Diff #199579)

Multiple parameters means you don't need 'explicit'.

There's still some inconsistency in the terminology, as clearly evidenced by the name and description of the IsNumExpr member of FileCheckPatternSubstitution. That class used to be where we'd find variables (now called pattern variables); but it has had numeric expressions crammed into it, except the comment wants to call it a substitution, and so we have "pattern" and "substitution" and "expression" all competing for clarity.

I'm uncertain what to suggest. It may be that "pattern variable" was not a great name, and instead it should be "string variable" to contrast more naturally with "numeric variable." And then substitutions are just substitutions, which can be either a string substitution or a numeric substitution, with the numeric substitution implicitly using a numeric expression. Following this idea, FileCheckPatternSubstitution would be renamed FileCheckSubstitution and the references to "numeric expression" within the class don't change. I don't know that I'd go so far as to have String and Numeric subclasses of FileCheckSubstitution, the abstraction of a Substitution might not really support that. But the naming can surely be improved.

That's a great idea, especially the "string variable" and "string substitution". I was not pleased with the name either but the only thing I thought of was regex variable which is plain wrong, I completely missed the obvious. I think there's still a confusion left with "substitution". The class represents the use of a numeric or string variable but I also use it in place to refer to the [[]] blocks (eg. "Numeric substitutions start with a '#' sign after the double brackets and also have the definition and substitution forms"). While even definition do lead to a substitution (since the square brackets are replaced by something else) I think substitution seems to refer more naturally to the use of a variable. So I want a name for the [[]] blocks but the only thing I can come up so far is substitution block. So you'd have string substitution blocks and numeric substitution blocks (or just substitution block when the distinction does not matter) which can be either string/numeric variable definition or string/numeric substitutions. Can you think of a better naming than substitution block? If not, I'll run with it.

I think I might have a go with a class hierarchy for FileCheckSubstitution as it might avoid the isNumExpr member variable, unless you think it's overkill.

What do you think? This is a discussion point, not a please-do-this comment.

I quite like the suggestion even if that means make an extra separate patch obviously. Naming clarity is very important to make the comments and thus the code understandable. It's going to be painful changing some of the error messages and thus the tests but it's worth it IMHO.

That's a great idea, especially the "string variable" and "string substitution". I was not pleased with the name either but the only thing I thought of was regex variable which is plain wrong, I completely missed the obvious.

Naming is hard, I clearly didn't think of it right away either.

I think I might have a go with a class hierarchy for FileCheckSubstitution as it might avoid the isNumExpr member variable, unless you think it's overkill.

The question is really: Is there a reasonable interface common to the two cases for consumers, and can the implementation be separated cleanly into string and numeric cases? The goal here should be to make the interface usable and the implementation understandable. If a hierarchy achieves that goal, then sure.

thopre updated this revision to Diff 200334.May 20 2019, 10:50 AM
thopre marked 2 inline comments as done.

Rebase after renaming patch

thopre marked an inline comment as done.May 21 2019, 2:35 AM
thopre added inline comments.
llvm/lib/Support/FileCheck.cpp
385 ↗(On Diff #200334)

Typo: Numeric substitution* blocks

thopre updated this revision to Diff 200472.May 21 2019, 5:55 AM
  • Rebase on top of latest changes
  • Fix typo
thopre updated this revision to Diff 200535.May 21 2019, 10:23 AM

Rename parseNumericSubstitution into parseNumericSubstitutionBlock since it also parses numeric variable definition

thopre updated this revision to Diff 200885.May 23 2019, 1:45 AM

Fix unit tests

thopre updated this revision to Diff 200888.May 23 2019, 1:56 AM

Prevent double reporting in definedCmdlineVariables

Looked at everything up to, but not including the unit tests. Thanks for all the work on this!

llvm/include/llvm/Support/FileCheck.h
355 ↗(On Diff #199579)

There's still an explicit here (but it's worth noting that explicit does still have an effect on multi-parameter constructor).

332 ↗(On Diff #199479)

I continue to not understand the "capture parenethese number"...

45 ↗(On Diff #200888)

the respective definition -> their respective definitions

52 ↗(On Diff #200888)

Not a new thing, but does this need the llvm::?

61 ↗(On Diff #200888)

I don't think you need to initialize Value here explicitly. The llvm::Optional default constructor should do that.

83 ↗(On Diff #200888)

This should return size_t to match DefLineNumber's type.

245–246 ↗(On Diff #200888)

I still don't understand what a "parenthesized subexpression number is" or what a CaptureParen is.

279 ↗(On Diff #200888)

in -> in the

386 ↗(On Diff #200888)

This type should match that of the earlier DefLineNumber member.

llvm/lib/Support/FileCheck.cpp
152 ↗(On Diff #200888)

variable -> variables

182 ↗(On Diff #200888)

same line -> same line as used (?)

201 ↗(On Diff #200888)

Don't think this should change in this patch, but maybe this is a hint that parseNumericVariable and other similar functions should be returning an llvm::Error rather than a boolean, and for the errors to be handled much higher up the stack somewhere.

Returning booleans can result in confusion or people missing a check somewhere.

jhenderson added inline comments.May 23 2019, 6:30 AM
llvm/lib/Support/FileCheck.cpp
256 ↗(On Diff #200888)

Nit: delete this blank line.

290 ↗(On Diff #200888)

Parse the numeric...

385–387 ↗(On Diff #200888)

I'd change this sentence to be: "Numeric substitution blocks work the same way as string ones, but start with a '#' sign after the double brackets."

409 ↗(On Diff #200888)

I think it would be slightly clearer to just call this IsDefinition.

509 ↗(On Diff #200888)

No need for struct here.

519 ↗(On Diff #200888)

Mark string -> Mark the string

520 ↗(On Diff #200888)

variable -> variables

522 ↗(On Diff #200888)

that -> this

523 ↗(On Diff #200888)

loose -> lose
(loose == opposite of tight, lose == opposite of win)

633 ↗(On Diff #200888)

See earlier comments about CaptureParen etc. As a result, I don't really follow what this code is doing.

643 ↗(On Diff #200888)

I'm not sure you've given the right error message here. Isn't the problem that MatchedValue does not represent a decimal integer? If that's the case, I'd go with an error message saying that.

1713–1714 ↗(On Diff #200888)

This seems like a bit of a weird thing to do. Why not just call the function directly? In other words, I suspect the comment should be updated.

1738 ↗(On Diff #200888)

"Expr test" needs to be explained more. I assume it's referring to the Expr.empty() line below, but perhaps better would be to outline what we're testing a bit more. Also, if FOO+2=10 is invalid here, but the code passes, maybe that implies we should be calling a different function (which presumably would be a subset of the parseNumericVariable function).

llvm/test/FileCheck/numeric-expression.txt
1 ↗(On Diff #200888)

Use double-dash like the test was before here.

11 ↗(On Diff #200888)

"in alternate spacing" is a weird phrase. Perhaps "with different spacing"

17–19 ↗(On Diff #200888)

Do you ever use these definitions? I don't see them being used, but they should be, as otherwise you aren't really testing anything.

31–32 ↗(On Diff #200888)

See above comment about "in alternate spacing". Also, there's a rogue full-stop at the end of line 31, but none at the end of line 32.

78 ↗(On Diff #200888)

Put the double-dashes back, please, throughout the test.

99 ↗(On Diff #200888)

You've used "PAT" here, presumably as an abbreviation for "pattern", but I don't think that abbreviation makes sense any more given the recent naming changes?

llvm/test/FileCheck/var-scope.txt
4 ↗(On Diff #200888)

"variable are remain defined"

Not sure what this is saying exactly, but it isn't good English. Probably simply "variables remain..."

18–19 ↗(On Diff #200888)

Doesn't adding the ':' change the semantics of this test? Why are you doing that?

probinson added inline comments.May 23 2019, 8:02 AM
llvm/include/llvm/Support/FileCheck.h
355 ↗(On Diff #199579)

Really? What effect does explicit have on a multi-parameter constructor? My reading of the cppreference.org description is that it used to apply to T var = {args}; but that is list-initialization starting with C++11.

jhenderson added inline comments.May 23 2019, 8:28 AM
llvm/include/llvm/Support/FileCheck.h
355 ↗(On Diff #199579)

The example on https://en.cppreference.com/w/cpp/language/explicit, shows that copy-list-initialization does not consider conversion constructors marked with explicit, from C++11. Take a look at the A3, A4, B3 and B4 examples. Summary version is that A var = {arg1, arg2}; will match a constructor without explicit, but not one with, whilst A var {arg1, arg2}; will work for both.

Mind you, it's not likely to actually matter in most instances, and we may not actually want the behaviour of explicit.

probinson added inline comments.May 23 2019, 8:58 AM
llvm/include/llvm/Support/FileCheck.h
355 ↗(On Diff #199579)

Okay, C++ is just too weird.
My point, and I did have one, is that explicit on a multi-parameter ctor is generally not used within LLVM, because the syntax where it matters is obviously construction or initialization anyway. LLVM tends to prefer the less-verbose syntax where there's no real benefit to the more-verbose syntax. The explicit was appropriate when this was a one-parameter ctor, and isn't really useful now, so it should be removed.
Or, if @thopre thinks it is useful, say why.

thopre updated this revision to Diff 200993.May 23 2019, 9:06 AM
thopre marked 47 inline comments as done.
  • Address not contested review comments
  • rebase on top of llvm:: prefix removal

Looked at everything up to, but not including the unit tests. Thanks for all the work on this!

And many thanks again for your patience and dedicated reviews. I apologies for all the change which I fail occasionally to carry over when resolving conflict in rebases and for the english and C++ rookie mistakes.

llvm/include/llvm/Support/FileCheck.h
332 ↗(On Diff #199479)

Oops, forgot to change that one. See above for the explanation, can't come up with a more intelligible terminology. pcresyntax(3) manual refer to them as capturing group but I don't think you'll like that any better. Would "regex parenthesized subexpression number" be clearer?

355 ↗(On Diff #199579)

I'm not sure I understand where you're getting at. Are you suggesting that explicit was only useful when the constructor only had the Ty parameter?

52 ↗(On Diff #200888)

No, and there's also some existing function with llvm:: prefix. Is it ok if I schedule a patch to remove all of them just after this patch (i.e. between patch 6 and 7)?

245–246 ↗(On Diff #200888)

I took the naming from the regex (7) manual. It refers to a parenthesis block in a regex that capture a part of the result. E.g. foo([0-9]) would capture the digit 5 if matched against foo5. I don't mind changing it to another more explicit naming (though I like this one) but would be opposed to explaining the term here because I would not know which of the occurences of this name should have the explanation.

Any suggestion?

llvm/lib/Support/FileCheck.cpp
201 ↗(On Diff #200888)

Mmmh, I'll try to schedule a patch to that effect once this patch is in.

633 ↗(On Diff #200888)

Each numeric expression that defines a numeric variable will have its corresponding substring in RegExStr surrounded by parenthesis which will store the text that matches it in an entry of the MatchInfo array below. I thus check that the index (capture parenthesis number) is not higher than the number of entries and then record the text that was match in the numeric variable.

643 ↗(On Diff #200888)

Not only. It could be a too big value (e.g. a value that would require more than 64 bits to represent). That error message covers both cases. Later on it also covers cases where the format of the number does not match the format expected (e.g. the number in the input is 0xAB but the variable has unsigned decimal format).

1713–1714 ↗(On Diff #200888)

I'm not following. The function parseNumericVariable is an instance method so needs an object to invoke it. The reason for it is that it checks for collision between numeric and string variables among other things.

1738 ↗(On Diff #200888)

That would create a function that would be called only here and contain the same code as below, wouldn't it? parseNumericVariable check whether there is a numeric variable at the start of the string it gets as parameter and verifies that this numeric variable is allowed in the context (ie. exists if it is a use, or does not conflict with a string variable if it is a definition). In this example it correctly detects FOO in FOO+2 and strip it from the original string. It is the role of the caller to decide what to do next. parseBinop for instance accepts happily to have an operator after that, but here we only want to have a variable.

This test is effectively saying: does this start with a numeric variable and is there nothing after that, i.e. is this string only a valid numeric variable.

llvm/test/FileCheck/numeric-expression.txt
99 ↗(On Diff #200888)

Argh indeed.

llvm/test/FileCheck/var-scope.txt
18–19 ↗(On Diff #200888)

This test is divided in 3 parts: (i) the first part defines some local and global variable, (ii) the second part checks that they can be matched before a LABEL barrier irregardless of whether --enable-var-scope is used and (iii) the third and last part checks that only global variable can be matched after a LABEL barrier when using --enable-var-scope.

Prior to this test the only way to define a numeric variable was via the -D command-line option hence the first part was actually doing a variable use for numeric variable. With this patch this bit can be turned into numeric variable definition to be consistent. -D command-line option is already tested in another testfile.

thopre updated this revision to Diff 200999.May 23 2019, 9:20 AM

Rebase on new version of llvm:: prefix removal patch

I'm not likely going to have any chance to look at this again by the end of the day, and I'm away all of next week. I'm okay if this gets approved by @probinson and all my outstanding comments have been addressed for this to land without me needing to look further at it.

I'm not likely going to have any chance to look at this again by the end of the day, and I'm away all of next week. I'm okay if this gets approved by @probinson and all my outstanding comments have been addressed for this to land without me needing to look further at it.

Is there any chance you could acknowledge whether the responses I've given to some of your remarks are satisfying?

I've responded to as much as I have time for. Hopefully the rest @probinson will understand what I'm talking about.

llvm/include/llvm/Support/FileCheck.h
392 ↗(On Diff #200999)

size_t here?

245–246 ↗(On Diff #200888)

I think there's a difference between capturing groups and what this code is doing, because the rest of the check isn't a regex usually. Indeed, you could end up with actual groups inside your regex pattern e.g. [[FOO:(a group)]] which could risk confusion. Also, I feel like "group" is a far more commonly understood term in regexes.

I think part of the problem is that this description refers to "RegExStr" which isn't a class member of global variable, so the class is no longer standalone in its interpretation.

If you really think that you should use the same terminology as regular regexes do, call them "CaptureGroup" or "GroupNumber" or something. What is the Paren actually referring to in the variable name below, for example? The name to me implies it's recording a number to do with a parenthesis, but the number's meaning is unclear.

llvm/lib/Support/FileCheck.cpp
1713–1714 ↗(On Diff #200888)

Right, what I'm saying is, could the validation logic move out of parseNumericVariable into a free function, shared by parseNumericVariable and the code here.

thopre updated this revision to Diff 201649.May 28 2019, 6:16 AM
thopre marked 2 inline comments as done.
  • Split parseNumericVariable into parseNumericVariableUse and parseNumericVariableDefinition since little is shared between two usages of this function
  • Change API of parseVariable given that most callers extract the variable name and the trailer from TrailIdx
  • Rename CaptureParen into CaptureParenGroup and hopefully clarify what this is about
thopre marked an inline comment as done.May 28 2019, 6:17 AM
thopre updated this revision to Diff 201979.May 29 2019, 10:40 AM

Make parseNumericVariableDefinition private

thopre marked an inline comment as done.Jun 4 2019, 7:14 AM
thopre added inline comments.
llvm/test/FileCheck/numeric-expression.txt
32–33 ↗(On Diff #194481)

Ping?

Grammar nits and one longer point, which is:
I understand where the "parenthesis group" term came from, but I think it's not appropriate here. While a CHECK line is implicitly a regex, and variables are a kind of back-reference, the syntax for variable references (use or def) is not parentheses and the content is not a "group" in any real sense, and so "parenthesis group" is an un-obvious and confusing term.
Because FileCheck uses square brackets, I propose that the least disruptive change at this point would be to call them "bracket expressions." The "bracket" part is obvious, and "expression" because (a) definitions will have a regex, and (b) numeric variable references are implicitly expressions. (Okay, still a bit of a stretch, but less so that "parenthesis group" IMO.)
WDYT?

Also let me apologize in advance for the review sparsity lately, and going forward. I was at a conference all last week, and I will be on holiday with limited internet through the end of next week.

llvm/include/llvm/Support/FileCheck.h
411 ↗(On Diff #201979)

Comma after "fails"

480 ↗(On Diff #201979)

in which case

thopre added a comment.Jun 4 2019, 9:58 AM

Grammar nits and one longer point, which is:
I understand where the "parenthesis group" term came from, but I think it's not appropriate here. While a CHECK line is implicitly a regex, and variables are a kind of back-reference, the syntax for variable references (use or def) is not parentheses and the content is not a "group" in any real sense, and so "parenthesis group" is an un-obvious and confusing term.
Because FileCheck uses square brackets, I propose that the least disruptive change at this point would be to call them "bracket expressions." The "bracket" part is obvious, and "expression" because (a) definitions will have a regex, and (b) numeric variable references are implicitly expressions. (Okay, still a bit of a stretch, but less so that "parenthesis group" IMO.)
WDYT?

Actually the parenthesis group refers to the regex being generated for a given CHECK line and fed to the regex engine. Let me give an example:

CHECK: foo [[STRVAR:([a-z]+)]] #NUMVAR:

would create the regex "foo (([a-z]+)) ([0-9]+)" and record that the value for STRVAL is captured by the parenthesis group 1 (0 captures what the whole regex matches) and NUMVAR's value is captured by the parenthesis group 3. The Regex::match function takes an array where each entry takes the matched text of the corresponding parenthesis group. The FileCheckNumExprMatch structure records for numeric variable definitions what entries to look for the values in that array so parenthesis group is perfectly adequate (apart from perhaps better naming) IMHO.

Also let me apologize in advance for the review sparsity lately, and going forward. I was at a conference all last week, and I will be on holiday with limited internet through the end of next week.

Sparsity of review is a given when reviewer is from a different company, I adapt my workflow around it. Likewise I try to respond to comments as fast as possible but might not always be able to.

thopre updated this revision to Diff 202969.Jun 4 2019, 10:03 AM
thopre marked an inline comment as done.

Fix typos

Ah ha. The ParenGroup referring to the CHECK line as it has been translated for consumption by the underlying regex package... that was not clear.
In which case the terminology is okay but the commentary could be better, see inline comment.

llvm/include/llvm/Support/FileCheck.h
350 ↗(On Diff #201979)

I think what this is trying to say is:
the pattern "foo[[bar:.*]]baz[[bar]]quux[[bar:.*]]"
is translated into some regex with parentheses,
the second set of parentheses corresponds to the last definition of "bar" on the line,
hence VariableDefs will map "bar" to 2.

Is that right? The translation of check-line-with-square-brackets-with-regex into parenthesized-regex-passed-to-regex-library is not obvious in this comment as written.

thopre updated this revision to Diff 203142.Jun 5 2019, 7:24 AM
thopre marked 3 inline comments as done.

Clarify mapping from string variable definition to parenthesis group number

thopre added inline comments.Jun 5 2019, 7:25 AM
llvm/include/llvm/Support/FileCheck.h
350 ↗(On Diff #201979)

Indeed. The comment was already unclear before this patch series, I think the reason is in most cases the CHECK directive does not contain parenthesis so the square bracket definition maps directly to the parenthesis group number. Hopefully the comment is now clearer.

probinson accepted this revision.Jun 5 2019, 8:16 AM

I am pretty sure all @jhenderson comments have been addressed, and I'm happy, so LGTM.
As I said previously, it may be a while before I get to your next patch.

llvm/include/llvm/Support/FileCheck.h
350 ↗(On Diff #201979)

It looks odd that the translated expression has nothing for the use of [[bar]] in the original, but the comment is much better now, thanks!

This revision is now accepted and ready to land.Jun 5 2019, 8:16 AM

Looks good except for one thing: missing unit tests for parseNumericVariableUse?

thopre updated this revision to Diff 203169.Jun 5 2019, 8:37 AM
thopre marked 2 inline comments as done.

Fix value of RegExStr in comment about VariableDefs and show case of use of variable defined on earlier line.

llvm/include/llvm/Support/FileCheck.h
350 ↗(On Diff #201979)

Indeed, it should contain a backreference to the first definition. When it refers to a definition on a different line the value gets inserted in the regex just before match. I've updated the pattern to show both cases, using <quux value> to represent the value captured on a previous line of QUUX.

thopre added a comment.Jun 5 2019, 9:44 AM

Looks good except for one thing: missing unit tests for parseNumericVariableUse?

This is a private method so I've moved the tests to ParseExpr which tests parseNumericSubstitutionBlock which calls into parseBinop (also private) which calls into parseNumericVariableUse.

jhenderson accepted this revision.Jun 6 2019, 6:14 AM

Looks good except for one thing: missing unit tests for parseNumericVariableUse?

This is a private method so I've moved the tests to ParseExpr which tests parseNumericSubstitutionBlock which calls into parseBinop (also private) which calls into parseNumericVariableUse.

Oh, my bad, I checked before making that comment, and thought it was public. Must have misread the code. LGTM.

This revision was automatically updated to reflect the committed changes.
hliao added a subscriber: hliao.Jan 9 2020, 6:26 AM
hliao added inline comments.
llvm/trunk/docs/CommandGuide/FileCheck.rst
630–632

Just curious why we have such a restriction. Could someone elaborate more?

That makes numeric checks on the same line must be written in split like this

CHECK: #VAR:...
CHECK-SAME: #VAR + 1

instead of

CHECK: #VAR:... #VAR + 1

That's quite verbose.

thopre marked 2 inline comments as done.Jan 9 2020, 6:47 AM
thopre added inline comments.
llvm/trunk/docs/CommandGuide/FileCheck.rst
630–632

This is not currently done for 2 reasons:

  1. The first reason is that the current code assumes the value of numeric expression is known at match time but in the case #VAR: #VAR+1 the value VAR+1 is not known since VAR is not known yet. I have a patch planned to address that later in the patch series [1]
  2. The more serious problem is how to deal with input such as 10 12 13 and the CHECK #VAR: {{.*}} #VAR+1 check. Here you'd want the CHECK to be successful and VAR to be assigned the value 12. However FileCheck with the aforementioned patch will ask the regex engine to match the input against the regex "([0-9]+) ([0-9]+)" and will then check the captured text to verify that the numeric expression is verified. However that regex will capture 10 and 12, being the first 2 numbers. Now given the .* how to look at all other possible matches? We would need help from the regex engine but the API only gives us a match/no match answer with captured text.

[1] https://reviews.llvm.org/D60392 which needs serious rebasing

Hope this answers your question.

hliao added inline comments.Jan 9 2020, 9:18 AM
llvm/trunk/docs/CommandGuide/FileCheck.rst
630–632
  1. I wish [1] could be landed as soon as possible. So far, numeric substitution is inconsistent with the previous string substitution. The later allows the use and def in the same line. E.g. the following works ` 11 11 CHECK: [[VAR:[0-9]+]] [[VAR]] `
  2. I haven't looked into the numeric support in detail. We should reference the original string block in FileCheck. A similar pattern works for the string block: ` 11 22 22 CHECK: [[VAR:[0-9]+]] [[VAR]] `

Any details on why numeric block works differently from string block?

hliao added inline comments.Jan 9 2020, 9:22 AM
llvm/trunk/docs/CommandGuide/FileCheck.rst
630–632

Fix the formatting of the examples aforesaid:

11 11
CHECK: [[VAR:[0-9]+]] [[VAR]]
11 22 22
CHECK: [[VAR:[0-9]+]] [[VAR]]

both of them work for string blocks.

thopre marked 2 inline comments as done.Jan 10 2020, 2:24 AM
thopre added inline comments.
llvm/trunk/docs/CommandGuide/FileCheck.rst
630–632

That's because it gets translated into "([0-9]+) \1" for the regex engine, i.e. string substitution can be expressed in terms of regex while numeric substitution cannot always. The current restriction is to only accept cases that can be expressed at the regex level.

The reason I've kept the patch to life part of the restriction last is that I'm not yet convinced it is a good thing to do. Take my example again but as a CHECK-NOT:

CHECK-NOT: #VAR: #VAR+1

This would be accepted after the last patch but if presented with the input "10 12 13" this would *not* give an error because the corresponding regex ("([0-9]+) ([0-9]+)") would match "10 12" which does not verify the +1 relation and thus the CHECK-NOT would be satisfied, despite the "12 13" bits. Given the potential for errors I'm leaning toward discarding the last patch at the moment.