Page MenuHomePhabricator

FileCheck [6/12]: Introduce numeric variable definition
ClosedPublic

Authored by thopre on Apr 7 2019, 4:20 PM.

Details

Summary

This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch introduces support for defining
numeric variable in a CHECK directive.

This commit introduces support for defining numeric variable from a
litteral value in the input text. Numeric expressions can then use the
variable provided it is on a later line.

Copyright:

  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.May 23 2019, 6:30 AM
llvm/include/llvm/Support/FileCheck.h
226–227

I still don't understand what a "parenthesized subexpression number is" or what a CaptureParen is.

259

in -> in the

332

I continue to not understand the "capture parenethese number"...

349

This type should match that of the earlier DefLineNumber member.

352–353

There's still an explicit here (but it's worth noting that explicit does still have an effect on multi-parameter constructor).

llvm/lib/Support/FileCheck.cpp
151

variable -> variables

182

same line -> same line as used (?)

201

Don't think this should change in this patch, but maybe this is a hint that parseNumericVariable and other similar functions should be returning an llvm::Error rather than a boolean, and for the errors to be handled much higher up the stack somewhere.

Returning booleans can result in confusion or people missing a check somewhere.

300

Nit: delete this blank line.

334

Parse the numeric...

382–391

I'd change this sentence to be: "Numeric substitution blocks work the same way as string ones, but start with a '#' sign after the double brackets."

412–422

I think it would be slightly clearer to just call this IsDefinition.

511

No need for struct here.

521

Mark string -> Mark the string

522

variable -> variables

524

that -> this

525

loose -> lose
(loose == opposite of tight, lose == opposite of win)

634

See earlier comments about CaptureParen etc. As a result, I don't really follow what this code is doing.

644

I'm not sure you've given the right error message here. Isn't the problem that MatchedValue does not represent a decimal integer? If that's the case, I'd go with an error message saying that.

1705–1706

This seems like a bit of a weird thing to do. Why not just call the function directly? In other words, I suspect the comment should be updated.

1730

"Expr test" needs to be explained more. I assume it's referring to the Expr.empty() line below, but perhaps better would be to outline what we're testing a bit more. Also, if FOO+2=10 is invalid here, but the code passes, maybe that implies we should be calling a different function (which presumably would be a subset of the parseNumericVariable function).

llvm/test/FileCheck/numeric-expression.txt
1

Use double-dash like the test was before here.

11

"in alternate spacing" is a weird phrase. Perhaps "with different spacing"

17–19

Do you ever use these definitions? I don't see them being used, but they should be, as otherwise you aren't really testing anything.

31–32

See above comment about "in alternate spacing". Also, there's a rogue full-stop at the end of line 31, but none at the end of line 32.

79

Put the double-dashes back, please, throughout the test.

98

You've used "PAT" here, presumably as an abbreviation for "pattern", but I don't think that abbreviation makes sense any more given the recent naming changes?

llvm/test/FileCheck/var-scope.txt
0–1

"variable are remain defined"

Not sure what this is saying exactly, but it isn't good English. Probably simply "variables remain..."

13–16

Doesn't adding the ':' change the semantics of this test? Why are you doing that?

probinson added inline comments.May 23 2019, 8:02 AM
llvm/include/llvm/Support/FileCheck.h
352–353

Really? What effect does explicit have on a multi-parameter constructor? My reading of the cppreference.org description is that it used to apply to T var = {args}; but that is list-initialization starting with C++11.

jhenderson added inline comments.May 23 2019, 8:28 AM
llvm/include/llvm/Support/FileCheck.h
352–353

The example on https://en.cppreference.com/w/cpp/language/explicit, shows that copy-list-initialization does not consider conversion constructors marked with explicit, from C++11. Take a look at the A3, A4, B3 and B4 examples. Summary version is that A var = {arg1, arg2}; will match a constructor without explicit, but not one with, whilst A var {arg1, arg2}; will work for both.

Mind you, it's not likely to actually matter in most instances, and we may not actually want the behaviour of explicit.

probinson added inline comments.May 23 2019, 8:58 AM
llvm/include/llvm/Support/FileCheck.h
352–353

Okay, C++ is just too weird.
My point, and I did have one, is that explicit on a multi-parameter ctor is generally not used within LLVM, because the syntax where it matters is obviously construction or initialization anyway. LLVM tends to prefer the less-verbose syntax where there's no real benefit to the more-verbose syntax. The explicit was appropriate when this was a one-parameter ctor, and isn't really useful now, so it should be removed.
Or, if @thopre thinks it is useful, say why.

thopre updated this revision to Diff 200993.May 23 2019, 9:06 AM
thopre marked 47 inline comments as done.
  • Address not contested review comments
  • rebase on top of llvm:: prefix removal

Looked at everything up to, but not including the unit tests. Thanks for all the work on this!

And many thanks again for your patience and dedicated reviews. I apologies for all the change which I fail occasionally to carry over when resolving conflict in rebases and for the english and C++ rookie mistakes.

llvm/include/llvm/Support/FileCheck.h
52

No, and there's also some existing function with llvm:: prefix. Is it ok if I schedule a patch to remove all of them just after this patch (i.e. between patch 6 and 7)?

226–227

I took the naming from the regex (7) manual. It refers to a parenthesis block in a regex that capture a part of the result. E.g. foo([0-9]) would capture the digit 5 if matched against foo5. I don't mind changing it to another more explicit naming (though I like this one) but would be opposed to explaining the term here because I would not know which of the occurences of this name should have the explanation.

Any suggestion?

332

Oops, forgot to change that one. See above for the explanation, can't come up with a more intelligible terminology. pcresyntax(3) manual refer to them as capturing group but I don't think you'll like that any better. Would "regex parenthesized subexpression number" be clearer?

352–353

I'm not sure I understand where you're getting at. Are you suggesting that explicit was only useful when the constructor only had the Ty parameter?

llvm/lib/Support/FileCheck.cpp
201

Mmmh, I'll try to schedule a patch to that effect once this patch is in.

634

Each numeric expression that defines a numeric variable will have its corresponding substring in RegExStr surrounded by parenthesis which will store the text that matches it in an entry of the MatchInfo array below. I thus check that the index (capture parenthesis number) is not higher than the number of entries and then record the text that was match in the numeric variable.

644

Not only. It could be a too big value (e.g. a value that would require more than 64 bits to represent). That error message covers both cases. Later on it also covers cases where the format of the number does not match the format expected (e.g. the number in the input is 0xAB but the variable has unsigned decimal format).

1705–1706

I'm not following. The function parseNumericVariable is an instance method so needs an object to invoke it. The reason for it is that it checks for collision between numeric and string variables among other things.

1730

That would create a function that would be called only here and contain the same code as below, wouldn't it? parseNumericVariable check whether there is a numeric variable at the start of the string it gets as parameter and verifies that this numeric variable is allowed in the context (ie. exists if it is a use, or does not conflict with a string variable if it is a definition). In this example it correctly detects FOO in FOO+2 and strip it from the original string. It is the role of the caller to decide what to do next. parseBinop for instance accepts happily to have an operator after that, but here we only want to have a variable.

This test is effectively saying: does this start with a numeric variable and is there nothing after that, i.e. is this string only a valid numeric variable.

llvm/test/FileCheck/numeric-expression.txt
98

Argh indeed.

llvm/test/FileCheck/var-scope.txt
13–16

This test is divided in 3 parts: (i) the first part defines some local and global variable, (ii) the second part checks that they can be matched before a LABEL barrier irregardless of whether --enable-var-scope is used and (iii) the third and last part checks that only global variable can be matched after a LABEL barrier when using --enable-var-scope.

Prior to this test the only way to define a numeric variable was via the -D command-line option hence the first part was actually doing a variable use for numeric variable. With this patch this bit can be turned into numeric variable definition to be consistent. -D command-line option is already tested in another testfile.

thopre updated this revision to Diff 200999.May 23 2019, 9:20 AM

Rebase on new version of llvm:: prefix removal patch

I'm not likely going to have any chance to look at this again by the end of the day, and I'm away all of next week. I'm okay if this gets approved by @probinson and all my outstanding comments have been addressed for this to land without me needing to look further at it.

I'm not likely going to have any chance to look at this again by the end of the day, and I'm away all of next week. I'm okay if this gets approved by @probinson and all my outstanding comments have been addressed for this to land without me needing to look further at it.

Is there any chance you could acknowledge whether the responses I've given to some of your remarks are satisfying?

I've responded to as much as I have time for. Hopefully the rest @probinson will understand what I'm talking about.

llvm/include/llvm/Support/FileCheck.h
226–227

I think there's a difference between capturing groups and what this code is doing, because the rest of the check isn't a regex usually. Indeed, you could end up with actual groups inside your regex pattern e.g. [[FOO:(a group)]] which could risk confusion. Also, I feel like "group" is a far more commonly understood term in regexes.

I think part of the problem is that this description refers to "RegExStr" which isn't a class member of global variable, so the class is no longer standalone in its interpretation.

If you really think that you should use the same terminology as regular regexes do, call them "CaptureGroup" or "GroupNumber" or something. What is the Paren actually referring to in the variable name below, for example? The name to me implies it's recording a number to do with a parenthesis, but the number's meaning is unclear.

354

size_t here?

llvm/lib/Support/FileCheck.cpp
1705–1706

Right, what I'm saying is, could the validation logic move out of parseNumericVariable into a free function, shared by parseNumericVariable and the code here.

thopre updated this revision to Diff 201649.May 28 2019, 6:16 AM
thopre marked 2 inline comments as done.
  • Split parseNumericVariable into parseNumericVariableUse and parseNumericVariableDefinition since little is shared between two usages of this function
  • Change API of parseVariable given that most callers extract the variable name and the trailer from TrailIdx
  • Rename CaptureParen into CaptureParenGroup and hopefully clarify what this is about
thopre marked an inline comment as done.May 28 2019, 6:17 AM
thopre updated this revision to Diff 201979.May 29 2019, 10:40 AM

Make parseNumericVariableDefinition private

thopre marked an inline comment as done.Jun 4 2019, 7:14 AM
thopre added inline comments.
llvm/test/FileCheck/numeric-expression.txt
21–22

Ping?

Grammar nits and one longer point, which is:
I understand where the "parenthesis group" term came from, but I think it's not appropriate here. While a CHECK line is implicitly a regex, and variables are a kind of back-reference, the syntax for variable references (use or def) is not parentheses and the content is not a "group" in any real sense, and so "parenthesis group" is an un-obvious and confusing term.
Because FileCheck uses square brackets, I propose that the least disruptive change at this point would be to call them "bracket expressions." The "bracket" part is obvious, and "expression" because (a) definitions will have a regex, and (b) numeric variable references are implicitly expressions. (Okay, still a bit of a stretch, but less so that "parenthesis group" IMO.)
WDYT?

Also let me apologize in advance for the review sparsity lately, and going forward. I was at a conference all last week, and I will be on holiday with limited internet through the end of next week.

llvm/include/llvm/Support/FileCheck.h
379

Comma after "fails"

442

in which case

thopre added a comment.Jun 4 2019, 9:58 AM

Grammar nits and one longer point, which is:
I understand where the "parenthesis group" term came from, but I think it's not appropriate here. While a CHECK line is implicitly a regex, and variables are a kind of back-reference, the syntax for variable references (use or def) is not parentheses and the content is not a "group" in any real sense, and so "parenthesis group" is an un-obvious and confusing term.
Because FileCheck uses square brackets, I propose that the least disruptive change at this point would be to call them "bracket expressions." The "bracket" part is obvious, and "expression" because (a) definitions will have a regex, and (b) numeric variable references are implicitly expressions. (Okay, still a bit of a stretch, but less so that "parenthesis group" IMO.)
WDYT?

Actually the parenthesis group refers to the regex being generated for a given CHECK line and fed to the regex engine. Let me give an example:

CHECK: foo [[STRVAR:([a-z]+)]] #NUMVAR:

would create the regex "foo (([a-z]+)) ([0-9]+)" and record that the value for STRVAL is captured by the parenthesis group 1 (0 captures what the whole regex matches) and NUMVAR's value is captured by the parenthesis group 3. The Regex::match function takes an array where each entry takes the matched text of the corresponding parenthesis group. The FileCheckNumExprMatch structure records for numeric variable definitions what entries to look for the values in that array so parenthesis group is perfectly adequate (apart from perhaps better naming) IMHO.

Also let me apologize in advance for the review sparsity lately, and going forward. I was at a conference all last week, and I will be on holiday with limited internet through the end of next week.

Sparsity of review is a given when reviewer is from a different company, I adapt my workflow around it. Likewise I try to respond to comments as fast as possible but might not always be able to.

thopre updated this revision to Diff 202969.Jun 4 2019, 10:03 AM
thopre marked an inline comment as done.

Fix typos

Ah ha. The ParenGroup referring to the CHECK line as it has been translated for consumption by the underlying regex package... that was not clear.
In which case the terminology is okay but the commentary could be better, see inline comment.

llvm/include/llvm/Support/FileCheck.h
326

I think what this is trying to say is:
the pattern "foo[[bar:.*]]baz[[bar]]quux[[bar:.*]]"
is translated into some regex with parentheses,
the second set of parentheses corresponds to the last definition of "bar" on the line,
hence VariableDefs will map "bar" to 2.

Is that right? The translation of check-line-with-square-brackets-with-regex into parenthesized-regex-passed-to-regex-library is not obvious in this comment as written.

thopre updated this revision to Diff 203142.Jun 5 2019, 7:24 AM
thopre marked 3 inline comments as done.

Clarify mapping from string variable definition to parenthesis group number

thopre added inline comments.Jun 5 2019, 7:25 AM
llvm/include/llvm/Support/FileCheck.h
326

Indeed. The comment was already unclear before this patch series, I think the reason is in most cases the CHECK directive does not contain parenthesis so the square bracket definition maps directly to the parenthesis group number. Hopefully the comment is now clearer.

probinson accepted this revision.Jun 5 2019, 8:16 AM

I am pretty sure all @jhenderson comments have been addressed, and I'm happy, so LGTM.
As I said previously, it may be a while before I get to your next patch.

llvm/include/llvm/Support/FileCheck.h
326

It looks odd that the translated expression has nothing for the use of [[bar]] in the original, but the comment is much better now, thanks!

This revision is now accepted and ready to land.Jun 5 2019, 8:16 AM

Looks good except for one thing: missing unit tests for parseNumericVariableUse?

thopre updated this revision to Diff 203169.Jun 5 2019, 8:37 AM
thopre marked 2 inline comments as done.

Fix value of RegExStr in comment about VariableDefs and show case of use of variable defined on earlier line.

llvm/include/llvm/Support/FileCheck.h
326

Indeed, it should contain a backreference to the first definition. When it refers to a definition on a different line the value gets inserted in the regex just before match. I've updated the pattern to show both cases, using <quux value> to represent the value captured on a previous line of QUUX.

thopre added a comment.Jun 5 2019, 9:44 AM

Looks good except for one thing: missing unit tests for parseNumericVariableUse?

This is a private method so I've moved the tests to ParseExpr which tests parseNumericSubstitutionBlock which calls into parseBinop (also private) which calls into parseNumericVariableUse.

jhenderson accepted this revision.Jun 6 2019, 6:14 AM

Looks good except for one thing: missing unit tests for parseNumericVariableUse?

This is a private method so I've moved the tests to ParseExpr which tests parseNumericSubstitutionBlock which calls into parseBinop (also private) which calls into parseNumericVariableUse.

Oh, my bad, I checked before making that comment, and thought it was public. Must have misread the code. LGTM.

This revision was automatically updated to reflect the committed changes.