Page MenuHomePhabricator

FileCheck: Add support for numeric variables and expressions
AbandonedPublic

Authored by thopre on Jul 9 2018, 9:36 AM.

Details

Summary

Sometimes there's a need to verify output which contain several numeric
values that are related by a numeric expression. For example, one might
want to check that two numbers are consecutive but accept any value for
the smallest of the two.

This patch introduces support for numeric variables and expressions to
express this sort of constraint. A user can then define variables to
have a numeric value and write numeric expression using those variable.
A matching format specifier allows to match numbers in a variety of
format: signed and unsigned decimal numbers as well as hex number in
lower or capital case. For example, matching 2 consecutive signed
decimal integers separated by a space is done via:

[[%d,#N:]] #N+1

The syntax is modeled after that of use and definition of the already
supported variables flavour (renamed pattern variable for clarity) and
pseudo variable by adding a # just after the double opening square
bracket [[. Full details of the syntax can be found in the updated
docs/CommandGuide/FileCheck.rst.

One notable pitfall is that CHECK pattern with numeric expression using
variable defined on the same line might fail to match despite there be a
line which satisfy the conditions expressed by those numeric expression.
This is due to lack of needed support from the regular expression engine
that FileCheck relies on and is documented in the updated
docs/CommandGuide/FileCheck.rst as well as possible workaround.

Copyright:

  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
grimar added inline comments.Jul 11 2018, 1:28 AM
utils/FileCheck/FileCheck.cpp
373 ↗(On Diff #154626)

You are doing a lookup in VariableDefs 3 times here. Please avoid.

Thanks all for the valuable comments. It seems clear that more discussion is needed on the syntax before doing any code change. I'll think about it and post a proposal to the ML.

Maybe this should be a new class of "numeric variables" with distinct syntax, e.g., [[#VAR]] or something, with an implicit regex of [[:digit:]]+ instead of making the test-writer use a valid regex.

I'm not sure I'm following you. Do you mean using a different syntax when (i) defining numeric variable or (ii) when using numeric expression?
I also do not understand how this would prevent the test-writer from having to use a valid regex. Could you give an example of the problem you see?

Maybe this should be a new class of "numeric variables" with distinct syntax, e.g., [[#VAR]] or something, with an implicit regex of [[:digit:]]+ instead of making the test-writer use a valid regex.

I'm not sure I'm following you. Do you mean using a different syntax when (i) defining numeric variable or (ii) when using numeric expression?
I also do not understand how this would prevent the test-writer from having to use a valid regex. Could you give an example of the problem you see?

Oh you mean using #VAR to define a numeric variable, therefore avoiding the need to have a regular expression. Yes, I like the idea, especially since it can work with the idea of a syntax to specify the base in which to interpret things as suggested by @jhenderson. What it needs though is a way to distinguish between definition and use though. I'll think about it.

I case people are interested I have also implemented something similar in our fork of llvm. I chose [[@EXPR var + 0x42]] as the syntax. I also added some functions such as [[@EXPR hex(VAR)]].
Variables can either by referenced by name or if they conflict with one of the builtin functions using ${x} syntax: [[@EXPR toupper(hex(${hex}))]]
Some examples of what it can do can be seen here: https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt

The implementation is IMO quite ugly (mostly a stripped down copy of lld's linker script parser) so I never got around to cleaning it up for upstream submission.

I case people are interested I have also implemented something similar in our fork of llvm. I chose [[@EXPR var + 0x42]] as the syntax. I also added some functions such as [[@EXPR hex(VAR)]].
Variables can either by referenced by name or if they conflict with one of the builtin functions using ${x} syntax: [[@EXPR toupper(hex(${hex}))]]

Simple functions are also on my wish-list! The main ones we care about are converting to and from hex numbers, and being able to do additions and subtractions. But I think just the basic integer and hex reading and additions would be a good start.

I case people are interested I have also implemented something similar in our fork of llvm. I chose [[@EXPR var + 0x42]] as the syntax. I also added some functions such as [[@EXPR hex(VAR)]].
Variables can either by referenced by name or if they conflict with one of the builtin functions using ${x} syntax: [[@EXPR toupper(hex(${hex}))]]

Simple functions are also on my wish-list! The main ones we care about are converting to and from hex numbers, and being able to do additions and subtractions. But I think just the basic integer and hex reading and additions would be a good start.

I added hex, oct, bin, dec + toupper/tolower (https://github.com/CTSRD-CHERI/llvm/blob/master/utils/FileCheck/FileCheck.cpp#L697) but I think I only ever ended up using dec/hex/toupper in the tests that I've written since.

tra removed a reviewer: tra.Jul 20 2018, 3:20 PM
tra added a subscriber: tra.
rnk added a subscriber: rnk.Aug 16 2018, 2:30 PM

FWIW I had a use case for this recently, and found it hadn't landed. Is there anything blocking that you need help with?

In D49084#1203189, @rnk wrote:

FWIW I had a use case for this recently, and found it hadn't landed. Is there anything blocking that you need help with?

My apologies for not replying earlier, didn't see the comment until now. Nothing blocking, splitting my time between GNU and LLVM work so progress has been slower than expected. I am nearly done now, I expect to publish a completely new version sometimes this week. Stay tuned!

Best regards.

thopre updated this revision to Diff 179094.Dec 20 2018, 9:50 AM

FileCheck: Add support for numeric variables and expressions

Summary:
Sometimes there's a need to verify output which contain several numeric
values that are related by a numeric expression. For example, one might
want to check that two numbers are consecutive but accept any value for
the smallest of the two.

This patch introduces support for numeric variables and expressions to
express this sort of constraint. A user can then define variables to
have a numeric value and write numeric expression using those variable.
A matching format specifier allows to match numbers in a variety of
format: signed and unsigned decimal numbers as well as hex number in
lower or capital case. For example, matching 2 consecutive signed
decimal integers separated by a space is done via:

[[%d,#N:]] #N+1

The syntax is modeled after that of use and definition of the already
supported variables flavour (renamed pattern variable for clarity) and
pseudo variable by adding a # just after the double opening square
bracket [[. Full details of the syntax can be found in the updated
docs/CommandGuide/FileCheck.rst.

One notable pitfall is that CHECK pattern with numeric expression using
variable defined on the same line might fail to match despite there be a
line which satisfy the conditions expressed by those numeric expression.
This is due to lack of needed support from the regular expression engine
that FileCheck relies on and is documented in the updated
docs/CommandGuide/FileCheck.rst as well as possible workaround.

Reviewers: probinson, arichardson, jhenderson, rnk, jdenny

Subscribers: llvm-commits

thopre retitled this revision from FileCheck: Add support for variable expressions to FileCheck: Add support for numeric variables and expressions.Dec 20 2018, 9:54 AM
thopre edited the summary of this revision. (Show Details)

I am willing to review this; however it is a very large patch, which will take time to go through. Also, after tomorrow, I am on holiday until 3 January.

Some parts of it look like they could be split off into separate patches; for example, the command-line option handling looks like it could be done separately. I wonder if the refactoring of the @LINE handling into the new expression handling could also be separated out.
If not, that's okay, it will just make the review take a bit longer.

As a first step, I have a structural suggestion for the documentation. Once that is done, I will probably have some more detailed comments on wording, but they can wait for now.

docs/CommandGuide/FileCheck.rst
570 ↗(On Diff #179094)

The topic and syntax are more complicated than anything else FileCheck does, so I think it would be better separate the syntax description into two parts: using numeric expressions, and defining numeric variables. The syntax for a numeric expression would be
[[# <constraint> <expr> ]]
and then you can describe how to define the variables,
[[# <fmtspec>,<NUMVAR>: <constraint> <expr> ]]
which builds on the syntax for using variables. I think this ordering will be easier to explain and to understand.

596 ↗(On Diff #179094)

Please give examples of what these lines will match, and what they will not match.

thopre updated this revision to Diff 179194.Dec 20 2018, 4:08 PM

Document how to set numeric variable with -D option

Thanks for updating this patch. It will be very useful!

include/llvm/Support/FileCheck.h
225 ↗(On Diff #179194)

Does this need to be a shared_ptr? Could it be a unique_ptr instead?

lib/Support/FileCheck.cpp
81 ↗(On Diff #179194)

Could this return a llvm::Optional<std::string> to avoid the output-parameter?

1076 ↗(On Diff #179194)

This could be deleted if you initialize CaptureParenNeeded to false at the beginning of the scope. It would also ensure that the variable is never uninitialized.

Thanks for updating this patch. It will be very useful!

+1

thopre marked 5 inline comments as done.Dec 21 2018, 2:08 AM
thopre added inline comments.
docs/CommandGuide/FileCheck.rst
514–518 ↗(On Diff #154626)

It changes for 2 reasons:

  • it redefines @LINE as a (pseudo) numeric variable as opposed to a pattern variable since it behaves like a numeric variable
  • it removes the explanation about offset since it comes naturally from being a numeric variable and then more (eg. #@LINE + OFFSETVAR).

In short, because we now have numeric variable it is easier to say @LINE is one and say what it evaluates to and leave what can be done to it to the general description of numeric variable.

include/llvm/Support/FileCheck.h
225 ↗(On Diff #179194)

Several numeric expressions can refer to the same numeric variable in which case you want the reference counting so that when calling the destructor for those numeric expression delete is called only once on that variable.

lib/Support/FileCheck.cpp
81 ↗(On Diff #179194)

Yes indeed and perhaps in a few other places as well. Thanks

This change is so large that it's going to take quite a while to review. However, I've made a few initial comments on the header. I'll come back to this next week and continue from there.

include/llvm/Support/FileCheck.h
41 ↗(On Diff #179194)

Nit: Code -> code

44 ↗(On Diff #179194)

Why the declaration here?

52 ↗(On Diff #179194)

Nit: print -> printed

57 ↗(On Diff #179194)

I don't really understand this comment. I think you are saying that this "Set" variable indicates whether the format is explicit or not, right? If so, rename the variable to something like IsExplicit or similar.

60 ↗(On Diff #179194)

Several -> If set, there are ...

81 ↗(On Diff #179194)

Don't namespace this. Just make the enum inside a scoped enum (i.e. enum class).

82–84 ↗(On Diff #179194)

I think the general style in LLVM is single space after full stop (applies here and elsewhere).

"is a numeric expression (and thus value ...) available" -> "a numeric expression is available (and thus the value ...)"

95–96 ↗(On Diff #179194)

I'm not sure the second sentence here is necessary.

97 ↗(On Diff #179194)

I'm not sure you need to name all your classes "FileCheckXXX". You can shorten their names by dropping the FileCheck part, I reckon.

99–103 ↗(On Diff #179194)

Why not just always store in a single uint64_t field called Value and convert to signed when requested?

Also, don't unnecessarily abbreviate variables like this.

131–132 ↗(On Diff #179194)

I don't think you need this comment. operator== is fairly self-explanatory as to its purpose...!

139 ↗(On Diff #179194)

LLVM style is for camel-case, starting with lower-case letters for method and function names. Also, don't bother commenting if the function purpose is obvious from its name.

145 ↗(On Diff #179194)

What is a tentative value?

169 ↗(On Diff #179194)

You don't need the "struct" here or in any of the other function arguments or variable declarations etc.

169–170 ↗(On Diff #179194)

This should probably return an Expected<std::string> or Optional<std::string> and not use an output parameter.

172–180 ↗(On Diff #179194)

Could these two just be overloads of operator+ and operator-?

183–184 ↗(On Diff #179194)

Only the first sentence here is needed. The other two are implicit by the fact that this is a virtual class with abstract methods.

191–192 ↗(On Diff #179194)

What does "null" mean in this context? We're not dealing with pointers... Assuming it's some kind of no-format, perhaps calling it by the corresponding variable name might be better.

Again, the last sentence is unnecessary, since it's implicit in the "= 0" part of the function declaration.

195–198 ↗(On Diff #179194)

This isn't the behaviour I'd expect a function of this name to do. I'd expect it to return a vector of names, not append to an input parameter.

Other nit in the comment: overrided -> overridden.

201 ↗(On Diff #179194)

Nit: litteral -> literal

209–213 ↗(On Diff #179194)

Similar to the FileCheckNumExprVal class, I don't think you need both constructors. One should suffice.

215–217 ↗(On Diff #179194)

I'd drop the first sentence and rephrase the second:
"Return the node itself and set \p Fmt to none, since literals do not have conversion formats" (where none is a placeholder for a better word than null).

227–228 ↗(On Diff #179194)

from x 2 -> by

232 ↗(On Diff #179194)

It's a little unclear what is meant by "matching format" here. Is it the format this expression can match against (i.e. hex/decimal etc), or what?

236 ↗(On Diff #179194)

Like struct, you don't nee the enum here.

246 ↗(On Diff #179194)

for numeric -> for a numeric

247–248 ↗(On Diff #179194)

eg. -> e.g.

I'd drop the bit in parentheses at the end of the sentence. If somebody starts using this for another person, this comment will almost certainly go stale.

252–253 ↗(On Diff #179194)

Drop the reference to the shared_ptr in this sentence, and change the second sentence to something like "The pointer is guaranteed to be valid as long as this object is."

256 ↗(On Diff #179194)

What is meant by "matched" in this context?

268 ↗(On Diff #179194)

Remove the first "can"

274 ↗(On Diff #179194)

Rather than returning a bool, and passing in a parameter that will be set, this and similar functions should probably return an Expected<T> or Optional<T>.

294–299 ↗(On Diff #179194)

Why are these two different functions?

thopre updated this revision to Diff 183612.Jan 25 2019, 1:31 PM
thopre marked 42 inline comments as done.

Besides typo and wording improvements:

  • Merge Eval and SetValueFromEval
  • Use union to store an int64_t or uint64_t in FileCheckNumExprVal
  • Turn Add and Sub into operator overloading.
thopre added inline comments.Jan 25 2019, 1:31 PM
include/llvm/Support/FileCheck.h
81 ↗(On Diff #179194)

I was following what is done for the Check enum. Should that enum be changed (in a separate patch)?

82–84 ↗(On Diff #179194)

I saw many occurrences of 2 spaces after dot in FileCheck and thought it had a different code style. Upon inspection I suppose it must be mistakes as single space is more common:

$ git grep -E -c "\. [^ ]" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/lib/Support/FileCheck.cpp
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/lib/Support/FileCheck.cpp:20
$ git grep -E -c "\. [^ ]" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/lib/Support/FileCheck.cpp
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/lib/Support/FileCheck.cpp:12

$ git grep -E -c "\. [^ ]" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/include/llvm/Support/FileCheck.h
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/include/llvm/Support/FileCheck.h:4
$ git grep -E -c "\. [^ ]" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/include/llvm/Support/FileCheck.h
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/include/llvm/Support/FileCheck.h:3

97 ↗(On Diff #179194)

The FileCheck prefix was added when FileCheck logic was moved into a library. See comment from Bogner in https://reviews.llvm.org/D50283

99–103 ↗(On Diff #179194)

Since unsigned to signed conversion is implementation defined and reinterpret_cast does not work between integer types, I would have to do the two-complement manually. I've used a union instead, hope you like it better.

139 ↗(On Diff #179194)

There are actually more examples of camel-case starting with capital letter: all the CheckFoo (eg. CheckSame), the PrintFoo (eg. PrintMatch).

$ git grep -E -c "[a-zA-Z]+ [a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/include/llvm/Support/FileCheck.h
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/include/llvm/Support/FileCheck.h:7
$ git grep -E -c "[a-zA-Z]+ [a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/lib/Support/FileCheck.cpp
<no result>

$ git grep -E -c "[a-zA-Z]+ [A-Z][a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/include/llvm/Support/FileCheck.h
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/include/llvm/Support/FileCheck.h:17
$ git grep -E -c "[a-zA-Z]+ [A-Z][a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/lib/Support/FileCheck.cpp
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/lib/Support/FileCheck.cpp:13

Do you still want me to fix that?

209–213 ↗(On Diff #179194)

My same concern applies here: I can easily "reinterpret_cast" to a uint64_t parameter when calling the constructor and pass an extra Signed parameter but then to do signed arithmetic I cannot think of an *easy* way to reinterpret it back into an int64_t without tripping on undefined or implementation defined behaviour.

294–299 ↗(On Diff #179194)

I can't remember now except that I needed it at some point. It is clearly not needed now, maybe as a result of having the known phase info. Anyway, merged into one.

thopre edited the summary of this revision. (Show Details)Jan 25 2019, 1:36 PM
thopre edited the summary of this revision. (Show Details)Feb 5 2019, 3:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 3:31 AM

For some reason the new diff has lost all of the inline comments...

I was following what is done for the Check enum. Should that enum be changed (in a separate patch)?

Yes, probably.

The FileCheck prefix was added when FileCheck logic was moved into a library. See comment from Bogner in https://reviews.llvm.org/D50283

Ah, I hadn't noticed that it had moved to Support. Okay, this is fine as is.

There are actually more examples of camel-case starting with capital letter: all the CheckFoo (eg. CheckSame), the PrintFoo (eg. PrintMatch).

$ git grep -E -c "[a-zA-Z]+ [a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/include/llvm/Support/FileCheck.h
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/include/llvm/Support/FileCheck.h:7
$ git grep -E -c "[a-zA-Z]+ [a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/lib/Support/FileCheck.cpp
<no result>

$ git grep -E -c "[a-zA-Z]+ [A-Z][a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/include/llvm/Support/FileCheck.h
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/include/llvm/Support/FileCheck.h:17
$ git grep -E -c "[a-zA-Z]+ [A-Z][a-z]+[A-Z][a-zA-Z]+\(" fcefe03bf94005e045499ed6650212cd1e9bed01 llvm/lib/Support/FileCheck.cpp
fcefe03bf94005e045499ed6650212cd1e9bed01:llvm/lib/Support/FileCheck.cpp:13

Do you still want me to fix that?

As there are already a mixture, we should follow the LLVM style for the new code, especially given the amount of new code being added. If it was all a different style there might be a different case to be made.


I definitely think that you need to fragment this up into small incremental patches, as I've found I'm not getting enough time to digest the whole thing at once (and I'm guessing the same goes for other reviewers). If you could try to find incremental ways of doing things that would be great. One example would be to add the command-line switches as non-functional switches first (make them hidden until you're ready for them to be published), then incrementally add features (and expand testing) to them. Similarly accepting some forms of the pattern etc could be done in one change (but do nothing with them) and then add more functionality over time.

llvm/include/llvm/Support/FileCheck.h
109

Nit: ie -> i.e.

129

Nit: other -> Other

174–177

Why not make these member functions? I.e:

FileCheckNumExprVal operator+(const FileChcekNumExprVal &Other);
FileCheckNumExprVal operator-(const FileChcekNumExprVal &Other);

Also Nit: lhs/rhs should be LHS/RHS (but this is moot if switching to member operators).

191

Nit: overriden -> overridden.

205–208

My same concern applies here: I can easily "reinterpret_cast" to a uint64_t parameter when calling the constructor and pass an extra Signed parameter but then to do signed arithmetic I cannot think of an *easy* way to reinterpret it back into an int64_t without tripping on undefined or implementation defined behaviour.

Okay, fair enough. I wonder if they could be templated then instead? I.e:

template <typename T> FileCheckNumExprLiteral(T Val) : Value(Val) {}

Also, should these be explicit, or are you deliberately allowing implicit conversions?

221–223

The second sentence is hard to parse. Could you maybe break it up into the different bits, as separate sentences?

226

Nit: eg. -> e.g.

245

is -> is alive.

264

correspond -> corresponds

thopre abandoned this revision.Apr 7 2019, 4:58 PM

I definitely think that you need to fragment this up into small incremental patches, as I've found I'm not getting enough time to digest the whole thing at once (and I'm guessing the same goes for other reviewers). If you could try to find incremental ways of doing things that would be great. One example would be to add the command-line switches as non-functional switches first (make them hidden until you're ready for them to be published), then incrementally add features (and expand testing) to them. Similarly accepting some forms of the pattern etc could be done in one change (but do nothing with them) and then add more functionality over time.

Done in patch series starting from https://reviews.llvm.org/D60381. Thus closing this revision. I've just noticed there's a few of your comments (casing of function notably) I haven't incorporated in the new patch series. I'll address them now.