Page MenuHomePhabricator

FileCheck: Add support for numeric variables and expressions
Needs ReviewPublic

Authored by thopre on Jul 9 2018, 9:36 AM.

Details

Summary

Sometimes there's a need to verify output which contain several numeric
values that are related by a numeric expression. For example, one might
want to check that two numbers are consecutive but accept any value for
the smallest of the two.

This patch introduces support for numeric variables and expressions to
express this sort of constraint. A user can then define variables to
have a numeric value and write numeric expression using those variable.
A matching format specifier allows to match numbers in a variety of
format: signed and unsigned decimal numbers as well as hex number in
lower or capital case. For example, matching 2 consecutive signed
decimal integers separated by a space is done via:

[[%d,#N:]] #N+1

The syntax is modeled after that of use and definition of the already
supported variables flavour (renamed pattern variable for clarity) and
pseudo variable by adding a # just after the double opening square
bracket [[. Full details of the syntax can be found in the updated
docs/CommandGuide/FileCheck.rst.

One notable pitfall is that CHECK pattern with numeric expression using
variable defined on the same line might fail to match despite there be a
line which satisfy the conditions expressed by those numeric expression.
This is due to lack of needed support from the regular expression engine
that FileCheck relies on and is documented in the updated
docs/CommandGuide/FileCheck.rst as well as possible workaround.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Jul 10 2018, 3:28 AM
utils/FileCheck/FileCheck.cpp
428 ↗(On Diff #154626)

ie -> i.e.

435 ↗(On Diff #154626)

val -> Val

I'd actually avoid using both val and Value in the same scope, as it gets very confusing. Please come up with a different name for one of them. E.g. Value could become ValueString.

470–472 ↗(On Diff #154626)

op -> Op
offset -> Offset

490 ↗(On Diff #154626)

variable Name -> variable \p Name (I think?)

497 ↗(On Diff #154626)

index -> Index

500 ↗(On Diff #154626)

isExpression -> IsExpression

502–505 ↗(On Diff #154626)

same line due -> same line, due
Remove extra space before "Therefore"
explicitely -> explicitly

509 ↗(On Diff #154626)

exist -> exists

531–532 ↗(On Diff #154626)

same-line use are -> either "same-line uses are" or "same-line use is"

533 ↗(On Diff #154626)

git? Please use a more descriptive name.

Also, should be capitalised.

569–570 ↗(On Diff #154626)

I don't think you need the initialiser here.

575–581 ↗(On Diff #154626)

Are we referring to a specific variable here? If so, please use "the variable" not just "variable". If not, please use "a variable". If there are multiple variables the first sentences refer to, please use the plural to make it clear.
"First pass" -> "The first pass". Similarly "Second pass" -> "The second pass".

I'm a little confused by the first use of variable in this bit: "Note that non-expression variable use of variables...". Maybe it needs re-wording.

590–591 ↗(On Diff #154626)

is... -> Is...

604 ↗(On Diff #154626)

I'm not sure how to parse this sentence. I think you might mean: "Wait for the second pass to replace uses in the expression." But I'm not certain.

622 ↗(On Diff #154626)

During first pass -> During the first pass

625 ↗(On Diff #154626)

in turns -> in turn

626 ↗(On Diff #154626)

expression -> the expression.

655 ↗(On Diff #154626)

I'd get rid of "for each of the definition" in this comment, as I think it's implicit. If you think it's important get rid of the "of the".
Also "expression" -> "the expression" or "expressions"

679 ↗(On Diff #154626)

Use llvm_unreachable here.

grimar added a subscriber: grimar.Jul 11 2018, 1:28 AM

Few comments from me.

utils/FileCheck/FileCheck.cpp
348 ↗(On Diff #154626)

I would inline:

if (IsExpression)
  Name = Expr.substr(0, Expr.find_first_of(validArithOps));
354 ↗(On Diff #154626)

You do a lookup in the VariableDefs two times here, please avoid.
Also can you use vector's back() instead of constructions like --(VariableDefs[Name].end())?

373 ↗(On Diff #154626)

You are doing a lookup in VariableDefs 3 times here. Please avoid.

463 ↗(On Diff #154626)

You do not need else after return:

if (Expr.empty() && CheckOnly)
  return true;

 if (!Expr.empty()) {
   ...
507 ↗(On Diff #154626)

You are doing a search of validArithOps in Name second time here. I would try to avoid.

667 ↗(On Diff #154626)

In this block you have a StringMap LocalVariableTable and you perform 3 lookups on this map,
though can do that only once I think. For example like that:

auto P = LocalVariableTable.insert({VariableDef.first, {}});
if (P.second)
  P.first->second.push_back(
      {MatchInfo[VariableDefIt.first], VariableDefIt.second});
671 ↗(On Diff #154626)

Since there are only 2 cases here, that might be better looking as IFs:

if (State == Unprocessed)
  State = DefProcessed;
else if (State == DefProcessed)
  State = ExprProcessed;
else
  llvm_unreachable("...");
690 ↗(On Diff #154626)

I would suggest something without decrementing of the iterators:

for (const auto &VariableDef : VariableDefs) {
  const std::pair<unsigned, unsigned > &P = VariableDef.second.back();
  assert(P.first < MatchInfo.size() && "Internal paren error");
  GlobalVariableTable[VariableDef.first] = MatchInfo[P.first];
}

Also, btw this assert does not make sense here, because MatchInfo is the SmallVector
and it's operator[] already has an assert, so you can remove it I think.

Thanks all for the valuable comments. It seems clear that more discussion is needed on the syntax before doing any code change. I'll think about it and post a proposal to the ML.

Maybe this should be a new class of "numeric variables" with distinct syntax, e.g., [[#VAR]] or something, with an implicit regex of [[:digit:]]+ instead of making the test-writer use a valid regex.

I'm not sure I'm following you. Do you mean using a different syntax when (i) defining numeric variable or (ii) when using numeric expression?
I also do not understand how this would prevent the test-writer from having to use a valid regex. Could you give an example of the problem you see?

Maybe this should be a new class of "numeric variables" with distinct syntax, e.g., [[#VAR]] or something, with an implicit regex of [[:digit:]]+ instead of making the test-writer use a valid regex.

I'm not sure I'm following you. Do you mean using a different syntax when (i) defining numeric variable or (ii) when using numeric expression?
I also do not understand how this would prevent the test-writer from having to use a valid regex. Could you give an example of the problem you see?

Oh you mean using #VAR to define a numeric variable, therefore avoiding the need to have a regular expression. Yes, I like the idea, especially since it can work with the idea of a syntax to specify the base in which to interpret things as suggested by @jhenderson. What it needs though is a way to distinguish between definition and use though. I'll think about it.

I case people are interested I have also implemented something similar in our fork of llvm. I chose [[@EXPR var + 0x42]] as the syntax. I also added some functions such as [[@EXPR hex(VAR)]].
Variables can either by referenced by name or if they conflict with one of the builtin functions using ${x} syntax: [[@EXPR toupper(hex(${hex}))]]
Some examples of what it can do can be seen here: https://github.com/CTSRD-CHERI/llvm/blob/master/test/FileCheck/expressions.txt

The implementation is IMO quite ugly (mostly a stripped down copy of lld's linker script parser) so I never got around to cleaning it up for upstream submission.

I case people are interested I have also implemented something similar in our fork of llvm. I chose [[@EXPR var + 0x42]] as the syntax. I also added some functions such as [[@EXPR hex(VAR)]].
Variables can either by referenced by name or if they conflict with one of the builtin functions using ${x} syntax: [[@EXPR toupper(hex(${hex}))]]

Simple functions are also on my wish-list! The main ones we care about are converting to and from hex numbers, and being able to do additions and subtractions. But I think just the basic integer and hex reading and additions would be a good start.

I case people are interested I have also implemented something similar in our fork of llvm. I chose [[@EXPR var + 0x42]] as the syntax. I also added some functions such as [[@EXPR hex(VAR)]].
Variables can either by referenced by name or if they conflict with one of the builtin functions using ${x} syntax: [[@EXPR toupper(hex(${hex}))]]

Simple functions are also on my wish-list! The main ones we care about are converting to and from hex numbers, and being able to do additions and subtractions. But I think just the basic integer and hex reading and additions would be a good start.

I added hex, oct, bin, dec + toupper/tolower (https://github.com/CTSRD-CHERI/llvm/blob/master/utils/FileCheck/FileCheck.cpp#L697) but I think I only ever ended up using dec/hex/toupper in the tests that I've written since.

tra removed a reviewer: tra.Jul 20 2018, 3:20 PM
tra added a subscriber: tra.
rnk added a subscriber: rnk.Aug 16 2018, 2:30 PM

FWIW I had a use case for this recently, and found it hadn't landed. Is there anything blocking that you need help with?

In D49084#1203189, @rnk wrote:

FWIW I had a use case for this recently, and found it hadn't landed. Is there anything blocking that you need help with?

My apologies for not replying earlier, didn't see the comment until now. Nothing blocking, splitting my time between GNU and LLVM work so progress has been slower than expected. I am nearly done now, I expect to publish a completely new version sometimes this week. Stay tuned!

Best regards.

thopre updated this revision to Diff 179094.Thu, Dec 20, 9:50 AM

FileCheck: Add support for numeric variables and expressions

Summary:
Sometimes there's a need to verify output which contain several numeric
values that are related by a numeric expression. For example, one might
want to check that two numbers are consecutive but accept any value for
the smallest of the two.

This patch introduces support for numeric variables and expressions to
express this sort of constraint. A user can then define variables to
have a numeric value and write numeric expression using those variable.
A matching format specifier allows to match numbers in a variety of
format: signed and unsigned decimal numbers as well as hex number in
lower or capital case. For example, matching 2 consecutive signed
decimal integers separated by a space is done via:

[[%d,#N:]] #N+1

The syntax is modeled after that of use and definition of the already
supported variables flavour (renamed pattern variable for clarity) and
pseudo variable by adding a # just after the double opening square
bracket [[. Full details of the syntax can be found in the updated
docs/CommandGuide/FileCheck.rst.

One notable pitfall is that CHECK pattern with numeric expression using
variable defined on the same line might fail to match despite there be a
line which satisfy the conditions expressed by those numeric expression.
This is due to lack of needed support from the regular expression engine
that FileCheck relies on and is documented in the updated
docs/CommandGuide/FileCheck.rst as well as possible workaround.

Reviewers: probinson, arichardson, jhenderson, rnk, jdenny

Subscribers: llvm-commits

thopre retitled this revision from FileCheck: Add support for variable expressions to FileCheck: Add support for numeric variables and expressions.Thu, Dec 20, 9:54 AM
thopre edited the summary of this revision. (Show Details)

I am willing to review this; however it is a very large patch, which will take time to go through. Also, after tomorrow, I am on holiday until 3 January.

Some parts of it look like they could be split off into separate patches; for example, the command-line option handling looks like it could be done separately. I wonder if the refactoring of the @LINE handling into the new expression handling could also be separated out.
If not, that's okay, it will just make the review take a bit longer.

As a first step, I have a structural suggestion for the documentation. Once that is done, I will probably have some more detailed comments on wording, but they can wait for now.

docs/CommandGuide/FileCheck.rst
577

The topic and syntax are more complicated than anything else FileCheck does, so I think it would be better separate the syntax description into two parts: using numeric expressions, and defining numeric variables. The syntax for a numeric expression would be
[[# <constraint> <expr> ]]
and then you can describe how to define the variables,
[[# <fmtspec>,<NUMVAR>: <constraint> <expr> ]]
which builds on the syntax for using variables. I think this ordering will be easier to explain and to understand.

603

Please give examples of what these lines will match, and what they will not match.

thopre updated this revision to Diff 179194.Thu, Dec 20, 4:08 PM

Document how to set numeric variable with -D option

Thanks for updating this patch. It will be very useful!

include/llvm/Support/FileCheck.h
225

Does this need to be a shared_ptr? Could it be a unique_ptr instead?

lib/Support/FileCheck.cpp
81

Could this return a llvm::Optional<std::string> to avoid the output-parameter?

1076

This could be deleted if you initialize CaptureParenNeeded to false at the beginning of the scope. It would also ensure that the variable is never uninitialized.

Thanks for updating this patch. It will be very useful!

+1

thopre marked 5 inline comments as done.Fri, Dec 21, 2:08 AM
thopre added inline comments.
docs/CommandGuide/FileCheck.rst
626–630

It changes for 2 reasons:

  • it redefines @LINE as a (pseudo) numeric variable as opposed to a pattern variable since it behaves like a numeric variable
  • it removes the explanation about offset since it comes naturally from being a numeric variable and then more (eg. #@LINE + OFFSETVAR).

In short, because we now have numeric variable it is easier to say @LINE is one and say what it evaluates to and leave what can be done to it to the general description of numeric variable.

include/llvm/Support/FileCheck.h
225

Several numeric expressions can refer to the same numeric variable in which case you want the reference counting so that when calling the destructor for those numeric expression delete is called only once on that variable.

lib/Support/FileCheck.cpp
81

Yes indeed and perhaps in a few other places as well. Thanks

This change is so large that it's going to take quite a while to review. However, I've made a few initial comments on the header. I'll come back to this next week and continue from there.

include/llvm/Support/FileCheck.h
41

Nit: Code -> code

44

Why the declaration here?

52

Nit: print -> printed

57

I don't really understand this comment. I think you are saying that this "Set" variable indicates whether the format is explicit or not, right? If so, rename the variable to something like IsExplicit or similar.

60

Several -> If set, there are ...

81

Don't namespace this. Just make the enum inside a scoped enum (i.e. enum class).

82–84

I think the general style in LLVM is single space after full stop (applies here and elsewhere).

"is a numeric expression (and thus value ...) available" -> "a numeric expression is available (and thus the value ...)"

95–96

I'm not sure the second sentence here is necessary.

97

I'm not sure you need to name all your classes "FileCheckXXX". You can shorten their names by dropping the FileCheck part, I reckon.

99–103

Why not just always store in a single uint64_t field called Value and convert to signed when requested?

Also, don't unnecessarily abbreviate variables like this.

131–132

I don't think you need this comment. operator== is fairly self-explanatory as to its purpose...!

139

LLVM style is for camel-case, starting with lower-case letters for method and function names. Also, don't bother commenting if the function purpose is obvious from its name.

145

What is a tentative value?

169

You don't need the "struct" here or in any of the other function arguments or variable declarations etc.

169–170

This should probably return an Expected<std::string> or Optional<std::string> and not use an output parameter.

172–180

Could these two just be overloads of operator+ and operator-?

183–184

Only the first sentence here is needed. The other two are implicit by the fact that this is a virtual class with abstract methods.

191–192

What does "null" mean in this context? We're not dealing with pointers... Assuming it's some kind of no-format, perhaps calling it by the corresponding variable name might be better.

Again, the last sentence is unnecessary, since it's implicit in the "= 0" part of the function declaration.

195–198

This isn't the behaviour I'd expect a function of this name to do. I'd expect it to return a vector of names, not append to an input parameter.

Other nit in the comment: overrided -> overridden.

201

Nit: litteral -> literal

209–213

Similar to the FileCheckNumExprVal class, I don't think you need both constructors. One should suffice.

215–217

I'd drop the first sentence and rephrase the second:
"Return the node itself and set \p Fmt to none, since literals do not have conversion formats" (where none is a placeholder for a better word than null).

227–228

from x 2 -> by

232

It's a little unclear what is meant by "matching format" here. Is it the format this expression can match against (i.e. hex/decimal etc), or what?

236

Like struct, you don't nee the enum here.

246

for numeric -> for a numeric

247–248

eg. -> e.g.

I'd drop the bit in parentheses at the end of the sentence. If somebody starts using this for another person, this comment will almost certainly go stale.

252–253

Drop the reference to the shared_ptr in this sentence, and change the second sentence to something like "The pointer is guaranteed to be valid as long as this object is."

256

What is meant by "matched" in this context?

268

Remove the first "can"

274

Rather than returning a bool, and passing in a parameter that will be set, this and similar functions should probably return an Expected<T> or Optional<T>.

294–299

Why are these two different functions?