This is an archive of the discontinued LLVM Phabricator instance.

FileCheck [7/12]: Arbitrary long numeric expressions
ClosedPublic

Authored by thopre on Apr 7 2019, 4:21 PM.

Details

Summary

This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch extend numeric expression to
support an arbitrary number of operands, either variable or literals.

Copyright:

  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thopre updated this revision to Diff 198547.May 7 2019, 3:28 PM

Rebase on top of latest changes

thopre updated this revision to Diff 198645.May 8 2019, 6:52 AM

Rebase on top of latest changes

thopre updated this revision to Diff 198948.May 9 2019, 4:51 PM

Rebase on top of latest changes

thopre updated this revision to Diff 199138.May 11 2019, 7:14 AM

Rebase on top of latest changes

thopre updated this revision to Diff 199236.May 13 2019, 3:58 AM

Rebase on top of latest changes

thopre updated this revision to Diff 199393.May 14 2019, 3:44 AM

Rebase on top of latest changes

rnk removed a reviewer: rnk.May 14 2019, 11:29 AM
rnk removed a subscriber: rnk.
thopre updated this revision to Diff 200473.May 21 2019, 5:55 AM

rebase on top of latest changes

thopre updated this revision to Diff 200550.May 21 2019, 11:15 AM

Get rid of the useless FileCheckNumExpr class which is only needed when matching format are introduced

thopre updated this revision to Diff 201982.May 29 2019, 10:52 AM

Rebase on top of latest changes in patch series

thopre updated this revision to Diff 203209.Jun 5 2019, 11:21 AM

Rebase on top of latest changes in the series

arichardson added inline comments.Jun 11 2019, 4:24 AM
llvm/include/llvm/Support/FileCheck.h
105 ↗(On Diff #203209)

Maybe this should use Expected instead of Optional to report errors?

llvm/lib/Support/FileCheck.cpp
738 ↗(On Diff #203209)

Could this be replaced with Error/Expected so that there is no more need for getUndefVarNames/appendUndefVarNames calls?

I've reviewed up to the tests, and got lots of comments for you.

llvm/docs/CommandGuide/FileCheck.rst
594–595 ↗(On Diff #203209)

numeric variables previously defined -> previously defined numeric variables

llvm/include/llvm/Support/FileCheck.h
43 ↗(On Diff #203209)

Are there other kinds of expressions except "numeric expressions"? Just wondering if we can simplify the terminology a little e.g. by simply referring to them as expressions and the FileCheckNumExprAST can become FileCheckExpressionAST.

66–67 ↗(On Diff #203209)

Are there plans for signed literals later? If not, then I'd get rid of "unsigned" from the comment.

69–70 ↗(On Diff #203209)

You could probably simplify this down to \returns the literal's value.

103–104 ↗(On Diff #203209)

Similar comment to above. No need to explain that its evaluating the expression for a single variable. Just keep the "returns this variable's value" part.

130 ↗(On Diff #203209)

Do these need to be shared_ptrs? Do multiple places need ownership, or is it sufficient that only place has it? Same question applies for all uses of shared_ptr further down.

152 ↗(On Diff #203209)

Could this use a MutableArrayRef?

193–194 ↗(On Diff #203209)

How about "the name of the undefined variables used in this substitution"?

Also use "Appends" instead of "Records" to show that it doesn't override the old ones.

Same comments apply to the other examples of getUndefVarNames.

209–210 ↗(On Diff #203209)

What if some string variables are defined and others aren't? The comment as it stands suggests that all are stored. If that's not intended, I'd rephrase it as "the name of the undefined string variables used in this substitution"

441 ↗(On Diff #203209)

What's a "legacy numeric substitution block"?

446 ↗(On Diff #203209)

defined -> define

509 ↗(On Diff #203209)

I don't like the terminology "Legacy", as it doesn't convey any useful meaning, unless defined somewhere clearly.

516 ↗(On Diff #203209)

I don't think you need enum here. Same applies throughout the code.

llvm/lib/Support/FileCheck.cpp
81 ↗(On Diff #203209)

Are you sure you want to clear this? That seems unnecessary, and confusing if you are passing in a reference. If you want an empty vector, I'd just expect the vector to be returned.

208 ↗(On Diff #203209)

Get rid of this blank line.

222 ↗(On Diff #203209)

Magic number?

thopre updated this revision to Diff 204497.Jun 13 2019, 5:15 AM
thopre marked 19 inline comments as done.
  • Rebase on latest changes in patch series
  • Address obvious changes
thopre added inline comments.Jun 13 2019, 5:16 AM
llvm/include/llvm/Support/FileCheck.h
43 ↗(On Diff #203209)

There's only numeric expression indeed but wouldn't the use of "expression" only induce the reader into thinking it only applies when there is an operator involved? Numeric expression as it stands also cover a single numeric variable in a numeric substitution. Also if changing to expression only, should I remove the "numeric" adjective in the documentation as well?

66–67 ↗(On Diff #203209)

There are but the comment can be changed when they are introduced. Fixed.

130 ↗(On Diff #203209)

Some AST nodes are referenced from a single place only (eg. literals and binary operations) while numeric variables can be shared among several numeric substitutions, eg. "add r[[#REG]], r[[#REG+1]], 42" would have 2 pointers to the class instance for REG. In the first case std::unique_ptr could be used, while in the latter case std::shared_ptr would make sense.

The problem comes from AST nodes being referred via FileCheckAST * pointers (since each operand can be either of the possible nodes) and thus I need to decide on a given pointer type for all nodes. I could go with regular pointers and keep an array of pointer in FileCheckPatternContext to all nodes created for them to be freed all at once when matching is done. Wouldn't change much over what is happening now since substitutions are only deleted once all pattern have been matched and substitutions points to the AST indirectly. Does that sound good?

441 ↗(On Diff #203209)

It refers to the "legacy use of @LINE" mentioned in the section "FileCheck Pseudo Numeric Variables" of the documentation, namely [[@LINE]] and [[@LINE+offset]] instead of the same with a '#' sign. Would "indicates whether Expr is a legacy use of @LINE" be clearer? Or perhaps "legacy @LINE expression"? If not, any other suggestion?

509 ↗(On Diff #203209)

See above.

Regarding what to call the @LINE+offset form, I think it's fine to call it "legacy" in the user-facing documentation, but it gets to be a bit much in the internals. I haven't commented every use but you will get the idea in the inline comments.

llvm/docs/CommandGuide/FileCheck.rst
593 ↗(On Diff #204497)

<expr>> should be <expr>

597 ↗(On Diff #204497)

There are a few different ways to describe expressions, I happen to prefer a recursive definition, something like this:

A numeric expression is recursively defined as:

  • a numeric operand, or
  • a numeric expression followed by an operator and a numeric operand.

A numeric operand is a previously defined numeric variable, or an integer literal. The supported operators are + and -. Spaces are accepted before, after and between any of these elements.

llvm/include/llvm/Support/FileCheck.h
538 ↗(On Diff #204497)

These could be { LineVar, Literal, Any } perhaps.

540 ↗(On Diff #204497)

I don't see a parameter named AllowedOperandFlag.

548 ↗(On Diff #204497)

\p LineExpr ... parsing a @LINE expression.

43 ↗(On Diff #203209)

I think "expression" does not necessarily imply an operator, at least not to a programmer.
You can probably also remove some uses of "numeric" in the documentation, when you are talking about expressions (but keep it when talking about variables).

441 ↗(On Diff #203209)

I don't know about James but I would prefer that the commentary say "legacy @LINE expression" and the parameter could be named LineExpr.

llvm/lib/Support/FileCheck.cpp
189 ↗(On Diff #204497)

The previous if guarantees AO is LegacyVar or Any here. So, this condition simplifies to AO == LegacyVar correct? (or LineVar if you rename as suggested.)

191 ↗(On Diff #204497)

Maybe a comment here, "Ignore the error and retry parsing as a literal."

194 ↗(On Diff #204497)

I think the only possibilities at this point are LegacyLiteral and Any, so this if is always true and should be removed.

244 ↗(On Diff #204497)

... in a @LINE expression ...

427 ↗(On Diff #204497)

IsLineExpr

719 ↗(On Diff #204497)

Delete the commented-out assertion.

727 ↗(On Diff #204497)

llvm_unreachable instead of assert(false)?

thopre updated this revision to Diff 205459.Jun 18 2019, 3:53 PM
thopre marked 19 inline comments as done.

Address all comments but shared_ptr

thopre added inline comments.Jun 18 2019, 3:56 PM
llvm/include/llvm/Support/FileCheck.h
441 ↗(On Diff #203209)

I went for LegacyLineExpr then since when false @LINE is also allowed and when true it limits what can be done with @LINE.

llvm/lib/Support/FileCheck.cpp
194 ↗(On Diff #204497)

I've addressed this comment and the other one about AO above but I wanted to say that although I realize that LLVM code style aims at reducing clutter to make the code more readable, I feel in many case (eg. here and some of the suppression of else after return) it makes the code less explicit and more fragile.

Apologies for the delay in coming back to this. It's been a busy few days.

llvm/include/llvm/Support/FileCheck.h
67–68 ↗(On Diff #205484)

I think "Class to represent an undefined variable error, which quotes that variable's name when printed." would be a bit more concise.

94 ↗(On Diff #205484)

definition -> definitions

121 ↗(On Diff #205484)

If you made this eval() return an error if the variable is undefined, it would nicely separate concerns of use versus definition (see my comment a bit further down about shared_ptrs).

158 ↗(On Diff #205484)

This should say "returns an error if a subexpression returns an error, or ..."

480 ↗(On Diff #205484)

substitued -> substituted

549 ↗(On Diff #205484)

Perhaps worth making this an enum class.

553 ↗(On Diff #205484)

"against \p SM"? I'm not sure I understand this change in terminology.

130 ↗(On Diff #203209)

I wonder whether it might make more sense to separate the concept of variables from the AST. The AST could have a variable reference class in it, which refers by raw pointer to a variable definition. An AST expression then solely "owns" each subexpression, whilst something higher up (e.g. the Context - not sure about this), owns the variables and contains the definitions. That way each item in the tree is only owned by its parents, which feels more logical.

Does that make sense?

As things stand the fact that a single FileCheckNumericVariable can be owned by multiple places implies it's not really a tree at all. (In graph theory a tree has no cross-references - everything is owned by a single parent only).

441 ↗(On Diff #203209)

How about IsLegacyLineExpr? It's a bit more verbose but maybe a little clearer too. Otherwise LegacyLineExpr is okay. It's just that the variable isn't actually an expression itself.

llvm/lib/Support/FileCheck.cpp
198 ↗(On Diff #205484)

I'm guessing there is a plan to allow other Radixes too? It seems like it would be normal to want to use hex, for example.

243 ↗(On Diff #205484)

in a legacy

270 ↗(On Diff #205484)

Err would be a more common variable name.

jhenderson added inline comments.Jun 19 2019, 7:45 AM
llvm/lib/Support/FileCheck.cpp
290 ↗(On Diff #205484)

in a legacy

715 ↗(On Diff #205484)

Trailing full stop required.

725–727 ↗(On Diff #205484)

I think this is actually unnecessary. handleAllErrors terminates the program with an llvm_unreachable anyway if there are any unhandled errors.

thopre updated this revision to Diff 205711.Jun 19 2019, 4:49 PM

Make new function parameter immediate use the /*ParamName=*/Value code style

thopre marked an inline comment as done.Jun 19 2019, 4:52 PM

I think I am happy with this, leaving the rest up to James.

llvm/include/llvm/Support/FileCheck.h
158 ↗(On Diff #205711)

I prefer the description of the return value to give the expected/normal case first, and the error case afterward. So,

\returns the expression value, or an error ...
thopre updated this revision to Diff 208099.Jul 4 2019, 4:53 PM
thopre marked 17 inline comments as done.
  • Rebase on top of refactoring patches
  • Address comments
llvm/include/llvm/Support/FileCheck.h
553 ↗(On Diff #205484)

I've used it in other place already. I'm using "against" because the diagnostic relates to the buffer held by SM (it is created via SM.GetMessage)

llvm/lib/Support/FileCheck.cpp
198 ↗(On Diff #205484)

/There is yes. See patch 9/12

thopre updated this revision to Diff 208151.Jul 5 2019, 4:58 AM

Rebase on latest changes in patch series

thopre updated this revision to Diff 208184.Jul 5 2019, 7:35 AM

Rebase on top of latest changes in patch series

Mostly looks good. A few nits still and one or two more significant comments. I think this structure looks much cleaner than the previous version, personally.

llvm/include/llvm/Support/FileCheck.h
54 ↗(On Diff #208184)

As things stand, this only can represent an unsigned literal. I think you'll want to support negative literals or unary negation operands fairly quickly. Otherwise, the expression "-1" doesn't work.

I'm also slightly concerned in the current form that somebody could try to use FileCheckExpressionLiteral for negative numbers, and get surprising results. Perhaps worth updating the comment to say that the class represents an unsigned integer literal.

60 ↗(On Diff #208184)

Perhaps "Construct a literal with the specified value."

llvm/lib/Support/FileCheck.cpp
49 ↗(On Diff #208184)

Might be worth making this comment more generic e.g. "Handle a failed operation, e.g. due to an undefined variable." That'll make the comment robust to other possible future failure types.

258 ↗(On Diff #208184)

is always a literal

The comment should probably start with "The second..."

301 ↗(On Diff #208184)

The comment should probably start with "The first..."

722 ↗(On Diff #208184)

You've changed this message implying it can have more than one undefined variable, but I don't see any lit test case change that has more than one. You should add/update one.

llvm/unittests/Support/FileCheckTest.cpp
17 ↗(On Diff #208184)

I think it would be a good idea to have a test case showing that this can take std::numeric_limits<uint64_t>::max().

25 ↗(On Diff #208184)

const std::set &?

30 ↗(On Diff #208184)

Perhaps worth having this print what names are still remaining if not empty. I think you can do something like EXPECT_TRUE(ExpectedUndefVarNames.empty()) << ExpectedUndefVarNames.

50 ↗(On Diff #208184)

getValue not getvalue.

397 ↗(On Diff #208184)

In other places you've used bool(SubstValue) for this sort of conversion. I think that might be nicer, but don't care that much. Regardless, it should be consistent throughout.

479 ↗(On Diff #208184)

unique_ptr?

thopre updated this revision to Diff 208975.Jul 10 2019, 9:00 AM
thopre marked 15 inline comments as done.

Address review comments.

thopre added inline comments.Jul 10 2019, 9:00 AM
llvm/include/llvm/Support/FileCheck.h
553 ↗(On Diff #205484)

Is my explanation satisfying or do you want to use another phrasing?

llvm/lib/Support/FileCheck.cpp
722 ↗(On Diff #208184)

There was a unit test already but it doesn't test that message indeed. I've added 1 lit test and updated another one, the former for correct substitution and the latter for undefined variables.

llvm/unittests/Support/FileCheckTest.cpp
25 ↗(On Diff #208184)

This cannot be const since erase is called on that set. Making it a reference means the set needs to be created in all the caller when now the caller can provide an initializer as is done in expectUndefError below.

thopre updated this revision to Diff 208978.Jul 10 2019, 9:02 AM

Run clang-format-diff

jhenderson added inline comments.Jul 10 2019, 9:34 AM
llvm/include/llvm/Support/FileCheck.h
553 ↗(On Diff #205484)

Yes, it's fine. Not the phrasing I would use but I don't care much about it.

60 ↗(On Diff #208978)

litteral -> literal

llvm/lib/Support/FileCheck.cpp
49 ↗(On Diff #208978)

eg. -> e.g.
either "an undefined variable" or "undefined variables"

llvm/unittests/Support/FileCheckTest.cpp
19 ↗(On Diff #208978)

Why not FileCheckExpressionLiteral Ten(10)?

25 ↗(On Diff #208978)

Why not FileCheckExpressionLiteral Max(...)?

33 ↗(On Diff #208978)

Other instances of overloading operator<< in the LLVM unit tests tend to just be in the llvm namespace, not in std. See for example ADT/StringRefTest.cpp.

The variables in this function don't use LLVM naming style.

35 ↗(On Diff #208978)

StringRef?

47 ↗(On Diff #208978)

Use std::unordered_set?

grimar added inline comments.Jul 10 2019, 10:03 AM
llvm/lib/Support/FileCheck.cpp
427 ↗(On Diff #204497)

I'd suggest adding a comment.

47 ↗(On Diff #208978)

Doesn't look like you need this->.

199 ↗(On Diff #208978)

btw it was a bit confusing for me to find there is a method that take bool IsPseudo
and method that take bool &IsPseudo, i.e. that one of them can change this flag, but another - not.

201 ↗(On Diff #208978)

You can avoid having this temorarily variable:

if (ParseVarResult)
  return parseNumericVariableUse(*ParseVarResult, IsPseudo, SM);

But actually it seems you also need to check the result of Expected,
not sure why it is assumed there is no error?

llvm/unittests/Support/FileCheckTest.cpp
34 ↗(On Diff #208978)

Seems a different variables naming style used here? Ten, Value vs first, set. You need stick to one I think.
(To upper case I guess? LLVM did not do a switch to a new one yet, except LLD project I believe).

grimar added inline comments.Jul 10 2019, 10:05 AM
llvm/lib/Support/FileCheck.cpp
201 ↗(On Diff #208978)

Oh, sorry, I misread the code, you actually check it. Then you still can avoid having the temp var :)

thopre marked 16 inline comments as done.Jul 11 2019, 8:53 AM
thopre added inline comments.
llvm/lib/Support/FileCheck.cpp
199 ↗(On Diff #208978)

You mean parseVariable Vs parseNumericVariableUse? Would you rather me have both be reference even though the latter only needs to get the value? Or is there a naming convention to denote a reference as opposed to a parameter passed by value?

llvm/unittests/Support/FileCheckTest.cpp
33 ↗(On Diff #208978)

I'm not sure why but when I try with llvm namespace and I switch to unordered_set as suggested below the overloaded operator does not get used and the code fails to compile. However the code compiles and works with llvm namespace if I use a set instead of unordered_set.

I've thus decided to just write a setToString method and output the result of calling it. It does create lots of temporaries though.

thopre updated this revision to Diff 209237.Jul 11 2019, 8:54 AM
thopre marked an inline comment as done.

Address review comments

jhenderson added inline comments.Jul 12 2019, 1:20 AM
llvm/unittests/Support/FileCheckTest.cpp
32 ↗(On Diff #209237)

static?

Though saying that, this is inside an anonymous namespace, so perhaps the other functions need static removing.

This function returns a StringRef, yet the base type is a local variable. I think that will result in the memory being freed under the reference. Perhaps better would be to make Str a Twine and then use Twine::str() to return a std::string.

38 ↗(On Diff #209237)

Perhaps make this ", ", and start and end the full string with '{'/'}' to make it easier to read.

grimar added inline comments.Jul 12 2019, 2:15 AM
llvm/include/llvm/Support/FileCheck.h
580 ↗(On Diff #209237)

This comment still mentions Expr and SM.

llvm/lib/Support/FileCheck.cpp
199 ↗(On Diff #208978)

There is no special naming conventions, but given that parseVariable method always must set IsPseudo flag,
but another parse* methods can only read the flags they are given, I wonder if it be better/cleaner for parseVariable to return Expected<std::pair<StringRef, bool> for example?

thopre updated this revision to Diff 209456.Jul 12 2019, 3:59 AM
thopre marked 7 inline comments as done.

Address review comments

llvm/lib/Support/FileCheck.cpp
199 ↗(On Diff #208978)

I'm not a big fan of pairs because its members are accessed without a name (first or second) which I find it less readable. Anyway, done but I've reverted the change to make the call to parseNumericVariableUse a one liner so that it is clear what is being passed to that function (ParseVarResult->first is not very descriptive).

llvm/unittests/Support/FileCheckTest.cpp
32 ↗(On Diff #209237)

Ah yes indeed, the StringRef does not own the string it references. I cannot use a Twine for Str because you cannot store a Twine so while I can do Str + S, I cannot store the result in Str (see https://llvm.org/doxygen/classllvm_1_1Twine.html#a28b850ffd65d5997cf730d644e06d89b I've been bitten by this before in this patchset). I guess I'll return a std::string then. I've used a Twine to do the " " concatenation though.

jhenderson added inline comments.Jul 12 2019, 5:19 AM
llvm/lib/Support/FileCheck.cpp
199 ↗(On Diff #208978)

You could use a small struct to name the members, perhaps called VariableProperties:

struct VariableProperties {
  StringRef Name;
  bool IsPseudo;
}
thopre updated this revision to Diff 209481.Jul 12 2019, 7:37 AM
thopre marked an inline comment as done.

Return a VariableProperties structure in parseVariable

jhenderson accepted this revision.Jul 12 2019, 7:47 AM

LGTM. Thanks for the hard work on this!

llvm/lib/Support/FileCheck.cpp
108 ↗(On Diff #209481)

You might be able to just do return {Name, IsPseudo} to avoid the temporary.

This revision is now accepted and ready to land.Jul 12 2019, 7:47 AM
thopre marked 2 inline comments as done.Jul 12 2019, 8:04 AM
thopre added inline comments.
llvm/lib/Support/FileCheck.cpp
108 ↗(On Diff #209481)

I tried but it didn't work. I think it would if I was returning the structure, but here I'm returning an Expected.

As per 6.6.3 C++11 draft:
"A return statement with a braced-init-list initializes the object or reference to be returned from the function by copy-list-initialization (8.5.4) from the specified initializer list."

So it tries to build an Expected with that initialization list which fails because I think an Expected<foo> only accepts a Foo.

thopre marked 2 inline comments as done.Jul 12 2019, 8:08 AM
thopre added inline comments.
llvm/lib/Support/FileCheck.cpp
108 ↗(On Diff #209481)

Yes, that's it:

/home/thomasp/repos/llvm-project/llvm/lib/Support/FileCheck.cpp:107:10: error: no matching constructor for initialization of 'Expected<FileCheckPattern::VariableProperties>'

return {Name, IsPseudo};
       ^~~~~~~~~~~~~~~~

/home/thomasp/repos/llvm-project/llvm/include/llvm/Support/Error.h:474:3: note: candidate template ignored: requirement 'std::is_convertible<StringRef &, VariableProperties>::value' was not satisfied [with OtherT = llvm::StringRef &]

Expected(OtherT &&Val,
^

It tries to convert Name into a VariableProperties and fails. All the other constructors expect an Error or another Expected and fail as well.

Thanks for your work on this! Looking forward to be able to use these new features the next time I update our fork.

llvm/lib/Support/FileCheck.cpp
108 ↗(On Diff #209481)

Maybe return VariableProperties{Name, IsPseudo}would work?

jhenderson added inline comments.Jul 12 2019, 8:25 AM
llvm/lib/Support/FileCheck.cpp
108 ↗(On Diff #209481)

Yes of course. I think you could still do it inline if you want, (not tried, so not 100% certain off the top of my head) and don't mind either way:

return VariableProperties {Name, IsPseudo};
thopre updated this revision to Diff 209680.Jul 13 2019, 1:11 AM

Return structure in parseVariable as a temporary

thopre marked 2 inline comments as done.Jul 13 2019, 1:12 AM
This revision was automatically updated to reflect the committed changes.