Page MenuHomePhabricator

[FileCheck] Add function call support to numerical expressions.
ClosedPublic

Authored by paulwalker-arm on May 14 2020, 3:48 AM.

Details

Summary
This patch extends numerical expressions to allow calls to
predefined functions. These calls can be combined with the
existing numerical operators, which includes nesting calls.

The call syntax is:

  <func>(<args>)

Where <func> is a predefined string literal, currently limited to
one of add, max, min and sub. <arg> is a comma seperated list of
numerical expressions.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thopre added inline comments.May 26 2020, 9:01 AM
llvm/docs/CommandGuide/FileCheck.rst
720–721

I think the exclamation mark should be reserved for function call. I find it a bit confusing otherwise but let's wait to see what other reviewers might think.

Do the tests you need this feature for require some way to force precedence or can this be dealt later?

arichardson added inline comments.May 26 2020, 9:12 AM
llvm/docs/CommandGuide/FileCheck.rst
720–721

Apologies for the delay, I've now rebased the parentheses revision (D77383).

paulwalker-arm marked 2 inline comments as done.May 26 2020, 9:27 AM
paulwalker-arm added inline comments.
llvm/docs/CommandGuide/FileCheck.rst
688

So something like:

A numeric operand is a previously defined numeric variable, an integer literal or the result of a function call.
720–721

I currently have no use for the precedence operator, the function calls are why I started down this path, so am happy either way.

Part of me just assumed that in the future the parser might be reworked to remove the need for the ! prefix, but I suppose providing the power of precedence early might prevent that work from happening :)

paulwalker-arm marked 2 inline comments as not done.May 26 2020, 9:37 AM
thopre added inline comments.May 26 2020, 9:51 AM
llvm/docs/CommandGuide/FileCheck.rst
688

I'm nitpicking but technically the operand is the function call itself, but evaluates to the return value of the function call.

720–721

I wouldn't hold my breath for a parser rewrite. While I'd love to make the code nicer and more flexible I have little free time to work on it myself. Anyway, if you don't require operator precedence just error on empty function name for now and we can extend it to be used for operator precedence later if needed.

rebase and post code review fixes

paulwalker-arm marked 3 inline comments as done.May 27 2020, 5:40 AM
paulwalker-arm added inline comments.
llvm/docs/CommandGuide/FileCheck.rst
720–721

Done. Perhaps it's a good idea to mandate that all symbolic operators have a named function counterpart. That way in the short term if somebody does want to force precedence they just need to write a slightly more verbose check line.

paulwalker-arm marked 2 inline comments as done.May 27 2020, 5:41 AM
thopre added inline comments.May 27 2020, 6:36 AM
llvm/docs/CommandGuide/FileCheck.rst
720–721

No objection to that, should be a 2 line changes, right?

paulwalker-arm added inline comments.May 27 2020, 7:05 AM
llvm/docs/CommandGuide/FileCheck.rst
720–721

Yep. I'll add entries for add and sub to this patch and resubmit later today.

Added functions for add and sub.

arichardson accepted this revision.May 28 2020, 5:38 AM

LGTM if @thopre and @jhenderson are happy with this change too.

llvm/unittests/Support/FileCheckTest.cpp
1076

Might make sense to add case with missing operators such as 2!mul(FOO,2) or FOO !mul(FOO,2) or !mul(FOO(!mul(3,2)))

This revision is now accepted and ready to land.May 28 2020, 5:38 AM

Some of my testing suggestions might better belong in the unit tests. Also, you're probably going to need to rebase and expand the behaviour somewhat now that signed values support has landed in D60390.

llvm/lib/Support/FileCheck.cpp
461

I'm not strongly opposed to the use of ! to indicate a function call, but is it actually necessary? It seems like a function call could just be identified by <sequence of identifier chars>(<some chars>). The code would look something like the following semi-pseudo code:

size_t Parenthesis = Expr.find_first_of('(');
if (Parenthesis != npos) {
  if (all_of(Expr.take_front(Parenthesis), [](char C) { 
      return isValidIdentifierChar(C);
    }) {
    if (AO != AllowedOperand::Any)
      return ErrorDiagnostic::get(SM, Expr, "unexpected function call");

    return parseCallExpr(Expr, LineNumber, Context, SM);
  }
}

Assuming I've not missed something, that would allow us to simplify all the usages of function calls.

595–596

I believe you could change this case to an asseertion - the parseExpr function treats a leading '(' as a different kind of expression.

667

In conjunction with my suggestion above about not having a function specifier, you could change this code to bail out without error in some cases, perhaps by starting with looking for the % followed by some specific characters, followed by a ,.

llvm/lib/Support/FileCheckImpl.h
728–734

I think you need to delete "both" here, since there are now three different things it accepts.

llvm/test/FileCheck/numeric-expression.txt
120

Perhaps change the inner UNSI to UNSI+1 or something to show that the argument of a function is any kind of expression? Up to you.

385

I would prefer these to be interleaved with their corresponding CHECK and input text:

RUN: ... --check-prefix CALL-MISSING-OPENING-BRACKET ...

CALL MISSING OPENING BRACKET
30
CALL-MISSING-OPENING-BRACKET-LABEL: ...
...

RUN: ... --check-prefix CALL-MISSING-CLOSING-BRACKET ...

CALL MISSING CLOSING BRACKET

etc. It helps reduce the distance I have to look to find the thing being checked for.

410

There might want to be some interaction testing with plain parentheses. Something like [[#!mul(NUMVAR,(NUMVAR+3))]] and [[#!mul(NUMVAR,(NUMVAR+3)]] (the first should work, but not the second).

418

Nit: it would probably be best to make this call take two arguments.

426

I think you also want the following: [[#!mul(,NUMVAR)]]

Possibly also [[#!mul(NUMVAR,,NUMVAR)]]

442

Nit: it would probably be best to make this call take two arguments.

llvm/unittests/Support/FileCheckTest.cpp
1076

+1

thopre added inline comments.May 29 2020, 4:07 AM
llvm/lib/Support/FileCheck.cpp
461

Regardless of the ease of implementation, I like the ! prefix since these are builtin functions/operators, not something the user can define. YMMV of course

jhenderson added inline comments.May 29 2020, 4:48 AM
llvm/lib/Support/FileCheck.cpp
461

I'm not sure why it matters that they are builtin? Even if we do provide the ability for users to define their own functions, surely their behaviour should be identical to built-in functions from the majority of the code's point-of-view? I'd actually think that including the ! would make it harder to parse, since we'd have to support function calls both with and without the !.

arichardson added inline comments.May 29 2020, 5:27 AM
llvm/lib/Support/FileCheck.cpp
461

If we can make it work without the ! without making the implementation much more complicated, I'd prefer that.
But I don't feel strongly either way.

Since the name needs to be followed by an open paren, even variables that have the same name as builtin functions should work: [[#mul(mul, 2)]]. The first one is a function name, the second must be a variable since there is no open paren.

Prefix aside I'm just doing another rebase to bring in the signedness work. I'd like to know if it's going to be a requirement to support the reporting of overflow/underflow for the builtins in my patch. Originally I had gone down the llvm route of making the operations signed rather than the data but I see the signedness patch implements the opposite. Ultimately I need to know if there's light at the end of the tunnel or whether to give up and just write ugly tests.

Rebased but still need to remove the function prefix and tighen up the mul/div operators.

Added the suggested tests to the FileCheck unitest plus fleshed out overflow reporting.

I have a clang-format query. I'm getting failures because clang-format is suggesting to use "-<space><digit>" to format a negative number. This doesn't seem correct to me and is not the style I see for existing code in FileCheckTests.cpp. Is this something I can ignore?

I'd like to know if it's going to be a requirement to support the reporting of overflow/underflow for the builtins in my patch. Originally I had gone down the llvm route of making the operations signed rather than the data but I see the signedness patch implements the opposite. Ultimately I need to know if there's light at the end of the tunnel or whether to give up and just write ugly tests.

(Wrote this comment before I saw you added overflow/underflow support, but leaving it because it might give an idea of my thought process on why): Not quite sure I fully followed this comment. I think my preference would be to error out for overflows/underflows, rather than silently allowing them. If things are going to be significantly more complex adding them but you are also going to address them immediately, I'm okay with it being deferred to a future patch. What I don't want long-term is for people to be able to write unintentionally broken test cases because they happen to be triggering underflow/overflow behaviour. Broken test cases are bad!

I have a clang-format query. I'm getting failures because clang-format is suggesting to use "-<space><digit>" to format a negative number. This doesn't seem correct to me and is not the style I see for existing code in FileCheckTests.cpp. Is this something I can ignore?

@thopre ran into this recently too. I consider it a bug in clang-format personally, so you can ignore it, but if @thopre hasn't already, you should file a clang-format bug so that it can get fixed.

llvm/docs/CommandGuide/FileCheck.rst
690

The "and have a 64-bit precision" bit seems a bit out of place here. It should probably be its own sentence.

691

Given the new stuff about functions, I might be tempted to pull out the sentence about supported operators into a list, a bit like that used for the accepted values, especially since it needs updating as things stand!

llvm/lib/Support/FileCheck.cpp
233

I think adding operator* etc makes sense, but it should be a separate patch to adding function support (probably a prerequisite). We don't want to cloud the intent of this patch by adding in other useful functionality, and it will make it easier to focus the reviewing.

I'll save reviewing them for that patch.

317–318

Did you consider writing min in terms of max (or vice versa)? Not sure if it is a good thing to do or not, but I believe it would lead to less duplicated and more concise code. Something like:

if (max(LeftOperand, RightOperand) == LeftOperand)
  return RightOperand;
return LeftOperand;
paulwalker-arm marked 2 inline comments as done.Jun 1 2020, 3:32 AM
paulwalker-arm added inline comments.
llvm/lib/Support/FileCheck.cpp
233

I've created D80915 to add the new operator functions.

thopre added a comment.Jun 1 2020, 6:01 AM

I'd like to know if it's going to be a requirement to support the reporting of overflow/underflow for the builtins in my patch. Originally I had gone down the llvm route of making the operations signed rather than the data but I see the signedness patch implements the opposite. Ultimately I need to know if there's light at the end of the tunnel or whether to give up and just write ugly tests.

(Wrote this comment before I saw you added overflow/underflow support, but leaving it because it might give an idea of my thought process on why): Not quite sure I fully followed this comment. I think my preference would be to error out for overflows/underflows, rather than silently allowing them. If things are going to be significantly more complex adding them but you are also going to address them immediately, I'm okay with it being deferred to a future patch. What I don't want long-term is for people to be able to write unintentionally broken test cases because they happen to be triggering underflow/overflow behaviour. Broken test cases are bad!

I have a clang-format query. I'm getting failures because clang-format is suggesting to use "-<space><digit>" to format a negative number. This doesn't seem correct to me and is not the style I see for existing code in FileCheckTests.cpp. Is this something I can ignore?

@thopre ran into this recently too. I consider it a bug in clang-format personally, so you can ignore it, but if @thopre hasn't already, you should file a clang-format bug so that it can get fixed.

I'm planning to take a look at this clang-format bug today.

thopre added a comment.Jun 1 2020, 6:08 AM

Prefix aside I'm just doing another rebase to bring in the signedness work. I'd like to know if it's going to be a requirement to support the reporting of overflow/underflow for the builtins in my patch. Originally I had gone down the llvm route of making the operations signed rather than the data but I see the signedness patch implements the opposite. Ultimately I need to know if there's light at the end of the tunnel or whether to give up and just write ugly tests.

Using signed operation would mean throwing overflow when the result could be represented in uint64_t in some cases which felt weird, especially since we support unsigned values (e.g. addresses).

thopre added a comment.Jun 1 2020, 7:56 AM

I'd like to know if it's going to be a requirement to support the reporting of overflow/underflow for the builtins in my patch. Originally I had gone down the llvm route of making the operations signed rather than the data but I see the signedness patch implements the opposite. Ultimately I need to know if there's light at the end of the tunnel or whether to give up and just write ugly tests.

(Wrote this comment before I saw you added overflow/underflow support, but leaving it because it might give an idea of my thought process on why): Not quite sure I fully followed this comment. I think my preference would be to error out for overflows/underflows, rather than silently allowing them. If things are going to be significantly more complex adding them but you are also going to address them immediately, I'm okay with it being deferred to a future patch. What I don't want long-term is for people to be able to write unintentionally broken test cases because they happen to be triggering underflow/overflow behaviour. Broken test cases are bad!

I have a clang-format query. I'm getting failures because clang-format is suggesting to use "-<space><digit>" to format a negative number. This doesn't seem correct to me and is not the style I see for existing code in FileCheckTests.cpp. Is this something I can ignore?

@thopre ran into this recently too. I consider it a bug in clang-format personally, so you can ignore it, but if @thopre hasn't already, you should file a clang-format bug so that it can get fixed.

I'm planning to take a look at this clang-format bug today.

Seems to be related to the use of operator. I've created PR46157

thopre added a comment.Jun 1 2020, 9:41 AM

I'm planning to take a look at this clang-format bug today.

Seems to be related to the use of operator. I've created PR46157

Someone already posted a patch for it: https://reviews.llvm.org/D80933. It works for 23ac16cf9bd4cc0bb434efcf6385baf083a2ff7b.

paulwalker-arm marked an inline comment as done.Jun 1 2020, 11:03 AM
paulwalker-arm added inline comments.
llvm/lib/Support/FileCheck.cpp
461

For what it's worth I took a run at an implementation that doesn't require a call prefix and whilst almost as simple as suggested above there are a couple of downsides.

(1) Calls without a prefix require look ahead parsing, which means redundant work continually looking for functions that might never be there. For example, take the parsing of var_a + var_b + var_c + var_d - (var_e - var_f) where parseNumericOperand is likely to perform many failed parse attempts.

(2) Some diagnostics become harder or impossible. For example, is "mu(la+b)" a call to an unknown function, a missing operator or a bracket typo. I know the same scenario is true if the user forgets the prefix but when they don't, we can emit a more useful message. You can see another example in this patch where it's easier to spot and report a missing operator before a function call.

(3) A prefix allows the use of symbols that might otherwise be confusing. Tenuous I know but consider "a + !operator+(b+c)".

(4) Are there any plans for VAR1(VAR2+VAR3) as short hand for mul(VAR1, VAR2+VAR3)?

A downside of the prefix is that we cannot easily use "!" to mean "not". That said "!(var)" support is only a minor modification.

I don't know if these are strong reasons to go with a prefix and ultimately either approach works for me, so just let me know the preference and I'll make the necessary changes.

Taking conversation out-of-line to make it easier. My personal beliefs are as follows:

(1) Calls without a prefix require look ahead parsing, which means redundant work continually looking for functions that might never be there. For example, take the parsing of var_a + var_b + var_c + var_d - (var_e - var_f) where parseNumericOperand is likely to perform many failed parse attempts.

By look ahead parsing are you saying something like to identify whether var_a is actually a function, we have to parse the +? I've not given too much thought to this, but I think this extra work can be avoided by delaying handling of a token until the next token is identified. Thus an identifier token is left unprocessed until the next token has been read in, at which point it is either processed as a function or a variable. However, I accept that might need some rewriting of the existing parsing code. Trying to parse something as a numeric operand as a first attempt before trying to read it as something else seems like the wrong approach long-term as we add more power to these expressions.

(2) Some diagnostics become harder or impossible. For example, is "mu(la+b)" a call to an unknown function, a missing operator or a bracket typo. I know the same scenario is true if the user forgets the prefix but when they don't, we can emit a more useful message. You can see another example in this patch where it's easier to spot and report a missing operator before a function call.

I think it's okay in that case to treat that as an attempt to call function mu, which probably doesn't exist. I'm not sure having the prefix helps in this case: !mu(la+b) is just as unknown a function. The function name is delimited by the end of the previous token and the opening parenthesis in the unprefixed case. I'm not sure I see the case where it's easier to spot a missing operand. The examples in the patch are (I think):

  • (1)(2) - this is unaffected - no identifier means these are treated as parenthesised expressions.
  • 2(X) - 2 will be parsed as a numeric literal, so this is still a missing operator.
  • 2!mul(FOO,2) - without the !, either the 2 is treated as a separate token, because it can't be the first character of an identifier (and therefore again a missing operator diagnostic is still possible), or 2mul becomes an invalid identifier, with corresponding message. I think either is an acceptable error.
  • FOO !mul(FOO,2) - without the !, FOO is still a separate token because of the whitespace, so the missing operand is easily identifiable.

(3) A prefix allows the use of symbols that might otherwise be confusing. Tenuous I know but consider "a + !operator+(b+c)".

I think we either a) don't need to support such expressions or b) special-case + following the term operator or equivalent sequences. I think we just want to keep our function names to valid identifiers like we already have for variable names.

(4) Are there any plans for VAR1(VAR2+VAR3) as short hand for mul(VAR1, VAR2+VAR3)?

I'm not aware of any plans, and I don't think they're really needed (especially if we add support for * as a binary operator for multiplication).

A downside of the prefix is that we cannot easily use "!" to mean "not". That said "!(var)" support is only a minor modification.

I don't think we have any plans or need for boolean support (tools don't generally print "true" or "false" and other things like 1 or 0 can be supported using regular numeric expressions). We will probably want to support != as a comparator though at some point, so we need to allow for that.

I don't know if these are strong reasons to go with a prefix and ultimately either approach works for me, so just let me know the preference and I'll make the necessary changes.

My personal preference is still no prefix. Related aside is this comic: https://xkcd.com/1306/ (especially the alt text).

Thanks for the discussion @jhenderson.

I've removed the function prefix and updated the tests accordingly. I've also removed div and mul support so that it's no longer dependent on D80915, which I'll progress separately.

arichardson added inline comments.Jun 4 2020, 11:58 AM
llvm/lib/Support/FileCheck.cpp
608

Should this be ltrim?

paulwalker-arm marked an inline comment as done.Jun 4 2020, 1:33 PM
paulwalker-arm added inline comments.
llvm/lib/Support/FileCheck.cpp
608

It looks like redundant code as space is already trimmed before loop entry and exit. I'll remove it.

Removed redundant rtrim.

A couple of test cases that might want adding:

  1. Trying to use a variable as a function (e.g. VAR1(1, 2))
  2. Trying to use a function as a variable (e.g. max + min)
  3. Maybe even defining a variable explicitly as a recognised function name (e.g. max + max(1, 2) or even max + max(max, max)).

I reckon the first should be treated as an unrecognised variable, and the others allowed (although the second one probably would be using undefined variables).

llvm/docs/CommandGuide/FileCheck.rst
702

Perhaps "Accepted" rather than "Acceptable"

llvm/lib/Support/FileCheck.cpp
233

Should this be an Expected if it can't fail? Same for min.

255

This cantFail call suggests to me that this shouldn't be an Expected return.

369–375

We should probably allow for optional whitespace between the end of the function name and the (.

paulwalker-arm edited the summary of this revision. (Show Details)Jun 5 2020, 3:20 AM
paulwalker-arm marked 3 inline comments as done.Jun 5 2020, 3:45 AM
llvm/lib/Support/FileCheck.cpp
233

This is to match the binop_eval_t typedef required by BinaryOperation.

255

As explained above. There did not seem much value in creating a duplicate set of functions (i.e. with and without an Expected result) given this is the only other use.

369–375

In LLVM's c/c++ world clang-format will remove such whitespace (presumably to aid readability) so do we really want to allow it in FileCheck code?

jhenderson added inline comments.Jun 5 2020, 4:00 AM
llvm/lib/Support/FileCheck.cpp
233

Thanks, makes sense.

369–375

We've allowed arbitrary whitespace everywhere else in the expressions, so I think we should. Not all environments will necessarily follow LLVM's coding standards.

I initially misread the "max + max" related comments and went down the wrong path. Happily it was not in vain as it prompted me to reuse parseVariable because the function relates to identifiers rather than just variables. I've also simplied parseVariable a little but stopped short of renaming things.

jhenderson accepted this revision.Jun 8 2020, 12:45 AM

Tip to help reviewers: click the "Done" box on inline comments before uploading a patch to indicate you have addressed a specific comment. The comment will then be marked as Done when you upload the next diff.

LGTM, thanks, but wait for others.

llvm/lib/Support/FileCheck.cpp
350

I know this was here before, but since you are modifying this line, you can fix on the way (or in a separate commit before if you prefer): use size_t rather than unsigned to match the return type of Str.size(). Optionally also update I to size_t for the same reason.

arichardson accepted this revision.Jun 9 2020, 4:35 AM
This revision was automatically updated to reflect the committed changes.