This patch extends numerical expressions to allow calls to predefined functions. These calls can be combined with the existing numerical operators, which includes nesting calls. The call syntax is: <func>(<args>) Where <func> is a predefined string literal, currently limited to one of add, max, min and sub. <arg> is a comma seperated list of numerical expressions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is very much work in progress but I welcome early feedback.
I don't know if a function name prefix is necessary but at this stage is allows me to ignore some corner cases. I'd also like to know whether I can get away with only supporting arbitrary argument counts at the parsing layer since currently I only need support for the usual two operand math operations.
Only a minor code change so still WIP, but with basic tests and documentation update.
I like this approach. Starting functions with ! seems reasonable since it is similar tablegen and if we decide that we don't need the prefix, we can always drop it.
The advantage of requiring the prefix is that we can give functions names that might also be commonly used capture names.
There should probably be some tests for error messages in llvm/unittests/Support/FileCheckTest.cpp.
I believe the patch is now ready for review. There's an open question as to
whether the precedence operator (i.e. !()) is required, but since it's
implementation came largely for free I ran with it.
I don't know if there's an official mechanism beyond adding people but can I request code review please.
Please note that the patch to add support for signed values (https://reviews.llvm.org/D60390) is at an advanced stage of review.
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
693 | Why change this sentence? Recursion only happens on one of the operand only. | |
725–726 | I don't think the exclamation mark should be required here. A parenthesis pair should be enough to force precedence. Note that there was a patch to adds support for that and you might want to rebase your patch on top of it. | |
llvm/lib/Support/FileCheck.cpp | ||
646–649 | Please group the two tests together, together they test whether it's an exit condition. |
Adding people as reviewer is the official way. The review policy [1] also says you can ping after 1 week if you didn't have any reply.
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
693 | I don't understand the distinction. I changed the sentence as I didn't what to say: an expression followed by an operator and either a numerical operand or a function call. Are you saying the above is more correct? Also with functions you can have !(mul(VAR+(umin(VAR2,4)) + !(udivl(VAR3+(umax(VAR3,4)) | |
725–726 | Sure, it's more that this implementation comes for free, whereas supporting arbitrary parenthesis pairs along side function calls requires more complex parsing. Is it worth the extra effort? If so I'll happy prevent this usage and error out for unnamed functions. I'd rather not wait for the parenthesis work because I've got other work at review whose tests are much improved when I can make use of function calls. |
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
693 | Oh right, I forgot about function altogether. That said, I think an expression is a kind of numeric operand so we should just expand the definition below saying it can also be a function call. |
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
725–726 | I think the exclamation mark should be reserved for function call. I find it a bit confusing otherwise but let's wait to see what other reviewers might think. Do the tests you need this feature for require some way to force precedence or can this be dealt later? |
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
693 | So something like: A numeric operand is a previously defined numeric variable, an integer literal or the result of a function call. | |
725–726 | I currently have no use for the precedence operator, the function calls are why I started down this path, so am happy either way. Part of me just assumed that in the future the parser might be reworked to remove the need for the ! prefix, but I suppose providing the power of precedence early might prevent that work from happening :) |
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
693 | I'm nitpicking but technically the operand is the function call itself, but evaluates to the return value of the function call. | |
725–726 | I wouldn't hold my breath for a parser rewrite. While I'd love to make the code nicer and more flexible I have little free time to work on it myself. Anyway, if you don't require operator precedence just error on empty function name for now and we can extend it to be used for operator precedence later if needed. |
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
725–726 | Done. Perhaps it's a good idea to mandate that all symbolic operators have a named function counterpart. That way in the short term if somebody does want to force precedence they just need to write a slightly more verbose check line. |
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
725–726 | No objection to that, should be a 2 line changes, right? |
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
725–726 | Yep. I'll add entries for add and sub to this patch and resubmit later today. |
LGTM if @thopre and @jhenderson are happy with this change too.
llvm/unittests/Support/FileCheckTest.cpp | ||
---|---|---|
1076 | Might make sense to add case with missing operators such as 2!mul(FOO,2) or FOO !mul(FOO,2) or !mul(FOO(!mul(3,2))) |
Some of my testing suggestions might better belong in the unit tests. Also, you're probably going to need to rebase and expand the behaviour somewhat now that signed values support has landed in D60390.
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
486 | I'm not strongly opposed to the use of ! to indicate a function call, but is it actually necessary? It seems like a function call could just be identified by <sequence of identifier chars>(<some chars>). The code would look something like the following semi-pseudo code: size_t Parenthesis = Expr.find_first_of('('); if (Parenthesis != npos) { if (all_of(Expr.take_front(Parenthesis), [](char C) { return isValidIdentifierChar(C); }) { if (AO != AllowedOperand::Any) return ErrorDiagnostic::get(SM, Expr, "unexpected function call"); return parseCallExpr(Expr, LineNumber, Context, SM); } } Assuming I've not missed something, that would allow us to simplify all the usages of function calls. | |
616–617 | I believe you could change this case to an asseertion - the parseExpr function treats a leading '(' as a different kind of expression. | |
693 | In conjunction with my suggestion above about not having a function specifier, you could change this code to bail out without error in some cases, perhaps by starting with looking for the % followed by some specific characters, followed by a ,. | |
llvm/lib/Support/FileCheckImpl.h | ||
735–741 | I think you need to delete "both" here, since there are now three different things it accepts. | |
llvm/test/FileCheck/numeric-expression.txt | ||
120 | Perhaps change the inner UNSI to UNSI+1 or something to show that the argument of a function is any kind of expression? Up to you. | |
384 | I would prefer these to be interleaved with their corresponding CHECK and input text: RUN: ... --check-prefix CALL-MISSING-OPENING-BRACKET ... CALL MISSING OPENING BRACKET 30 CALL-MISSING-OPENING-BRACKET-LABEL: ... ... RUN: ... --check-prefix CALL-MISSING-CLOSING-BRACKET ... CALL MISSING CLOSING BRACKET etc. It helps reduce the distance I have to look to find the thing being checked for. | |
409 | There might want to be some interaction testing with plain parentheses. Something like [[#!mul(NUMVAR,(NUMVAR+3))]] and [[#!mul(NUMVAR,(NUMVAR+3)]] (the first should work, but not the second). | |
417 | Nit: it would probably be best to make this call take two arguments. | |
425 | I think you also want the following: [[#!mul(,NUMVAR)]] Possibly also [[#!mul(NUMVAR,,NUMVAR)]] | |
441 | Nit: it would probably be best to make this call take two arguments. | |
llvm/unittests/Support/FileCheckTest.cpp | ||
1076 | +1 |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
486 | Regardless of the ease of implementation, I like the ! prefix since these are builtin functions/operators, not something the user can define. YMMV of course |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
486 | I'm not sure why it matters that they are builtin? Even if we do provide the ability for users to define their own functions, surely their behaviour should be identical to built-in functions from the majority of the code's point-of-view? I'd actually think that including the ! would make it harder to parse, since we'd have to support function calls both with and without the !. |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
486 | If we can make it work without the ! without making the implementation much more complicated, I'd prefer that. Since the name needs to be followed by an open paren, even variables that have the same name as builtin functions should work: [[#mul(mul, 2)]]. The first one is a function name, the second must be a variable since there is no open paren. |
Prefix aside I'm just doing another rebase to bring in the signedness work. I'd like to know if it's going to be a requirement to support the reporting of overflow/underflow for the builtins in my patch. Originally I had gone down the llvm route of making the operations signed rather than the data but I see the signedness patch implements the opposite. Ultimately I need to know if there's light at the end of the tunnel or whether to give up and just write ugly tests.
Rebased but still need to remove the function prefix and tighen up the mul/div operators.
Added the suggested tests to the FileCheck unitest plus fleshed out overflow reporting.
I have a clang-format query. I'm getting failures because clang-format is suggesting to use "-<space><digit>" to format a negative number. This doesn't seem correct to me and is not the style I see for existing code in FileCheckTests.cpp. Is this something I can ignore?
(Wrote this comment before I saw you added overflow/underflow support, but leaving it because it might give an idea of my thought process on why): Not quite sure I fully followed this comment. I think my preference would be to error out for overflows/underflows, rather than silently allowing them. If things are going to be significantly more complex adding them but you are also going to address them immediately, I'm okay with it being deferred to a future patch. What I don't want long-term is for people to be able to write unintentionally broken test cases because they happen to be triggering underflow/overflow behaviour. Broken test cases are bad!
@thopre ran into this recently too. I consider it a bug in clang-format personally, so you can ignore it, but if @thopre hasn't already, you should file a clang-format bug so that it can get fixed.
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
695 | The "and have a 64-bit precision" bit seems a bit out of place here. It should probably be its own sentence. | |
696 | Given the new stuff about functions, I might be tempted to pull out the sentence about supported operators into a list, a bit like that used for the accepted values, especially since it needs updating as things stand! | |
llvm/lib/Support/FileCheck.cpp | ||
233 | I think adding operator* etc makes sense, but it should be a separate patch to adding function support (probably a prerequisite). We don't want to cloud the intent of this patch by adding in other useful functionality, and it will make it easier to focus the reviewing. I'll save reviewing them for that patch. | |
317–318 | Did you consider writing min in terms of max (or vice versa)? Not sure if it is a good thing to do or not, but I believe it would lead to less duplicated and more concise code. Something like: if (max(LeftOperand, RightOperand) == LeftOperand) return RightOperand; return LeftOperand; |
Using signed operation would mean throwing overflow when the result could be represented in uint64_t in some cases which felt weird, especially since we support unsigned values (e.g. addresses).
Someone already posted a patch for it: https://reviews.llvm.org/D80933. It works for 23ac16cf9bd4cc0bb434efcf6385baf083a2ff7b.
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
486 | For what it's worth I took a run at an implementation that doesn't require a call prefix and whilst almost as simple as suggested above there are a couple of downsides. (1) Calls without a prefix require look ahead parsing, which means redundant work continually looking for functions that might never be there. For example, take the parsing of var_a + var_b + var_c + var_d - (var_e - var_f) where parseNumericOperand is likely to perform many failed parse attempts. (2) Some diagnostics become harder or impossible. For example, is "mu(la+b)" a call to an unknown function, a missing operator or a bracket typo. I know the same scenario is true if the user forgets the prefix but when they don't, we can emit a more useful message. You can see another example in this patch where it's easier to spot and report a missing operator before a function call. (3) A prefix allows the use of symbols that might otherwise be confusing. Tenuous I know but consider "a + !operator+(b+c)". (4) Are there any plans for VAR1(VAR2+VAR3) as short hand for mul(VAR1, VAR2+VAR3)? A downside of the prefix is that we cannot easily use "!" to mean "not". That said "!(var)" support is only a minor modification. I don't know if these are strong reasons to go with a prefix and ultimately either approach works for me, so just let me know the preference and I'll make the necessary changes. |
Taking conversation out-of-line to make it easier. My personal beliefs are as follows:
(1) Calls without a prefix require look ahead parsing, which means redundant work continually looking for functions that might never be there. For example, take the parsing of var_a + var_b + var_c + var_d - (var_e - var_f) where parseNumericOperand is likely to perform many failed parse attempts.
By look ahead parsing are you saying something like to identify whether var_a is actually a function, we have to parse the +? I've not given too much thought to this, but I think this extra work can be avoided by delaying handling of a token until the next token is identified. Thus an identifier token is left unprocessed until the next token has been read in, at which point it is either processed as a function or a variable. However, I accept that might need some rewriting of the existing parsing code. Trying to parse something as a numeric operand as a first attempt before trying to read it as something else seems like the wrong approach long-term as we add more power to these expressions.
(2) Some diagnostics become harder or impossible. For example, is "mu(la+b)" a call to an unknown function, a missing operator or a bracket typo. I know the same scenario is true if the user forgets the prefix but when they don't, we can emit a more useful message. You can see another example in this patch where it's easier to spot and report a missing operator before a function call.
I think it's okay in that case to treat that as an attempt to call function mu, which probably doesn't exist. I'm not sure having the prefix helps in this case: !mu(la+b) is just as unknown a function. The function name is delimited by the end of the previous token and the opening parenthesis in the unprefixed case. I'm not sure I see the case where it's easier to spot a missing operand. The examples in the patch are (I think):
- (1)(2) - this is unaffected - no identifier means these are treated as parenthesised expressions.
- 2(X) - 2 will be parsed as a numeric literal, so this is still a missing operator.
- 2!mul(FOO,2) - without the !, either the 2 is treated as a separate token, because it can't be the first character of an identifier (and therefore again a missing operator diagnostic is still possible), or 2mul becomes an invalid identifier, with corresponding message. I think either is an acceptable error.
- FOO !mul(FOO,2) - without the !, FOO is still a separate token because of the whitespace, so the missing operand is easily identifiable.
(3) A prefix allows the use of symbols that might otherwise be confusing. Tenuous I know but consider "a + !operator+(b+c)".
I think we either a) don't need to support such expressions or b) special-case + following the term operator or equivalent sequences. I think we just want to keep our function names to valid identifiers like we already have for variable names.
(4) Are there any plans for VAR1(VAR2+VAR3) as short hand for mul(VAR1, VAR2+VAR3)?
I'm not aware of any plans, and I don't think they're really needed (especially if we add support for * as a binary operator for multiplication).
A downside of the prefix is that we cannot easily use "!" to mean "not". That said "!(var)" support is only a minor modification.
I don't think we have any plans or need for boolean support (tools don't generally print "true" or "false" and other things like 1 or 0 can be supported using regular numeric expressions). We will probably want to support != as a comparator though at some point, so we need to allow for that.
I don't know if these are strong reasons to go with a prefix and ultimately either approach works for me, so just let me know the preference and I'll make the necessary changes.
My personal preference is still no prefix. Related aside is this comic: https://xkcd.com/1306/ (especially the alt text).
Thanks for the discussion @jhenderson.
I've removed the function prefix and updated the tests accordingly. I've also removed div and mul support so that it's no longer dependent on D80915, which I'll progress separately.
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
629 | Should this be ltrim? |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
629 | It looks like redundant code as space is already trimmed before loop entry and exit. I'll remove it. |
A couple of test cases that might want adding:
- Trying to use a variable as a function (e.g. VAR1(1, 2))
- Trying to use a function as a variable (e.g. max + min)
- Maybe even defining a variable explicitly as a recognised function name (e.g. max + max(1, 2) or even max + max(max, max)).
I reckon the first should be treated as an unrecognised variable, and the others allowed (although the second one probably would be using undefined variables).
llvm/docs/CommandGuide/FileCheck.rst | ||
---|---|---|
707 | Perhaps "Accepted" rather than "Acceptable" | |
llvm/lib/Support/FileCheck.cpp | ||
233 | Should this be an Expected if it can't fail? Same for min. | |
255 | This cantFail call suggests to me that this shouldn't be an Expected return. | |
374–380 | We should probably allow for optional whitespace between the end of the function name and the (. |
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
233 | This is to match the binop_eval_t typedef required by BinaryOperation. | |
255 | As explained above. There did not seem much value in creating a duplicate set of functions (i.e. with and without an Expected result) given this is the only other use. | |
374–380 | In LLVM's c/c++ world clang-format will remove such whitespace (presumably to aid readability) so do we really want to allow it in FileCheck code? |
I initially misread the "max + max" related comments and went down the wrong path. Happily it was not in vain as it prompted me to reuse parseVariable because the function relates to identifiers rather than just variables. I've also simplied parseVariable a little but stopped short of renaming things.
Tip to help reviewers: click the "Done" box on inline comments before uploading a patch to indicate you have addressed a specific comment. The comment will then be marked as Done when you upload the next diff.
LGTM, thanks, but wait for others.
llvm/lib/Support/FileCheck.cpp | ||
---|---|---|
355 | I know this was here before, but since you are modifying this line, you can fix on the way (or in a separate commit before if you prefer): use size_t rather than unsigned to match the return type of Str.size(). Optionally also update I to size_t for the same reason. |
Why change this sentence? Recursion only happens on one of the operand only.