This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support for selecting a
matching format to match a numeric value against (ie. decimal, hex lower
case letters or hex upper case letters).
This commit allows to select what format a numeric value should be
matched against. The following formats are supported: decimal value,
lower case hex value and upper case hex value. Matching formats impact
both the format of numeric value to be matched as well as the format of
accepted numbers in a definition with empty numeric expression
constraint.
Default for absence of format is decimal value unless the numeric
expression constraint is non null and use a variable in which case the
format is the one used to define that variable. Conclict of format in
case of several variable being used is diagnosed and forces the user to
select a matching format explicitely.
This commit also enables immediates in numeric expressions to be in any
radix known to StringRef's GetAsInteger method, except for legacy
numeric expressions (ie [[@LINE+<offset>]] which only support decimal
immediates.
Copyright:
Linaro (changes up to diff 183612 of revision D55940)
GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)
@jhenderson I presume you were answering to Paul? Because FileCheck will indeed check the input file for 0xC when encountering CHECK: 0x[[#%X, 12]] so it's all good. As to the prefix, I supposed you meant for the literals inside the # syntax since prefix is not needed for printf/scanf: scanf("%X", x) will happily scan CAFE without expecting a prefix.
@probinson Since we are talking about parallels between variable definition and scanf on one side and numeric substitution and printf on the other side, we can look at the following examples:
0x[[#%X, VAR2:]] is equivalent to: scanf("%X", VAR2).
0x[[#%X, VAR1+12]] when VAR1 was defined with -D#VAR1=3 will be equivalent to checking the output of the following against the input file:
VAR1=3;
printf("0x%X", VAR1+12);
Note that C interprets 12 in decimal despite the %X format specifier of the printf which only influences the conversion of the resulting numeric value (3+12=15 is converted to F and thus the input file is checked for "0xF").
#SOMEVAR:<EXPR> is a shorthand for both, and thus a combination of printf+scanf with some input file check inbetween, e.g. 0x[[#%X, VAR2:VAR1+12]] when VAR1 is defined with -D#VAR1=3 is equivalent to checking the output of the following against the input file:
VAR1=3;
printf("0x%X", VAR1+12);
and scanning the matched input with:
scanf("0x%X", VAR2);
The case of #%X,VAR:12 is then just a special case, i.e. matching the input file against:
LGTM from me on this. I can't think of anything else I'd like addressing, but please wait for @arichardson and anybody else who wants to comment to be happy.
I was responding to the general conversation, and giving my thoughts, so not addressing anybody specifically. Yes, the as-it-stands behaviour is what I would prefer, precisely for the reasons you outlined with the printf semantics.
@thopre Thanks for the detailed explanation. I think the current behaviour in the tests makes sense. Having the match format not affect the parsing of whats to the right of the = seems simpler and avoids problems.
My apologies but I still think there's changes needed on this patch. I had changed the status earlier but I presume updating the patch did reset the status to "needs review". I haven't covered all the API changes in the unittests and I'd like to address the comment from Paul about the lack of clarity of which variables have implicit format conflict. I'm happy to deal with the latter in a separate patch if people are eager to have this change (I'm personally looking forward to it being committed as well) but I want to at least finish the unit testing.
Since this is already a sizable diff, I'd like to address this in a separate patch if you don't mind. It'll require recording more parsing state and I'm thinking about using a parser object for that and thus simplifying the interface of parsing functions (lots of info would be kept as internal state of the object).
Is everybody happy with the constness changes to the ExpressionFormat operators and the new unit tests?
Looks fine to me. I would say say this can be committed if @jhenderson is happy with it and all tests pass. If there are any remaining issues those can always be fixed in follow-up commit.
It won't do any harm, I think, but it's inconsistent with style elsewhere, so I think it should be deleted (the const on the method is fine though of course).
Nit: this comment isn't quite right. You probably wanted "methods' output", since its the output of the methods. However, I think a clearer phrase this and the similar comments to "Check unsgined decimal format methods", or possibly even "Check unsigned decimal format properties".
In fact, now that I think about it, perhaps it would make more sense for this test to be split into a separate TEST for each different format. The test name would then document what the test is for, and you wouldn't need the comment. You might even want to consider a parameterised test (see the TEST_P function), to avoid code duplication, for the Unsigned, HexLower and HexUpper cases.
I've given this a bit more thought, and I think it would be better here and in similar situations with failing Expecteds to actually check the error contents. You can see examples of this in the DWARFDebugLineTest.cpp unit tests (look for the checkError function for a rough guide).
takeError on an Expected seems weird without first having checked that the Expected actually failed. You probably want EXPECT_THAT_EXPECTED(someFunctionThatReturnsAnExpected(), Failed()); if you aren't going to actually check the properties of the Error within the Expected.
Test "NoFormat" explicitly by passing it into the constructor and then consider a separate test to show that the default constructor generates a NoFormat kind.
I've lost track. What's the difference between parsePatternExpect and parseSubstExpect? Why is it pattern, not subst here? Finally, what do these test cases have to do with formats?
I know this isn't something directly related to your change, so it should be a later one, but you should also remove all uses of errorToBool in favour of checking the actual Error.
Sorry I didn't pick up on those in earlier reviews.
parsePatternExpect will call parsePattern which parse the rhs of a CHECK directive. parseSubstExpect calls parseNumericSubstitutionBlock which parses what's inside a # block. The use of parsePatternExpect is because I'm testing that legacy @LINE expression only accept what the old [[@LINE]] (without #) accepted before this patch set. The restriction for legacy @LINE expression is in parseNumericSubstitutionBlock and private functions it calls but the detection of a legacy @LINE expression is in parsePattern hence the use of parsePatternExpect.
The last of the 3 tests is format related because this patch also enables hex literals (I didn't want to make a separate patch for a few line diff). The other 2 should have been done in earlier patch. I've split these 2 into a separate patch: https://reviews.llvm.org/D72912
It's an ASSERT because if the match fails it does not make sense checking Matches[0]. I've added negative testing, it's only for valueFromStringRepr that it is not possible (I've added a comment for that there).