Page MenuHomePhabricator

FileCheck [9/12]: Add support for matching formats

Authored by thopre on Apr 7 2019, 4:22 PM.



This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support for selecting a
matching format to match a numeric value against (ie. decimal, hex lower
case letters or hex upper case letters).

This commit allows to select what format a numeric value should be
matched against. The following formats are supported: decimal value,
lower case hex value and upper case hex value. Matching formats impact
both the format of numeric value to be matched as well as the format of
accepted numbers in a definition with empty numeric expression

Default for absence of format is decimal value unless the numeric
expression constraint is non null and use a variable in which case the
format is the one used to define that variable. Conclict of format in
case of several variable being used is diagnosed and forces the user to
select a matching format explicitely.

This commit also enables immediates in numeric expressions to be in any
radix known to StringRef's GetAsInteger method, except for legacy
numeric expressions (ie [[@LINE+<offset>]] which only support decimal


  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
arichardson added inline comments.Fri, Jan 3, 7:12 AM

I'm not entirely sure what the best solution there is.
I think NUMVAL2 should always evaluate to 19.
However, I don't have a strong preference whether it should capture decimal 19 or hex 0x13 (i.e. parse the number in the expresssion as decimal, but inherit format from the other variable).

Alternatively we could require an explicit format for those (probably rare?) cases.

jhenderson added inline comments.Fri, Jan 3, 8:58 AM

Am I right in thinking you can get rid of the = 0 now?


This comment isn't clear to me. I assume what it means by "if it can be matched" is "if it is a format that can be used in a match" ot something similar?


if neither is NoFormat and their kinds are the same.


Nit: trailing full stop.

You're also mixing your comment characters. Here you use '#', but two lines above you use ';'.


Nit: too many blank lines (1 is enough)


I'm not familiar with this "%ProtectFileCheckOutput". What is it for, and why do only some cases seem to use it?


You should probably also have this case where the format is on the second value, and not the first.


Aside from being fewer in number, how is this set different from the "USE DEF FMT IMPL MATCH" set?


It might be interesting to show what happens when you pass an upper-case hex digit to a lower-case format and vice versa.


What's going on here?

thopre updated this revision to Diff 236098.Fri, Jan 3, 11:47 AM
thopre marked 6 inline comments as done.

Add example showing literals are always parsed as decimal


Yes, in one case one could diagnose a missing comma while in the other a missing percentage. This approach allows to reorder the code to parse each element of a numeric substitution (format specifier, variable definition, expression) when internal API change, as was done in earlier version of this patch. If you don't mind I'll keep it this way.


Is the way you expect 12 to be interpreted in '-D#%X,NUMVAL=12' and in '-D#%X,NUMVAL1=7 -D#NUMVAL2=12+NUMVAL1' different because of the explicit Vs implicit format or because you see the former as similar to a value in an input text (and thus 12 as 0x12) and the latter as a literal in a numeric expression (and thus 12 as decimal 12)?

I find it even more confusing to distinguish the behavior of literals between implicit and explicit format (e.g. '-D#%X,NUMVAL1=7 -D#%X,NUMVAL2=12+NUMVAL1' would interpret 12 as 0x12 then).

I also see several reasons to avoid the distinction between the 2 examples based on whether a variable is present after the equal sign:

  • complexity: (i) more text needs to be added in the documentation to describe this distinction thus making the feature harder to comprehend IMHO and (ii) the parsing code needs to distinguish between these 2 cases
  • consistency: the same syntax is used in both cases but the behaviour is different due to the use of a numeric variable after the equal sign
  • unnecessary: the format specifier is needed because (i) the input text can contain a mixture of text and numeric values and (ii) hex numeric values can come with or without prefix, in lowercase or uppercase. It is thus necessary to indicate when something like "dead" is to be interpreted as a numeric value and when it shouldn't. There is not problem in a numeric substitution (whether one defined on the command line or in a check file) since there is no text in it and it is under control of the test writer which can add a prefix.

Therefore I strongly think anything after the equal sign is to be interpreted as a numeric expression and not an input. The distinction might appear clearer in the provision made by one of the later patch to support numeric subsitutions such as #%X,NUMVAL:<12 where the numeric expression value would be 12 and the input value could be 11 (input being B) and thus match.

I've added an example in the documentation to document that a literal is always intepreted as decimal in the absence of prefix (0x12 is allowed and interpreted as hex). Hopefully that'll alleviate some of your concern.

thopre updated this revision to Diff 236109.Fri, Jan 3, 12:33 PM
thopre marked 12 inline comments as done.

Address latest round of review comments


This is used when parsing the output of FileCheck to avoid things such as localization, see for more details


valueFromStringRepr does not give an error in that case as it expects the value to match the format. This is because getAsInteger does not allow to check for the casing and I don't want to make the function more complex by checking it.

jhenderson added inline comments.Mon, Jan 6, 2:33 AM

FWIW, I would find treating any number without 0x as hex to be ambiguous. I don't think the format specifier really should make a difference to that. After all, if you think of it in printf terms: printf("0x%x", 12) prints "0xc", not "0x12".


Got it, thanks!


Have you considered using ASSERT_THAT_ERROR and EXPECT_THAT_ERROR in these tests?

thopre marked 4 inline comments as done.Mon, Jan 6, 5:41 AM
thopre added inline comments.

@jhenderson I presume you were answering to Paul? Because FileCheck will indeed check the input file for 0xC when encountering CHECK: 0x[[#%X, 12]] so it's all good. As to the prefix, I supposed you meant for the literals inside the # syntax since prefix is not needed for printf/scanf: scanf("%X", x) will happily scan CAFE without expecting a prefix.

@probinson Since we are talking about parallels between variable definition and scanf on one side and numeric substitution and printf on the other side, we can look at the following examples:

  • 0x[[#%X, VAR2:]] is equivalent to: scanf("%X", VAR2).
  • 0x[[#%X, VAR1+12]] when VAR1 was defined with -D#VAR1=3 will be equivalent to checking the output of the following against the input file:
printf("0x%X", VAR1+12);

Note that C interprets 12 in decimal despite the %X format specifier of the printf which only influences the conversion of the resulting numeric value (3+12=15 is converted to F and thus the input file is checked for "0xF").

#SOMEVAR:<EXPR> is a shorthand for both, and thus a combination of printf+scanf with some input file check inbetween, e.g. 0x[[#%X, VAR2:VAR1+12]] when VAR1 is defined with -D#VAR1=3 is equivalent to checking the output of the following against the input file:

printf("0x%X", VAR1+12);

and scanning the matched input with:

scanf("0x%X", VAR2);

The case of #%X,VAR:12 is then just a special case, i.e. matching the input file against:

printf("%X", 12);

followed by scanning the matched text with:

scanf("%X", VAR);
thopre updated this revision to Diff 236336.Mon, Jan 6, 5:41 AM
thopre marked an inline comment as done.

Use EXPECT_THAT_ERROR when testing for an error in unittest

jhenderson accepted this revision.Mon, Jan 6, 5:49 AM

LGTM from me on this. I can't think of anything else I'd like addressing, but please wait for @arichardson and anybody else who wants to comment to be happy.


@thopre - I assume you meant @arichardson here, not @probinson...

I was responding to the general conversation, and giving my thoughts, so not addressing anybody specifically. Yes, the as-it-stands behaviour is what I would prefer, precisely for the reasons you outlined with the printf semantics.

This revision is now accepted and ready to land.Mon, Jan 6, 5:49 AM
arichardson accepted this revision.Mon, Jan 6, 6:59 AM

LGTM. Thanks for working on this feature!


Yes that seems perfectly fine.


@thopre Thanks for the detailed explanation. I think the current behaviour in the tests makes sense. Having the match format not affect the parsing of whats to the right of the = seems simpler and avoids problems.

thopre planned changes to this revision.Mon, Jan 6, 7:10 AM

My apologies but I still think there's changes needed on this patch. I had changed the status earlier but I presume updating the patch did reset the status to "needs review". I haven't covered all the API changes in the unittests and I'd like to address the comment from Paul about the lack of clarity of which variables have implicit format conflict. I'm happy to deal with the latter in a separate patch if people are eager to have this change (I'm personally looking forward to it being committed as well) but I want to at least finish the unit testing.

thopre updated this revision to Diff 236817.Wed, Jan 8, 6:58 AM

Complete unit testing of changes in the patch

This revision is now accepted and ready to land.Wed, Jan 8, 6:58 AM
thopre requested review of this revision.Wed, Jan 8, 6:59 AM

Is everybody happy with the constness changes to the ExpressionFormat operators and the new unit tests?

thopre marked 4 inline comments as done.Wed, Jan 8, 7:01 AM
thopre added inline comments.

Since this is already a sizable diff, I'd like to address this in a separate patch if you don't mind. It'll require recording more parsing state and I'm thinking about using a parser object for that and thus simplifying the interface of parsing functions (lots of info would be kept as internal state of the object).

arichardson accepted this revision.Sun, Jan 12, 2:50 AM

Is everybody happy with the constness changes to the ExpressionFormat operators and the new unit tests?

Looks fine to me. I would say say this can be committed if @jhenderson is happy with it and all tests pass. If there are any remaining issues those can always be fixed in follow-up commit.


Since Kind is just an enum, the const is unnecessary for by-value arguments but I don't think it does any harm either.

This revision is now accepted and ready to land.Sun, Jan 12, 2:50 AM

No problem with the constness changes.


It won't do any harm, I think, but it's inconsistent with style elsewhere, so I think it should be deleted (the const on the method is fine though of course).


Nit: this comment isn't quite right. You probably wanted "methods' output", since its the output of the methods. However, I think a clearer phrase this and the similar comments to "Check unsgined decimal format methods", or possibly even "Check unsigned decimal format properties".

In fact, now that I think about it, perhaps it would make more sense for this test to be split into a separate TEST for each different format. The test name would then document what the test is for, and you wouldn't need the comment. You might even want to consider a parameterised test (see the TEST_P function), to avoid code duplication, for the Unsigned, HexLower and HexUpper cases.


ASSERT_THAT_EXPECTED(Wildcard, Succeeded());

Assert because otherwise the next line will crash if it fails, and the _THAT_EXPECTED macro for readability.

Same comment below.

Perhaps worth naming this UnsignedWildcard (HexLowerWildcard etc), and avoid reusing the variable in the other cases.


I've given this a bit more thought, and I think it would be better here and in similar situations with failing Expecteds to actually check the error contents. You can see examples of this in the DWARFDebugLineTest.cpp unit tests (look for the checkError function for a rough guide).

takeError on an Expected seems weird without first having checked that the Expected actually failed. You probably want EXPECT_THAT_EXPECTED(someFunctionThatReturnsAnExpected(), Failed()); if you aren't going to actually check the properties of the Error within the Expected.


Test "NoFormat" explicitly by passing it into the constructor and then consider a separate test to show that the default constructor generates a NoFormat kind.

Also NoneFormat -> NoFormat.


What about the other formats? Why is this here ratehr than in the individual test cases?


Self-comparison feels like a special case. You probably want to use a second format instance, I reckon.


Vs -> versus


This comment seems stale compared to what is being done below (i.e. it's not just about the eval() function it appears on the surface).


This '++' change, and the corresponding changes below, seem unrelated?


I've lost track. What's the difference between parsePatternExpect and parseSubstExpect? Why is it pattern, not subst here? Finally, what do these test cases have to do with formats?


does a check against "C" here make sense?


Does a check against "b" here make sense?


Debug printing left in by mistake?


ASSERT_THAT_EXPECTED(ExpressionPointer, Succeeded());
Same all over the place.


I know this isn't something directly related to your change, so it should be a later one, but you should also remove all uses of errorToBool in favour of checking the actual Error.

Sorry I didn't pick up on those in earlier reviews.

thopre updated this revision to Diff 238737.Fri, Jan 17, 3:52 AM
thopre marked 20 inline comments as done.
  • Address unit tests review comments
  • fix casing of first letter in an error message
thopre marked an inline comment as done.Fri, Jan 17, 3:54 AM
thopre added inline comments.

Indeed, made it into a separate patch:


parsePatternExpect will call parsePattern which parse the rhs of a CHECK directive. parseSubstExpect calls parseNumericSubstitutionBlock which parses what's inside a # block. The use of parsePatternExpect is because I'm testing that legacy @LINE expression only accept what the old [[@LINE]] (without #) accepted before this patch set. The restriction for legacy @LINE expression is in parseNumericSubstitutionBlock and private functions it calls but the detection of a legacy @LINE expression is in parsePattern hence the use of parsePatternExpect.

The last of the 3 tests is format related because this patch also enables hex literals (I didn't want to make a separate patch for a few line diff). The other 2 should have been done in earlier patch. I've split these 2 into a separate patch:

jhenderson added inline comments.Mon, Jan 20, 2:28 AM

How about:

StringRef HexDigits = AllowUpperHex ? "ABCDEF" : "abcdef";
EXPECT_TRUE(WildcardRegex.match(HexDigits, &Matches));
EXPECT_EQ(Matches[0], HexDigits);

Remind me why we can't check the opposite cases are rejected here? That needs commenting at the very least.

Also, why is the check at line 70 ASSERT_TRUE, rather than EXPECT_TRUE?


I'd recommend the following, to reduce duplication of checks.

StringRef Ten;
StringRef Fifteen;
if (AllowHex) {
  if (AllowUpperHex) {
    Ten = "A";
    Fifteen = "F";
  } else {
    Ten = "a";
    Fifteen = "f";
} else {
  Ten = "10";
  Fifteen = "15;
EXPECT_EQ(*TenMatchingString, Ten);
EXPECT_EQ(*FifteenMatchingString, Fifteen);

Formats -> Format


Probably worth a blank line to separate out the Conflict and NoFormat cases. Or just factor them into separate tests.


Formats -> Format


This comment is unnecessary - the test code is self-documenting.


Put a line break above the start of each case, i.e. at the start of each (set of) declaration(s). This will show which the cases more distinctly.


On further thought, consider replacing this comment with a new TEST, with the test naem documenting the test purpose.


, -> ;


format -> formats


format -> formats


What's the diference between this test case and the one above?


See comments in another review. Are this and the similar patterns below safe, if the Expected is in a success state?


Should this test the actual Error contents, like you do elsewhere?



thopre updated this revision to Diff 239204.Mon, Jan 20, 3:33 PM
thopre marked 17 inline comments as done.

Address all review comments