Page MenuHomePhabricator

FileCheck [9/12]: Add support for matching formats

Authored by thopre on Apr 7 2019, 4:22 PM.



This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support for selecting a
matching format to match a numeric value against (ie. decimal, hex lower
case letters or hex upper case letters).

This commit allows to select what format a numeric value should be
matched against. The following formats are supported: decimal value,
lower case hex value and upper case hex value. Matching formats impact
both the format of numeric value to be matched as well as the format of
accepted numbers in a definition with empty numeric expression

Default for absence of format is decimal value unless the numeric
expression constraint is non null and use a variable in which case the
format is the one used to define that variable. Conclict of format in
case of several variable being used is diagnosed and forces the user to
select a matching format explicitely.

This commit also enables immediates in numeric expressions to be in any
radix known to StringRef's GetAsInteger method, except for legacy
numeric expressions (ie [[@LINE+<offset>]] which only support decimal


  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Jan 6 2020, 2:33 AM

Got it, thanks!


Have you considered using ASSERT_THAT_ERROR and EXPECT_THAT_ERROR in these tests?

thopre marked 4 inline comments as done.Jan 6 2020, 5:41 AM
thopre added inline comments.

@jhenderson I presume you were answering to Paul? Because FileCheck will indeed check the input file for 0xC when encountering CHECK: 0x[[#%X, 12]] so it's all good. As to the prefix, I supposed you meant for the literals inside the # syntax since prefix is not needed for printf/scanf: scanf("%X", x) will happily scan CAFE without expecting a prefix.

@probinson Since we are talking about parallels between variable definition and scanf on one side and numeric substitution and printf on the other side, we can look at the following examples:

  • 0x[[#%X, VAR2:]] is equivalent to: scanf("%X", VAR2).
  • 0x[[#%X, VAR1+12]] when VAR1 was defined with -D#VAR1=3 will be equivalent to checking the output of the following against the input file:
printf("0x%X", VAR1+12);

Note that C interprets 12 in decimal despite the %X format specifier of the printf which only influences the conversion of the resulting numeric value (3+12=15 is converted to F and thus the input file is checked for "0xF").

#SOMEVAR:<EXPR> is a shorthand for both, and thus a combination of printf+scanf with some input file check inbetween, e.g. 0x[[#%X, VAR2:VAR1+12]] when VAR1 is defined with -D#VAR1=3 is equivalent to checking the output of the following against the input file:

printf("0x%X", VAR1+12);

and scanning the matched input with:

scanf("0x%X", VAR2);

The case of #%X,VAR:12 is then just a special case, i.e. matching the input file against:

printf("%X", 12);

followed by scanning the matched text with:

scanf("%X", VAR);
thopre updated this revision to Diff 236336.Jan 6 2020, 5:41 AM
thopre marked an inline comment as done.

Use EXPECT_THAT_ERROR when testing for an error in unittest

jhenderson accepted this revision.Jan 6 2020, 5:49 AM

LGTM from me on this. I can't think of anything else I'd like addressing, but please wait for @arichardson and anybody else who wants to comment to be happy.


@thopre - I assume you meant @arichardson here, not @probinson...

I was responding to the general conversation, and giving my thoughts, so not addressing anybody specifically. Yes, the as-it-stands behaviour is what I would prefer, precisely for the reasons you outlined with the printf semantics.

This revision is now accepted and ready to land.Jan 6 2020, 5:49 AM
arichardson accepted this revision.Jan 6 2020, 6:59 AM

LGTM. Thanks for working on this feature!


Yes that seems perfectly fine.


@thopre Thanks for the detailed explanation. I think the current behaviour in the tests makes sense. Having the match format not affect the parsing of whats to the right of the = seems simpler and avoids problems.

thopre planned changes to this revision.Jan 6 2020, 7:10 AM

My apologies but I still think there's changes needed on this patch. I had changed the status earlier but I presume updating the patch did reset the status to "needs review". I haven't covered all the API changes in the unittests and I'd like to address the comment from Paul about the lack of clarity of which variables have implicit format conflict. I'm happy to deal with the latter in a separate patch if people are eager to have this change (I'm personally looking forward to it being committed as well) but I want to at least finish the unit testing.

thopre updated this revision to Diff 236817.Jan 8 2020, 6:58 AM

Complete unit testing of changes in the patch

This revision is now accepted and ready to land.Jan 8 2020, 6:58 AM
thopre requested review of this revision.Jan 8 2020, 6:59 AM

Is everybody happy with the constness changes to the ExpressionFormat operators and the new unit tests?

thopre marked 4 inline comments as done.Jan 8 2020, 7:01 AM
thopre added inline comments.

Since this is already a sizable diff, I'd like to address this in a separate patch if you don't mind. It'll require recording more parsing state and I'm thinking about using a parser object for that and thus simplifying the interface of parsing functions (lots of info would be kept as internal state of the object).

arichardson accepted this revision.Jan 12 2020, 2:50 AM

Is everybody happy with the constness changes to the ExpressionFormat operators and the new unit tests?

Looks fine to me. I would say say this can be committed if @jhenderson is happy with it and all tests pass. If there are any remaining issues those can always be fixed in follow-up commit.


Since Kind is just an enum, the const is unnecessary for by-value arguments but I don't think it does any harm either.

This revision is now accepted and ready to land.Jan 12 2020, 2:50 AM

No problem with the constness changes.


It won't do any harm, I think, but it's inconsistent with style elsewhere, so I think it should be deleted (the const on the method is fine though of course).


Nit: this comment isn't quite right. You probably wanted "methods' output", since its the output of the methods. However, I think a clearer phrase this and the similar comments to "Check unsgined decimal format methods", or possibly even "Check unsigned decimal format properties".

In fact, now that I think about it, perhaps it would make more sense for this test to be split into a separate TEST for each different format. The test name would then document what the test is for, and you wouldn't need the comment. You might even want to consider a parameterised test (see the TEST_P function), to avoid code duplication, for the Unsigned, HexLower and HexUpper cases.


ASSERT_THAT_EXPECTED(Wildcard, Succeeded());

Assert because otherwise the next line will crash if it fails, and the _THAT_EXPECTED macro for readability.

Same comment below.

Perhaps worth naming this UnsignedWildcard (HexLowerWildcard etc), and avoid reusing the variable in the other cases.


I've given this a bit more thought, and I think it would be better here and in similar situations with failing Expecteds to actually check the error contents. You can see examples of this in the DWARFDebugLineTest.cpp unit tests (look for the checkError function for a rough guide).

takeError on an Expected seems weird without first having checked that the Expected actually failed. You probably want EXPECT_THAT_EXPECTED(someFunctionThatReturnsAnExpected(), Failed()); if you aren't going to actually check the properties of the Error within the Expected.


Test "NoFormat" explicitly by passing it into the constructor and then consider a separate test to show that the default constructor generates a NoFormat kind.

Also NoneFormat -> NoFormat.


What about the other formats? Why is this here ratehr than in the individual test cases?


Self-comparison feels like a special case. You probably want to use a second format instance, I reckon.


Vs -> versus


This comment seems stale compared to what is being done below (i.e. it's not just about the eval() function it appears on the surface).


This '++' change, and the corresponding changes below, seem unrelated?


I've lost track. What's the difference between parsePatternExpect and parseSubstExpect? Why is it pattern, not subst here? Finally, what do these test cases have to do with formats?


does a check against "C" here make sense?


Does a check against "b" here make sense?


Debug printing left in by mistake?


ASSERT_THAT_EXPECTED(ExpressionPointer, Succeeded());
Same all over the place.


I know this isn't something directly related to your change, so it should be a later one, but you should also remove all uses of errorToBool in favour of checking the actual Error.

Sorry I didn't pick up on those in earlier reviews.

thopre updated this revision to Diff 238737.Jan 17 2020, 3:52 AM
thopre marked 20 inline comments as done.
  • Address unit tests review comments
  • fix casing of first letter in an error message
thopre marked an inline comment as done.Jan 17 2020, 3:54 AM
thopre added inline comments.

Indeed, made it into a separate patch:


parsePatternExpect will call parsePattern which parse the rhs of a CHECK directive. parseSubstExpect calls parseNumericSubstitutionBlock which parses what's inside a # block. The use of parsePatternExpect is because I'm testing that legacy @LINE expression only accept what the old [[@LINE]] (without #) accepted before this patch set. The restriction for legacy @LINE expression is in parseNumericSubstitutionBlock and private functions it calls but the detection of a legacy @LINE expression is in parsePattern hence the use of parsePatternExpect.

The last of the 3 tests is format related because this patch also enables hex literals (I didn't want to make a separate patch for a few line diff). The other 2 should have been done in earlier patch. I've split these 2 into a separate patch:

jhenderson added inline comments.Jan 20 2020, 2:28 AM

How about:

StringRef HexDigits = AllowUpperHex ? "ABCDEF" : "abcdef";
EXPECT_TRUE(WildcardRegex.match(HexDigits, &Matches));
EXPECT_EQ(Matches[0], HexDigits);

Remind me why we can't check the opposite cases are rejected here? That needs commenting at the very least.

Also, why is the check at line 70 ASSERT_TRUE, rather than EXPECT_TRUE?


I'd recommend the following, to reduce duplication of checks.

StringRef Ten;
StringRef Fifteen;
if (AllowHex) {
  if (AllowUpperHex) {
    Ten = "A";
    Fifteen = "F";
  } else {
    Ten = "a";
    Fifteen = "f";
} else {
  Ten = "10";
  Fifteen = "15;
EXPECT_EQ(*TenMatchingString, Ten);
EXPECT_EQ(*FifteenMatchingString, Fifteen);

Formats -> Format


Probably worth a blank line to separate out the Conflict and NoFormat cases. Or just factor them into separate tests.


Formats -> Format


This comment is unnecessary - the test code is self-documenting.


Put a line break above the start of each case, i.e. at the start of each (set of) declaration(s). This will show which the cases more distinctly.


On further thought, consider replacing this comment with a new TEST, with the test naem documenting the test purpose.


, -> ;


format -> formats


format -> formats


What's the diference between this test case and the one above?


See comments in another review. Are this and the similar patterns below safe, if the Expected is in a success state?


Should this test the actual Error contents, like you do elsewhere?



thopre updated this revision to Diff 239204.Jan 20 2020, 3:33 PM
thopre marked 17 inline comments as done.

Address all review comments

thopre added inline comments.Jan 21 2020, 1:58 AM

It's an ASSERT because if the match fails it does not make sense checking Matches[0]. I've added negative testing, it's only for valueFromStringRepr that it is not possible (I've added a comment for that there).


With the latest previous patch yes.

Nearly there, from my point of view.


This doesn't appear to have been done here, only in the bufferized bit below.


Spurious "to allow"?

thopre updated this revision to Diff 239393.Jan 21 2020, 11:35 AM
thopre marked 3 inline comments as done.

Address unit tests review comments


My bad, I don't know how I missed that.

jhenderson accepted this revision.Jan 23 2020, 1:29 AM


@arichardson, do you have any more comments?

arichardson accepted this revision.Jan 23 2020, 5:01 PM

Thanks! Looks good, just one minor typo in the documentation.


accepted format specifiers

thopre updated this revision to Diff 240173.Jan 24 2020, 6:15 AM
thopre marked an inline comment as done.
  • Fix typo in doc
  • remove sneaky literal for LineNumber in unit test
This revision was automatically updated to reflect the committed changes.