This is an archive of the discontinued LLVM Phabricator instance.

FileCheck [10/12]: Add support for signed numeric values
ClosedPublic

Authored by thopre on Apr 7 2019, 4:23 PM.

Details

Summary

This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support signed numeric
values, thus allowing negative numeric values.

As such, the patch adds a new class to represent a signed or unsigned
value and add the logic for type promotion and type conversion in
numeric expression mixing signed and unsigned values. It also adds
the %d format specifier to represent signed value.

Finally, it also adds underflow and overflow detection when performing a
binary operation.

Copyright:

  • Linaro (changes up to diff 183612 of revision D55940)
  • GraphCore (changes in later versions of revision D55940 and in new revision created off D55940)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thopre updated this revision to Diff 203212.Jun 5 2019, 11:21 AM

Rebase on top of latest changes in the series

thopre updated this revision to Diff 205631.Jun 19 2019, 10:16 AM

Rebase on top of latest changes in the patch series

thopre updated this revision to Diff 205683.Jun 19 2019, 3:15 PM
  • Fix a case of numeric expression -> expression
  • Fix code style
thopre updated this revision to Diff 205765.Jun 20 2019, 2:45 AM

Rebase on latest changes in patch series

arichardson added inline comments.Jul 26 2019, 2:59 AM
llvm/lib/Support/FileCheck.cpp
91

llvm/include/llvm/Support/CheckedArithmetic.h contains llvm::checkedAdd, llvm::checkedSub, llvm::checkedAddUnsigned and llvm::checkedSubUnsigned, which should make this code much simpler.

thopre updated this revision to Diff 233340.Dec 11 2019, 6:03 AM
thopre marked an inline comment as done.

Address comment and rebase

thopre planned changes to this revision.Dec 11 2019, 6:03 AM

Uploading the current status of this patch. I know there are still issues (in particular add more unittests), and I'll address them.

thopre planned changes to this revision.Jan 8 2020, 7:27 AM

Need to add unit tests

arichardson accepted this revision.Jan 12 2020, 3:10 AM

Looks good to me once the tests have been added and other reviewers are also happy.

llvm/lib/Support/FileCheck.cpp
84

This is fine for the initial version of the patch, but I think in the future it will be useful to permit adding an unsigned variable to a signed one, etc. as long as the result is still within the representable range.

242

I'm not sure if this is necessary but I feel like it may be useful to permit adding a signed (negative value) to an unsigned value as long as the result is >= 0. Maybe these checks should move into operator+/-?

llvm/test/FileCheck/numeric-expression.txt
68

Since there are quite a lot of CHECK lines now, it might be helpful to add a comment after every line that checks a new variable. For example:

11 // VAR1
12
10
-30 // VAR2
-29
-31
c // VAR3 (hex)
...

Sorry for the delay in looking at this. Comments inline.

llvm/lib/Support/FileCheck.cpp
33

I'd not abbreviate this name. It's not clear to me that "Conv" == "Conversion"

Also, it's more idiomatic to declare the Error within the if, to reduce its scope.

66

other -> Other

67

I think you need a check that Signed == other.Signed, as otherwise there will be some cases where the two values match but aren't of the same signed-ness (e.g. -1 and uint64_t max). Unless that's intentional of course.

73

I'd make the names of this and convertUnsigned a bit more explicit, by adding "to", e.g. "toSigned" or "convertToSigned"

1149

As far as I can see, this block is just skipping the error? Why the switch/case?

llvm/lib/Support/FileCheckImpl.h
133 ↗(On Diff #236825)

Perhaps a not_supported or invalid_argument would make more sense here.

158–162 ↗(On Diff #236825)

This pair of constructors feels like a recipe for confusion or compilation problems due to ambiguous resolution. I think it might be better to make Signed an additional parameter.

166–167 ↗(On Diff #236825)

other -> Other

225–229 ↗(On Diff #236825)

As with above, this could lead to ambiguous compiler resolution, if Val is neither a uint64_t or int64_t. Perhaps not an issue in internal code, but it feels like there might be a better solution to me, perhaps using templates and std::is_signed in this case?

291 ↗(On Diff #236825)

You've lost the reference to when this can be None. When can it be?

llvm/test/FileCheck/numeric-expression.txt
71

Maybe there should be som testing for signed positive values here?

thopre updated this revision to Diff 263088.May 10 2020, 4:09 PM
thopre marked 17 inline comments as done.

Address review comments

This revision is now accepted and ready to land.May 10 2020, 4:09 PM
thopre added inline comments.May 10 2020, 4:10 PM
llvm/lib/Support/FileCheck.cpp
1149

There's now only an OverflowError without type but FYI the goal was to catch the case of a new type of ValueError in the future that shouldn't be ignored here.

llvm/lib/Support/FileCheckImpl.h
133 ↗(On Diff #236825)

From llvm/include/llvm/Support/Error.h:

/// The value returned by this function can be returned from convertToErrorCode
/// for Error values where no sensible translation to std::error_code exists.
/// It should only be used in this situation, and should never be used where a
/// sensible conversion to std::error_code is available, as attempts to convert
/// to/from this error will result in a fatal error. (i.e. it is a programmatic
///error to try to convert such a value).
std::error_code inconvertibleErrorCode();

That seems to be the right function to use when that function does not make sense for a given error?

thopre requested review of this revision.May 10 2020, 4:10 PM

Last build failure were due to the pre commit check script trying to apply one of the earlier patch of this patch series as a needed dependency even though that diff is marked as closed.

jhenderson added inline comments.May 12 2020, 2:28 AM
llvm/docs/CommandGuide/FileCheck.rst
606

promotion rules

llvm/lib/Support/FileCheck.cpp
50–56

It might be worth a comment saying why signed plus unsigned results in an unsigned (of course, that assumes I'm following the logic correctly).

56

I'm a little confused as to what is going on here, as I'm not up-to-speed on APInt. Could you give a brief explanation, please?

79

Doesn't this assume 2s complement, which isn't standard-guaranteed as far as I'm aware? I'm okay with that, if there is precendent for it in LLVM already, since it seems like C++ is moving in that direction (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html), but I'd be nervous otherwise.

1054

Why do we handle OverflowError here, but not other error kinds (e.g. undefined variables)?

1056

Unable -> unable

(diagnostic messages should not be capitalized)

llvm/lib/Support/FileCheckImpl.h
87 ↗(On Diff #263290)

Delete the "if" at the end of this line (it's not needed).

106 ↗(On Diff #263290)

std::errc::value_too_large is for overflows, so maybe that can be used here?

109 ↗(On Diff #263290)

This comment doesn't seem right?

110 ↗(On Diff #263290)

Probably should be lower-case "overflow"

143 ↗(On Diff #263290)

I'm slightly nervous "error in case of overflow" might go stale, in the event we introduce other failure modes. Perhaps "an error in case of a problem, such as an overflow."

llvm/test/FileCheck/numeric-expression.txt
44–48

Any particular reason you've changed this case?

82–84

Nit: here and below, add space after comment markers, i.e. // VAR1

Also, has this been addressed:

Maybe there should be som testing for signed positive values here?

152

Here and elsewhere there seem to be some extra cases added that aren't to do with signed decimal values. Do they belong in this change?

175–176

It seems like we need testing showing the behaviour of signed decimals converting to hex.

295–296

Here and elsewhere, minor nit: be consistent with single or double-dashes.

llvm/unittests/Support/FileCheckTest.cpp
108 ↗(On Diff #263290)

MinusFortyTwo

385 ↗(On Diff #263290)

What about 10u + -11 or -11 + 10u?

405 ↗(On Diff #263290)

What about MinusTen - MinusTen?

461 ↗(On Diff #263290)

10ull would be more typical, I'd think, here and elsewhere. Any particular reason you've chosen the cast?

arichardson added inline comments.May 12 2020, 3:09 AM
llvm/lib/Support/FileCheck.cpp
56

Using APInt here seems unnecessary. How about return (int64_t)Value;?

Or if you would like to avoid the casts you could use a union.

79

Since we know the value is signed after if (!Signed), can't we change these three lines to
return ExpressionValue(std::abs((int64_t)Value), /*Signed=*/false)?

llvm/lib/Support/FileCheckImpl.h
127 ↗(On Diff #263290)

Signed && (int64_t)Value < 0?

arichardson added inline comments.May 12 2020, 3:10 AM
llvm/unittests/Support/FileCheckTest.cpp
461 ↗(On Diff #263290)

ull is not guaranteed to be the same as a uint64_t constant since some platforms use unsigned long instead of unsigned long long. stdint.h provides UINT64_C(10) if you want to avoid the cast.

thopre marked 2 inline comments as done.May 12 2020, 9:35 AM
thopre added inline comments.
llvm/lib/Support/FileCheck.cpp
56

Value is uint64_t and C++14 says:

If the destination type is signed, the value is unchanged if it can be represented in the destination type;otherwise, the value is implementation-defined

So if Value corresponds to a negative signed integer the conversion is implementation-defined. I don't know how APInt deal with this but I hope it does something C++14 compliant.

79

As per above, the conversion to (int64_t) is implementation-defined in case where Value has its msb set (ie. corresponds to a negative signed value).

thopre updated this revision to Diff 263557.May 12 2020, 4:04 PM
thopre marked 29 inline comments as done.

Address all comments

thopre updated this revision to Diff 263558.May 12 2020, 4:05 PM

Fix casing

MaskRay added inline comments.
llvm/lib/Support/FileCheckImpl.h
46 ↗(On Diff #263558)

a signed integer

jhenderson added inline comments.May 13 2020, 12:56 AM
llvm/test/FileCheck/numeric-expression.txt
87–88

I'm not following the purpose of the blank line here?

llvm/unittests/Support/FileCheckTest.cpp
385 ↗(On Diff #263290)

Where has this been addressed?

405 ↗(On Diff #263290)

Where has this been addressed?

thopre added inline comments.May 13 2020, 6:25 AM
llvm/lib/Support/FileCheckImpl.h
127 ↗(On Diff #263290)

Casting to int64_t is implementation-defined if Value is > std::numeric_limits<int64_t>::max(). That said current code assumes 2-complement so I've changed for memcpy. Hopefully the compiler can elide that call.

llvm/test/FileCheck/numeric-expression.txt
82–84

Yes, that's the VAR2+72 case.

152

Ah yes indeed, I noticed some bug in the testcase. I forgot to split the fix out in its own patch. Now done in https://reviews.llvm.org/D79820

llvm/unittests/Support/FileCheckTest.cpp
385 ↗(On Diff #263290)

Those are ExpressionValue since this test is testing ExpressionValue operator+. I've renamed all such variable to add the suffix "Value"

405 ↗(On Diff #263290)

Ditto.

thopre marked 3 inline comments as done.May 13 2020, 6:30 AM
thopre added inline comments.
llvm/test/FileCheck/numeric-expression.txt
87–88

The CHECK-NEXT: #VAR2a is trying to match c which is also found in "// VAR1c". The CHECK-NEXT thus fails because it found c after the previous match on the same line.

llvm/unittests/Support/FileCheckTest.cpp
385 ↗(On Diff #263290)

I forgot to send the comments I had written. I hope that answers why I didn't do it.

arichardson added inline comments.May 13 2020, 7:19 AM
llvm/lib/Support/FileCheckImpl.h
127 ↗(On Diff #263290)

Yes it definitely can (even at -O1): https://godbolt.org/z/5jcz9L
I'm still not sure that it's possible that casting gives a different value from memcpy on any architecture that LLVM builds on, but I'm also fine with the memcpy.

thopre updated this revision to Diff 263718.May 13 2020, 7:34 AM
thopre marked 2 inline comments as done.

Fix typo

thopre planned changes to this revision.May 13 2020, 7:37 AM

The current promotion rules prevent to compute VAR-10 when VAR is an unsigned variable whose value is 5 because the result would be negative but the expression should be unsigned. I'll address that by accepting all values as long as they fit in an int64_t or uint64_t and only throw an overflow error when trying to match the value if it doesn't fit with the expression format. I should have done that in the first place.

thopre marked 2 inline comments as done.May 13 2020, 7:56 AM
thopre added inline comments.
llvm/lib/Support/FileCheckImpl.h
127 ↗(On Diff #263290)

architecture -> system, surely the compiler can mess things up as well? It's such a shame that there isn't a better way to achieve this while being perfectly well defined.

jhenderson added inline comments.May 14 2020, 12:36 AM
llvm/test/FileCheck/numeric-expression.txt
87–88

I see. I'm beginning to think the names of the variables need improving. Rather than have VAR1, VAR2, VAR1a etc, how about "SIGN", "UNSI" "UHEX", "LHEX", etc, possibly with additional context for their actual values. The CHECKs might also want an additional {{^}} to avoid matching things spuriously.

llvm/unittests/Support/FileCheckTest.cpp
385 ↗(On Diff #263290)

I'm not sure it does, if I'm honest. An unsigned 10 plus a signed -11 should trigger the clause at line 174 in FileCheck.cpp, whilst the other way around triggers the clause at 170. They are distinctly interesting from the other cases that trigger those clauses because they should trigger overflows, if I've understood things right, rather than resulting in max uint64_t, but I don't think that's tested directly. The fact that this happens to call into other functions seems somewhat irrelevant from testing a distinct unit perspective - if you refactored the code to no longer use the same code path as something else that might be tested independently, you still want the test coverage.

405 ↗(On Diff #263290)

It seems to me like operator- is a distinct unit, and its interface should be fully tested independently (see also my comments above). I believe this ends path ends up calling into checkedSub for which I can see no direct testing (does it handle double-negatives correctly, for example?)

thopre updated this revision to Diff 264003.May 14 2020, 8:25 AM
thopre marked 8 inline comments as done.

Address review comments

llvm/unittests/Support/FileCheckTest.cpp
385 ↗(On Diff #263290)

I've reworked the ExpressionValue storage which should simplify the codepath somewhat and added one testcase to cover for an overflow case that was not tested (I was relying on testing of checkedSub but as you pointed out testing code would risk to bitrot in case the operator code is changed).

Please let me know if that doesn't address all your concerns.

405 ↗(On Diff #263290)

How about now? Whether the value is signed or not doesn't matter anymore, only whether it's negative which should allow to reduce the amount of overflow codepaths.

jhenderson added inline comments.May 15 2020, 12:55 AM
llvm/docs/CommandGuide/FileCheck.rst
606–608

Is this still true?

llvm/lib/Support/FileCheck.cpp
105

I think there's an overflow issue here, right? If RightValue is bigger than max int64_t, it's possible for the expression to end up outside a representable int64_t, which seems like it should be an error?

llvm/lib/Support/FileCheckImpl.h
127–131 ↗(On Diff #264003)

I'm looking at this and thinking can't this and the above constructor not just be changed to:

template <class T>
explicit ExpressionValue(T Val)
  : Value(Val), Negative(Val < 0) {}
133 ↗(On Diff #264003)

true is -> true if

llvm/unittests/Support/FileCheckTest.cpp
385 ↗(On Diff #263290)

I'm starting to get a bit of review fatigue, so I'm starting to miss things, but if I've followed things right, my suggested simplification to the constructors should enable us to get rid of the distinction between SignedTen and UnsignedTen, which in turn should allow the tests to be much simpler. At that point, I'd expect the test cases to be:

  1. Negative + Positive
  2. Positive + Negative
  3. Negative + Negative (no overflow)
  4. Positive + Positive (no overflow)

5-6) Overflowing variations of 3) and 4)

with an equivalent set for operator-

thopre updated this revision to Diff 264263.May 15 2020, 9:17 AM
thopre marked 9 inline comments as done.

Address review comments and fix a few more bugs

llvm/lib/Support/FileCheck.cpp
47–48

Undefined on 2-complement system when Value == min int64_t.

83

This can fail since RightOperand, being positive, is represented by a uint64_t.

105

Indeed, and it made me realize that getAbsolute doesn't work for min int64_t either.

llvm/unittests/Support/FileCheckTest.cpp
385 ↗(On Diff #263290)

My apologies for causing this fatigue. I've added comments for each individual checks for each of the function/operator tested. Hope this helps. I've also caught a bug in the -A -B case where B would not fit in an int64_t. There's now an extra underflow here with an extra test.

thopre updated this revision to Diff 264382.May 15 2020, 4:20 PM

Fix format issue

I've been thinking more about test cases, and suggested a number more. If you can reduce the boiler plate for each individual test case, that would be great too. Perhaps even consider a TEST_P solution to specify input and expected output, whilst sharing the same code body.

llvm/unittests/Support/FileCheckTest.cpp
312 ↗(On Diff #264382)

I'd test the range edge of -1 instead of -10.

It may be worth showing it fails for int64_t min too.

Also, a 0 check would be good to ensure no off-by-1 errors/incorrect use of < versus <= etc.

324–325 ↗(On Diff #264382)

Perhaps worth adding a int64_t max + 1 case.

330 ↗(On Diff #264382)

Maybe also int64_t min and max cases?

339 ↗(On Diff #264382)

Maybe worth considering test cases for 0, and uint64_t max too?

368 ↗(On Diff #264382)

Perhaps use -1 instead of -10.

Perhaps also MinInt64 - MinInt64.

373–377 ↗(On Diff #264382)

These checks all have a lot of repeated code. It's probably time to factor out much of this into a function. It might also be a good idea to factor them into separate TESTs to allow the rest of the test to run in the event of a failure with an ASSERT etc, and to provide finer-grained test failures.

385 ↗(On Diff #264382)

I'd prefer these slightly to not just always give a result of 0. I think it is more interesting to show that 10 + -11 can produce a negative number, for example, whilst 11 + -10 produces a positive one. Same goes for the case above.

396 ↗(On Diff #264382)

Perhaps 1 instead of 10 here.

Perhaps also MaxUint64Value + MaxUint64Value would be a useful case, to maximise surface area.

401 ↗(On Diff #264382)

What is meant by "too big" here? Be more precise in this comment.

406 ↗(On Diff #264382)

- 1 rather than - 10 would probably be best here.

419 ↗(On Diff #264382)

Similar to the + case, it might be nice for this to result in a non-zero value.

434 ↗(On Diff #264382)

"or nul"?

435 ↗(On Diff #264382)

I'm not sure Value is a useful suffix to all these names. It wasn't confusing to me before either.

444 ↗(On Diff #264382)

Maybe Zero instead of Ten here would be good, for maximum possible underflow.
Maybe consider 0 - (std::abs(MinInt64) + 1) for minimum possible underflow.

455 ↗(On Diff #264382)

Why not test the exact value?

458 ↗(On Diff #264382)

Seems to me like there are two interesting cases here: "result is positive" and "result is negative".

thopre updated this revision to Diff 265083.May 19 2020, 4:31 PM
thopre marked 20 inline comments as done.

Address unittest review comments

thopre updated this revision to Diff 265084.May 19 2020, 4:34 PM

Fix format by using source tree clang-format-diff.py instead of system clang-format-diff

llvm/unittests/Support/FileCheckTest.cpp
373–377 ↗(On Diff #264382)

Is that fine grain enough or do you want me to split further?

444 ↗(On Diff #264382)

I can't use std::abs(MinInt64) since it's undefined in 2-complement (std::abs returns an int64_t for some reason).

458 ↗(On Diff #264382)

Result is positive is already handled above. I meant to handle the negative case where result < -(max int64_t). Fixed now.

The latest changes are a big improvement on readability, thanks. Did you consider using TEST_P to factor things out further? That way you'd only need to specify the input and expected output values for each test case, with only a single TEST itself.

llvm/unittests/Support/FileCheckTest.cpp
298 ↗(On Diff #265084)

Don't know how plausible it is, but did you consider passing in a function_ref to the operator+/operator- functions instead of the boolean? That might be useful in the future if support is extended to e.g. multiplication.

365 ↗(On Diff #265084)

I think these sorts of comments indicate a need to split the test up. The suggestion I have is one method == one (or more) TESTs.

Thus there'd be tests for getAbsolute, getSignedValue etc etc and I'd call them ExpressionValueGetAbsolute, ExpressionValueGetSignedValue etc.

411 ↗(On Diff #265084)

Typo

444 ↗(On Diff #264382)

(Just to be clear, I only used std::abs as an intended example, rather than literally what should have been written).

thopre updated this revision to Diff 265602.May 21 2020, 2:24 PM
thopre marked 4 inline comments as done.

Address inline review comments

The latest changes are a big improvement on readability, thanks. Did you consider using TEST_P to factor things out further? That way you'd only need to specify the input and expected output values for each test case, with only a single TEST itself.

To make the result more concise that would require sharing fixture and instantiation among TEST_P but I'm not sure how to do it since the values generated in the instantiation are passed to all TEST_P but even operator+ and operator- have different values. If I need separate instantiation per TEST_P then I need separate fixture as well and that becomes more faff that what we have now I feel.

If there's some clever trick please give me some pointers on how to do it as I cannot see it right now.

jhenderson accepted this revision.May 27 2020, 12:51 AM

LGTM, with nit.

Re. TEST_P, the best suggestion I've got is look at the examples in DWARFDebugLineTest.cpp, where I've used a few different tricks to achieve the goal of code re-use. In particular, AdjustAddressFixtureBase related test cases might be interesting to you. In your case, I'd probably have a set of parameters each for failing and passing combinations, and for addition and subtraction, for a total of 4 combinations, if I'm not mistaken. You'd have thin fixtures for each, but they could probably share some code via a base class.

That all being said, only make those changes if you think they are worthwhile. I'm happy enough with this as-is.

llvm/unittests/Support/FileCheckTest.cpp
422 ↗(On Diff #265602)

Here and in various other places, not sure you want the space between - and 10 (same for - 20 and various cases below).

This revision is now accepted and ready to land.May 27 2020, 12:51 AM
thopre marked an inline comment as done.May 27 2020, 11:21 AM
thopre added inline comments.
llvm/unittests/Support/FileCheckTest.cpp
422 ↗(On Diff #265602)

I don't like it either but this comes from clang-format-diff.py. Should I go against it?

LGTM, with nit.

Re. TEST_P, the best suggestion I've got is look at the examples in DWARFDebugLineTest.cpp, where I've used a few different tricks to achieve the goal of code re-use. In particular, AdjustAddressFixtureBase related test cases might be interesting to you. In your case, I'd probably have a set of parameters each for failing and passing combinations, and for addition and subtraction, for a total of 4 combinations, if I'm not mistaken. You'd have thin fixtures for each, but they could probably share some code via a base class.

That all being said, only make those changes if you think they are worthwhile. I'm happy enough with this as-is.

So far, I'm getting:

llvm/unittests/Support/FileCheckTest.cpp | 413 +++++++++++++++++++++++++++++++++-------------------
1 file changed, 263 insertions(+), 150 deletions(-)

It's also quite a lot of code change, so even if that is acceptable because the code is deemed more readable, it's better to look at it as a separate patch and land the current code (with the suggested spacing fix)

thopre updated this revision to Diff 266789.May 28 2020, 2:43 AM
thopre marked an inline comment as done.

Rebase and fix spacing issue for negative literals

Argh sorry, forgot to remove the unnecessary phabricator marks when landing.

This revision was automatically updated to reflect the committed changes.

LGTM, with nit.

Re. TEST_P, the best suggestion I've got is look at the examples in DWARFDebugLineTest.cpp, where I've used a few different tricks to achieve the goal of code re-use. In particular, AdjustAddressFixtureBase related test cases might be interesting to you. In your case, I'd probably have a set of parameters each for failing and passing combinations, and for addition and subtraction, for a total of 4 combinations, if I'm not mistaken. You'd have thin fixtures for each, but they could probably share some code via a base class.

That all being said, only make those changes if you think they are worthwhile. I'm happy enough with this as-is.

So far, I'm getting:

llvm/unittests/Support/FileCheckTest.cpp | 413 +++++++++++++++++++++++++++++++++-------------------
1 file changed, 263 insertions(+), 150 deletions(-)

It's also quite a lot of code change, so even if that is acceptable because the code is deemed more readable, it's better to look at it as a separate patch and land the current code (with the suggested spacing fix)

I'll leave it up to you as to whether you want to. This is probably sufficient as is.

llvm/unittests/Support/FileCheckTest.cpp
422 ↗(On Diff #265602)

If clang-format-diff.py was trying to format it that way, I'd consider that a bug in clang-format, so should be reported as such. The - there is a unary operator, and should be formatted as such (without a space).