This is an archive of the discontinued LLVM Phabricator instance.

[clang][ubsan] Implicit Conversion Sanitizer - integer sign change - clang part
ClosedPublic

Authored by lebedev.ri on Aug 3 2018, 6:22 AM.

Details

Summary

C and C++ are interesting languages. They are statically typed, but weakly.
The implicit conversions are allowed. This is nice, allows to write code
while balancing between getting drowned in everything being convertible,
and nothing being convertible. As usual, this comes with a price:

void consume(unsigned int val);

void test(int val) {
  consume(val);
  // The 'val' is `signed int`, but `consume()` takes `unsigned int`.
  // If val is negative, then consume() will be operating on a large
  // unsigned value, and you may or may not have a bug.

  // But yes, sometimes this is intentional.
  // Making the conversion explicit silences the sanitizer.
  consume((unsigned int)val);
}

Yes, there is a -Wsign-conversion` diagnostic group, but first, it is kinda
noisy, since it warns on everything (unlike sanitizers, warning on an
actual issues), and second, likely there are cases where it does not warn.

The actual detection is pretty easy. We just need to check each of the values
whether it is negative, and equality-compare the results of those comparisons.
The unsigned value is obviously non-negative. Zero is non-negative too.
https://godbolt.org/g/w93oj2

We do not have to emit the check *always*, there are obvious situations
where we can avoid emitting it, since it would always get optimized-out.
But i do think the tautological IR (icmp ult %x, 0, which is always false)
should be emitted, and the middle-end should cleanup it.

This sanitizer is in the -fsanitize=implicit-conversion group,
and is a logical continuation of D48958 -fsanitize=implicit-integer-truncation.
As for the ordering, i'we opted to emit the check after
-fsanitize=implicit-integer-truncation. At least on these simple 16 test cases,
this results in 1 of the 12 emitted checks being optimized away,
as compared to 0 checks being optimized away if the order is reversed.

This is a clang part.
The compiler-rt part is D50251.

Finishes fixing PR21530, PR37552, PR35409.
Finishes partially fixing PR9821.
Finishes fixing https://github.com/google/sanitizers/issues/940.

Only the bitfield handling is missing.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Aug 3 2018, 6:22 AM
erichkeane added inline comments.Aug 3 2018, 6:42 AM
lib/CodeGen/CGExprScalar.cpp
1008 ↗(On Diff #158987)

I'd rather !SrcType->isInt || !DestType->isInt

1011 ↗(On Diff #158987)

This seems like a silly assert, since you did the check above.

1030 ↗(On Diff #158987)

Does this really need its own scope?

1032 ↗(On Diff #158987)

Again, I'd rather we distribute the '!'.

1035 ↗(On Diff #158987)

These scopes are getting out of hand... just kill them all. Introducing CanonSrcType/CanonDstType into the larger scope isn't that big of a deal.

1044 ↗(On Diff #158987)

Curious what prevents this?

1054 ↗(On Diff #158987)

// Is this value a signed type?

2004 ↗(On Diff #158987)

Is this an error? You swapped a 'has' with a 'hasOneOf' but only listed a single thing.

lebedev.ri added inline comments.Aug 3 2018, 6:48 AM
lib/CodeGen/CGExprScalar.cpp
2004 ↗(On Diff #158987)

ImplicitConversion is a *group*, which includes ImplicitIntegerTruncation and ImplicitIntegerSignChange.
So hasOneOf(group) checks that at least one check of the group is enabled.
No, this is correct, else the tests would break.

erichkeane added inline comments.Aug 3 2018, 6:50 AM
lib/CodeGen/CGExprScalar.cpp
2004 ↗(On Diff #158987)

Ah! I missed that subtly, just LOOKed odd.

lebedev.ri updated this revision to Diff 159004.Aug 3 2018, 7:13 AM
lebedev.ri marked 5 inline comments as done.

Address most of @erichkeane review notes.

lib/CodeGen/CGExprScalar.cpp
1008 ↗(On Diff #158987)

This i'd prefer to keep this as-is, since this is copied verbatim from ScalarExprEmitter::EmitIntegerTruncationCheck().

1011 ↗(On Diff #158987)

If this assertion doesn't hold, we'll (hopefully!) hit an assertion somewhere down in the [IR] Builder.
I think it would be best to be proactive here.
(Similarly, this is copied verbatim from ScalarExprEmitter::EmitIntegerTruncationCheck().)

1030 ↗(On Diff #158987)

Yeah, they got out of hand here..

1032 ↗(On Diff #158987)

Here - ok.

1044 ↗(On Diff #158987)

Right now this was simply copied from ScalarExprEmitter::EmitIntegerTruncationCheck(),
where it made sense (since conversion to bool should be comparison with 0, not truncation).
I'm not quite sure about booleans here. I think i should just drop it, at least for now.

1054 ↗(On Diff #158987)

That reads strange, but i don't have a better idea.

erichkeane added inline comments.Aug 3 2018, 7:17 AM
lib/CodeGen/CGExprScalar.cpp
1008 ↗(On Diff #158987)

If so much is being copied from EmitIntegerTruncationCheck, perhaps the two of these need to be the same function with an option/check on the sanitizer option on which it should do?

lebedev.ri added a project: Restricted Project.Aug 3 2018, 9:40 AM
lebedev.ri added inline comments.
lib/CodeGen/CGExprScalar.cpp
1008 ↗(On Diff #158987)

I agree that code duplication is bad, but i'm not sure that inlining
both of these functions into an one huge one is the right solution.
The amount of actually duplicated code is somewhat small - one early-return
for non-integer types, and an assert that the llvm type is integer..

rsmith added inline comments.Aug 3 2018, 1:21 PM
lib/CodeGen/CGExprScalar.cpp
1036 ↗(On Diff #159004)

I don't like the overlap between the implicit truncation check and this check. I think you should aim for exactly one of those checks to fire for any given integer conversion. There are the following cases:

  • Dst is smaller than Src: if the value changes at all (with sign change or without), then the truncation check already catches it, and catching it here does not seem useful
  • Dst is the same size as Src or larger: sign change is the only problem, and is only possible if exactly one of Src and Dst is signed

So I think you should bail out of this function if either Src and Dst are both unsigned or both are signed, and also if Src is larger than Dst (because we treat that case as a lossy truncation rather than as a sign change).

And when you do emit a check here, the only thing you need to check is if the signed value is negative (if so, you definitely changed the sign, and if not, you definitely didn't -- except in the truncation cases that the truncation sanitizer catches).

1050–1051 ↗(On Diff #159004)

If !VSigned, the result is a constant false; you don't need to emit an icmp to work that out.

@erichkeane, @rsmith thanks for taking a look!

lib/CodeGen/CGExprScalar.cpp
1036 ↗(On Diff #159004)

To be clear: we want to skip emitting in those cases if the other check (truncation) is enabled, right?
It does seem to make sense, (and i did thought about that a bit), but i need to think about it more..

1050–1051 ↗(On Diff #159004)

Ok, if you insist.
I didn't do that in the first place because we will now have an icmp
where one operand being a constant, so we can simplify it further.
And i don't want to complicate this logic if middle-end already handles it :)

rsmith added inline comments.Aug 3 2018, 2:26 PM
lib/CodeGen/CGExprScalar.cpp
1036 ↗(On Diff #159004)

I think we want to skip emitting those checks always (regardless of whether the other sanitizer is enabled). One way to think of it: this sanitizer checks for non-truncating implicit integer conversions that change the value of the result. The other sanitizer checks for truncating implicit integer conversions that change the value of the result.

I don't see any point in allowing the user to ask to sanitize sign-changing truncation but not other value-changing truncations. That would lead to this:

int a = 0x17fffffff; // no sanitizer warning
int b = 0x180000000; // sanitizer warning
int c = 0x1ffffffff; // sanitizer warning
int d = 0x200000000; // no sanitizer warning

... which I think makes no sense.

1050–1051 ↗(On Diff #159004)

This becomes a lot simpler with the approach I described in the other comment thread, because you don't need a second icmp eq at all.

rsmith added inline comments.Aug 3 2018, 2:46 PM
lib/CodeGen/CGExprScalar.cpp
1036 ↗(On Diff #159004)

Hmm, wait, the "truncation" sanitizer doesn't catch this:

int a = 0x80000000u;

... does it? (Because it looks for cases where the value doesn't round-trip, not for cases where the value was changed by the truncation.)

I've thought a bit more about the user model and use cases for these sanitizers, and I think what we want is:

  • a sanitizer that checks for implicit conversions with data loss (the existing truncation sanitizer)
  • a sanitizer that checks for implicit conversions that change the value, where either the source or destination was signed (approximately what this sanitizer is doing)

The difference between that and what you have here is that I think the new sanitizer should catch all of these cases:

int a = 0x17fffffff;
int b = 0x180000000;
int c = 0x1ffffffff;
int d = 0x200000000;

... because while the initializations of a and d don't change the sign of the result, that's only because they wrap around *past* a sign change.

So, I think what you have here is fine for the SrcBits <= DstBits case, but for the SrcBits > DstBits case, you should also check whether the value is the same as the original (that is, perform the truncation check).

In order to avoid duplicating work when both sanitizers are enabled, it'd make sense to combine the two sanitizer functions into a single function and reuse the checks.

lebedev.ri marked 11 inline comments as done.

Address most of @rsmith review notes.

lib/CodeGen/CGExprScalar.cpp
1036 ↗(On Diff #159004)

Yep, makes sense. I don't think i have followed the recommendations to the letter,
but i think the end result is not worse than suggested. Added tests shows how it works now.

1050–1051 ↗(On Diff #159004)

Humm. So i have initially did this. It is probably broken for non-scalars, but we don't care probably.

But then i thought more.

If we do not emit truncation check, we get icmp eq (icmp ...), false, which is tautological.
We can't just drop the outer icmp eq since we'd get the opposite value.
We could emit xor %icmp, -1 to invert it. Or simply invert the predicate, and avoid the second icmp.
By itself, either of these options doesn't sound that bad.

But if both are signed, we can't do that. So we have to have two different code paths...

If we do emit the icmp ult %x, 0, [it naturally works with vectors], we avoid complicating the front-end,
and the middle-end playfully simplifies this IR with no sweat.

So why do we want to complicate the front-end in this case, and not let the middle-end do it's job?
I'm unconvinced, and i have kept this as is. :/

lebedev.ri added inline comments.Aug 5 2018, 7:51 AM
lib/CodeGen/CGExprScalar.cpp
1129–1130 ↗(On Diff #159187)

Actually, after messing with souper a little, if we are converting from *larger* *signed* type,
then the truncation check is sufficient already.
https://godbolt.org/g/DLVCy8
https://rise4fun.com/Alive/u2h

So it *seems* only the unsigned int -> signed char case is problematic.

Do not emit sign-change check in signed int -> signed char case.
The truncation check is sufficient:
https://godbolt.org/g/r1wgQG
https://rise4fun.com/Alive/ifj

The middle-end [clearly] does not understand that,
but since the sign-change is completely unneeded here, it's not a blocker.

The unsigned int -> signed char case is the only oh-so-special one,
that needs both the truncation and sign change checks,
but the IR can be significantly improved, will handle that:
https://godbolt.org/g/q7e76x
https://rise4fun.com/Alive/2W8

lebedev.ri marked an inline comment as done.Aug 5 2018, 11:58 AM

Ping.

Rebased, now that the D50465 has landed, and we are now able to properly optimize the ugliest case:

This comes with Implicit Conversion Sanitizer - integer sign change (D50250):

signed char test(unsigned int x) { return x; }

clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3

lebedev.ri planned changes to this revision.Aug 17 2018, 9:37 AM

Depends on D50901.
(which should land first, ideally.)

lebedev.ri added a subscriber: chandlerc.

Rebased ontop of D50901, added

-  ``-fsanitize=implicit-integer-arithmetic-value-change``: Catches implicit
   conversions that change the arithmetic value of the integer. Enables
   ``implicit-signed-integer-truncation`` and ``implicit-integer-sign-change``.

as requested by @rsmith and @chandlerc.

lebedev.ri added inline comments.Aug 23 2018, 4:27 AM
lib/CodeGen/CGExprScalar.cpp
1050–1051 ↗(On Diff #159004)

If !VSigned, the result is a constant false; you don't need to emit an icmp to work that out.

Even at -O0, dagcombine constant-folds (unsurprizingly) this case, https://godbolt.org/z/D5ueOq

Ping once again :)

It might help if you're more specific about whose review you're asking for.

It might help if you're more specific about whose review you're asking for.

I suppose the main suspect is still the @rsmith.
Though, D50901 is less controversial, so maybe best to start there..

@rsmith Ping.
Though, D50901 is less controversial, so maybe best to start there..

Ping.
The prerequisite "split truncation sanitizer into unsigned and signed cases" has landed.
I believe i have replied/addressed all the points previously raised here.
Would be awesome to get this going at long last :)

rsmith accepted this revision.Oct 23 2018, 3:14 PM

Just some minor nits.

docs/ReleaseNotes.rst
186 ↗(On Diff #160458)

This seems inaccurate: -fsanitize=signed-integer-overflow is part of -fsanitize=integer and catches UB. Maybe "Like some other [...]"

190–191 ↗(On Diff #160458)

This is a bit hard to read. Maybe reverse the order of these two lines.

lib/CodeGen/CGExprScalar.cpp
1071 ↗(On Diff #162050)

Just emit i1 false directly in this case. IRBuilder generally only constant-folds values for you if all the operands are constant, and sanitizers are often used at -O0, so there's no guarantee that anyone else will clean this up.

1091–1103 ↗(On Diff #162050)

Do we really need this comment? This seems kinda obvious.

1036 ↗(On Diff #159004)

OK, so to be clear I'm following:

  • Any implicit conversion that truncates and changes the value triggers the truncation sanitizer (unsigned if both source and destination are unsigned, signed otherwise)
  • Any implicit conversion that results in a sign change triggers the sign change sanitizer
  • Any implicit conversion that triggers both sanitizers produces a single warning classified as ICCK_SignedIntegerTruncationOrSignChange (eg, the truncation changed the value, and the sign changed -- possibly multiple times -- when dropping bits)

That seems fine to me.

This revision is now accepted and ready to land.Oct 23 2018, 3:14 PM
lebedev.ri marked 4 inline comments as done.

Just some minor nits.

YES! Thank you for the [long-awaited] review!

Addressed review notes.
The compiler-rt part D50251 still needs a review before this can finally go in.

There is now a problem with SanitizerOrdinal - it's out of bits :)
After this, there isn't a single bit available.
I even had to drop one new sanitizer group that wasn't *strictly* needed just yet.
I should probably file a bug so this knowledge will not be lost.

lib/CodeGen/CGExprScalar.cpp
1071 ↗(On Diff #162050)

Ok, fair enough.

1036 ↗(On Diff #159004)

OK, so to be clear I'm following:

I do think that is what is going on.
It does warn on all of the cases you highlighted:

int a = 0x80000000u;

int a = 0x17fffffff;
int b = 0x180000000;
int c = 0x1ffffffff;
int d = 0x200000000;

Rebased, NFC.