This is an archive of the discontinued LLVM Phabricator instance.

[ubsan] Omit superflous overflow checks for promoted arithmetic (PR20193)
ClosedPublic

Authored by vsk on Jan 31 2017, 9:17 PM.

Details

Summary

C requires the operands of arithmetic expressions to be promoted if
their types are smaller than an int. Ubsan emits overflow checks when
this sort of type promotion occurs, even if there is no way to actually
get an overflow with the promoted type.

This patch teaches clang how to omit the superflous overflow checks
(addressing PR20193).

Testing: check-clang and check-ubsan.

Diff Detail

Event Timeline

vsk created this revision.Jan 31 2017, 9:17 PM
efriedma added inline comments.
lib/CodeGen/CGExprScalar.cpp
73

Checking isPromotableIntegerType doesn't work the way you want it to; types can be "promoted" without actually widening them. For example, enum types are promotable, and in C++ wchar_t is promotable.

vsk marked an inline comment as done.Feb 1 2017, 3:28 PM
vsk added inline comments.
lib/CodeGen/CGExprScalar.cpp
73

Thanks for catching this! I have fixed the issue and will update this patch shortly.

vsk updated this revision to Diff 86739.Feb 1 2017, 3:31 PM
vsk marked an inline comment as done.
  • Per Eli's comment: check that integers are actually widened, instead of incorrectly assuming they are always widened. I added some test cases for this.
  • Address the 'fixme' regarding multiplication with unsigned operands.
vsk updated this revision to Diff 86746.Feb 1 2017, 4:22 PM
vsk edited the summary of this revision. (Show Details)
  • Remove a stale test case in unsigned-promotion.c.
regehr added a subscriber: regehr.Feb 1 2017, 8:51 PM

Out of curiosity, how many of these superfluous checks are not subsequently eliminated by InstCombine?

vsk added a comment.Feb 1 2017, 11:36 PM

Out of curiosity, how many of these superfluous checks are not subsequently eliminated by InstCombine?

I don't have numbers from a benchmark prepped. Here's what we get with the 'ubsan-promoted-arith.cpp' test case from this patch:

Setup# of overflow checks
unpatched, -O022
unpatched, -O0 + instcombine7
patched, -O08
patched, -O0 + instcombine7

(There's a difference between the "patched, -O0" setup and the "patched, -O0 + instcombine" setup because llvm figures out that the symbol 'a' is 0, and gets rid of an addition that way.)

At least for us, this patch is still worthwhile, because our use case is -O0 -fsanitized=undefined. Also, this makes less work for instcombine, but I haven't measured the compile-time effect.

filcab edited edge metadata.Feb 3 2017, 8:36 AM

Why the switch to if instead of a fully-covered switch/case?

In D29369#664426, @vsk wrote:

Out of curiosity, how many of these superfluous checks are not subsequently eliminated by InstCombine?

I don't have numbers from a benchmark prepped. Here's what we get with the 'ubsan-promoted-arith.cpp' test case from this patch:

Setup# of overflow checks
unpatched, -O022
unpatched, -O0 + instcombine7
patched, -O08
patched, -O0 + instcombine7

(There's a difference between the "patched, -O0" setup and the "patched, -O0 + instcombine" setup because llvm figures out that the symbol 'a' is 0, and gets rid of an addition that way.)

At least for us, this patch is still worthwhile, because our use case is -O0 -fsanitized=undefined. Also, this makes less work for instcombine, but I haven't measured the compile-time effect.

Probably running mem2reg and others before instcombine would make it elide more checks. But if you're using -O0 anyway, I guess this would help anyway.

test/CodeGen/ubsan-promoted-arith.cpp
57

Nit: Maybe USHRT_MAX * USHRT_MAX is more understandable?

100

Maybe put the rdar ID next to the FIXME? Like this it looks like you might have written that instead of CHECK by mistake.

vsk marked an inline comment as done.Feb 3 2017, 1:57 PM

Why the switch to if instead of a fully-covered switch/case?

It lets me avoid repeating two function calls:

switch (CGF.getLangOpts().getSignedOverflowBehavior()) {
case LangOptions::SOB_Defined:
  return Builder.CreateMul(Ops.LHS, Ops.RHS, "mul");
case LangOptions::SOB_Undefined:
  if (CGF.SanOpts.has(SanitizerKind::SignedIntegerOverflow) &&
      !CanElideOverflowCheck(CGF.getContext(), Ops))
    return EmitOverflowCheckedBinOp(Ops);
  return Builder.CreateNSWMul(Ops.LHS, Ops.RHS, "mul");
case LangOptions::SOB_Trapping:
  if (CanElideOverflowCheck(CGF.getContext(), Ops))
    return Builder.CreateNSWMul(Ops.LHS, Ops.RHS, "mul");
  return EmitOverflowCheckedBinOp(Ops);
}
test/CodeGen/ubsan-promoted-arith.cpp
57

Yes, I'll fix this.

100

I placed the rdar ID here to communicate that the line should become a CHECK line once the bug is fixed. It will go away with D29437.

vsk marked an inline comment as done.Feb 3 2017, 1:59 PM
In D29369#666308, @vsk wrote:

Why the switch to if instead of a fully-covered switch/case?

It lets me avoid repeating two function calls:

Ah, sorry about that, I think I understand what you had in mind now. I'll fix that too.

vsk updated this revision to Diff 87030.Feb 3 2017, 2:37 PM
vsk edited the summary of this revision. (Show Details)
  • Use switches per Filipe's comment, and fix a comment in the test case.
filcab added a comment.Feb 9 2017, 6:25 AM

Minor nits, now.
LGTM, but having someone more familiar with clang chime in would be great.

lib/CodeGen/CGExprScalar.cpp
1723

Maybe a helper IsWidenedIntegerOp(...) (or IsOpWiderThanBaseTypeor something) would make this (and others, like the first return of getUnwidenedIntegerType) easier to read?

test/CodeGen/ubsan-promoted-arith.cpp
91

Extra -. INT_MIN/-1 is what you want. You already have a test above for -INT_MIN (which would overflow before the division.

filcab accepted this revision.Feb 9 2017, 6:26 AM
This revision is now accepted and ready to land.Feb 9 2017, 6:26 AM
dtzWill accepted this revision.Feb 9 2017, 9:13 AM

I've been bitten when attempting to use existence/nature of casts in the AST to reason about the original code, but this looks like it does the right thing in all the situations I can think of.

Missing overflows because of a bugged attempt to optimize the -O0 case would be unfortunate-- has this been tested and compared on larger codes (test-suite, other projects)?

When it comes to C/C++ standards and constructs, it seems there's always some extension/language feature/flag that someone (ab)uses that you forgot all about when reasoning about these things abstractly... so it'd be good to see this checked out before the next major release.

+1 to suggestion for a more readable name or wrapper for the common pattern of 'getUnwidenedIntegerType..hasValue()', if you get a chance.

LGTM, thanks!

dtzWill requested changes to this revision.EditedFeb 9 2017, 9:42 AM

After some thought, can we discuss why this is a good idea?

This increases the cyclomatic complexity of code that already is difficult to reason about, and seems like it's both brittle and out-of-place in CGExprScalar.

It really seems it would be better to let InstCombine or some other analysis/transform deal with proving checks redundant instead of attempting to do so on-the-fly during CodeGen.

Can you better motivate why this is worth these costs, or explain your use case a bit more?

This revision now requires changes to proceed.Feb 9 2017, 9:42 AM
vsk added a comment.Feb 9 2017, 6:35 PM

After some thought, can we discuss why this is a good idea?

The goal is to lower ubsan's compile-time + instrumentation overhead at -O0, since this reduces the friction of debugging a ubsan-instrumented project.

This increases the cyclomatic complexity of code that already is difficult to reason about, and seems like it's both brittle and out-of-place in CGExprScalar.

Are there cleanups or ways to reorganize the code that would make this sort of change less complex / brittle? I'm open to taking that on.

It really seems it would be better to let InstCombine or some other analysis/transform deal with proving checks redundant instead of attempting to do so on-the-fly during CodeGen.

-O1/-O2 do get rid of a lot of checks, but they also degrade the debugging experience, so it's not really a solution for this use case.

Can you better motivate why this is worth these costs, or explain your use case a bit more?

I have some numbers from LNT. I did a pre-patch and post-patch run at -O0 + -fsanitize=signed-integer-overflow,unsigned-integer-overflow. There were 4,672 object files produced in each run. This patch brings the average object size down from 36,472.0 to 36,378.3 bytes (a 0.26% improvement), and the average number of overflow checks per object down from 66.8 to 66.2 (a 0.81% improvement).

I don't have reliable compile-time numbers, but not emitting IR really seems like a straightforward improvement over emitting/analyzing/removing it.

So, those are the benefits. IMO getting close to 1% better at reducing instrumentation overhead is worth the complexity cost here.

In D29369#673064, @vsk wrote:

After some thought, can we discuss why this is a good idea?

The goal is to lower ubsan's compile-time + instrumentation overhead at -O0, since this reduces the friction of debugging a ubsan-instrumented project.

Apologies for the delay, thank you for the explanation.

This increases the cyclomatic complexity of code that already is difficult to reason about, and seems like it's both brittle and out-of-place in CGExprScalar.

Are there cleanups or ways to reorganize the code that would make this sort of change less complex / brittle? I'm open to taking that on.

None that I see immediately (heh, otherwise I'd be working on them myself...) but the code paths for trapping/non-trapping are particularly what I meant re:complexity, and while I suppose the AST or its interface is probably unlikely to change much (?) I'm concerned about these checks silently removing checks they shouldn't in the future.

(Who would notice if this happened?)

It really seems it would be better to let InstCombine or some other analysis/transform deal with proving checks redundant instead of attempting to do so on-the-fly during CodeGen.

-O1/-O2 do get rid of a lot of checks, but they also degrade the debugging experience, so it's not really a solution for this use case.

Understood, that makes sense.

Would running InstCombine (and only InstCombine):

  1. Fail to remove any checks elided by this change?
  2. Have a negative impact on debugging experience? For this I'm mostly asking for a guess, I don't know how to exactly quantify this easily.

(3) Have an undesirable impact on compilation time or other negative consequence?)

Can you better motivate why this is worth these costs, or explain your use case a bit more?

I have some numbers from LNT. I did a pre-patch and post-patch run at -O0 + -fsanitize=signed-integer-overflow,unsigned-integer-overflow. There were 4,672 object files produced in each run. This patch brings the average object size down from 36,472.0 to 36,378.3 bytes (a 0.26% improvement), and the average number of overflow checks per object down from 66.8 to 66.2 (a 0.81% improvement).

Wonderful, thank you for producing and sharing these numbers. Those improvements don't convince me, but if you're saying this is important to you and your use-case/users I'm happy to go with that.

I don't have reliable compile-time numbers, but not emitting IR really seems like a straightforward improvement over emitting/analyzing/removing it.

Hard to say. Separation of concerns is important too, but of course there's trade-offs everywhere :). I'd suspect this doesn't change compile-time much either way.

So, those are the benefits. IMO getting close to 1% better at reducing instrumentation overhead is worth the complexity cost here.

Couldn't say, but that sounds reasonable to me and I don't mean to stand in the way of progress!

Can you answer my questions about InstCombine above? Thanks!

vsk added a comment.Feb 14 2017, 1:37 PM
In D29369#673064, @vsk wrote:

After some thought, can we discuss why this is a good idea?

The goal is to lower ubsan's compile-time + instrumentation overhead at -O0, since this reduces the friction of debugging a ubsan-instrumented project.

Apologies for the delay, thank you for the explanation.

Np, thanks for taking a look!

This increases the cyclomatic complexity of code that already is difficult to reason about, and seems like it's both brittle and out-of-place in CGExprScalar.

Are there cleanups or ways to reorganize the code that would make this sort of change less complex / brittle? I'm open to taking that on.

None that I see immediately (heh, otherwise I'd be working on them myself...) but the code paths for trapping/non-trapping are particularly what I meant re:complexity, and while I suppose the AST or its interface is probably unlikely to change much (?) I'm concerned about these checks silently removing checks they shouldn't in the future.

(Who would notice if this happened?)

I don't have a good answer for this. I've tried to make sure we don't introduce any false negatives with this patch by covering all the cases I can think of, but it's possible we could have missed something. There are enough people using this feature that I think we'd be alerted to + fix false negatives.

It really seems it would be better to let InstCombine or some other analysis/transform deal with proving checks redundant instead of attempting to do so on-the-fly during CodeGen.

-O1/-O2 do get rid of a lot of checks, but they also degrade the debugging experience, so it's not really a solution for this use case.

Understood, that makes sense.

Would running InstCombine (and only InstCombine):

  1. Fail to remove any checks elided by this change?

No, instcombine gets all of these.

  1. Have a negative impact on debugging experience? For this I'm mostly asking for a guess, I don't know how to exactly quantify this easily.

Probably, but I'm not 100% sure. Instcombine can touch a fair amount of debug info.

(3) Have an undesirable impact on compilation time or other negative consequence?)

Instcombine is one of the slower llvm passes, IIRC. At any rate, the idea of modifying the -O0 pipeline when ubsan is enabled just to turn on instcombine doesn't seem palatable..

Can you better motivate why this is worth these costs, or explain your use case a bit more?

I have some numbers from LNT. I did a pre-patch and post-patch run at -O0 + -fsanitize=signed-integer-overflow,unsigned-integer-overflow. There were 4,672 object files produced in each run. This patch brings the average object size down from 36,472.0 to 36,378.3 bytes (a 0.26% improvement), and the average number of overflow checks per object down from 66.8 to 66.2 (a 0.81% improvement).

Wonderful, thank you for producing and sharing these numbers. Those improvements don't convince me, but if you're saying this is important to you and your use-case/users I'm happy to go with that.

Yeah, on average, the patch isn't a huge improvement. What makes it worthwhile (imo) is that the risk is also very low, and that it can pay to emit less IR (for the one person out there that wants to add a million shorts together).

Some context: we have a project that adds ~17,000 integers together in straight-line code (someone must have auto-generated the C code that does this ><...). The amount of add/sub overflow checks generated there brings clang to its knees at -O0, -Os, etc. We had to kill the build of this project. I get that it's not a representative example, but this is the kind of behavior I really don't want unsuspecting users to hit.

I don't have reliable compile-time numbers, but not emitting IR really seems like a straightforward improvement over emitting/analyzing/removing it.

Hard to say. Separation of concerns is important too, but of course there's trade-offs everywhere :). I'd suspect this doesn't change compile-time much either way.

So, those are the benefits. IMO getting close to 1% better at reducing instrumentation overhead is worth the complexity cost here.

Couldn't say, but that sounds reasonable to me and I don't mean to stand in the way of progress!

Can you answer my questions about InstCombine above? Thanks!

vsk added a comment.Feb 22 2017, 4:26 PM

Ping, is the argument in favor of making the change in my last comment satisfactory?

vsk updated this revision to Diff 89733.Feb 24 2017, 3:20 PM
vsk edited edge metadata.
  • Make the suggested readability improvements, and fix a comment in the test case.
dtzWill accepted this revision.Feb 24 2017, 4:41 PM

Sorry for the delay!

LGTM, thanks!

This revision is now accepted and ready to land.Feb 24 2017, 4:41 PM
This revision was automatically updated to reflect the committed changes.