This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix for trunc folding build break
ClosedPublic

Authored by anna on Jun 28 2016, 6:39 AM.

Details

Summary

We can fold truncs whose operand feeds from a load, if the trunc value is available through a prior load/store.

This change is from: http://reviews.llvm.org/D21246, which folded the trunc but missed the bitcast or ptrtoint/inttoptr required in the RAUW call, when the load type didnt match the prior load/store type.

The change was reverted in: r273703 due to fail in Chromium and sanitizer build. This was the bad RAUW assertion:
Assertion failed: New->getType() == getType() && "replaceAllUses of value with new value of different type!", file ..\lib\IR\Value.cpp, line 375

Load being used:

%138 = load float, float* %m_x.i527, align 4, !dbg !447

Trunc being replaced:

%155 = trunc i64 %140 to i32, !dbg !470

This change fixes the above issue, and I've added a testcase to make sure that we don't fail the assertion (@trunc_avoid_bitcast). Also, couple of other testcases added.

Diff Detail

Repository
rL LLVM

Event Timeline

anna updated this revision to Diff 62083.Jun 28 2016, 6:39 AM
anna retitled this revision from to [InstCombine] Fix for trunc folding build break.
anna updated this object.
anna set the repository for this revision to rL LLVM.
anna added a subscriber: llvm-commits.
anna added a comment.Jun 28 2016, 7:10 AM

Could think of 3 possible ways to fix the build breakage caused by type mismatch (the test case which showcases the breakage is @trunc_avoid_bitcast):

  1. The RAUW return replaceInstUsesWith(CI, AvailableVal); should be return replaceInstUsesWith( CI, Builder->CreateBitOrPointerCast(AvailableVal, CI.getType(), CI.getName() + ".cast"));
  1. Avoid the RAUW when the AvailableVal type is not the same as DestTy of trunc.
  2. Special case the trunc type in FindAvailableLoadedValue (i.e. check for exact match rather than CastInst::isBitOrNoopPointerCastable(LI->getType(), AccessTy, DL)) Error-prone solution whose benefit is minimal.

I followed #2.

What #1 does is replace the trunc with a bitcast or PointerCast. While generating bitcast would be useful if FindAvailableLoadedValue was used to replace loads, there does not seem to be a *strong* benefit in replacing the trunc with bitcast or pointercast. Looking at SelectionDAG::visitBitCast and all the pointer cast instructions, it may or may not be a noop. I didnot see an obvious benefit (i.e. hold true in all cases) in using BitOrPointerCast instead of trunc.
Also, this avoids possible performance regression creeping up due to some specific target lowering or due to optimizations that benefit trunc rather than BitOrPointerCast.

sanjoy edited edge metadata.Jun 28 2016, 10:57 AM

Hi Anna,

In D21791#468817, @anna wrote:

Could think of 3 possible ways to fix the build breakage caused by
type mismatch (the test case which showcases the breakage is
@trunc_avoid_bitcast):

  1. The RAUW return replaceInstUsesWith(CI, AvailableVal); should be return replaceInstUsesWith( CI, Builder->CreateBitOrPointerCast(AvailableVal, CI.getType(), CI.getName() + ".cast"));
  1. Avoid the RAUW when the AvailableVal type is not the same as DestTy of trunc.
  2. Special case the trunc type in FindAvailableLoadedValue (i.e. check for exact match rather than CastInst::isBitOrNoopPointerCastable(LI->getType(), AccessTy, DL)) Error-prone solution whose benefit is minimal.

I followed #2.

What #1 does is replace the trunc with a bitcast or
PointerCast. While generating bitcast would be useful if
FindAvailableLoadedValue was used to replace loads, there does not
seem to be a *strong* benefit in replacing the trunc with bitcast or
pointercast. Looking at SelectionDAG::visitBitCast and all the
pointer cast instructions, it may or may not be a noop. I didnot see
an obvious benefit (i.e. hold true in all cases) in using
BitOrPointerCast instead of trunc.

The normal LLVM meme is "bitcasts are free", so I'd be tempted to with
#1. That is also less surprising (a "spurious" pointer cast won't
break optimization). ISDOpcodes.h says that there are cases where
ISD::BITCAST will not be a no-op, but my guess is that it is still
better to replace a trunc with bitcast (i.e. even in the "worst"
case a bitcast will be no worse than a trunc).

Also, this avoids possible performance regression creeping up due to
some specific target lowering or due to optimizations that benefit
trunc rather than BitOrPointerCast.

Given how ubiquitous bitcast s are in LLVM IR, I think that ^ is
unlikely (and if we find such an optimization that is broken by a
bitcast, we should fix the optimization).

anna updated this revision to Diff 62216.Jun 29 2016, 6:36 AM
anna edited edge metadata.

The RAUW in visitTrunc inserts BitOrPointerCast if required and replaces all uses of the trunc.

sanjoy requested changes to this revision.Jun 29 2016, 11:34 AM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Transforms/InstCombine/InstCombineCasts.cpp
592 ↗(On Diff #62216)

Don't we have to check if LI is at most unordered? I realize I missed this the last time.

Please also add a test case.

This revision now requires changes to proceed.Jun 29 2016, 11:34 AM
anna updated this revision to Diff 62379.Jun 30 2016, 10:16 AM
anna edited edge metadata.
anna marked an inline comment as done.

addressed Sanjoy's comments - load atomic ordering should be at most unordered. Added test case trunc_atomic_monotonic

anna updated this revision to Diff 62380.Jun 30 2016, 10:22 AM
anna edited edge metadata.

corrected trunc_atomic_monotonic test Filecheck statement.

reames accepted this revision.Jul 7 2016, 11:32 AM
reames edited edge metadata.

LGTM w/comments addressed.

lib/Analysis/Loads.cpp
328 ↗(On Diff #62380)

I'm not following why you need this new interface. Why can't we reuse the existing interface? We have the load instruction don't we?

... I think I see it. The issue is we need to use the narrower type right? The comment seams to be stale. You're really using the size of the access ty. Can you adjust the comment to make this more obvious.

p.s. Please don't repeat function comments in the cpp. And yes, I know the near by code does. This is older style.

lib/Transforms/InstCombine/InstCombineCasts.cpp
595 ↗(On Diff #62380)

I think what you've got here is entirely correct, but let me sketch and alternate framing to see if you think this is simpler. Instead of splitting out the properties of the load, we leave the interface that of the load, but add the accessty (which is asserts as less than the load type). If we do that, the atomicity check can be sunk into the common code.

test/Transforms/InstCombine/trunc.ll
186 ↗(On Diff #62380)

Add a comment about this only be legal for non-atomic loads since you essentially split the load here.

198 ↗(On Diff #62380)

Add check lines for the other half of the original load.

213 ↗(On Diff #62380)

Add positive checks for the two consume lines at least.

258 ↗(On Diff #62380)

Add a positive test for the two consume lines.

276 ↗(On Diff #62380)

This looks to be a missed optimization. Please comment it as such.

anna added inline comments.Jul 7 2016, 1:12 PM
lib/Analysis/Loads.cpp
328 ↗(On Diff #62380)

This function FindAvailableLoadedValue is used to see if we have the contents of type AccessTy at location Ptr.

This function can be called for either full or partial sizes, as mentioned in start of the comment:

Scan the ScanBB block backwards checking to see if we have the value at the memory address \p Ptr of type \p AccessTy

Will remove the comment from cpp, and try to make the need for the interface more obvious.

lib/Transforms/InstCombine/InstCombineCasts.cpp
595 ↗(On Diff #62380)

We had this in one of the iterations in prior review.

We changed the function signature to be a memory location, rather than an load instruction (http://reviews.llvm.org/D21246#463566).

Discussed with Sanjoy and reasons were:

  1. we can use this function to see (partial or full) memory contents of any operation -- memcpy, load etc. Required constraints should be verified at the caller, as mentioned in function comments.
/// Note that we assume the \p *Ptr is accessed through a non-volatile but
/// potentially atomic load. Any other constraints should be verified at the
/// caller.
  1. Use FindAvailableLoadedValue as-is in most optimizations which require the full load value. No change to function signature.

I had added the Type *accessTy to the original interface with the LoadInst, but this was error prone since there was ambiguity on whether LoadInst->getType() or AccessTy (from visitTrunc) are used. Your suggestion would allow us to use a single type accessTy , but change the calls to this function used in various optimizations.

anna marked 3 inline comments as done.Jul 8 2016, 6:50 AM
anna added inline comments.
test/Transforms/InstCombine/trunc.ll
186 ↗(On Diff #62380)

the load is not split/shrunk. Only the trunc is removed, and all uses of trunc replaced by corresponding store value.

198 ↗(On Diff #62380)

Added checks for making sure load is as it is.

anna updated this revision to Diff 63219.Jul 8 2016, 7:24 AM
anna updated this object.
anna edited edge metadata.

Addressed review comments:
Updated comment for AvailableLoadedValue. Updated comments and file checks for tests

This revision was automatically updated to reflect the committed changes.