This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add rule to fold away trunc of partial load
ClosedPublic

Authored by anna on Jun 10 2016, 3:15 PM.

Details

Summary

This instcombine rule is created to fold away trunc operations that are generated for partial loads. The trunc can be folded away when the value for partial load is available from a prior store or load.

This kind of code can be generated as a result of GVN widening or in some source code as well. The current rule considers partial loads/stores of the form:

%pload = load i8, i8* %a, align 2
  %wide.load = load i16, i16* %0, align 2
  %lowhalf.1 = trunc i16 %wide.load to i8
  call void @consume(i8 %lowhalf.1)
  call void @consume(i8 %pload).

The store partially feeding into load is added as a test case.

Diff Detail

Repository
rL LLVM

Event Timeline

anna updated this revision to Diff 60414.Jun 10 2016, 3:15 PM
anna retitled this revision from to [InstCombine] Add rule to fold away trunc of partial load.
anna updated this object.
anna added reviewers: reames, sanjoy.
anna set the repository for this revision to rL LLVM.
anna added a reviewer: majnemer.EditedJun 10 2016, 3:18 PM
anna added a subscriber: llvm-commits.

The current rule gets the lower part of the load instruction on an LE machine. We could also add rules where we get the upper half of the load (would require a different pattern match in visitTrunc and pass in the offset to FindAvailableLoadedValue).

sanjoy requested changes to this revision.Jun 10 2016, 3:24 PM
sanjoy edited edge metadata.

Some minor comments inline.

lib/Analysis/Loads.cpp
388 ↗(On Diff #60414)

Doesn't the atomicity concern above apply here too?

429 ↗(On Diff #60414)

Why not use CastInst::isBitOrNoopPointerCastable here as well (same for the load case)?

lib/Transforms/InstCombine/InstCombineCasts.cpp
580 ↗(On Diff #60414)

Is this clang-formatted ?

test/Transforms/InstCombine/trunc.ll
163 ↗(On Diff #60414)

Add a store test case, and also one or two negative test cases.

This revision now requires changes to proceed.Jun 10 2016, 3:24 PM
anna updated this revision to Diff 60522.Jun 13 2016, 7:35 AM
anna edited edge metadata.
anna removed rL LLVM as the repository for this revision.
anna marked 4 inline comments as done.

addressed above comments.

sanjoy requested changes to this revision.Jun 16 2016, 6:30 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Analysis/Loads.cpp
385 ↗(On Diff #60522)

It feels odd that we have to special-case PartialType this way. I think a better design would be to introduce a new function FindAvailableMemoryContents(Value *Ptr, Type *AccessTy ... < no PartialTy>) (perhaps with a different name), and have this version of FindAvailableLoadedValue dispatch to FindAvailableMemoryContents one with LI->getPointerOperand(), LI->getType(). visitTrunc can then call FindAvailableMemoryContents directly.

lib/Transforms/InstCombine/InstCombineCasts.cpp
588 ↗(On Diff #60522)

Here if we have:

store i64 %x to %ptr
%val = load i64* %ptr
%val.tr = trunc i64 %val to i32

won't we have a type mismatch and try to replace %val.tr with %x?

This revision now requires changes to proceed.Jun 16 2016, 6:30 PM
anna added inline comments.Jun 17 2016, 8:48 AM
lib/Analysis/Loads.cpp
385 ↗(On Diff #60522)

It seems ok, but are we fine with changing FindAvailableLoadedValue to being a wrapper function, with all the actual code in FindAvailableMemoryContents?

lib/Transforms/InstCombine/InstCombineCasts.cpp
588 ↗(On Diff #60522)

We won't replace %val.tr with %x since the DestTy is i32. We do the check in FindAvailableLoadedValue (LI: %val,.., DestTy: i32) that i64 is BitorNoopPointerCastable to the PartialType i32.
I've added this test case as well now.

sanjoy added inline comments.Jun 17 2016, 10:21 AM
lib/Analysis/Loads.cpp
385 ↗(On Diff #60522)

I don't see any problems with that (you can put them both in this .cpp file for the time being).

lib/Transforms/InstCombine/InstCombineCasts.cpp
588 ↗(On Diff #60522)

I was thinking of the case where the first check, i.e.

if (AreEquivalentAddressValues(LoadPtr, StrippedPtr) &&
    CastInst::isBitOrNoopPointerCastable(LI->getType(), AccessTy, DL)) {

or

if (AreEquivalentAddressValues(StorePtr, StrippedPtr) &&
    CastInst::isBitOrNoopPointerCastable(SI->getValueOperand()->getType(),
                                         AccessTy, DL)) {

would fire and you'd return a wider value. So the test case has to be

define i1 @trunc_different_size_load(i32 * align 4 %a) {
  store i32 0, i32 *%a, align 4
  %wide.load = load i32, i32* %a, align 4
  %lowhalf = trunc i32 %wide.load to i8
  call void @consume(i8 %lowhalf)
  %cmp.2 = icmp ult i32 %wide.load, 256
  ret i1 %cmp.2
; CHECK-LABEL: @trunc_different_size_load
; CHECK: %lowhalf = trunc i32 %wide.load to i8
}
anna added inline comments.Jun 17 2016, 11:07 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
588 ↗(On Diff #60522)

I see your point about the widening and the mismatch of types. The reason the firing of the first check did not trigger any failures was because a prior instcombine rule (I think it is combineLoadStore which calls FindAvailableLoad as well) already removes the trunc. So with or without this trunc rule in instcombine, opt with -instcombine changes the above test case to:

define i1 @trunc_different_size_loads(i32* align 4 %a) {
  store i32 0, i32* %a, align 4
  call void @consume(i8 0)
  ret i1 false
}

Also verified the value is correctly truncated, say in this case:

store i32 257, i32* %a, align 4
call void @consume(i8 1)
ret i1 false

That said, I agree we should not be relying on a certain ordering of instcombine rules for this, since there would be a type mismatch in absence of load store instcombine rules.
Will make the change. Thanks :)

anna added inline comments.Jun 18 2016, 1:09 PM
lib/Analysis/Loads.cpp
385 ↗(On Diff #60522)

Just noticed there is a similar change that went in for the same function checking.

This change adds a new parameter to the function isLoadCSE, with default as nullptr.

I'll have to rethink the design for my change. it makes sense to create a separate function which checks for partial memory contents. However, we do not want to keep adding extra parameters to the FindAvailableLoadedValue function, since this would mean keeping track of the parameters (or atleast knowing what each parameter does) when we call FindAvailableLoadedValue from FindAvailableMemoryContents.

anna added a comment.EditedJun 18 2016, 6:54 PM

Just noticed there is a similar change that went in for the same function checking. This change adds a new parameter to the function isLoadCSE, with default as nullptr.

change in question: http://reviews.llvm.org/D21271

anna updated this revision to Diff 61365.Jun 21 2016, 7:23 AM
anna edited edge metadata.
anna set the repository for this revision to rL LLVM.

Added function FindAvailableMemoryContents, which returns the partial or full value at memory location accessed by load.

sanjoy requested changes to this revision.Jun 21 2016, 12:06 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
include/llvm/Analysis/Loads.h
64 ↗(On Diff #61365)

Nit: newer code does not repeat the function name before comments.

67 ↗(On Diff #61365)

This is not quite the interface I was thinking of. I was thinking of

Value *FindAvailableMemoryContents(Value *Location, Type *AccessTy,
                                   BasicBlock *BB,
                                   BasicBlock::iterator &ScanFrom,
                                   unsigned MaxInstsToScan,
                                   AliasAnalysis *AA = nullptr,
                                   AAMDNodes *AATags = nullptr,
                                   bool *IsLoadCSE = nullptr);

That is, FindAvailableMemoryContents tries to give a Value * that would be the result of loading a value of type AccessTy from Location (this is how it should be specified too). Whether the Location came from a load instruction of elsewhere is not something it needs to worry about.

lib/Analysis/Loads.cpp
303 ↗(On Diff #61365)

Nit: remove the \brief -- newer code can rely on doxygen using autobrief.

313 ↗(On Diff #61365)

Why did you remove this? Isn't this pertinent to FindAvailableMemoryContents as well?

455 ↗(On Diff #61365)

I think this "partial or complete" term (here and elsewhere) are more confusing than helpful. The invariant here is that FindAvailableMemoryContents will return a value of type AccessTy, whether AccessTy is partial or full from the perspective of the caller is not something FindAvailableMemoryContents should worry about. For instance, with the edited interface that takes a pointer and not a LoadInst, I'd rather we have:

// We want a value covering the whole of the load.
return FindAvailableMemoryContents(Load->getPointerOperand(), Load->getType(), ScanBB, ScanFrom, MaxInstsToScan, AA, AATags, IsLoadCSE);
This revision now requires changes to proceed.Jun 21 2016, 12:06 PM
anna updated this revision to Diff 61450.Jun 21 2016, 2:36 PM
anna edited edge metadata.
anna removed rL LLVM as the repository for this revision.
anna marked 5 inline comments as done.

Added function signature changed to Value *Ptr instead of LoadInst *Load. Also name changed from FindMemoryContents to overloaded version of FindAvailableLoadedValue.

My main concern with the signature having the Pointer operand Value *Ptr instead of LoadInst *Load was making sure the atomicity checks would be correct (LI->isAtomic < Load->isAtomic) and the corresponding one in the store inst SI case.

Discussed offline with Sanjoy and we decided to leave the sanity checking to the callers of this function (both atomicity concerns and any other checks needed depending on the nature of transformation done once the value is available).
Calling this function to see if a store value is available: FindAvailableLoadedValue(SI->getPointerOperand(), SI->getType,...) would possibly require careful consideration of atomicity rules.

sanjoy requested changes to this revision.Jun 21 2016, 9:33 PM
sanjoy edited edge metadata.

Mostly minor comments.

include/llvm/Analysis/Loads.h
65 ↗(On Diff #61450)

Nit here and later: \p Ptr

(This is optional) Also, instead of a separate line about AccessTy, I'd just restate this to say "if we have the value at address \p Ptr of type \p AccessTy"

80 ↗(On Diff #61450)

Describe isAtomicMemOp, and note that we're constant folding an unordered load (actually, maybe that's a better name for this function - FoldLoadFromAddress?).

Also, by convention, it should be IsAtomicMemOp.

82 ↗(On Diff #61450)

I think "sanity checks" is too vague to be useful. I think it is sufficient to state that we've assumed that we're folding a non-volatile unordered (but potentially atomic) load -- it is understood that people calling this function know what they want. :)

lib/Analysis/Loads.cpp
363 ↗(On Diff #61450)

This bit of change looks unnecessary now

This revision now requires changes to proceed.Jun 21 2016, 9:33 PM
anna marked 2 inline comments as done.Jun 22 2016, 7:27 AM
anna added inline comments.
include/llvm/Analysis/Loads.h
80 ↗(On Diff #61450)

Described IsAtomicMemOp as

IsAtomicMemOp specifies the atomicity of the memory operation that accesses \p *Ptr. We verify atomicity constraints are satisfied when value forwarding from another memory operation that has value \p *Ptr available.

But FoldLoadFromAddress is perhaps a misleading name, since the transformation done by the caller need not be folding of load (explained below).

82 ↗(On Diff #61450)

I think we will miss some subtle points in that case, and will only be considering one usage of this function.
Basically, we do not make any assumptions on the memory operation, i.e. it need not be unordered non-volatile. This constraint is only required for the call from the FindAvailableLoadedValue(LoadInst *Load,...). We do not need this constraint for the call from InstCombiner::visitTrunc, since we do not fold away the load.
Also, the transformation using the value returned, can be a fold of the load as done in most cases, or fold of the trunc and nothing done to the load.

The idea is that it is up to the caller to decide what sort of memory operations they are interested in (may or may not be an unordered load) and what transformation needs to be done based on the value (may or may not be folding of load).

I thought this would be a very detailed explanation, and decided to state it as 'sanity checks' :)

sanjoy added inline comments.Jun 22 2016, 10:59 AM
include/llvm/Analysis/Loads.h
82 ↗(On Diff #61450)

We do not need this constraint for the call from
InstCombiner::visitTrunc, since we do not fold away the load.

I'm not sure if we are okay folding volatile loads, but I realized
(just now) that we're not okay folding atomic loads in
visitTrunc. Our program could have been:

%ptr is initially 0

ThreadA:
  store atomic i32 1, i32* %ptr

ThreadB:
  %prevva = load atomic i16, (bitcast %ptr)
  %val = load atomic i32, i32* %ptr
  print(%val)
  print(trunc(%val))

Right now the program above can print 0, 0 or 1, 1, but after your
change, ThreadB will be:

ThreadB:
  %prevva = load atomic i16, (bitcast %ptr)
  %val = load atomic i32, i32* %ptr
  print(%val)
  print(%prevva)

and will alllow the execution 1, 0 and 0, 1. I think we can fold
atomic loads only if the load has trunc as the only use.

sanjoy added inline comments.Jun 22 2016, 11:25 AM
include/llvm/Analysis/Loads.h
82 ↗(On Diff #61450)

That example was not correct, to be precise, this is what I'm talking
about:

%ptr is 0 initially

ThreadA:
  store atomic i32 -1, i32* %ptr

ThreadB:
  %val0 = load atomic i16, i16* (bitcast %ptr)
  %val1 = load atomic i32, i32* %ptr
  %val2 = trunc i32 %val1 to i16
  print(%val1, %val2)

Now ThreadB can print only "0, 0" or "-1, -1", but after you replace
%val2 with %val0, it will be possible for it to print "0, -1" or "-1,
0".

anna added inline comments.Jun 22 2016, 12:17 PM
include/llvm/Analysis/Loads.h
82 ↗(On Diff #61450)

Good catch! I initially thought FindAvailableLoadedValue atomicity check may handle the case you've written: if (LI->isAtomic() < isAtomicMemOp) return nullptr, but here both loads are atomic, and this would cause incorrect results only in visitTrunc.

I'll have this check before deciding if visitTrunc can call this function (include volatile check for safety).

if (auto *LI = dyn_cast<LoadInst>(Src)) { // source operand of trunc is load
if(!LI->isVolatile() && (!LI->isAtomic() || LI->hasOneUse()) )
   call FindAvailableLoadedValue
}

and add above example within theadB as testcase to see trunc not removed.

anna updated this revision to Diff 61669.Jun 23 2016, 6:55 AM
anna edited edge metadata.
anna marked 3 inline comments as done.

Added check in visitTrunc to confirm the load is non-volatile (volatile loads have target dependent results, and I'm not sure if it will affect any targets) and if the load is atomic, it has only a single use, which is the trunc.
Added a test case to verify trunc is not removed in the case of atomic load with more than one use.

sanjoy accepted this revision.Jun 23 2016, 11:02 AM
sanjoy edited edge metadata.

lgtm

This revision is now accepted and ready to land.Jun 23 2016, 11:02 AM
This revision was automatically updated to reflect the committed changes.
anna reopened this revision.Jun 28 2016, 6:45 AM

The above change caused a build break when the result type of trunc being folded did not match that of the
prior load/store that had the trunc value.

The change was reverted in: r273703 due to fail in Chromium and sanitizer build. This was the bad RAUW assertion:
Assertion failed: New->getType() == getType() && "replaceAllUses of value with new value of different type!", file ..\lib\IR\Value.cpp, line 375

Load being used:

%138 = load float, float* %m_x.i527, align 4, !dbg !447
Trunc being replaced:

%155 = trunc i64 %140 to i32, !dbg !470

The RAUW return replaceInstUsesWith(CI, AvailableVal); was missing Builder->CreateBitOrPointerCast of AvailableVal.

This revision is now accepted and ready to land.Jun 28 2016, 6:45 AM
anna closed this revision.Jun 28 2016, 6:47 AM

All changes will be tracked in new review which addresses the build break: http://reviews.llvm.org/D21791