This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
5
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
2/2
hoist-invariant-group-load.ll

Differential D31539

Hoisting invariant.group in LICM
AbandonedPublic

Authored by Prazek on Mar 31 2017, 9:05 AM.

Download Raw Diff

Details

Reviewers

chandlerc
sanjoy

Summary

The !invariant.group loads having pointer operand that dominates
the loop body can be hoisted, because we know the load will produce
the same value in every loop step.

Diff Detail

Build Status

Buildable 5248
Build 5248: arc lint + arc unit

Event Timeline

Prazek created this revision.Mar 31 2017, 9:05 AM

small updates

add newline

Harbormaster completed remote builds in B5229: Diff 93664.Mar 31 2017, 9:15 AM

Prazek added a subscriber: amharc.Mar 31 2017, 9:52 AM

Comments inline.

lib/Transforms/Scalar/LICM.cpp
495	You can use `Loop::isLoopInvariant` here.
571	I'm not sure that that the langref lets you do this. What it allows for is %a = ptr, !invariant.group %b = ptr, !invariant.group to %a = *ptr, !invariant.group %b = %a which is slightly different from what you're doing here. So I'd recommend changing the langref wording to be more like what we have for `!invariant.load`.
877	This bit does not look correct. Why can't these attributes be control dependent?
test/Transforms/LICM/hoist-invariant-group-load.ll
40	Please use `-instnamer` to name all the instructions. Otherwise editing the tests will be painful.

This revision now requires changes to proceed.Mar 31 2017, 6:59 PM

• dberlin added a subscriber: • dberlin.Mar 31 2017, 7:00 PM

• dberlin added inline comments.

test/Transforms/LICM/hoist-invariant-group-load.ll
40	Also, if you can, i'd just use update_test_checks at this point.

Prazek added inline comments.Apr 1 2017, 8:26 AM

lib/Transforms/Scalar/LICM.cpp
571	I am not sure how I can make it more clear in LangRef, but I am happy to change that if you have any ideas. Right now it says " The existence of the invariant.group metadata on the instruction tells the optimizer that every load and store to the same pointer operand within the same invariant group can be assumed to load or store the same value" so if my pointer operand doesn't change in any loop step, then I guess it works. The other interpretation is that unrolling the loop one time and then optimizing load in the loop based on invariant.group is exactly the same. But I agree that the docs shoud probably mention that it has to be executed etc.
877	Good catch, I didn't think about it because this works for devirtualization. This means that we need a way to say that given property holds globally. See mailing list

Fixed test

Prazek marked 2 inline comments as done.Apr 1 2017, 11:56 AM

Hi Piotr,

Won't this patch allow a situation like this:

for (;;) {
  vtable = load ptr0, !invariant.group
  use(vtable)
  store new vtable to ptr0
  ptr1 = barrier(ptr0)
  // ptr1 is not used, say
}

for (;;) {
  store new vtable to ptr0
  ptr1 = barrier(ptr0)
  // ptr1 is not used, say
}
vtable = load ptr0, !invariant.group
use(vtable)

After this, the load of vtable will return the newer vtable, which seems problematic.

In D31539#716345, @sanjoy wrote:
Hi Piotr,

Won't this patch allow a situation like this:
for (;;) {
  vtable = load ptr0, !invariant.group
  use(vtable)
  store new vtable to ptr0
  ptr1 = barrier(ptr0)
  // ptr1 is not used, say
}
to
for (;;) {
  store new vtable to ptr0
  ptr1 = barrier(ptr0)
  // ptr1 is not used, say
}
vtable = load ptr0, !invariant.group
use(vtable)
?

After this, the load of vtable will return the newer vtable, which seems problematic.

It would, but that would mean that either the !invariant.group metadata is invalid there, or the store is storing the same value as it is loaded (so it would probably have !invariant.group
there, but it doesn't matter).
If I would unroll this loop:

vtable.pre = load ptr0, !invariant.group
use(vtable.pre)
store new vtable to ptr0
ptr1 = barrier(ptr0)
for (;;) {
  vtable = load ptr0, !invariant.group
  use(vtable)
  store new vtable to ptr0
  ptr2 = barrier(ptr0)
  // ptr1 is not used, say
}

Then you can clearly see that vtable.pre dominates vtable and it has the same pointer operand, so it means it has to load the same value.
Or maybe you are refering to the fact that invariant.group metadata would be preserved? If this is a case, then see my post on mailing list :)

In D31539#716347, @Prazek wrote:
It would, but that would mean that either the !invariant.group metadata is invalid there, or the store is storing the same value as it is loaded (so it would probably have !invariant.group
there, but it doesn't matter).
If I would unroll this loop:
vtable.pre = load ptr0, !invariant.group
use(vtable.pre)
store new vtable to ptr0
ptr1 = barrier(ptr0)
for (;;) {
  vtable = load ptr0, !invariant.group
  use(vtable)
  store new vtable to ptr0
  ptr2 = barrier(ptr0)
  // ptr1 is not used, say
}
Then you can clearly see that vtable.pre dominates vtable and it has the same pointer operand, so it means it has to load the same value.

Does this reasoning still hold if the loop has just one iteration? That is, say the loop was

for (i = 0; i < 1; i++) {
  vtable = load ptr0, !invariant.group
  use(vtable)
  store new vtable to ptr0  // S0
  ptr1 = barrier(ptr0)
  // ptr1 is not used, say
}

then (I claim that) the program does not have UB even if S0 is storing a new vtable. I think this is the kind of IR we'll get from

for (i = 0; i < 1; i++) {
  storage->f();
  storage->~Foo();
  new(storage) Bar;
}

Or maybe you are refering to the fact that invariant.group metadata would be preserved? If this is a case, then see my post on mailing list :)

I thought your mailing list post was about speculating loads. Here we're not speculating anything -- we're only sinking. In any case, you do not need the !invariant.group metadata on the sunk load for the example above to "work".

In D31539#716384, @sanjoy wrote:
It would, but that would mean that either the !invariant.group metadata is invalid there, or the store is storing the same value as it is loaded (so it would probably have !invariant.group
there, but it doesn't matter).
If I would unroll this loop:
vtable.pre = load ptr0, !invariant.group
use(vtable.pre)
store new vtable to ptr0
ptr1 = barrier(ptr0)
for (;;) {
  vtable = load ptr0, !invariant.group
  use(vtable)
Does this reasoning still hold if the loop has just one iteration? That is, say the loop was
for (i = 0; i < 1; i++) {
  vtable = load ptr0, !invariant.group
  use(vtable)
  store new vtable to ptr0  // S0
  ptr1 = barrier(ptr0)
  // ptr1 is not used, say
}
then (I claim that) the program does not have UB even if S0 is storing a new vtable. I think this is the kind of IR we'll get from
for (i = 0; i < 1; i++) {
  storage->f();
  storage->~Foo();
  new(storage) Bar;
}
Or maybe you are refering to the fact that invariant.group metadata would be preserved? If this is a case, then see my post on mailing list :)

I thought your mailing list post was about speculating loads. Here we're not speculating anything -- we're only sinking. In any case, you do not need the !invariant.group metadata on the sunk load for the example above to "work".

So it looks like we can't sink based on invariant.group, but I guess hoisting is still possible, because
it would be only invalid if there would be a store before the load

for (i = 0; i < 1; i++) {

store new vtable to ptr0  // S0  
vtable = load ptr0, !invariant.group
use(vtable)

}

but then if store stores different value, that means it is UB. Is this right?

I thought your mailing list post was about speculating loads. Here we're not speculating anything -- we're only sinking. In any case, you do not need the !invariant.group metadata on the sunk load for the example above to "work".

The dereferenceable is usefull for speculative loads, but in tha last mail I showed the problem with other metadata:
If I will remove invariant.group md while hoisting from loop, then I probably won't be able to devirtualize it further. Consider:

void loop(A* a, int p) {
  for (int i = 0; i < p; i++)
    a->foo():
}

void call() {
  A a;
  clobber(&a); // external function
  loop(&a, 15);
}
external void clobber(A *a);

if I will hoist vtable load of out the loop before inlining, then it won't be able to change it to A::foo(), because invariant.group will be removed.
If I will not hoist it, then it will be devirtualized after inlining, but for callsites of loop() that wasn't inlined it will be worse.

That means we need a way of specifing that one propery holds globally, which would mean that is not dependent on branch. for vtables and virtual function the invariant.group, invariant.load and !dereferenceable is a global property, and I can't loose this information.

Few problems that I have to address before it will make sense to hoist based on invariant.group

1a. We should preserve the metadata if we are not speculatively hoisting loads (if it is guaranteed to execute). This way we won't gonna loose invariant.group metadata that is crucial to preserve
1b. with 1a we should only hoist invariant.group instructions if we will preserve the metadata

We have to find a way to preserve the metadata if we want to hoist speculatively - I proposed solution on mailing list, and there is also very close one here https://reviews.llvm.org/D18738

Here are the features that I needed:
https://reviews.llvm.org/D45150
https://reviews.llvm.org/D45151

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LICM.cpp

19 lines

test/

Transforms/

LICM/

hoist-invariant-group-load.ll

105 lines

Diff 93750

lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	void llvm::computeLoopSafetyInfo(LoopSafetyInfo SafetyInfo, Loop CurLoop) {
// personality routine.		// personality routine.
Function *Fn = CurLoop->getHeader()->getParent();		Function *Fn = CurLoop->getHeader()->getParent();
if (Fn->hasPersonalityFn())		if (Fn->hasPersonalityFn())
if (Constant *PersonalityFn = Fn->getPersonalityFn())		if (Constant *PersonalityFn = Fn->getPersonalityFn())
if (isFuncletEHPersonality(classifyEHPersonality(PersonalityFn)))		if (isFuncletEHPersonality(classifyEHPersonality(PersonalityFn)))
SafetyInfo->BlockColors = colorEHFunclets(*Fn);		SafetyInfo->BlockColors = colorEHFunclets(*Fn);
}		}

		static bool isLoadInvariantGroupInLoop(LoadInst LI, DominatorTree DT,
		Loop *CurLoop) {
		if (!LI->getMetadata(LLVMContext::MD_invariant_group))
		return false;

		// TODO can I do this without casting to Instruction?
		if (auto *PointerOperandInst = dyn_cast<Instruction>(LI->getPointerOperand()))
		sanjoyUnsubmitted Not Done Reply Inline Actions You can use `Loop::isLoopInvariant` here. sanjoy: You can use `Loop::isLoopInvariant` here.
		return DT->properlyDominates(PointerOperandInst->getParent(),
		CurLoop->getHeader());
		return true; // If it is not an instruction then it always dominates
		// TODO check if it actually happens.
		}

// Return true if LI is invariant within scope of the loop. LI is invariant if		// Return true if LI is invariant within scope of the loop. LI is invariant if
// CurLoop is dominated by an invariant.start representing the same memory location		// CurLoop is dominated by an invariant.start representing the same memory location
// and size as the memory location LI loads from, and also the invariant.start		// and size as the memory location LI loads from, and also the invariant.start
// has no uses.		// has no uses.
static bool isLoadInvariantInLoop(LoadInst LI, DominatorTree DT,		static bool isLoadInvariantInLoop(LoadInst LI, DominatorTree DT,
Loop *CurLoop) {		Loop *CurLoop) {
Value *Addr = LI->getOperand(0);		Value *Addr = LI->getOperand(0);
const DataLayout &DL = LI->getModule()->getDataLayout();		const DataLayout &DL = LI->getModule()->getDataLayout();
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {

// Loads from constant memory are always safe to move, even if they end up		// Loads from constant memory are always safe to move, even if they end up
// in the same alias set as something that ends up being modified.		// in the same alias set as something that ends up being modified.
if (AA->pointsToConstantMemory(LI->getOperand(0)))		if (AA->pointsToConstantMemory(LI->getOperand(0)))
return true;		return true;
if (LI->getMetadata(LLVMContext::MD_invariant_load))		if (LI->getMetadata(LLVMContext::MD_invariant_load))
return true;		return true;

		if (isLoadInvariantGroupInLoop(LI, DT, CurLoop))
		sanjoyUnsubmitted Not Done Reply Inline Actions I'm not sure that that the langref lets you do this. What it allows for is %a = ptr, !invariant.group %b = ptr, !invariant.group to %a = ptr, !invariant.group %b = %a which is slightly different from what you're doing here. So I'd recommend changing the langref wording to be more like what we have for `!invariant.load`. sanjoy:* I'm not sure that that the langref lets you do this. What it allows for is ``` %a = *ptr, !
		PrazekAuthorUnsubmitted Not Done Reply Inline Actions I am not sure how I can make it more clear in LangRef, but I am happy to change that if you have any ideas. Right now it says " The existence of the invariant.group metadata on the instruction tells the optimizer that every load and store to the same pointer operand within the same invariant group can be assumed to load or store the same value" so if my pointer operand doesn't change in any loop step, then I guess it works. The other interpretation is that unrolling the loop one time and then optimizing load in the loop based on invariant.group is exactly the same. But I agree that the docs shoud probably mention that it has to be executed etc. Prazek: I am not sure how I can make it more clear in LangRef, but I am happy to change that if you…
		return true;

// This checks for an invariant.start dominating the load.		// This checks for an invariant.start dominating the load.
if (isLoadInvariantInLoop(LI, DT, CurLoop))		if (isLoadInvariantInLoop(LI, DT, CurLoop))
return true;		return true;

// Don't hoist loads which have may-aliased stores in loop.		// Don't hoist loads which have may-aliased stores in loop.
uint64_t Size = 0;		uint64_t Size = 0;
if (LI->getType()->isSized())		if (LI->getType()->isSized())
Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType());		Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType());
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	static bool hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
// Metadata can be dependent on conditions we are hoisting above.		// Metadata can be dependent on conditions we are hoisting above.
// Conservatively strip all metadata on the instruction unless we were		// Conservatively strip all metadata on the instruction unless we were
// guaranteed to execute I if we entered the loop, in which case the metadata		// guaranteed to execute I if we entered the loop, in which case the metadata
// is valid in the loop preheader.		// is valid in the loop preheader.
if (I.hasMetadataOtherThanDebugLoc() &&		if (I.hasMetadataOtherThanDebugLoc() &&
// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning		// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning
// time in isGuaranteedToExecute if we don't actually have anything to		// time in isGuaranteedToExecute if we don't actually have anything to
// drop. It is a compile time optimization, not required for correctness.		// drop. It is a compile time optimization, not required for correctness.
!isGuaranteedToExecute(I, DT, CurLoop, SafetyInfo))		!isGuaranteedToExecute(I, DT, CurLoop, SafetyInfo))
		sanjoyUnsubmitted Not Done Reply Inline Actions This bit does not look correct. Why can't these attributes be control dependent? sanjoy: This bit does not look correct. Why can't these attributes be control dependent?
		PrazekAuthorUnsubmitted Not Done Reply Inline Actions Good catch, I didn't think about it because this works for devirtualization. This means that we need a way to say that given property holds globally. See mailing list Prazek: Good catch, I didn't think about it because this works for devirtualization. This means that…
I.dropUnknownNonDebugMetadata();		I.dropUnknownNonDebugMetadata({LLVMContext::MD_invariant_group,
		LLVMContext::MD_invariant_load});

// Move the new node to the Preheader, before its terminator.		// Move the new node to the Preheader, before its terminator.
I.moveBefore(Preheader->getTerminator());		I.moveBefore(Preheader->getTerminator());

// Do not retain debug locations when we are moving instructions to different		// Do not retain debug locations when we are moving instructions to different
// basic blocks, because we want to avoid jumpy line tables. Calls, however,		// basic blocks, because we want to avoid jumpy line tables. Calls, however,
// need to retain their debug locs because they may be inlined.		// need to retain their debug locs because they may be inlined.
// FIXME: How do we retain source locations without causing poor debugging		// FIXME: How do we retain source locations without causing poor debugging
▲ Show 20 Lines • Show All 504 Lines • Show Last 20 Lines

test/Transforms/LICM/hoist-invariant-group-load.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -licm -disable-basicaa -S < %s \| FileCheck %s

				%struct.A = type { i32 (...)** }

				; CHECK-LABEL: @hoist(
				define void @hoist(%struct.A* dereferenceable(8) %arg) {

				entry:
				%call1 = tail call i32 @bar()
				%tobool2 = icmp eq i32 %call1, 0
				br i1 %tobool2, label %while.end, label %while.body.lr.ph

				while.body.lr.ph: ; preds = %entry
				; CHECK: [[B:%.]] = bitcast %struct.A [[ARG:%.]] to void (%struct.A)***
				; CHECK-NEXT: [[VTABLE:%.]] = load void (%struct.A)*, void (%struct.A)*** [[B]], align 8, !invariant.group
				; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
				%b = bitcast %struct.A* %arg to void (%struct.A)**
				br label %while.body

				while.body: ; preds = %while.body, %while.body.lr.ph
				; CHECK: while.body:
				; CHECK-NEXT: [[TMP:%.]] = load void (%struct.A), void (%struct.A)** [[VTABLE]], align 8, !invariant.load
				%vtable = load void (%struct.A), void (%struct.A)*** %b, align 8, !dereferenceable !0, !invariant.group !1
				%tmp = load void (%struct.A), void (%struct.A)* %vtable, align 8, !invariant.load !1
				tail call void %tmp(%struct.A* %arg)
				%call = tail call i32 @bar()
				%tobool = icmp eq i32 %call, 0
				br i1 %tobool, label %while.end.loopexit, label %while.body

				while.end.loopexit: ; preds = %while.body
				br label %while.end

				while.end: ; preds = %while.end.loopexit, %entry
				ret void
				}

				; CHECK-LABEL: @hoist2(
				define void @hoist2(i8** dereferenceable(8) %arg) {
				entry:
				sanjoyUnsubmitted Done Reply Inline Actions Please use `-instnamer` to name all the instructions. Otherwise editing the tests will be painful. sanjoy: Please use `-instnamer` to name all the instructions. Otherwise editing the tests will be…
				dberlinUnsubmitted Done Reply Inline Actions Also, if you can, i'd just use update_test_checks at this point. dberlin: Also, if you can, i'd just use update_test_checks at this point.
				%call1 = tail call i32 @bar()
				%tobool2 = icmp eq i32 %call1, 0
				br i1 %tobool2, label %while.end, label %while.body.lr.ph

				while.body.lr.ph: ; preds = %entry
				; CHECK: while.body.lr.ph:
				; CHECK-NEXT: [[X:%.]] = load i8, i8** [[ARG:%.*]], align 8, !invariant.group
				; CHECK-NEXT: br label [[WHILE_BODY:%.*]]

				br label %while.body
				; CHECK: while.body:
				while.body: ; preds = %while.body, %while.body.lr.ph
				%x = load i8, i8* %arg, align 8, !invariant.group !1
				call void @foo(i8* %x)
				%call = tail call i32 @bar()
				%tobool = icmp eq i32 %call, 0
				br i1 %tobool, label %while.end.loopexit, label %while.body

				while.end.loopexit: ; preds = %while.body
				br label %while.end

				while.end: ; preds = %while.end.loopexit, %entry
				ret void
				}

				declare void @foo(i8*)

				declare i32 @bar()

				; CHECK-LABEL: @dontHoist(
				define void @dontHoist(%struct.A** %a) {

				entry:
				%call4 = tail call i32 @bar()
				%cmp5 = icmp sgt i32 %call4, 0
				br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret void
				; CHECK: for.body:
				for.body:
				; CHECK: [[VTABLE:%.]] = load void (%struct.A)*, void (%struct.A)*** {{.}}, align 8, !dereferenceable !{{.}}, !invariant.group
				; CHECK-NEXT: [[TMP2:%.]] = load void (%struct.A), void (%struct.A)** [[VTABLE]], align 8, !invariant.load
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds %struct.A, %struct.A* %a, i64 %indvars.iv
				%tmp = load %struct.A, %struct.A* %arrayidx, align 8
				%tmp1 = bitcast %struct.A* %tmp to void (%struct.A)**
				%vtable = load void (%struct.A), void (%struct.A)*** %tmp1, align 8, !dereferenceable !0, !invariant.group !1
				%tmp2 = load void (%struct.A), void (%struct.A)* %vtable, align 8, !invariant.load !1
				tail call void %tmp2(%struct.A* %tmp)
				%indvars.iv.next = add nuw i64 %indvars.iv, 1
				%call = tail call i32 @bar()
				%tmp3 = sext i32 %call to i64
				%cmp = icmp slt i64 %indvars.iv.next, %tmp3
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}

				!0 = !{i64 8}
				!1 = !{}