This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
3/3
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
hoist-invariant-group-load.ll

Differential D45151

[LICM] Hoisting invariant.group loads
Needs ReviewPublic

Authored by Prazek on Apr 1 2018, 2:05 PM.

Download Raw Diff

Details

Reviewers

• dberlin
amharc
junbuml
hfinkel
rsmith
sanjoy
kuhar
reames

Summary

This patch introduce hoisting !invariant.group loads that
are guaranteed to execute when the loop is executed. This way
we ain't gonna drop any metadata. This is crucial as we don't know
yet if it is profitable to hoist !invariant.group loads speculatively.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 16619
Build 16619: arc lint + arc unit

Event Timeline

Prazek created this revision.Apr 1 2018, 2:05 PM

Prazek added a parent revision: D45150: Less conservative LoopSafetyInfo for headers.Apr 1 2018, 2:05 PM

kuhar accepted this revision.Apr 3 2018, 8:26 PM

kuhar added inline comments.

lib/Transforms/Scalar/LICM.cpp
522	nit: I'd remove the second 'only'

This revision is now accepted and ready to land.Apr 3 2018, 8:26 PM

Prazek added a reviewer: junbuml.Apr 30 2018, 4:31 PM

Prazek added a reviewer: reames.May 3 2018, 4:20 AM

Can someone with experience in Loop passes take a look at this?

You say that you don't yet know if this is profitable yet. Do you have reason to believe that it might not be profitable (e.g., some example where it seems like we might want to constrain the behavior)? We almost never favor keeping metadata over doing other transformations, so I think it's worth being explicit about our reasoning here. Just saying "I don't know yet" is probably not sufficient, as we could say that about nearly all metadata, and in that case, change the default logic.

lib/Transforms/Scalar/LICM.cpp
1052	This now sounds fairly tautological. How about saying, "Except when we can prove the metadata independent of any such conditions, strip it." (instead of "Conservatively strip all metadata on the instruction unless we can keep it.")
1055	isGuaranteedToExecute -> canKeepMetadata

In D45151#1113727, @hfinkel wrote:

You say that you don't yet know if this is profitable yet. Do you have reason to believe that it might not be profitable (e.g., some example where it seems like we might want to constrain the behavior)? We almost never favor keeping metadata over doing other transformations, so I think it's worth being explicit about our reasoning here. Just saying "I don't know yet" is probably not sufficient, as we could say that about nearly all metadata, and in that case, change the default logic.

The reasoning behind it is that given function:

void foo(A *a) {

other(a);
while(...)
  something();
  a->bar();

}

we could hoist vtable and virtual function loads from the loop, but it would strip the metadata.
This could be very bad after we would inline the function in the context where the dynamic type is known by invariant.group

void caller() {
A a;
foo(&a);
}

now because we stripped the metadata, and because call to "other" blocks us from const propagating vptr value we would not be able to determine a->bar.
If we would not hoist vptr load then by providing other store or load with invariant group we would be able to const propagte it to the one inside the loop.

I think that instead of hoisiting the vtable load all the time we should unroll the loop one time, so that vptr value can be propagated from the first iteration. This way we would still keep the metadata.

I did not yet collected benchmarking data with and without this patch, but looking at one microbenchmark from LNT that had speedup arround 70% I am pretty confident that can make very similar benchmark, but that would use the example that I showed to show that stripping metadata while hoisting vtable load would not be as beneficial.

s/loop unrolling/loop peeling/
I will had to take a look at this at some time, but from what I've heard the state of loop peeling in LLVM is not very sofisticated, and implementing the logic that I propose might be a little hard.

Fixed comments

@hfinkel is my response resonable?

Harbormaster completed remote builds in B19286: Diff 151158.Jun 13 2018, 7:21 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 13 2018, 7:21 AM

friendly ping

Prazek added a reviewer: rsmith.Jul 16 2018, 9:36 PM

Prazek removed a parent revision: D45150: Less conservative LoopSafetyInfo for headers.Jul 19 2018, 2:56 PM

Can we spec !invariant.group in a way that lets us always keep the metadata when hoisting? Right now it isn't clear what happens if its contract is violated, i.e. what the behavior of this program is:

store i32 0, i32* %ptr, !invariant.group !0
%x = load i32, i32* %ptr, !invariant.group !0
store i32 1, i32* %ptr, !invariant.group !0
%y = load i32, i32* %ptr, !invariant.group !0

though it seems like you're assuming the *load* has UB? Can we instead say that the second store has UB? That way we should be able to hoist the load instruction without dropping the metadata.

llvm/lib/Transforms/Scalar/LICM.cpp
542 ↗	(On Diff #151158)	lose
1075 ↗	(On Diff #151158)	Not sure what this replacement adds -- do you plan to add more stuff to `canKeepMetadata` in the future?

Prazek added inline comments.Aug 13 2018, 4:00 PM

llvm/lib/Transforms/Scalar/LICM.cpp
1075 ↗	(On Diff #151158)	It is a good way to document the code. This way I do not need to comment both calls to isGuaranteedToExecute saying what is the logic behind it. It is also more future proof - if someone would like to change the logic of canKeepMetadata, then all callers will be affected, which could not be the case if someone would only change one place.

In D45151#1197581, @sanjoy wrote:
Can we spec !invariant.group in a way that lets us always keep the metadata when hoisting? Right now it isn't clear what happens if its contract is violated, i.e. what the behavior of this program is:
store i32 0, i32* %ptr, !invariant.group !0
%x = load i32, i32* %ptr, !invariant.group !0
store i32 1, i32* %ptr, !invariant.group !0
%y = load i32, i32* %ptr, !invariant.group !0
though it seems like you're assuming the *load* has UB? Can we instead say that the second store has UB? That way we should be able to hoist the load instruction without dropping the metadata.

I talked with Sanjoy ofline, but for the record, consider:

%ptr = alloc 
if (false) { 
  %val = load %ptr, !invariant.group 
} 
store %ptr, 1, !invariant.group 
load %ptr, !invariant.group

I think that the current LangRef says that the store with invariant.group must store the same value (and it could also be removed if there is dominating store/load with invariant.group),
but even without it, in this example we could forward undef to the second load based on the hoisted first load, if we would keep the !invariant.group md.

Sanjoy sugested that such load could return poison instead of undef. I am not familiar with it, and I would expect that getting such change would probably take the same amount of time as getting this simple change (posted on April 1st),
so if there are no objections, I would like to put this one in the trunk.

@hfinkel, you had some objections. Can you take a look again?

I'm sorry I can't be more decisive here since I wasn't deeply involved with the devirtualization work early on and so lack a lot of context. The general direction here seems fine to me -- given that we use metadata to express a large range of things, it seems ok to have metadata specific MD dropping policies. However, given that Hal originally objected to this, we should make sure he is on board.

In D45151#1199948, @sanjoy wrote:

I'm sorry I can't be more decisive here since I wasn't deeply involved with the devirtualization work early on and so lack a lot of context. The general direction here seems fine to me -- given that we use metadata to express a large range of things, it seems ok to have metadata specific MD dropping policies. However, given that Hal originally objected to this, we should make sure he is on board.

I understand the motivation, thanks. The problem here is that, while we might want to avoid hoisting so that we don't strip the metadata, that only is better if we happen to inline into a function that then provides a concrete type. Otherwise, we should have hoisted. How about this: Add a parameter to LICM to control this choice: We can choose not to hoist during the early runs on LICM (which happen during inlining), and then hoist later (during the LICM invocation that runs after loop unrolling).

In D45151#1200094, @hfinkel wrote:

In D45151#1199948, @sanjoy wrote:

I'm sorry I can't be more decisive here since I wasn't deeply involved with the devirtualization work early on and so lack a lot of context. The general direction here seems fine to me -- given that we use metadata to express a large range of things, it seems ok to have metadata specific MD dropping policies. However, given that Hal originally objected to this, we should make sure he is on board.

I understand the motivation, thanks. The problem here is that, while we might want to avoid hoisting so that we don't strip the metadata, that only is better if we happen to inline into a function that then provides a concrete type. Otherwise, we should have hoisted. How about this: Add a parameter to LICM to control this choice: We can choose not to hoist during the early runs on LICM (which happen during inlining), and then hoist later (during the LICM invocation that runs after loop unrolling).

Sorry for late response, didn't have time to look into this.
I though about this and I would like to stick to the current implementation. I think that having different LICM behavior introduces complexity that is not worth the cost.
We would also need to introduce new named passes, like licm-pre-inline and licm-after-inline. It is also not clear what we should do after inlining when doing LTO.

I would stick to this approach, as it does not pessimize current implementation -- in some cases we can hoist vtable loads, but all of the vtable loads that would be hoisted before are also hoisted right now.

kuhar resigned from this revision.Sep 30 2019, 9:48 AM

This revision now requires review to proceed.Sep 30 2019, 9:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2019, 9:48 AM

Herald added a subscriber: asbirlea. · View Herald Transcript

reames resigned from this revision.Mar 25 2020, 1:53 PM

asbirlea removed a subscriber: asbirlea.Mar 25 2020, 2:29 PM

Prazek mentioned this in D99784: [LICM] Hoist loads with invariant.group metadata.May 11 2021, 6:14 AM

sanjoy resigned from this revision.Jan 29 2022, 5:31 PM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LICM.cpp

31 lines

test/

Transforms/

LICM/

hoist-invariant-group-load.ll

128 lines

Diff 140595

lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 498 Lines • ▼ Show 20 Lines	if (!inSubLoop(BB, CurLoop, LI))
CurLoop->getLoopPreheader()->getTerminator()))		CurLoop->getLoopPreheader()->getTerminator()))
Changed \|= hoist(I, DT, CurLoop, SafetyInfo, ORE);		Changed \|= hoist(I, DT, CurLoop, SafetyInfo, ORE);
}		}
}		}

return Changed;		return Changed;
}		}

		static bool canKeepMetadata(const Instruction &I, const DominatorTree *DT,
		const Loop *CurLoop,
		const LoopSafetyInfo *SafetyInfo) {
		// The metadata is valid in the loop preheader if we are guaranteed to
		// execute I if we entered the loop.
		return isGuaranteedToExecute(I, DT, CurLoop, SafetyInfo);
		}

		static bool
		isUnconditionalInvariantGroupLoad(LoadInst LI, const DominatorTree DT,
		const Loop *CurLoop,
		const LoopSafetyInfo *SafetyInfo) {
		if (!LI->getMetadata(LLVMContext::MD_invariant_group))
		return false;

		// For now we only want to hoist invariant.group loads only if we can keep
		kuharUnsubmitted Done Reply Inline Actions nit: I'd remove the second 'only' kuhar: nit: I'd remove the second 'only'
		// the metadata. This is because we don't know yet if it's better to hoist it
		// and loose metadata, or to keep the metadata counting that we will be able
		// to merge this load with another outside the loop.
		return canKeepMetadata(*LI, DT, CurLoop, SafetyInfo);
		}

// Return true if LI is invariant within scope of the loop. LI is invariant if		// Return true if LI is invariant within scope of the loop. LI is invariant if
// CurLoop is dominated by an invariant.start representing the same memory		// CurLoop is dominated by an invariant.start representing the same memory
// location and size as the memory location LI loads from, and also the		// location and size as the memory location LI loads from, and also the
// invariant.start has no uses.		// invariant.start has no uses.
static bool isLoadInvariantInLoop(LoadInst LI, DominatorTree DT,		static bool isLoadInvariantInLoop(LoadInst LI, DominatorTree DT,
Loop *CurLoop) {		Loop *CurLoop) {
Value *Addr = LI->getOperand(0);		Value *Addr = LI->getOperand(0);
const DataLayout &DL = LI->getModule()->getDataLayout();		const DataLayout &DL = LI->getModule()->getDataLayout();
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
if (AA->pointsToConstantMemory(LI->getOperand(0)))		if (AA->pointsToConstantMemory(LI->getOperand(0)))
return true;		return true;
if (LI->getMetadata(LLVMContext::MD_invariant_load))		if (LI->getMetadata(LLVMContext::MD_invariant_load))
return true;		return true;

if (LI->isAtomic() && SinkingToLoopBody)		if (LI->isAtomic() && SinkingToLoopBody)
return false; // Don't sink unordered atomic loads to loop body.		return false; // Don't sink unordered atomic loads to loop body.

		if (isUnconditionalInvariantGroupLoad(LI, DT, CurLoop, SafetyInfo))
		return true;

// This checks for an invariant.start dominating the load.		// This checks for an invariant.start dominating the load.
if (isLoadInvariantInLoop(LI, DT, CurLoop))		if (isLoadInvariantInLoop(LI, DT, CurLoop))
return true;		return true;

// Don't hoist loads which have may-aliased stores in loop.		// Don't hoist loads which have may-aliased stores in loop.
uint64_t Size = 0;		uint64_t Size = 0;
if (LI->getType()->isSized())		if (LI->getType()->isSized())
Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType());		Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType());
▲ Show 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	static bool hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
DEBUG(dbgs() << "LICM hoisting to " << Preheader->getName() << ": " << I		DEBUG(dbgs() << "LICM hoisting to " << Preheader->getName() << ": " << I
<< "\n");		<< "\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "Hoisted", &I) << "hoisting "		return OptimizationRemark(DEBUG_TYPE, "Hoisted", &I) << "hoisting "
<< ore::NV("Inst", &I);		<< ore::NV("Inst", &I);
});		});

// Metadata can be dependent on conditions we are hoisting above.		// Metadata can be dependent on conditions we are hoisting above.
// Conservatively strip all metadata on the instruction unless we were		// Conservatively strip all metadata on the instruction unless we can keep it.
		hfinkelUnsubmitted Done Reply Inline Actions This now sounds fairly tautological. How about saying, "Except when we can prove the metadata independent of any such conditions, strip it." (instead of "Conservatively strip all metadata on the instruction unless we can keep it.") hfinkel: This now sounds fairly tautological. How about saying, "Except when we can prove the metadata…
// guaranteed to execute I if we entered the loop, in which case the metadata
// is valid in the loop preheader.
if (I.hasMetadataOtherThanDebugLoc() &&		if (I.hasMetadataOtherThanDebugLoc() &&
// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning		// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning
// time in isGuaranteedToExecute if we don't actually have anything to		// time in isGuaranteedToExecute if we don't actually have anything to
		hfinkelUnsubmitted Done Reply Inline Actions isGuaranteedToExecute -> canKeepMetadata hfinkel: isGuaranteedToExecute -> canKeepMetadata
// drop. It is a compile time optimization, not required for correctness.		// drop. It is a compile time optimization, not required for correctness.
!isGuaranteedToExecute(I, DT, CurLoop, SafetyInfo))		!canKeepMetadata(I, DT, CurLoop, SafetyInfo))
I.dropUnknownNonDebugMetadata();		I.dropUnknownNonDebugMetadata();

// Move the new node to the Preheader, before its terminator.		// Move the new node to the Preheader, before its terminator.
I.moveBefore(Preheader->getTerminator());		I.moveBefore(Preheader->getTerminator());

// Do not retain debug locations when we are moving instructions to different		// Do not retain debug locations when we are moving instructions to different
// basic blocks, because we want to avoid jumpy line tables. Calls, however,		// basic blocks, because we want to avoid jumpy line tables. Calls, however,
// need to retain their debug locs because they may be inlined.		// need to retain their debug locs because they may be inlined.
▲ Show 20 Lines • Show All 504 Lines • Show Last 20 Lines

test/Transforms/LICM/hoist-invariant-group-load.ll

This file was added.

				; RUN: opt -licm -disable-basicaa -S < %s \| FileCheck %s

				%struct.A = type { i32 (...)** }

				; CHECK-LABEL: @hoist(
				define void @hoist(%struct.A* %arg) {
				entry:
				br i1 undef, label %while.end, label %while.body.lr.ph

				; CHECK: while.body.lr.ph:
				while.body.lr.ph: ; preds = %entry
				; CHECK: [[VTABLE:%.]] = load void (%struct.A)*, void (%struct.A)*** [[B:%.*]], align 8, !invariant.group
				; CHECK-NEXT: [[TMP:%.]] = load void (%struct.A), void (%struct.A)** [[VTABLE]], align 8, !invariant.load
				; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
				%b = bitcast %struct.A* %arg to void (%struct.A)**
				br label %while.body

				while.body: ; preds = %while.body, %while.body.lr.ph
				; CHECK: while.body:

				%vtable = load void (%struct.A), void (%struct.A)*** %b, align 8, !invariant.group !1
				%tmp = load void (%struct.A), void (%struct.A)* %vtable, align 8, !invariant.load !1
				tail call void %tmp(%struct.A* %arg)
				%call = tail call i32 @bar()
				%tobool = icmp eq i32 %call, 0
				br i1 %tobool, label %while.end.loopexit, label %while.body

				while.end.loopexit: ; preds = %while.body
				br label %while.end

				while.end: ; preds = %while.end.loopexit, %entry
				ret void
				}

				; CHECK-LABEL: @hoist2(
				define void @hoist2(i8** %arg) {
				entry:
				%call1 = tail call i32 @bar()
				%tobool2 = icmp eq i32 %call1, 0
				br i1 %tobool2, label %while.end, label %while.body.lr.ph

				while.body.lr.ph: ; preds = %entry
				; CHECK: while.body.lr.ph:
				; CHECK-NEXT: [[X:%.]] = load i8, i8** [[ARG:%.*]], align 8, !invariant.group
				; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
				br label %while.body

				; CHECK: while.body:
				while.body: ; preds = %while.body, %while.body.lr.ph
				%x = load i8, i8* %arg, align 8, !invariant.group !1
				call void @foo(i8* %x)
				%call = tail call i32 @bar()
				%tobool = icmp eq i32 %call, 0
				br i1 %tobool, label %while.end.loopexit, label %while.body

				while.end.loopexit: ; preds = %while.body
				br label %while.end

				while.end: ; preds = %while.end.loopexit, %entry
				ret void
				}

				declare void @foo(i8*)

				declare i32 @bar()

				; CHECK-LABEL: @dontHoist(
				define void @dontHoist(%struct.A** %a) {

				entry:
				%call4 = tail call i32 @bar()
				%cmp5 = icmp sgt i32 %call4, 0
				br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret void

				; CHECK: for.body:
				for.body:
				; CHECK: [[VTABLE:%.]] = load void (%struct.A)*, void (%struct.A)*** {{.}}, align 8, !dereferenceable !{{.}}, !invariant.group
				; CHECK-NEXT: [[TMP2:%.]] = load void (%struct.A), void (%struct.A)** [[VTABLE]], align 8, !invariant.load
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds %struct.A, %struct.A* %a, i64 %indvars.iv
				%tmp = load %struct.A, %struct.A* %arrayidx, align 8
				%tmp1 = bitcast %struct.A* %tmp to void (%struct.A)**
				%vtable = load void (%struct.A), void (%struct.A)*** %tmp1, align 8, !dereferenceable !0, !invariant.group !1
				%tmp2 = load void (%struct.A), void (%struct.A)* %vtable, align 8, !invariant.load !1
				tail call void %tmp2(%struct.A* %tmp)
				%indvars.iv.next = add nuw i64 %indvars.iv, 1
				%call = tail call i32 @bar()
				%tmp3 = sext i32 %call to i64
				%cmp = icmp slt i64 %indvars.iv.next, %tmp3
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}

				; CHECK-LABEL: @donthoist2(
				define void @donthoist2(i8** dereferenceable(8) %arg) {
				entry:
				br i1 undef, label %while.end, label %while.body.lr.ph

				while.body.lr.ph: ; preds = %entry
				br label %while.body

				; CHECK: while.body:
				while.body: ; preds = %while.body, %while.body.lr.ph
				; CHECK: [[X:%.]] = load i8, i8** [[ARG:%.*]], align 8, !invariant.group
				%call = tail call i32 @bar()
				%x = load i8, i8* %arg, align 8, !invariant.group !1
				call void @foo(i8* %x)

				%tobool = icmp eq i32 %call, 0
				br i1 %tobool, label %while.end.loopexit, label %while.body

				while.end.loopexit: ; preds = %while.body
				br label %while.end

				while.end: ; preds = %while.end.loopexit, %entry
				ret void
				}

				!0 = !{i64 8}
				!1 = !{}