Download Raw Diff

Details

Reviewers

fhahn
efriedma
dmgreen
rupprecht

Commits

rG5344d8e10bb7: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address…

Summary

When checking the profitability of folding an address computation
into a memory instruction, the compiler tries to determine the liveness
of the values, comprising the address, at the point of the memory instruction.
This patch improves on the live variable estimates by including
the loop invariants which are references in the loop body.

Diff Detail

Event Timeline

chill created this revision.Feb 13 2023, 2:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

chill requested review of this revision.Feb 13 2023, 2:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B213379: Diff 496896.Feb 13 2023, 2:51 AM

chill added a parent revision: D143896: [NFC][CodeGenPrepare] Match against the correct instruction when checking profitability of folding an address.Feb 13 2023, 2:57 AM

chill added a child revision: D143898: [CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores.

chill retitled this revision from [CodeGenPrepare] Loop invariant liveness to [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.Feb 13 2023, 6:52 AM

chill edited the summary of this revision. (Show Details)

chill added reviewers: fhahn, efriedma.Feb 13 2023, 8:37 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 13 2023, 8:37 AM

chill updated this revision to Diff 503841.Mar 9 2023, 10:35 AM

Harbormaster completed remote builds in B218442: Diff 503841.Mar 9 2023, 12:16 PM

Ping?

Chages: make the limit on loop invariant users to scan a parameter

Harbormaster completed remote builds in B222543: Diff 509399.Mar 29 2023, 9:33 AM

Ping?

dmgreen added a subscriber: dmgreen.Apr 19 2023, 3:51 AM

dmgreen added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
274	-> loop invariant
5124	"If the value is loop invariant"

chill updated this revision to Diff 515008.Apr 19 2023, 10:06 AM

chill marked 2 inline comments as done.

Harbormaster completed remote builds in B226640: Diff 515008.Apr 19 2023, 10:06 AM

mingmingl added a subscriber: mingmingl.Apr 19 2023, 10:22 AM

I ran some tests and this showed some nice improvements. LGTM, thanks.

This revision is now accepted and ready to land.Apr 20 2023, 5:54 AM

This revision was landed with ongoing or failed builds.Apr 24 2023, 2:26 AM

Closed by commit rG5344d8e10bb7: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address… (authored by chill). · Explain Why

This revision was automatically updated to reflect the committed changes.

chill added a commit: rG5344d8e10bb7: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address….

This discovers unstable behavior.
For me, MCA/ResourceManaget.o sometimes differs.
Reproduced by stage1 clang.

Something sensitive might be in LoopInfo.

llvm/lib/CodeGen/CodeGenPrepare.cpp
5127	L is unstable.

In D143897#4301532, @chapuni wrote:

This discovers unstable behavior.
For me, MCA/ResourceManaget.o sometimes differs.
Reproduced by stage1 clang.

Same here, I also bisected to this commit. I found it separately via CodeGen/PrologEpilogInserter.o which differs ~10-20% of the time. The minimal set of flags to reproduce it with (for me at least) is -O3 -fno-exceptions.

rupprecht added a reverting change: rGfbf42f1fe2b5: Revert "[CodeGenPrepare] Estimate liveness of loop invariants when checking for….Apr 27 2023, 7:16 PM

Thank you, I'll have a look.

This revision is now accepted and ready to land.Apr 28 2023, 1:43 AM

chill planned changes to this revision.Apr 28 2023, 1:43 AM

@chill, are you able to reproduce this?

I was reducing this last night so that I could give you a .ll test case that exhibits this behavior, but that was derailed by something and I have to start over. If you can already reproduce it yourself, I'll just stop the process.

In D143897#4305401, @rupprecht wrote:

@chill, are you able to reproduce this?

I was reducing this last night so that I could give you a .ll test case that exhibits this behavior, but that was derailed by something and I have to start over. If you can already reproduce it yourself, I'll just stop the process.

No, I don't have anything (I've not done anything for this problem either, I'm busy elsewhere).

Any input/data would be greatly appreciated!

I'm not making very good progress with a test case. So far the only interesting insights I may have:

This only reproduces in optimized builds. Setting -UNDEBUG makes the problem go away. Maybe debug builds are normalizing something?
The problem obvious goes away with -mllvm -cgp-max-loop-inv-users-to-scan=0. One of my guesses was that V->uses() ordering might be non-deterministic, so you could get different answers in isUsedInLoop if the loop is terminated before you visit the instruction that would cause you to return true. That is not the case: the problem still exists with -mllvm -cgp-max-loop-inv-users-to-scan=10000.

Otherwise, if I do the usual process of getting clang to dump the IR and pipe that into opt, or various things like that, the process is deterministic. If you have any guess for what might get a reproducer, I can give it a shot.

All the operations involved here should be deterministic. You're probably looking for a bug elsewhere. One possibility is that LoopInfo doesn't contain the correct information... for example, the CFG changes, but that change isn't recorded in LoopInfo.

In D143897#4310910, @efriedma wrote:

All the operations involved here should be deterministic. You're probably looking for a bug elsewhere. One possibility is that LoopInfo doesn't contain the correct information... for example, the CFG changes, but that change isn't recorded in LoopInfo.

Ack. To clarify, I'm fairly confident this patch _causes_ non-determinism, as I can bisect to this commit and reproduce it, and not at the commit prior to it, etc. But of course this might be doing a valid, deterministic transformation that triggers an existing non-determinism bug later on, or something like that.

Reproduced with a stage2 compiler when compiling llvm/lib/MCA/HardwareUnits/ResourceManager.cpp.

Minor update on top of D150210

This revision is now accepted and ready to land.May 9 2023, 10:42 AM

chill edited parent revisions, added: D150210: [CodeGenPrepare] Fix for using outdated/corrupt LoopInfo; removed: D143896: [NFC][CodeGenPrepare] Match against the correct instruction when checking profitability of folding an address.May 9 2023, 10:43 AM

chill marked an inline comment as done.May 9 2023, 10:47 AM

Harbormaster completed remote builds in B230913: Diff 520757.May 9 2023, 12:37 PM

No non-determinism detected with this on top of D150210, so LGTM. Thanks!

chill added a child revision: D150384: [CodeGenPrepare] Fix for using outdated/corrupt LoopInfo.May 11 2023, 10:33 AM

chill removed a parent revision: D150210: [CodeGenPrepare] Fix for using outdated/corrupt LoopInfo.

chill removed a child revision: D150384: [CodeGenPrepare] Fix for using outdated/corrupt LoopInfo.

chill added a parent revision: D150384: [CodeGenPrepare] Fix for using outdated/corrupt LoopInfo.

chill updated this revision to Diff 521367.May 11 2023, 10:35 AM

Harbormaster completed remote builds in B231383: Diff 521367.May 11 2023, 12:38 PM

chill removed a child revision: D143898: [CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores.Jun 2 2023, 6:14 AM

Rebase/refresh, no changes

chill added a child revision: D152827: [AArch64] Correctly determine if {ADD,SUB}{W,X}rs instructions are cheap.Jun 13 2023, 9:23 AM

Harbormaster completed remote builds in B238514: Diff 530938.Jun 13 2023, 10:46 AM

chill removed a child revision: D152827: [AArch64] Correctly determine if {ADD,SUB}{W,X}rs instructions are cheap.Jul 7 2023, 8:19 AM

chill edited parent revisions, added: D153638: [CodeGenPrepare][NFC] Update the dominator tree instead of rebuilding it; removed: D150384: [CodeGenPrepare] Fix for using outdated/corrupt LoopInfo.

Diff 530938

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
static cl::opt<unsigned>		static cl::opt<unsigned>
HugeFuncThresholdInCGPP("cgpp-huge-func", cl::init(10000), cl::Hidden,		HugeFuncThresholdInCGPP("cgpp-huge-func", cl::init(10000), cl::Hidden,
cl::desc("Least BB number of huge function."));		cl::desc("Least BB number of huge function."));

static cl::opt<unsigned>		static cl::opt<unsigned>
MaxAddressUsersToScan("cgp-max-address-users-to-scan", cl::init(100),		MaxAddressUsersToScan("cgp-max-address-users-to-scan", cl::init(100),
cl::Hidden,		cl::Hidden,
cl::desc("Max number of address users to look at"));		cl::desc("Max number of address users to look at"));

		static cl::opt<unsigned> MaxLoopInvUsersToScan(
		"cgp-max-loop-inv-users-to-scan", cl::init(20), cl::Hidden,
		cl::desc("Max number of loop invariant users to look at"));
		dmgreenUnsubmitted Done Reply Inline Actions -> loop invariant dmgreen: -> loop invariant
namespace {		namespace {

enum ExtType {		enum ExtType {
ZeroExtension, // Zero extension has been seen.		ZeroExtension, // Zero extension has been seen.
SignExtension, // Sign extension has been seen.		SignExtension, // Sign extension has been seen.
BothExtension // This extension type is used if we saw sext after		BothExtension // This extension type is used if we saw sext after
// ZeroExtension had been set, or if we saw zext after		// ZeroExtension had been set, or if we saw zext after
// SignExtension had been set. It makes the type		// SignExtension had been set. It makes the type
▲ Show 20 Lines • Show All 4,797 Lines • ▼ Show 20 Lines	static bool FindAllMemoryUses(
const TargetLowering &TLI, const TargetRegisterInfo &TRI, bool OptSize,		const TargetLowering &TLI, const TargetRegisterInfo &TRI, bool OptSize,
ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {		ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {
unsigned SeenInsts = 0;		unsigned SeenInsts = 0;
SmallPtrSet<Instruction *, 16> ConsideredInsts;		SmallPtrSet<Instruction *, 16> ConsideredInsts;
return FindAllMemoryUses(I, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,		return FindAllMemoryUses(I, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,
PSI, BFI, SeenInsts);		PSI, BFI, SeenInsts);
}		}

		static bool isUsedInLoop(const Value V, const Loop L) {
		unsigned N = 0;

		for (const Use &U : V->uses()) {
		if (++N > MaxLoopInvUsersToScan)
		break;
		const Instruction *UserI = cast<Instruction>(U.getUser());
		if (L->contains(UserI->getParent()))
		return true;
		}

		return false;
		}

/// Return true if Val is already known to be live at the use site that we're		/// Return true if Val is already known to be live at the use site that we're
/// folding it into. If so, there is no cost to include it in the addressing		/// folding it into. If so, there is no cost to include it in the addressing
/// mode. KnownLive1 and KnownLive2 are two values that we know are live at the		/// mode. KnownLive1 and KnownLive2 are two values that we know are live at the
/// instruction already.		/// instruction already.
bool AddressingModeMatcher::valueAlreadyLiveAtInst(Value *Val,		bool AddressingModeMatcher::valueAlreadyLiveAtInst(Value *Val,
Value *KnownLive1,		Value *KnownLive1,
Value *KnownLive2) {		Value *KnownLive2) {
// If Val is either of the known-live values, we know it is live!		// If Val is either of the known-live values, we know it is live!
if (Val == nullptr \|\| Val == KnownLive1 \|\| Val == KnownLive2)		if (Val == nullptr \|\| Val == KnownLive1 \|\| Val == KnownLive2)
return true;		return true;

// All values other than instructions and arguments (e.g. constants) are live.		// All values other than instructions and arguments (e.g. constants) are live.
if (!isa<Instruction>(Val) && !isa<Argument>(Val))		if (!isa<Instruction>(Val) && !isa<Argument>(Val))
return true;		return true;

// If Val is a constant sized alloca in the entry block, it is live, this is		// If Val is a constant sized alloca in the entry block, it is live, this is
// true because it is just a reference to the stack/frame pointer, which is		// true because it is just a reference to the stack/frame pointer, which is
// live for the whole function.		// live for the whole function.
if (AllocaInst *AI = dyn_cast<AllocaInst>(Val))		if (AllocaInst *AI = dyn_cast<AllocaInst>(Val))
if (AI->isStaticAlloca())		if (AI->isStaticAlloca())
return true;		return true;

		// If the value is loop invariant and is used in the loop which contains the
		dmgreenUnsubmitted Done Reply Inline Actions "If the value is loop invariant" dmgreen: "If the value is loop invariant"
		// memory instruction, it's live.
		BasicBlock *BB = MemoryInst->getParent();
		if (Loop *L = LI.getLoopFor(BB);
		chapuniUnsubmitted Done Reply Inline Actions L is unstable. chapuni: L is unstable.
		L && L->isLoopInvariant(Val) && isUsedInLoop(Val, L))
		return true;

// Check to see if this value is already used in the memory instruction's		// Check to see if this value is already used in the memory instruction's
// block. If so, it's already live into the block at the very least, so we		// block. If so, it's already live into the block at the very least, so we
// can reasonably fold it.		// can reasonably fold it.
return Val->isUsedInBasicBlock(MemoryInst->getParent());		return Val->isUsedInBasicBlock(BB);
}		}

/// It is possible for the addressing mode of the machine to fold the specified		/// It is possible for the addressing mode of the machine to fold the specified
/// instruction into a load or store that ultimately uses it.		/// instruction into a load or store that ultimately uses it.
/// However, the specified instruction has multiple uses.		/// However, the specified instruction has multiple uses.
/// Given this, it may actually increase register pressure to fold it		/// Given this, it may actually increase register pressure to fold it
/// into the load. For example, consider this code:		/// into the load. For example, consider this code:
///		///
▲ Show 20 Lines • Show All 3,502 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/gep-sink-loop-inv-live.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -codegenprepare < %s \| FileCheck %s
				target triple = "aarch64-linux"

				declare void @use(...)
				declare i64 @next(i64)

				define void @f(ptr %a, i64 %k, i64 %n, ptr %q) {
				; CHECK-LABEL: @f(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[COND:%.*]]
				; CHECK: cond:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[J:%.]], [[IF_THEN:%.]] ], [ [[I]], [[IF_ELSE:%.*]] ]
				; CHECK-NEXT: [[P:%.]] = getelementptr i32, ptr [[A:%.]], i64 [[I]]
				; CHECK-NEXT: [[C0:%.]] = icmp ult i64 [[I]], [[N:%.]]
				; CHECK-NEXT: br i1 [[C0]], label [[LOOP:%.]], label [[EXIT:%.]]
				; CHECK: loop:
				; CHECK-NEXT: [[J]] = call i64 @next(i64 [[I]])
				; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[I]], 4
				; CHECK-NEXT: [[SUNKADDR1:%.*]] = getelementptr i8, ptr [[A]], i64 [[SUNKADDR]]
				; CHECK-NEXT: [[V:%.*]] = load i32, ptr [[SUNKADDR1]], align 4
				; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[V]], 0
				; CHECK-NEXT: br i1 [[C1]], label [[IF_THEN]], label [[IF_ELSE]]
				; CHECK: if.then:
				; CHECK-NEXT: store ptr [[P]], ptr [[Q:%.*]], align 8
				; CHECK-NEXT: br label [[COND]]
				; CHECK: if.else:
				; CHECK-NEXT: call void @use(ptr [[A]])
				; CHECK-NEXT: br label [[COND]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %cond

				cond:
				%i = phi i64 [0, %entry], [%i.next, %next]
				%p = getelementptr i32, ptr %a, i64 %i
				%c0 = icmp ult i64 %i, %n
				br i1 %c0, label %loop, label %exit

				loop:
				%j = call i64 @next(i64 %i)
				%v = load i32, ptr %p
				%c1 = icmp slt i32 %v, 0
				br i1 %c1, label %if.then, label %if.else

				if.then:
				store ptr %p, ptr %q
				br label %next

				if.else:
				call void @use(ptr %a)
				br label %next

				next:
				%i.next = phi i64 [%j, %if.then], [%i, %if.else]
				br label %cond

				exit:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability
AcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 530938

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/test/CodeGen/AArch64/gep-sink-loop-inv-live.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitabilityAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 530938

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/test/CodeGen/AArch64/gep-sink-loop-inv-live.ll

[CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability
AcceptedPublic