This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/CodeGen/
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp

Differential D143896

[NFC][CodeGenPrepare] Match against the correct instruction when checking profitability of folding an address
ClosedPublic

Authored by chill on Feb 13 2023, 2:49 AM.

Download Raw Diff

Details

Reviewers

mkazantsev
reames

Commits

rG4f02a0f60693: [NFC][CodeGenPrepare] Match against the correct instruction when checking…

Summary

The "nested" AddressingModeMatchers in
AddressingModeMatcher::isProfitableToFoldIntoAddressingMode are constructed
using the original memory instruction, even though they check whether the
address operand of a differrent memory instructon is foldable. The memory
instruction is used only for a dominance check (when not checking for
profitability), and using the wrong memory instruction does not change the
outcome of the test - if an address is foldable, the dominance test afects which
of the two possible ways to fold is chosen, but this result is discarded.

As an example, in

target triple = "x86_64-linux"

declare i1 @check(i64, i64)
define i32 @f(i1 %cc, ptr %p, ptr %q, i64 %n) {
entry:
  br label %loop

loop:
  %iv = phi i64 [ %i, %C ], [ 0, %entry ]
  %offs = mul i64 %iv, 4

  %c.0 = icmp  ult i64 %iv, %n
  br i1 %c.0, label %A, label %fail

A:
  br i1 %cc, label %B, label %C

C:
  %u = phi i32 [0, %A], [%w, %B]
  %i = add i64 %iv, 1
  %a.0 = getelementptr i8, ptr %p, i64 %offs
  %a.1 = getelementptr i8, ptr %a.0, i64 4
  %v = load i32, ptr %a.1
  %c.1 = icmp eq i32 %v, %u
  br i1 %c.1, label %exit, label %loop

B:
  %a.2 = getelementptr i8, ptr %p, i64 %offs
  %a.3 = getelementptr i8, ptr %a.2, i64 4
  %w = load i32, ptr %a.3
  br label %C

exit:
  ret i32 -1

fail:
   ret i32 0
}

the dominance test is perfomed between %i = ... and %v = ... at the moment
we're checking whether %a3 = ... is foldable

Using the memory instruction, which uses the interesting address is "more
correct" and this change is needed by a future patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

chill created this revision.Feb 13 2023, 2:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:49 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

chill requested review of this revision.Feb 13 2023, 2:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:49 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B213378: Diff 496895.Feb 13 2023, 2:50 AM

chill added a parent revision: D143895: [AArch64] Fix incorrect `isLegalAddressingMode`.Feb 13 2023, 2:57 AM

chill added a child revision: D143897: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.

chill edited the summary of this revision. (Show Details)Feb 13 2023, 6:44 AM

Herald added a subscriber: pengfei. · View Herald TranscriptFeb 13 2023, 6:44 AM

chill added reviewers: mkazantsev, reames.Feb 13 2023, 8:36 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 13 2023, 8:36 AM

chill updated this revision to Diff 503840.Mar 9 2023, 10:34 AM

Harbormaster completed remote builds in B218441: Diff 503840.Mar 9 2023, 12:14 PM

PIng?

Ping?

I think this is fine.

This revision is now accepted and ready to land.Apr 12 2023, 2:22 AM

Rebased to resolve a trivial merge conflict (int &SeenInsts vs. unsigned &SeenInsts)

This revision was landed with ongoing or failed builds.Apr 21 2023, 10:10 AM

Closed by commit rG4f02a0f60693: [NFC][CodeGenPrepare] Match against the correct instruction when checking… (authored by chill). · Explain Why

This revision was automatically updated to reflect the committed changes.

chill added a commit: rG4f02a0f60693: [NFC][CodeGenPrepare] Match against the correct instruction when checking….

Harbormaster completed remote builds in B227229: Diff 515793.Apr 21 2023, 11:47 AM

chill removed a child revision: D143897: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.May 9 2023, 10:43 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

21 lines

Diff 515817

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,970 Lines • ▼ Show 20 Lines	static bool IsOperandAMemoryOperand(CallInst CI, InlineAsm IA, Value *OpVal,

return true;		return true;
}		}

/// Recursively walk all the uses of I until we find a memory use.		/// Recursively walk all the uses of I until we find a memory use.
/// If we find an obviously non-foldable instruction, return true.		/// If we find an obviously non-foldable instruction, return true.
/// Add accessed addresses and types to MemoryUses.		/// Add accessed addresses and types to MemoryUses.
static bool FindAllMemoryUses(		static bool FindAllMemoryUses(
Instruction I, SmallVectorImpl<std::pair<Value , Type *>> &MemoryUses,		Instruction I, SmallVectorImpl<std::pair<Use , Type *>> &MemoryUses,
SmallPtrSetImpl<Instruction *> &ConsideredInsts, const TargetLowering &TLI,		SmallPtrSetImpl<Instruction *> &ConsideredInsts, const TargetLowering &TLI,
const TargetRegisterInfo &TRI, bool OptSize, ProfileSummaryInfo *PSI,		const TargetRegisterInfo &TRI, bool OptSize, ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI, unsigned &SeenInsts) {		BlockFrequencyInfo *BFI, unsigned &SeenInsts) {
// If we already considered this instruction, we're done.		// If we already considered this instruction, we're done.
if (!ConsideredInsts.insert(I).second)		if (!ConsideredInsts.insert(I).second)
return false;		return false;

// If this is an obviously unfoldable instruction, bail out.		// If this is an obviously unfoldable instruction, bail out.
if (!MightBeFoldableInst(I))		if (!MightBeFoldableInst(I))
return true;		return true;

// Loop over all the uses, recursively processing them.		// Loop over all the uses, recursively processing them.
for (Use &U : I->uses()) {		for (Use &U : I->uses()) {
// Conservatively return true if we're seeing a large number or a deep chain		// Conservatively return true if we're seeing a large number or a deep chain
// of users. This avoids excessive compilation times in pathological cases.		// of users. This avoids excessive compilation times in pathological cases.
if (SeenInsts++ >= MaxAddressUsersToScan)		if (SeenInsts++ >= MaxAddressUsersToScan)
return true;		return true;

Instruction *UserI = cast<Instruction>(U.getUser());		Instruction *UserI = cast<Instruction>(U.getUser());
if (LoadInst *LI = dyn_cast<LoadInst>(UserI)) {		if (LoadInst *LI = dyn_cast<LoadInst>(UserI)) {
MemoryUses.push_back({U.get(), LI->getType()});		MemoryUses.push_back({&U, LI->getType()});
continue;		continue;
}		}

if (StoreInst *SI = dyn_cast<StoreInst>(UserI)) {		if (StoreInst *SI = dyn_cast<StoreInst>(UserI)) {
if (U.getOperandNo() != StoreInst::getPointerOperandIndex())		if (U.getOperandNo() != StoreInst::getPointerOperandIndex())
return true; // Storing addr, not into addr.		return true; // Storing addr, not into addr.
MemoryUses.push_back({U.get(), SI->getValueOperand()->getType()});		MemoryUses.push_back({&U, SI->getValueOperand()->getType()});
continue;		continue;
}		}

if (AtomicRMWInst *RMW = dyn_cast<AtomicRMWInst>(UserI)) {		if (AtomicRMWInst *RMW = dyn_cast<AtomicRMWInst>(UserI)) {
if (U.getOperandNo() != AtomicRMWInst::getPointerOperandIndex())		if (U.getOperandNo() != AtomicRMWInst::getPointerOperandIndex())
return true; // Storing addr, not into addr.		return true; // Storing addr, not into addr.
MemoryUses.push_back({U.get(), RMW->getValOperand()->getType()});		MemoryUses.push_back({&U, RMW->getValOperand()->getType()});
continue;		continue;
}		}

if (AtomicCmpXchgInst *CmpX = dyn_cast<AtomicCmpXchgInst>(UserI)) {		if (AtomicCmpXchgInst *CmpX = dyn_cast<AtomicCmpXchgInst>(UserI)) {
if (U.getOperandNo() != AtomicCmpXchgInst::getPointerOperandIndex())		if (U.getOperandNo() != AtomicCmpXchgInst::getPointerOperandIndex())
return true; // Storing addr, not into addr.		return true; // Storing addr, not into addr.
MemoryUses.push_back({U.get(), CmpX->getCompareOperand()->getType()});		MemoryUses.push_back({&U, CmpX->getCompareOperand()->getType()});
continue;		continue;
}		}

if (CallInst *CI = dyn_cast<CallInst>(UserI)) {		if (CallInst *CI = dyn_cast<CallInst>(UserI)) {
if (CI->hasFnAttr(Attribute::Cold)) {		if (CI->hasFnAttr(Attribute::Cold)) {
// If this is a cold call, we can sink the addressing calculation into		// If this is a cold call, we can sink the addressing calculation into
// the cold path. See optimizeCallInst		// the cold path. See optimizeCallInst
bool OptForSize =		bool OptForSize =
Show All 16 Lines	if (FindAllMemoryUses(UserI, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,
PSI, BFI, SeenInsts))		PSI, BFI, SeenInsts))
return true;		return true;
}		}

return false;		return false;
}		}

static bool FindAllMemoryUses(		static bool FindAllMemoryUses(
Instruction I, SmallVectorImpl<std::pair<Value , Type *>> &MemoryUses,		Instruction I, SmallVectorImpl<std::pair<Use , Type *>> &MemoryUses,
const TargetLowering &TLI, const TargetRegisterInfo &TRI, bool OptSize,		const TargetLowering &TLI, const TargetRegisterInfo &TRI, bool OptSize,
ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {		ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {
unsigned SeenInsts = 0;		unsigned SeenInsts = 0;
SmallPtrSet<Instruction *, 16> ConsideredInsts;		SmallPtrSet<Instruction *, 16> ConsideredInsts;
return FindAllMemoryUses(I, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,		return FindAllMemoryUses(I, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,
PSI, BFI, SeenInsts);		PSI, BFI, SeenInsts);
}		}

▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::isProfitableToFoldIntoAddressingMode(
// ranges, we're ok with it.		// ranges, we're ok with it.
if (!BaseReg && !ScaledReg)		if (!BaseReg && !ScaledReg)
return true;		return true;

// If all uses of this instruction can have the address mode sunk into them,		// If all uses of this instruction can have the address mode sunk into them,
// we can remove the addressing mode and effectively trade one live register		// we can remove the addressing mode and effectively trade one live register
// for another (at worst.) In this context, folding an addressing mode into		// for another (at worst.) In this context, folding an addressing mode into
// the use is just a particularly nice way of sinking it.		// the use is just a particularly nice way of sinking it.
SmallVector<std::pair<Value , Type >, 16> MemoryUses;		SmallVector<std::pair<Use , Type >, 16> MemoryUses;
if (FindAllMemoryUses(I, MemoryUses, TLI, TRI, OptSize, PSI, BFI))		if (FindAllMemoryUses(I, MemoryUses, TLI, TRI, OptSize, PSI, BFI))
return false; // Has a non-memory, non-foldable use!		return false; // Has a non-memory, non-foldable use!

// Now that we know that all uses of this instruction are part of a chain of		// Now that we know that all uses of this instruction are part of a chain of
// computation involving only operations that could theoretically be folded		// computation involving only operations that could theoretically be folded
// into a memory use, loop over each of these memory operation uses and see		// into a memory use, loop over each of these memory operation uses and see
// if they could actually fold the instruction. The assumption is that		// if they could actually fold the instruction. The assumption is that
// addressing modes are cheap and that duplicating the computation involved		// addressing modes are cheap and that duplicating the computation involved
// many times is worthwhile, even on a fastpath. For sinking candidates		// many times is worthwhile, even on a fastpath. For sinking candidates
// (i.e. cold call sites), this serves as a way to prevent excessive code		// (i.e. cold call sites), this serves as a way to prevent excessive code
// growth since most architectures have some reasonable small and fast way to		// growth since most architectures have some reasonable small and fast way to
// compute an effective address. (i.e LEA on x86)		// compute an effective address. (i.e LEA on x86)
SmallVector<Instruction *, 32> MatchedAddrModeInsts;		SmallVector<Instruction *, 32> MatchedAddrModeInsts;
for (const std::pair<Value , Type > &Pair : MemoryUses) {		for (const std::pair<Use , Type > &Pair : MemoryUses) {
Value *Address = Pair.first;		Value *Address = Pair.first->get();
		Instruction *UserI = cast<Instruction>(Pair.first->getUser());
Type *AddressAccessTy = Pair.second;		Type *AddressAccessTy = Pair.second;
unsigned AS = Address->getType()->getPointerAddressSpace();		unsigned AS = Address->getType()->getPointerAddressSpace();

// Do a match against the root of this address, ignoring profitability. This		// Do a match against the root of this address, ignoring profitability. This
// will tell us if the addressing mode for the memory operation will		// will tell us if the addressing mode for the memory operation will
// actually cover the shared instruction.		// actually cover the shared instruction.
ExtAddrMode Result;		ExtAddrMode Result;
std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,
0);		0);
TypePromotionTransaction::ConstRestorationPt LastKnownGood =		TypePromotionTransaction::ConstRestorationPt LastKnownGood =
TPT.getRestorationPoint();		TPT.getRestorationPoint();
AddressingModeMatcher Matcher(MatchedAddrModeInsts, TLI, TRI, LI, getDTFn,		AddressingModeMatcher Matcher(MatchedAddrModeInsts, TLI, TRI, LI, getDTFn,
AddressAccessTy, AS, MemoryInst, Result,		AddressAccessTy, AS, UserI, Result,
InsertedInsts, PromotedInsts, TPT,		InsertedInsts, PromotedInsts, TPT,
LargeOffsetGEP, OptSize, PSI, BFI);		LargeOffsetGEP, OptSize, PSI, BFI);
Matcher.IgnoreProfitability = true;		Matcher.IgnoreProfitability = true;
bool Success = Matcher.matchAddr(Address, 0);		bool Success = Matcher.matchAddr(Address, 0);
(void)Success;		(void)Success;
assert(Success && "Couldn't select anything?");		assert(Success && "Couldn't select anything?");

// The match was to check the profitability, the changes made are not		// The match was to check the profitability, the changes made are not
▲ Show 20 Lines • Show All 3,410 Lines • Show Last 20 Lines