This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/2
TargetLowering.h
-
TargetSubtargetInfo.h
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
1/2
AArch64ISelLowering.cpp
-
AArch64Subtarget.h
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
call-addr-fold.ll

Differential D143898

[CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores
AbandonedPublic

Authored by chill on Feb 13 2023, 2:50 AM.

Download Raw Diff

Details

Reviewers

fhahn
t.p.northover
efriedma

Summary

The sinking of address computations to their users (loads/stores)
is often blocked by call instructions, which take the address as
a parameter - unless the call is "cold", it's considered a non-foldable use.

Considering the whole call sequence, including passing the arguments,
it is sometimes possible to materialize an address computation directly
into a hard register, in a sense "to fold the addressing mode into the call".

For example, on AArch64 the register-to-register copy
instruction ("C6.2.190 MOV (register)", which would likely by used to pass
a pre-computed address argument, is an alias to "C6.2.207 ORR (shifted register)"
and typically has the same latency and throughput as an "ADD" instruction.

This change tries to allow sinking of more addresses to load/store instructions
by preventing some call instructions from being blockers.

With this change CodeGenPrepare still does sinking only towards memory
loads/stores. It works in synergy with a MachineSink patch in
https://reviews.llvm.org/D145706, which does sinking towards calls.

This patch (together with the others up/down the stack) improves
SPECv6 500.perlbench_r by about 3.26% and the whole
of SPECv6 intrate by about 0.46% (geomean).

Diff Detail

Event Timeline

chill created this revision.Feb 13 2023, 2:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

chill requested review of this revision.Feb 13 2023, 2:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B213380: Diff 496899.Feb 13 2023, 2:51 AM

chill added a parent revision: D143897: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.Feb 13 2023, 2:57 AM

chill edited the summary of this revision. (Show Details)Feb 13 2023, 7:14 AM

Herald added a subscriber: kristof.beyls. · View Herald TranscriptFeb 13 2023, 7:14 AM

chill edited the summary of this revision. (Show Details)Feb 13 2023, 7:54 AM

chill added reviewers: fhahn, t.p.northover, efriedma.Feb 13 2023, 8:37 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 13 2023, 8:37 AM

chill edited the summary of this revision. (Show Details)Feb 13 2023, 8:38 AM

If I'm understanding correctly, the point is that we don't want to block sinking if an address computation has multiple uses, where only some are foldable?

Expressing this in terms of "folding into a call" seems confusing and unnecessary; we should just change the logic to allow sinking if we can fold some, but not all, the uses. (Maybe including some sort analysis of whether sinking increases the number of times we perform the address computation at runtime.)

In D143898#4123601, @efriedma wrote:

If I'm understanding correctly, the point is that we don't want to block sinking if an address computation has multiple uses, where only some are foldable?

Yes, for when we cannot say we aren't extending live ranges of the address "registers".

Expressing this in terms of "folding into a call" seems confusing and unnecessary;

To me it looks like a straightforward analogy.
For load/stores we have an addressing computation that's essentially free
when it's a part of the load/store instruction, as opposed to it being a separate instruction and using a simple indirect load/store.
For calls we have an addressing computation that's essentially free
when it's a part of the call sequence, as opposed to it being a separate instruction and using a simple register-to-register move.

we should just change the logic to allow sinking if we can fold some, but not all, the uses.
(Maybe including some sort analysis of whether sinking increases the number of times we perform the address computation at runtime.)

IMHO the main issue here is not extending live ranges too much. As for the runtime overhead, sinking a foldable address is assumed to
not increase the execution time, because of the checks in isLegalAddressingMode and (with this patch) in canFoldAddrModeIntoCall.

Perhaps, with PGO, we could use block frequencies in a way that a non-foldable, infrequently executed user does not prevent sinking, much like
the way we currently use Attribute::Cold. However, I don't think that's an alternative to this patch.

For calls we have an addressing computation that's essentially free when it's a part of the call sequence, as opposed to it being a separate instruction and using a simple register-to-register move.

There are a lot of cases where there isn't any extra "mov" that can hide the cost; the most common of those being the case where the value in question is spilled. But I guess I can see the analogy. I'd generally prefer to consider that sort of transform in terms of rematerialization, though. We can compute the cost of remat much more accurately in the register allocator.

IMHO the main issue here is not extending live ranges too much. As for the runtime overhead, sinking a foldable address is assumed to not increase the execution time, because of the checks in isLegalAddressingMode and (with this patch) in canFoldAddrModeIntoCall.

I was thinking more in terms of sinking to arbitrary uses, as opposed to only sinking some uses.

chill updated this revision to Diff 503842.Mar 9 2023, 10:35 AM

chill added a child revision: D145706: [MachineSink] Sink instruction copies when they can replace copy into hard register.

chill edited the summary of this revision. (Show Details)Mar 9 2023, 10:49 AM

Harbormaster completed remote builds in B218443: Diff 503842.Mar 9 2023, 12:25 PM

chill retitled this revision from [CodeGenPrepare] Fold addressing mode into calls to [CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores.Mar 10 2023, 8:25 AM

Ping?

For constructs like the given testcase, you don't need to reason about whether it's profitable to fold into a call; cloning a GEP into both sides of an if-else doesn't actually increase execution time, so it's obviously profitable even if you can only fold it on one side of the if-else. Can you add some examples that actually require predicting whether the GEP is as cheap as a mov, and the argument is passed in a register?

llvm/include/llvm/CodeGen/TargetLowering.h
3178	In this case, the addressing mode doesn't actually represent any computation, so it isn't relevant for this transform; when do you expect it to become relevant?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24528	Counting arguments like this won't do what you want if there are arguments which are passed in multiple registers.

chill added inline comments.May 10 2023, 7:45 AM

llvm/include/llvm/CodeGen/TargetLowering.h
3178	The logic of the `AddressingModeMatcher` is that it accumulates an addressing mode from partial "expressions". For example an addressing mode which ends up as `[reg + imm]` or `[reg + reg]` would have involved checks for legality of just `[reg]` here: https://github.com/llvm/llvm-project/blob/ddfb974d0fca62e3eaeb98b79b5e29738c9082d2/llvm/lib/CodeGen/CodeGenPrepare.cpp#L4926

chill planned changes to this revision.Jun 2 2023, 6:13 AM

chill removed a child revision: D145706: [MachineSink] Sink instruction copies when they can replace copy into hard register.

chill removed a parent revision: D143897: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.

chill added inline comments.Jun 13 2023, 9:42 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24528	Indeed. For now, I'm proposing a different approach, doing the transformation entirely in `MachineSink`. The relevant review is https://reviews.llvm.org/D152828 If it doesn't fly, I'll come back to this review and think how to solve this above issue.

In D143898#4303461, @efriedma wrote:

For constructs like the given testcase, you don't need to reason about whether it's profitable to fold into a call; cloning a GEP into both sides of an if-else doesn't actually increase execution time, so it's obviously profitable even if you can only fold it on one side of the if-else. Can you add some examples that actually require predicting whether the GEP is as cheap as a mov, and the argument is passed in a register?

These test cases are constructed just to have the addressing computation and its users in separate basic blocks.
In our motivating examples, we have the addressing computation as loop invariant and its users inside a loop body, so
just sinking copies might increase run time.

I have added such test cases in https://reviews.llvm.org/D152828 : sink-and-fold.ll, functions f4 and f5.

chill abandoned this revision.Oct 23 2023, 10:38 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

7 lines

TargetSubtargetInfo.h

3 lines

lib/

CodeGen/

CodeGenPrepare.cpp

128 lines

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

32 lines

AArch64Subtarget.h

3 lines

test/

CodeGen/

AArch64/

call-addr-fold.ll

173 lines

Diff 503842

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 3,165 Lines • ▼ Show 20 Lines	public:
/// returned.		/// returned.
virtual Value *createComplexDeinterleavingIR(		virtual Value *createComplexDeinterleavingIR(
Instruction *I, ComplexDeinterleavingOperation OperationType,		Instruction *I, ComplexDeinterleavingOperation OperationType,
ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,		ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,
Value *Accumulator = nullptr) const {		Value *Accumulator = nullptr) const {
return nullptr;		return nullptr;
}		}

		/// Return true if it is beneficial to fold address calculation into a call
		/// sequence.
		virtual bool canFoldAddrModeIntoCall(const CallBase &CB, unsigned ArgNo,
		const AddrMode &AM) const {
		return AM.HasBaseReg && AM.BaseGV == nullptr && AM.Scale == 0 && AM.BaseOffs == 0;
		efriedmaUnsubmitted Not Done Reply Inline Actions In this case, the addressing mode doesn't actually represent any computation, so it isn't relevant for this transform; when do you expect it to become relevant? efriedma: In this case, the addressing mode doesn't actually represent any computation, so it isn't…
		chillAuthorUnsubmitted Done Reply Inline Actions The logic of the `AddressingModeMatcher` is that it accumulates an addressing mode from partial "expressions". For example an addressing mode which ends up as `[reg + imm]` or `[reg + reg]` would have involved checks for legality of just `[reg]` here: https://github.com/llvm/llvm-project/blob/ddfb974d0fca62e3eaeb98b79b5e29738c9082d2/llvm/lib/CodeGen/CodeGenPrepare.cpp#L4926 chill: The logic of the `AddressingModeMatcher` is that it accumulates an addressing mode from partial…
		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Runtime Library hooks		// Runtime Library hooks
//		//

/// Rename the default libcall routine name for the specified libcall.		/// Rename the default libcall routine name for the specified libcall.
void setLibcallName(RTLIB::Libcall Call, const char *Name) {		void setLibcallName(RTLIB::Libcall Call, const char *Name) {
LibcallRoutineNames[Call] = Name;		LibcallRoutineNames[Call] = Name;
}		}
▲ Show 20 Lines • Show All 2,081 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	public:
virtual bool enablePostRAMachineScheduler() const;		virtual bool enablePostRAMachineScheduler() const;

/// True if the subtarget should run the atomic expansion pass.		/// True if the subtarget should run the atomic expansion pass.
virtual bool enableAtomicExpand() const;		virtual bool enableAtomicExpand() const;

/// True if the subtarget should run the indirectbr expansion pass.		/// True if the subtarget should run the indirectbr expansion pass.
virtual bool enableIndirectBrExpand() const;		virtual bool enableIndirectBrExpand() const;

		/// Enable folding of address computations into call sequences.
		virtual bool enableCallAddrFold() const { return false; }

/// Override generic scheduling policy within a region.		/// Override generic scheduling policy within a region.
///		///
/// This is a convenient way for targets that don't provide any custom		/// This is a convenient way for targets that don't provide any custom
/// scheduling heuristics (no custom MachineSchedStrategy) to make		/// scheduling heuristics (no custom MachineSchedStrategy) to make
/// changes to the generic scheduling policy.		/// changes to the generic scheduling policy.
virtual void overrideSchedPolicy(MachineSchedPolicy &Policy,		virtual void overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const {}		unsigned NumRegionInstrs) const {}

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,194 Lines • ▼ Show 20 Lines	static bool despeculateCountZeros(IntrinsicInst *CountZeros,
// We are explicitly handling the zero case, so we can set the intrinsic's		// We are explicitly handling the zero case, so we can set the intrinsic's
// undefined zero argument to 'true'. This will also prevent reprocessing the		// undefined zero argument to 'true'. This will also prevent reprocessing the
// intrinsic; we only despeculate when a zero input is defined.		// intrinsic; we only despeculate when a zero input is defined.
CountZeros->setArgOperand(1, Builder.getTrue());		CountZeros->setArgOperand(1, Builder.getTrue());
ModifiedDT = ModifyDT::ModifyBBDT;		ModifiedDT = ModifyDT::ModifyBBDT;
return true;		return true;
}		}

		static bool isRegularCall(const CallBase *CB) {
		return !CB->isInlineAsm() && CB->getIntrinsicID() == Intrinsic::not_intrinsic;
		}

		static const CallBase getRegularCall(const Instruction I) {
		if (const auto *CB = dyn_cast<CallBase>(I); CB && isRegularCall(CB))
		return CB;
		return nullptr;
		}

		static unsigned getRegularCallArgNo(const Instruction I, Value V) {
		if (const CallBase *CB = getRegularCall(I)) {
		unsigned AN = 0;
		for (const Use &U : CB->args()) {
		if (U == V)
		return AN;
		++AN;
		}
		}
		return ~0;
		}

bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {		bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {
BasicBlock *BB = CI->getParent();		BasicBlock *BB = CI->getParent();

// Lower inline assembly if we can.		// Lower inline assembly if we can.
// If we found an inline asm expession, and if the target knows how to		// If we found an inline asm expession, and if the target knows how to
// lower it to normal LLVM code, do so now.		// lower it to normal LLVM code, do so now.
if (CI->isInlineAsm()) {		if (CI->isInlineAsm()) {
if (TLI->ExpandInlineAsm(CI)) {		if (TLI->ExpandInlineAsm(CI)) {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	if (MemTransferInst *MTI = dyn_cast<MemTransferInst>(MI)) {
MTI->setSourceAlignment(SrcAlign);		MTI->setSourceAlignment(SrcAlign);
}		}
}		}

// If we have a cold call site, try to sink addressing computation into the		// If we have a cold call site, try to sink addressing computation into the
// cold block. This interacts with our handling for loads and stores to		// cold block. This interacts with our handling for loads and stores to
// ensure that we can fold all uses of a potential addressing computation		// ensure that we can fold all uses of a potential addressing computation
// into their uses. TODO: generalize this to work over profiling data		// into their uses. TODO: generalize this to work over profiling data
if (CI->hasFnAttr(Attribute::Cold) && !OptSize &&		if (isRegularCall(CI) && CI->hasFnAttr(Attribute::Cold) && !OptSize &&
!llvm::shouldOptimizeForSize(BB, PSI, BFI.get()))		!llvm::shouldOptimizeForSize(BB, PSI, BFI.get())) {
for (auto &Arg : CI->args()) {		for (auto &Arg : CI->args()) {
if (!Arg->getType()->isPointerTy())		if (!Arg->getType()->isPointerTy())
continue;		continue;
unsigned AS = Arg->getType()->getPointerAddressSpace();		unsigned AS = Arg->getType()->getPointerAddressSpace();
if (optimizeMemoryInst(CI, Arg, Arg->getType(), AS))		if (optimizeMemoryInst(CI, Arg, Arg->getType(), AS))
return true;		return true;
}		}
		}

IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);		IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
if (II) {		if (II) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
break;		break;
case Intrinsic::assume:		case Intrinsic::assume:
llvm_unreachable("llvm.assume should have been removed already");		llvm_unreachable("llvm.assume should have been removed already");
▲ Show 20 Lines • Show All 909 Lines • ▼ Show 20 Lines
}		}

namespace {		namespace {

/// A helper class for matching addressing modes.		/// A helper class for matching addressing modes.
///		///
/// This encapsulates the logic for matching the target-legal addressing modes.		/// This encapsulates the logic for matching the target-legal addressing modes.
class AddressingModeMatcher {		class AddressingModeMatcher {
		static constexpr unsigned INVALID_ARG_NO = ~0u;

SmallVectorImpl<Instruction *> &AddrModeInsts;		SmallVectorImpl<Instruction *> &AddrModeInsts;
const TargetLowering &TLI;		const TargetLowering &TLI;
		const TargetSubtargetInfo &STI;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
const DataLayout &DL;		const DataLayout &DL;
const LoopInfo &LI;		const LoopInfo &LI;
const std::function<const DominatorTree &()> getDTFn;		const std::function<const DominatorTree &()> getDTFn;

/// AccessTy/MemoryInst - This is the type for the access (e.g. double) and		/// AccessTy/MemoryInst - This is the type for the access (e.g. double) and
/// the memory instruction that we're computing this address for.		/// the memory instruction that we're computing this address for.
Type *AccessTy;		Type *AccessTy;
unsigned AddrSpace;		unsigned AddrSpace;
		unsigned ArgNo;
Instruction *MemoryInst;		Instruction *MemoryInst;

/// This is the addressing mode that we're building up. This is		/// This is the addressing mode that we're building up. This is
/// part of the return value of this addressing mode matching stuff.		/// part of the return value of this addressing mode matching stuff.
ExtAddrMode &AddrMode;		ExtAddrMode &AddrMode;

/// The instructions inserted by other CodeGenPrepare optimizations.		/// The instructions inserted by other CodeGenPrepare optimizations.
const SetOfInstrs &InsertedInsts;		const SetOfInstrs &InsertedInsts;
Show All 14 Lines	class AddressingModeMatcher {
/// True if we are optimizing for size.		/// True if we are optimizing for size.
bool OptSize;		bool OptSize;

ProfileSummaryInfo *PSI;		ProfileSummaryInfo *PSI;
BlockFrequencyInfo *BFI;		BlockFrequencyInfo *BFI;

AddressingModeMatcher(		AddressingModeMatcher(
SmallVectorImpl<Instruction *> &AMI, const TargetLowering &TLI,		SmallVectorImpl<Instruction *> &AMI, const TargetLowering &TLI,
		const TargetSubtargetInfo &STI,
const TargetRegisterInfo &TRI, const LoopInfo &LI,		const TargetRegisterInfo &TRI, const LoopInfo &LI,
const std::function<const DominatorTree &()> getDTFn, Type *AT,		const std::function<const DominatorTree &()> getDTFn, Type *AT,
unsigned AS, Instruction *MI, ExtAddrMode &AM,		unsigned AS, Instruction *MI, ExtAddrMode &AM,
const SetOfInstrs &InsertedInsts, InstrToOrigTy &PromotedInsts,		const SetOfInstrs &InsertedInsts, InstrToOrigTy &PromotedInsts,
TypePromotionTransaction &TPT,		TypePromotionTransaction &TPT,
std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,
bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI)		bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI)
: AddrModeInsts(AMI), TLI(TLI), TRI(TRI),		: AddrModeInsts(AMI), TLI(TLI), STI(STI), TRI(TRI),
DL(MI->getModule()->getDataLayout()), LI(LI), getDTFn(getDTFn),		DL(MI->getModule()->getDataLayout()), LI(LI), getDTFn(getDTFn),
AccessTy(AT), AddrSpace(AS), MemoryInst(MI), AddrMode(AM),		AccessTy(AT), AddrSpace(AS), ArgNo(INVALID_ARG_NO), MemoryInst(MI),
InsertedInsts(InsertedInsts), PromotedInsts(PromotedInsts), TPT(TPT),		AddrMode(AM), InsertedInsts(InsertedInsts),
LargeOffsetGEP(LargeOffsetGEP), OptSize(OptSize), PSI(PSI), BFI(BFI) {		PromotedInsts(PromotedInsts), TPT(TPT), LargeOffsetGEP(LargeOffsetGEP),
		OptSize(OptSize), PSI(PSI), BFI(BFI) {
IgnoreProfitability = false;		IgnoreProfitability = false;
}		}

public:		public:
/// Find the maximal addressing mode that a load/store of V can fold,		/// Find the maximal addressing mode that a load/store of V can fold,
/// give an access type of AccessTy. This returns a list of involved		/// give an access type of AccessTy. This returns a list of involved
/// instructions in AddrModeInsts.		/// instructions in AddrModeInsts.
/// \p InsertedInsts The instructions inserted by other CodeGenPrepare		/// \p InsertedInsts The instructions inserted by other CodeGenPrepare
/// optimizations.		/// optimizations.
/// \p PromotedInsts maps the instructions to their type before promotion.		/// \p PromotedInsts maps the instructions to their type before promotion.
/// \p The ongoing transaction where every action should be registered.		/// \p The ongoing transaction where every action should be registered.
static ExtAddrMode		static ExtAddrMode
Match(Value V, Type AccessTy, unsigned AS, Instruction *MemoryInst,		Match(Value V, Type AccessTy, unsigned AS, Instruction *MemoryInst,
SmallVectorImpl<Instruction *> &AddrModeInsts,		SmallVectorImpl<Instruction *> &AddrModeInsts,
const TargetLowering &TLI, const LoopInfo &LI,		const TargetLowering &TLI, const LoopInfo &LI,
const std::function<const DominatorTree &()> getDTFn,		const std::function<const DominatorTree &()> getDTFn,
		const TargetSubtargetInfo &STI,
const TargetRegisterInfo &TRI, const SetOfInstrs &InsertedInsts,		const TargetRegisterInfo &TRI, const SetOfInstrs &InsertedInsts,
InstrToOrigTy &PromotedInsts, TypePromotionTransaction &TPT,		InstrToOrigTy &PromotedInsts, TypePromotionTransaction &TPT,
std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,
bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {		bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {
ExtAddrMode Result;		ExtAddrMode Result;

bool Success = AddressingModeMatcher(AddrModeInsts, TLI, TRI, LI, getDTFn,		AddressingModeMatcher Matcher(
AccessTy, AS, MemoryInst, Result,		AddrModeInsts, TLI, STI, TRI, LI, getDTFn, AccessTy, AS, MemoryInst, Result,
InsertedInsts, PromotedInsts, TPT,		InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI, BFI);
LargeOffsetGEP, OptSize, PSI, BFI)		if (STI.enableCallAddrFold())
.matchAddr(V, 0);		Matcher.ArgNo = getRegularCallArgNo(MemoryInst, V);
		bool Success = Matcher.matchAddr(V, 0);
(void)Success;		(void)Success;
assert(Success && "Couldn't select anything?");		assert(Success && "Couldn't select anything?");
return Result;		return Result;
}		}

private:		private:
bool matchScaledValue(Value *ScaleReg, int64_t Scale, unsigned Depth);		bool matchScaledValue(Value *ScaleReg, int64_t Scale, unsigned Depth);
bool matchAddr(Value *Addr, unsigned Depth);		bool matchAddr(Value *Addr, unsigned Depth);
bool matchOperationAddr(User *AddrInst, unsigned Opcode, unsigned Depth,		bool matchOperationAddr(User *AddrInst, unsigned Opcode, unsigned Depth,
bool *MovedAway = nullptr);		bool *MovedAway = nullptr);
bool isProfitableToFoldIntoAddressingMode(Instruction *I,		bool isProfitableToFoldIntoAddressingMode(Instruction *I,
ExtAddrMode &AMBefore,		ExtAddrMode &AMBefore,
ExtAddrMode &AMAfter);		ExtAddrMode &AMAfter);
bool valueAlreadyLiveAtInst(Value Val, Value KnownLive1, Value *KnownLive2);		bool valueAlreadyLiveAtInst(Value Val, Value KnownLive1, Value *KnownLive2);
bool isPromotionProfitable(unsigned NewCost, unsigned OldCost,		bool isPromotionProfitable(unsigned NewCost, unsigned OldCost,
Value *PromotedOperand) const;		Value *PromotedOperand) const;
		bool canFoldAddr(const ExtAddrMode &TestAddrMode) const;
};		};

class PhiNodeSet;		class PhiNodeSet;

/// An iterator for PhiNodeSet.		/// An iterator for PhiNodeSet.
class PhiNodeSetIterator {		class PhiNodeSetIterator {
PhiNodeSet *const Set;		PhiNodeSet *const Set;
size_t CurrentIndex = 0;		size_t CurrentIndex = 0;
▲ Show 20 Lines • Show All 648 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::matchScaledValue(Value *ScaleReg, int64_t Scale,
ExtAddrMode TestAddrMode = AddrMode;		ExtAddrMode TestAddrMode = AddrMode;

// Add scale to turn X4+X3 -> X*7. This could also do things like		// Add scale to turn X4+X3 -> X*7. This could also do things like
// [A+B + A7] -> [B+A8].		// [A+B + A7] -> [B+A8].
TestAddrMode.Scale += Scale;		TestAddrMode.Scale += Scale;
TestAddrMode.ScaledReg = ScaleReg;		TestAddrMode.ScaledReg = ScaleReg;

// If the new address isn't legal, bail out.		// If the new address isn't legal, bail out.
if (!TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace))		if (!canFoldAddr(TestAddrMode))
return false;		return false;

// It was legal, so commit it.		// It was legal, so commit it.
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;

// Okay, we decided that we can add ScaleReg+Scale to AddrMode. Check now		// Okay, we decided that we can add ScaleReg+Scale to AddrMode. Check now
// to see if ScaleReg is actually X+C. If so, we can turn this into adding		// to see if ScaleReg is actually X+C. If so, we can turn this into adding
// XScale + CScale to addr mode. If we found available IV increment, do not		// XScale + CScale to addr mode. If we found available IV increment, do not
// go any further: we can reuse it and cannot eliminate it.		// go any further: we can reuse it and cannot eliminate it.
ConstantInt *CI = nullptr;		ConstantInt *CI = nullptr;
Value *AddLHS = nullptr;		Value *AddLHS = nullptr;
if (isa<Instruction>(ScaleReg) && // not a constant expr.		if (isa<Instruction>(ScaleReg) && // not a constant expr.
match(ScaleReg, m_Add(m_Value(AddLHS), m_ConstantInt(CI))) &&		match(ScaleReg, m_Add(m_Value(AddLHS), m_ConstantInt(CI))) &&
!isIVIncrement(ScaleReg, &LI) && CI->getValue().isSignedIntN(64)) {		!isIVIncrement(ScaleReg, &LI) && CI->getValue().isSignedIntN(64)) {
TestAddrMode.InBounds = false;		TestAddrMode.InBounds = false;
TestAddrMode.ScaledReg = AddLHS;		TestAddrMode.ScaledReg = AddLHS;
TestAddrMode.BaseOffs += CI->getSExtValue() * TestAddrMode.Scale;		TestAddrMode.BaseOffs += CI->getSExtValue() * TestAddrMode.Scale;

// If this addressing mode is legal, commit it and remember that we folded		// If this addressing mode is legal, commit it and remember that we folded
// this instruction.		// this instruction.
if (TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace)) {		if (canFoldAddr(TestAddrMode)) {
AddrModeInsts.push_back(cast<Instruction>(ScaleReg));		AddrModeInsts.push_back(cast<Instruction>(ScaleReg));
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;
return true;		return true;
}		}
// Restore status quo.		// Restore status quo.
TestAddrMode = AddrMode;		TestAddrMode = AddrMode;
}		}

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (auto IVStep = GetConstantStep(ScaleReg)) {
APInt Offset = Step * AddrMode.Scale;		APInt Offset = Step * AddrMode.Scale;
if (Offset.isSignedIntN(64)) {		if (Offset.isSignedIntN(64)) {
TestAddrMode.InBounds = false;		TestAddrMode.InBounds = false;
TestAddrMode.ScaledReg = IVInc;		TestAddrMode.ScaledReg = IVInc;
TestAddrMode.BaseOffs -= Offset.getLimitedValue();		TestAddrMode.BaseOffs -= Offset.getLimitedValue();
// If this addressing mode is legal, commit it..		// If this addressing mode is legal, commit it..
// (Note that we defer the (expensive) domtree base legality check		// (Note that we defer the (expensive) domtree base legality check
// to the very last possible point.)		// to the very last possible point.)
if (TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace) &&		if (canFoldAddr(TestAddrMode) &&
getDTFn().dominates(IVInc, MemoryInst)) {		getDTFn().dominates(IVInc, MemoryInst)) {
AddrModeInsts.push_back(cast<Instruction>(IVInc));		AddrModeInsts.push_back(cast<Instruction>(IVInc));
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;
return true;		return true;
}		}
// Restore status quo.		// Restore status quo.
TestAddrMode = AddrMode;		TestAddrMode = AddrMode;
}		}
▲ Show 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::isPromotionProfitable(
if (NewCost < OldCost)		if (NewCost < OldCost)
return true;		return true;
// The promotion is neutral but it may help folding the sign extension in		// The promotion is neutral but it may help folding the sign extension in
// loads for instance.		// loads for instance.
// Check that we did not create an illegal instruction.		// Check that we did not create an illegal instruction.
return isPromotedInstructionLegal(TLI, DL, PromotedOperand);		return isPromotedInstructionLegal(TLI, DL, PromotedOperand);
}		}

		bool AddressingModeMatcher::canFoldAddr(const ExtAddrMode &TestAddrMode) const {
		if (ArgNo != INVALID_ARG_NO) {
		// Check if the address is "foldable" into a regular call.
		const auto &CB = cast<CallBase>(*MemoryInst);
		if (STI.enableCallAddrFold() &&
		TLI.canFoldAddrModeIntoCall(CB, ArgNo, TestAddrMode))
		return true;

		// Even if it isn't, accept it if we have a cold call, but still require
		// legal addressing mode. This limits the amount of code we potentially
		// sink.
		if (!CB.hasFnAttr(Attribute::Cold) \|\| OptSize \|\|
		llvm::shouldOptimizeForSize(CB.getParent(), PSI, BFI))
		return false;
		}

		return TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace);
		}

/// Given an instruction or constant expr, see if we can fold the operation		/// Given an instruction or constant expr, see if we can fold the operation
/// into the addressing mode. If so, update the addressing mode and return		/// into the addressing mode. If so, update the addressing mode and return
/// true, otherwise return false without modifying AddrMode.		/// true, otherwise return false without modifying AddrMode.
/// If \p MovedAway is not NULL, it contains the information of whether or		/// If \p MovedAway is not NULL, it contains the information of whether or
/// not AddrInst has to be folded into the addressing mode on success.		/// not AddrInst has to be folded into the addressing mode on success.
/// If \p MovedAway == true, \p AddrInst will not be part of the addressing		/// If \p MovedAway == true, \p AddrInst will not be part of the addressing
/// because it has been moved away.		/// because it has been moved away.
/// Thus AddrInst must not be added in the matched instructions.		/// Thus AddrInst must not be added in the matched instructions.
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	for (unsigned i = 1, e = AddrInst->getNumOperands(); i != e; ++i, ++GTI) {
VariableScale = TypeSize;		VariableScale = TypeSize;
}		}
}		}
}		}

// A common case is for the GEP to only do a constant offset. In this case,		// A common case is for the GEP to only do a constant offset. In this case,
// just add it to the disp field and check validity.		// just add it to the disp field and check validity.
if (VariableOperand == -1) {		if (VariableOperand == -1) {
		bool InBounds = AddrMode.InBounds;
		int64_t BaseOffs = AddrMode.BaseOffs;
AddrMode.BaseOffs += ConstantOffset;		AddrMode.BaseOffs += ConstantOffset;
if (matchAddr(AddrInst->getOperand(0), Depth + 1)) {
if (!cast<GEPOperator>(AddrInst)->isInBounds())		if (!cast<GEPOperator>(AddrInst)->isInBounds())
AddrMode.InBounds = false;		AddrMode.InBounds = false;
		if (matchAddr(AddrInst->getOperand(0), Depth + 1))
return true;		return true;
}		AddrMode.InBounds = InBounds;
AddrMode.BaseOffs -= ConstantOffset;		AddrMode.BaseOffs = BaseOffs;

if (EnableGEPOffsetSplit && isa<GetElementPtrInst>(AddrInst) &&		if (EnableGEPOffsetSplit && isa<GetElementPtrInst>(AddrInst) &&
TLI.shouldConsiderGEPOffsetSplit() && Depth == 0 &&		TLI.shouldConsiderGEPOffsetSplit() && Depth == 0 &&
ConstantOffset > 0) {		ConstantOffset > 0) {
// Record GEPs with non-zero offsets as candidates for splitting in		// Record GEPs with non-zero offsets as candidates for splitting in
// the event that the offset cannot fit into the r+i addressing mode.		// the event that the offset cannot fit into the r+i addressing mode.
// Simple and common case that only one GEP is used in calculating the		// Simple and common case that only one GEP is used in calculating the
// address for the memory access.		// address for the memory access.
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::matchAddr(Value *Addr, unsigned Depth) {
// Start a transaction at this point that we will rollback if the matching		// Start a transaction at this point that we will rollback if the matching
// fails.		// fails.
TypePromotionTransaction::ConstRestorationPt LastKnownGood =		TypePromotionTransaction::ConstRestorationPt LastKnownGood =
TPT.getRestorationPoint();		TPT.getRestorationPoint();
if (ConstantInt *CI = dyn_cast<ConstantInt>(Addr)) {		if (ConstantInt *CI = dyn_cast<ConstantInt>(Addr)) {
if (CI->getValue().isSignedIntN(64)) {		if (CI->getValue().isSignedIntN(64)) {
// Fold in immediates if legal for the target.		// Fold in immediates if legal for the target.
AddrMode.BaseOffs += CI->getSExtValue();		AddrMode.BaseOffs += CI->getSExtValue();
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.BaseOffs -= CI->getSExtValue();		AddrMode.BaseOffs -= CI->getSExtValue();
}		}
} else if (GlobalValue *GV = dyn_cast<GlobalValue>(Addr)) {		} else if (GlobalValue *GV = dyn_cast<GlobalValue>(Addr)) {
// If this is a global variable, try to fold it into the addressing mode.		// If this is a global variable, try to fold it into the addressing mode.
if (!AddrMode.BaseGV) {		if (!AddrMode.BaseGV) {
AddrMode.BaseGV = GV;		AddrMode.BaseGV = GV;
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.BaseGV = nullptr;		AddrMode.BaseGV = nullptr;
}		}
} else if (Instruction *I = dyn_cast<Instruction>(Addr)) {		} else if (Instruction *I = dyn_cast<Instruction>(Addr)) {
ExtAddrMode BackupAddrMode = AddrMode;		ExtAddrMode BackupAddrMode = AddrMode;
unsigned OldSize = AddrModeInsts.size();		unsigned OldSize = AddrModeInsts.size();

// Check to see if it is possible to fold this operation.		// Check to see if it is possible to fold this operation.
Show All 26 Lines	if (ConstantInt *CI = dyn_cast<ConstantInt>(Addr)) {
return true;		return true;
}		}

// Worse case, the target should support [reg] addressing modes. :)		// Worse case, the target should support [reg] addressing modes. :)
if (!AddrMode.HasBaseReg) {		if (!AddrMode.HasBaseReg) {
AddrMode.HasBaseReg = true;		AddrMode.HasBaseReg = true;
AddrMode.BaseReg = Addr;		AddrMode.BaseReg = Addr;
// Still check for legality in case the target supports [imm] but not [i+r].		// Still check for legality in case the target supports [imm] but not [i+r].
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.HasBaseReg = false;		AddrMode.HasBaseReg = false;
AddrMode.BaseReg = nullptr;		AddrMode.BaseReg = nullptr;
}		}

// If the base register is already taken, see if we can do [r+r].		// If the base register is already taken, see if we can do [r+r].
if (AddrMode.Scale == 0) {		if (AddrMode.Scale == 0) {
AddrMode.Scale = 1;		AddrMode.Scale = 1;
AddrMode.ScaledReg = Addr;		AddrMode.ScaledReg = Addr;
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.Scale = 0;		AddrMode.Scale = 0;
AddrMode.ScaledReg = nullptr;		AddrMode.ScaledReg = nullptr;
}		}
// Couldn't match.		// Couldn't match.
TPT.rollback(LastKnownGood);		TPT.rollback(LastKnownGood);
return false;		return false;
}		}
Show All 27 Lines
static constexpr int MaxMemoryUsesToScan = 100;		static constexpr int MaxMemoryUsesToScan = 100;

/// Recursively walk all the uses of I until we find a memory use.		/// Recursively walk all the uses of I until we find a memory use.
/// If we find an obviously non-foldable instruction, return true.		/// If we find an obviously non-foldable instruction, return true.
/// Add accessed addresses and types to MemoryUses.		/// Add accessed addresses and types to MemoryUses.
static bool FindAllMemoryUses(		static bool FindAllMemoryUses(
Instruction I, SmallVectorImpl<std::pair<Use , Type *>> &MemoryUses,		Instruction I, SmallVectorImpl<std::pair<Use , Type *>> &MemoryUses,
SmallPtrSetImpl<Instruction *> &ConsideredInsts, const TargetLowering &TLI,		SmallPtrSetImpl<Instruction *> &ConsideredInsts, const TargetLowering &TLI,
		const TargetSubtargetInfo &STI,
const TargetRegisterInfo &TRI, bool OptSize, ProfileSummaryInfo *PSI,		const TargetRegisterInfo &TRI, bool OptSize, ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI, int &SeenInsts) {		BlockFrequencyInfo *BFI, int &SeenInsts) {
// If we already considered this instruction, we're done.		// If we already considered this instruction, we're done.
if (!ConsideredInsts.insert(I).second)		if (!ConsideredInsts.insert(I).second)
return false;		return false;

// If this is an obviously unfoldable instruction, bail out.		// If this is an obviously unfoldable instruction, bail out.
if (!MightBeFoldableInst(I))		if (!MightBeFoldableInst(I))
Show All 38 Lines	if (CallInst *CI = dyn_cast<CallInst>(UserI)) {
// If this is a cold call, we can sink the addressing calculation into		// If this is a cold call, we can sink the addressing calculation into
// the cold path. See optimizeCallInst		// the cold path. See optimizeCallInst
bool OptForSize =		bool OptForSize =
OptSize \|\| llvm::shouldOptimizeForSize(CI->getParent(), PSI, BFI);		OptSize \|\| llvm::shouldOptimizeForSize(CI->getParent(), PSI, BFI);
if (!OptForSize)		if (!OptForSize)
continue;		continue;
}		}

InlineAsm *IA = dyn_cast<InlineAsm>(CI->getCalledOperand());		if (InlineAsm *IA = dyn_cast<InlineAsm>(CI->getCalledOperand())) {
if (!IA)
return true;

// If this is a memory operand, we're cool, otherwise bail out.		// If this is a memory operand, we're cool, otherwise bail out.
if (!IsOperandAMemoryOperand(CI, IA, I, TLI, TRI))		if (!IsOperandAMemoryOperand(CI, IA, I, TLI, TRI))
return true;		return true;
continue;		continue;
}		}

if (FindAllMemoryUses(UserI, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,		// Bail if call folding is not enabled.
		if (!STI.enableCallAddrFold())
		return true;

		// Intrinsics are handled elsewhere and we can't quite handle non-pointer
		// types yet.
		if (isa<IntrinsicInst>(UserI) \|\| !I->getType()->isPointerTy())
		return true;

		MemoryUses.push_back({&U, I->getType()});
		continue;
		}

		if (FindAllMemoryUses(UserI, MemoryUses, ConsideredInsts, TLI, STI, TRI, OptSize,
PSI, BFI, SeenInsts))		PSI, BFI, SeenInsts))
return true;		return true;
}		}

return false;		return false;
}		}

static bool FindAllMemoryUses(		static bool FindAllMemoryUses(
Instruction I, SmallVectorImpl<std::pair<Use , Type *>> &MemoryUses,		Instruction I, SmallVectorImpl<std::pair<Use , Type *>> &MemoryUses,
const TargetLowering &TLI, const TargetRegisterInfo &TRI, bool OptSize,		const TargetLowering &TLI, const TargetSubtargetInfo &STI, const TargetRegisterInfo &TRI, bool OptSize,
ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {		ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {
int SeenInsts = 0;		int SeenInsts = 0;
SmallPtrSet<Instruction *, 16> ConsideredInsts;		SmallPtrSet<Instruction *, 16> ConsideredInsts;
return FindAllMemoryUses(I, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,		return FindAllMemoryUses(I, MemoryUses, ConsideredInsts, TLI, STI, TRI, OptSize,
PSI, BFI, SeenInsts);		PSI, BFI, SeenInsts);
}		}

static unsigned MaxLoopInvariantUsesToScan = 20;		static unsigned MaxLoopInvariantUsesToScan = 20;
static bool isUsedInLoop(const Value V, const Loop L) {		static bool isUsedInLoop(const Value V, const Loop L) {
unsigned N = 0;		unsigned N = 0;

for (const Use &U : V->uses()) {		for (const Use &U : V->uses()) {
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::isProfitableToFoldIntoAddressingMode(
if (!BaseReg && !ScaledReg)		if (!BaseReg && !ScaledReg)
return true;		return true;

// If all uses of this instruction can have the address mode sunk into them,		// If all uses of this instruction can have the address mode sunk into them,
// we can remove the addressing mode and effectively trade one live register		// we can remove the addressing mode and effectively trade one live register
// for another (at worst.) In this context, folding an addressing mode into		// for another (at worst.) In this context, folding an addressing mode into
// the use is just a particularly nice way of sinking it.		// the use is just a particularly nice way of sinking it.
SmallVector<std::pair<Use , Type >, 16> MemoryUses;		SmallVector<std::pair<Use , Type >, 16> MemoryUses;
if (FindAllMemoryUses(I, MemoryUses, TLI, TRI, OptSize, PSI, BFI))		if (FindAllMemoryUses(I, MemoryUses, TLI, STI, TRI, OptSize, PSI, BFI))
return false; // Has a non-memory, non-foldable use!		return false; // Has a non-memory, non-foldable use!

// Now that we know that all uses of this instruction are part of a chain of		// Now that we know that all uses of this instruction are part of a chain of
// computation involving only operations that could theoretically be folded		// computation involving only operations that could theoretically be folded
// into a memory use, loop over each of these memory operation uses and see		// into a memory use, loop over each of these memory operation uses and see
// if they could actually fold the instruction. The assumption is that		// if they could actually fold the instruction. The assumption is that
// addressing modes are cheap and that duplicating the computation involved		// addressing modes are cheap and that duplicating the computation involved
// many times is worthwhile, even on a fastpath. For sinking candidates		// many times is worthwhile, even on a fastpath. For sinking candidates
Show All 10 Lines	for (const std::pair<Use , Type > &Pair : MemoryUses) {
// Do a match against the root of this address, ignoring profitability. This		// Do a match against the root of this address, ignoring profitability. This
// will tell us if the addressing mode for the memory operation will		// will tell us if the addressing mode for the memory operation will
// actually cover the shared instruction.		// actually cover the shared instruction.
ExtAddrMode Result;		ExtAddrMode Result;
std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,
0);		0);
TypePromotionTransaction::ConstRestorationPt LastKnownGood =		TypePromotionTransaction::ConstRestorationPt LastKnownGood =
TPT.getRestorationPoint();		TPT.getRestorationPoint();
AddressingModeMatcher Matcher(MatchedAddrModeInsts, TLI, TRI, LI, getDTFn,		AddressingModeMatcher Matcher(MatchedAddrModeInsts, TLI, STI, TRI, LI, getDTFn,
AddressAccessTy, AS, UserI, Result,		AddressAccessTy, AS, UserI, Result,
InsertedInsts, PromotedInsts, TPT,		InsertedInsts, PromotedInsts, TPT,
LargeOffsetGEP, OptSize, PSI, BFI);		LargeOffsetGEP, OptSize, PSI, BFI);
Matcher.IgnoreProfitability = true;		Matcher.IgnoreProfitability = true;
		if (STI.enableCallAddrFold())
		Matcher.ArgNo = getRegularCallArgNo(UserI, Address);
bool Success = Matcher.matchAddr(Address, 0);		bool Success = Matcher.matchAddr(Address, 0);
(void)Success;		(void)Success;
assert(Success && "Couldn't select anything?");		assert(Success && "Couldn't select anything?");

// The match was to check the profitability, the changes made are not		// The match was to check the profitability, the changes made are not
// part of the original matcher. Therefore, they should be dropped		// part of the original matcher. Therefore, they should be dropped
// otherwise the original matcher will not present the right state.		// otherwise the original matcher will not present the right state.
TPT.rollback(LastKnownGood);		TPT.rollback(LastKnownGood);
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	while (!worklist.empty()) {
// actual use. It's expected that most address matches don't actually need		// actual use. It's expected that most address matches don't actually need
// the domtree.		// the domtree.
auto getDTFn = [MemoryInst, this]() -> const DominatorTree & {		auto getDTFn = [MemoryInst, this]() -> const DominatorTree & {
Function *F = MemoryInst->getParent()->getParent();		Function *F = MemoryInst->getParent()->getParent();
return this->getDT(*F);		return this->getDT(*F);
};		};
ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(		ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(
V, AccessTy, AddrSpace, MemoryInst, AddrModeInsts, TLI, LI, getDTFn,		V, AccessTy, AddrSpace, MemoryInst, AddrModeInsts, TLI, LI, getDTFn,
*TRI, InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI,		SubtargetInfo, TRI, InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI,
BFI.get());		BFI.get());

GetElementPtrInst *GEP = LargeOffsetGEP.first;		GetElementPtrInst *GEP = LargeOffsetGEP.first;
if (GEP && !NewGEPBases.count(GEP)) {		if (GEP && !NewGEPBases.count(GEP)) {
// If splitting the underlying data structure can reduce the offset of a		// If splitting the underlying data structure can reduce the offset of a
// GEP, collect the GEP. Skip the GEPs that are the new bases of		// GEP, collect the GEP. Skip the GEPs that are the new bases of
// previously split data structures.		// previously split data structures.
LargeOffsetGEPMap[GEP->getPointerOperand()].push_back(LargeOffsetGEP);		LargeOffsetGEPMap[GEP->getPointerOperand()].push_back(LargeOffsetGEP);
▲ Show 20 Lines • Show All 3,296 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 836 Lines • ▼ Show 20 Lines	public:
bool isComplexDeinterleavingOperationSupported(		bool isComplexDeinterleavingOperationSupported(
ComplexDeinterleavingOperation Operation, Type *Ty) const override;		ComplexDeinterleavingOperation Operation, Type *Ty) const override;

Value *createComplexDeinterleavingIR(		Value *createComplexDeinterleavingIR(
Instruction *I, ComplexDeinterleavingOperation OperationType,		Instruction *I, ComplexDeinterleavingOperation OperationType,
ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,		ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,
Value *Accumulator = nullptr) const override;		Value *Accumulator = nullptr) const override;

		bool canFoldAddrModeIntoCall(const CallBase &CB, unsigned ArgNo,
		const AddrMode &AM) const override;

bool supportSplitCSR(MachineFunction *MF) const override {		bool supportSplitCSR(MachineFunction *MF) const override {
return MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS &&		return MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS &&
MF->getFunction().hasFnAttribute(Attribute::NoUnwind);		MF->getFunction().hasFnAttribute(Attribute::NoUnwind);
}		}
void initializeSplitCSR(MachineBasicBlock *Entry) const override;		void initializeSplitCSR(MachineBasicBlock *Entry) const override;
void insertCopiesSplitCSR(		void insertCopiesSplitCSR(
MachineBasicBlock *Entry,		MachineBasicBlock *Entry,
const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;		const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;
▲ Show 20 Lines • Show All 382 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 24,511 Lines • ▼ Show 20 Lines	if (OperationType == ComplexDeinterleavingOperation::CAdd) {
if (IntId == Intrinsic::not_intrinsic)		if (IntId == Intrinsic::not_intrinsic)
return nullptr;		return nullptr;

return B.CreateIntrinsic(IntId, Ty, {InputA, InputB});		return B.CreateIntrinsic(IntId, Ty, {InputA, InputB});
}		}

return nullptr;		return nullptr;
}		}

		bool AArch64TargetLowering::canFoldAddrModeIntoCall(const CallBase &CB,
		unsigned ArgNo,
		const AddrMode &AM) const {
		// We should always accept a single base register.
		if (TargetLowering::canFoldAddrModeIntoCall(CB, ArgNo, AM))
		return true;

		if (ArgNo > 7 \|\| AM.BaseGV \|\| !AM.HasBaseReg)
		efriedmaUnsubmitted Not Done Reply Inline Actions Counting arguments like this won't do what you want if there are arguments which are passed in multiple registers. efriedma: Counting arguments like this won't do what you want if there are arguments which are passed in…
		chillAuthorUnsubmitted Done Reply Inline Actions Indeed. For now, I'm proposing a different approach, doing the transformation entirely in `MachineSink`. The relevant review is https://reviews.llvm.org/D152828 If it doesn't fly, I'll come back to this review and think how to solve this above issue. chill: Indeed. For now, I'm proposing a different approach, doing the transformation entirely in…
		return false;

		// For more complex addressing modes, check the possibility of a cheap
		// materialisation into an argument register.

		// reg + imm
		if (AM.Scale == 0)
		return isLegalAddImmediate(AM.BaseOffs);

		if (AM.BaseOffs != 0)
		return false;

		// reg + scale * reg
		if (AM.Scale == 1)
		return true;

		// Some CPUs have fast `reg + scale * reg` instruction, for scales of 2, 4, 8, and 16.
		if (!Subtarget->hasLSLFast() \|\| AM.Scale <= 0)
		return false;

		uint64_t S = uint64_t(AM.Scale);
		return (S & (S - 1)) == 0 && S <= 16;
		}

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	#include "AArch64GenSubtargetInfo.inc"
const CallLowering *getCallLowering() const override;		const CallLowering *getCallLowering() const override;
const InlineAsmLowering *getInlineAsmLowering() const override;		const InlineAsmLowering *getInlineAsmLowering() const override;
InstructionSelector *getInstructionSelector() const override;		InstructionSelector *getInstructionSelector() const override;
const LegalizerInfo *getLegalizerInfo() const override;		const LegalizerInfo *getLegalizerInfo() const override;
const RegisterBankInfo *getRegBankInfo() const override;		const RegisterBankInfo *getRegBankInfo() const override;
const Triple &getTargetTriple() const { return TargetTriple; }		const Triple &getTargetTriple() const { return TargetTriple; }
bool enableMachineScheduler() const override { return true; }		bool enableMachineScheduler() const override { return true; }
bool enablePostRAScheduler() const override { return usePostRAScheduler(); }		bool enablePostRAScheduler() const override { return usePostRAScheduler(); }
		bool enableCallAddrFold() const override { return true; }

/// Returns ARM processor family.		/// Returns ARM processor family.
/// Avoid this function! CPU specifics should be kept local to this class		/// Avoid this function! CPU specifics should be kept local to this class
/// and preferably modeled with SubtargetFeatures or properties in		/// and preferably modeled with SubtargetFeatures or properties in
/// initializeProperties().		/// initializeProperties().
ARMProcFamilyEnum getProcFamily() const {		ARMProcFamilyEnum getProcFamily() const {
return ARMProcFamily;		return ARMProcFamily;
}		}

▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/call-addr-fold.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -codegenprepare < %s \| FileCheck %s
				target triple = "aarch64-linux"

				declare void @use(...)

				; Check address is sunk towards the load, since CodeGenPrepare considers it likely to be
				; "folded" into the call as well.
				define i32 @f0(i1 %c1, ptr %p) {
				; CHECK-LABEL: @f0(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A:%.]] = getelementptr i32, ptr [[P:%.]], i32 2
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[A]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[SUNKADDR:%.*]] = getelementptr i8, ptr [[P]], i64 8
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[SUNKADDR]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i32, ptr %p, i32 2
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

				define i32 @f1(i1 %c1, ptr %p, i64 %i) {
				; CHECK-LABEL: @f1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A:%.]] = getelementptr i8, ptr [[P:%.]], i64 [[I:%.*]]
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[A]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[SUNKADDR:%.*]] = getelementptr i8, ptr [[P]], i64 [[I]]
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[SUNKADDR]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i8, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

				; Address calculation too complex.
				define i32 @f2(i1 %c1, ptr %p, i64 %i) {
				; CHECK-LABEL: @f2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A:%.]] = getelementptr i32, ptr [[P:%.]], i64 [[I:%.*]]
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[A]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[A]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i32, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

				; Address calculation cheap enough on some cores.
				define i32 @f3(i1 %c1, ptr %p, i64 %i) "target-cpu"="neoverse-n1" {
				; CHECK-LABEL: @f3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A:%.]] = getelementptr i32, ptr [[P:%.]], i64 [[I:%.*]]
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[A]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[I]], 4
				; CHECK-NEXT: [[SUNKADDR1:%.*]] = getelementptr i8, ptr [[P]], i64 [[SUNKADDR]]
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[SUNKADDR1]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i32, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

				define i32 @f4(i1 %c1, ptr %p, i64 %i) {
				; CHECK-LABEL: @f4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A:%.]] = getelementptr i8, ptr [[P:%.]], i64 [[I:%.*]]
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[A]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[V1:%.*]] = call i32 @use(i32 1, ptr [[A]])
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i8, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = call i32 @use(i32 1, ptr %a)
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Relax conditions for folding addressing mode into loads/storesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 503842

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/test/CodeGen/AArch64/call-addr-fold.ll

[CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores
AbandonedPublic