This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/2
TargetLowering.h
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
1/2
AArch64ISelLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
call-addr-fold.ll
-
X86/
-
dagcombine-tokenfactor-limit-crash.ll

Differential D143898

[CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores
AbandonedPublic

Authored by chill on Feb 13 2023, 2:50 AM.

Download Raw Diff

Details

Reviewers

fhahn
t.p.northover
efriedma

Summary

The sinking of address computations to their users (loads/stores)
is often blocked by call instructions, which take the address as
a parameter - unless the call is "cold", it's considered a non-foldable use.

Considering the whole call sequence, including passing the arguments,
it is sometimes possible to materialize an address computation directly
into a hard register, in a sense "to fold the addressing mode into the call".

For example, on AArch64 the register-to-register copy
instruction ("C6.2.190 MOV (register)", which would likely by used to pass
a pre-computed address argument, is an alias to "C6.2.207 ORR (shifted register)"
and typically has the same latency and throughput as an "ADD" instruction.

This change tries to allow sinking of more addresses to load/store instructions
by preventing some call instructions from being blockers.

With this change CodeGenPrepare still does sinking only towards memory
loads/stores. It works in synergy with a MachineSink patch in
https://reviews.llvm.org/D145706, which does sinking towards calls.

This patch (together with the others up/down the stack) improves
SPECv6 500.perlbench_r by about 3.26% and the whole
of SPECv6 intrate by about 0.46% (geomean).

Diff Detail

Event Timeline

chill created this revision.Feb 13 2023, 2:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

chill requested review of this revision.Feb 13 2023, 2:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 2:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B213380: Diff 496899.Feb 13 2023, 2:51 AM

chill added a parent revision: D143897: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.Feb 13 2023, 2:57 AM

chill edited the summary of this revision. (Show Details)Feb 13 2023, 7:14 AM

Herald added a subscriber: kristof.beyls. · View Herald TranscriptFeb 13 2023, 7:14 AM

chill edited the summary of this revision. (Show Details)Feb 13 2023, 7:54 AM

chill added reviewers: fhahn, t.p.northover, efriedma.Feb 13 2023, 8:37 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 13 2023, 8:37 AM

chill edited the summary of this revision. (Show Details)Feb 13 2023, 8:38 AM

If I'm understanding correctly, the point is that we don't want to block sinking if an address computation has multiple uses, where only some are foldable?

Expressing this in terms of "folding into a call" seems confusing and unnecessary; we should just change the logic to allow sinking if we can fold some, but not all, the uses. (Maybe including some sort analysis of whether sinking increases the number of times we perform the address computation at runtime.)

In D143898#4123601, @efriedma wrote:

If I'm understanding correctly, the point is that we don't want to block sinking if an address computation has multiple uses, where only some are foldable?

Yes, for when we cannot say we aren't extending live ranges of the address "registers".

Expressing this in terms of "folding into a call" seems confusing and unnecessary;

To me it looks like a straightforward analogy.
For load/stores we have an addressing computation that's essentially free
when it's a part of the load/store instruction, as opposed to it being a separate instruction and using a simple indirect load/store.
For calls we have an addressing computation that's essentially free
when it's a part of the call sequence, as opposed to it being a separate instruction and using a simple register-to-register move.

we should just change the logic to allow sinking if we can fold some, but not all, the uses.
(Maybe including some sort analysis of whether sinking increases the number of times we perform the address computation at runtime.)

IMHO the main issue here is not extending live ranges too much. As for the runtime overhead, sinking a foldable address is assumed to
not increase the execution time, because of the checks in isLegalAddressingMode and (with this patch) in canFoldAddrModeIntoCall.

Perhaps, with PGO, we could use block frequencies in a way that a non-foldable, infrequently executed user does not prevent sinking, much like
the way we currently use Attribute::Cold. However, I don't think that's an alternative to this patch.

For calls we have an addressing computation that's essentially free when it's a part of the call sequence, as opposed to it being a separate instruction and using a simple register-to-register move.

There are a lot of cases where there isn't any extra "mov" that can hide the cost; the most common of those being the case where the value in question is spilled. But I guess I can see the analogy. I'd generally prefer to consider that sort of transform in terms of rematerialization, though. We can compute the cost of remat much more accurately in the register allocator.

IMHO the main issue here is not extending live ranges too much. As for the runtime overhead, sinking a foldable address is assumed to not increase the execution time, because of the checks in isLegalAddressingMode and (with this patch) in canFoldAddrModeIntoCall.

I was thinking more in terms of sinking to arbitrary uses, as opposed to only sinking some uses.

chill updated this revision to Diff 503842.Mar 9 2023, 10:35 AM

chill added a child revision: D145706: [MachineSink] Sink instruction copies when they can replace copy into hard register.

chill edited the summary of this revision. (Show Details)Mar 9 2023, 10:49 AM

Harbormaster completed remote builds in B218443: Diff 503842.Mar 9 2023, 12:25 PM

chill retitled this revision from [CodeGenPrepare] Fold addressing mode into calls to [CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores.Mar 10 2023, 8:25 AM

Ping?

For constructs like the given testcase, you don't need to reason about whether it's profitable to fold into a call; cloning a GEP into both sides of an if-else doesn't actually increase execution time, so it's obviously profitable even if you can only fold it on one side of the if-else. Can you add some examples that actually require predicting whether the GEP is as cheap as a mov, and the argument is passed in a register?

llvm/include/llvm/CodeGen/TargetLowering.h
3170	In this case, the addressing mode doesn't actually represent any computation, so it isn't relevant for this transform; when do you expect it to become relevant?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24358	Counting arguments like this won't do what you want if there are arguments which are passed in multiple registers.

chill added inline comments.May 10 2023, 7:45 AM

llvm/include/llvm/CodeGen/TargetLowering.h
3170	The logic of the `AddressingModeMatcher` is that it accumulates an addressing mode from partial "expressions". For example an addressing mode which ends up as `[reg + imm]` or `[reg + reg]` would have involved checks for legality of just `[reg]` here: https://github.com/llvm/llvm-project/blob/ddfb974d0fca62e3eaeb98b79b5e29738c9082d2/llvm/lib/CodeGen/CodeGenPrepare.cpp#L4926

chill planned changes to this revision.Jun 2 2023, 6:13 AM

chill removed a child revision: D145706: [MachineSink] Sink instruction copies when they can replace copy into hard register.

chill removed a parent revision: D143897: [CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability.

chill added inline comments.Jun 13 2023, 9:42 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24358	Indeed. For now, I'm proposing a different approach, doing the transformation entirely in `MachineSink`. The relevant review is https://reviews.llvm.org/D152828 If it doesn't fly, I'll come back to this review and think how to solve this above issue.

In D143898#4303461, @efriedma wrote:

For constructs like the given testcase, you don't need to reason about whether it's profitable to fold into a call; cloning a GEP into both sides of an if-else doesn't actually increase execution time, so it's obviously profitable even if you can only fold it on one side of the if-else. Can you add some examples that actually require predicting whether the GEP is as cheap as a mov, and the argument is passed in a register?

These test cases are constructed just to have the addressing computation and its users in separate basic blocks.
In our motivating examples, we have the addressing computation as loop invariant and its users inside a loop body, so
just sinking copies might increase run time.

I have added such test cases in https://reviews.llvm.org/D152828 : sink-and-fold.ll, functions f4 and f5.

chill abandoned this revision.Oct 23 2023, 10:38 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

7 lines

lib/

CodeGen/

CodeGenPrepare.cpp

111 lines

Target/

AArch64/

AArch64ISelLowering.h

3 lines

AArch64ISelLowering.cpp

32 lines

test/

CodeGen/

AArch64/

call-addr-fold.ll

172 lines

X86/

dagcombine-tokenfactor-limit-crash.ll

2 lines

Diff 496899

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 3,157 Lines • ▼ Show 20 Lines	public:
/// returned.		/// returned.
virtual Value *createComplexDeinterleavingIR(		virtual Value *createComplexDeinterleavingIR(
Instruction *I, ComplexDeinterleavingOperation OperationType,		Instruction *I, ComplexDeinterleavingOperation OperationType,
ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,		ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,
Value *Accumulator = nullptr) const {		Value *Accumulator = nullptr) const {
return nullptr;		return nullptr;
}		}

		/// Return true if it is beneficial to fold address calculation into a call
		/// sequence.
		virtual bool canFoldAddrModeIntoCall(const CallBase &CB, unsigned ArgNo,
		const AddrMode &AM) const {
		return AM.HasBaseReg && AM.BaseGV == nullptr && AM.Scale == 0 && AM.BaseOffs == 0;
		efriedmaUnsubmitted Not Done Reply Inline Actions In this case, the addressing mode doesn't actually represent any computation, so it isn't relevant for this transform; when do you expect it to become relevant? efriedma: In this case, the addressing mode doesn't actually represent any computation, so it isn't…
		chillAuthorUnsubmitted Done Reply Inline Actions The logic of the `AddressingModeMatcher` is that it accumulates an addressing mode from partial "expressions". For example an addressing mode which ends up as `[reg + imm]` or `[reg + reg]` would have involved checks for legality of just `[reg]` here: https://github.com/llvm/llvm-project/blob/ddfb974d0fca62e3eaeb98b79b5e29738c9082d2/llvm/lib/CodeGen/CodeGenPrepare.cpp#L4926 chill: The logic of the `AddressingModeMatcher` is that it accumulates an addressing mode from partial…
		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Runtime Library hooks		// Runtime Library hooks
//		//

/// Rename the default libcall routine name for the specified libcall.		/// Rename the default libcall routine name for the specified libcall.
void setLibcallName(RTLIB::Libcall Call, const char *Name) {		void setLibcallName(RTLIB::Libcall Call, const char *Name) {
LibcallRoutineNames[Call] = Name;		LibcallRoutineNames[Call] = Name;
}		}
▲ Show 20 Lines • Show All 2,058 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,194 Lines • ▼ Show 20 Lines	static bool despeculateCountZeros(IntrinsicInst *CountZeros,
// We are explicitly handling the zero case, so we can set the intrinsic's		// We are explicitly handling the zero case, so we can set the intrinsic's
// undefined zero argument to 'true'. This will also prevent reprocessing the		// undefined zero argument to 'true'. This will also prevent reprocessing the
// intrinsic; we only despeculate when a zero input is defined.		// intrinsic; we only despeculate when a zero input is defined.
CountZeros->setArgOperand(1, Builder.getTrue());		CountZeros->setArgOperand(1, Builder.getTrue());
ModifiedDT = ModifyDT::ModifyBBDT;		ModifiedDT = ModifyDT::ModifyBBDT;
return true;		return true;
}		}

		static bool isRegularCall(const CallBase *CB) {
		return !CB->isInlineAsm() && CB->getIntrinsicID() == Intrinsic::not_intrinsic;
		}

		static const CallBase getRegularCall(const Instruction I) {
		if (const auto *CB = dyn_cast<CallBase>(I); CB && isRegularCall(CB))
		return CB;
		return nullptr;
		}

bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {		bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {
BasicBlock *BB = CI->getParent();		BasicBlock *BB = CI->getParent();

// Lower inline assembly if we can.		// Lower inline assembly if we can.
// If we found an inline asm expession, and if the target knows how to		// If we found an inline asm expession, and if the target knows how to
// lower it to normal LLVM code, do so now.		// lower it to normal LLVM code, do so now.
if (CI->isInlineAsm()) {		if (CI->isInlineAsm()) {
if (TLI->ExpandInlineAsm(CI)) {		if (TLI->ExpandInlineAsm(CI)) {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	if (MemTransferInst *MTI = dyn_cast<MemTransferInst>(MI)) {
MTI->setSourceAlignment(SrcAlign);		MTI->setSourceAlignment(SrcAlign);
}		}
}		}

// If we have a cold call site, try to sink addressing computation into the		// If we have a cold call site, try to sink addressing computation into the
// cold block. This interacts with our handling for loads and stores to		// cold block. This interacts with our handling for loads and stores to
// ensure that we can fold all uses of a potential addressing computation		// ensure that we can fold all uses of a potential addressing computation
// into their uses. TODO: generalize this to work over profiling data		// into their uses. TODO: generalize this to work over profiling data
if (CI->hasFnAttr(Attribute::Cold) && !OptSize &&		if (isRegularCall(CI)) {
!llvm::shouldOptimizeForSize(BB, PSI, BFI.get()))
for (auto &Arg : CI->args()) {		for (auto &Arg : CI->args()) {
if (!Arg->getType()->isPointerTy())		if (!Arg->getType()->isPointerTy())
continue;		continue;
unsigned AS = Arg->getType()->getPointerAddressSpace();		unsigned AS = Arg->getType()->getPointerAddressSpace();
if (optimizeMemoryInst(CI, Arg, Arg->getType(), AS))		if (optimizeMemoryInst(CI, Arg, Arg->getType(), AS))
return true;		return true;
}		}
		}

IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);		IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
if (II) {		if (II) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
break;		break;
case Intrinsic::assume:		case Intrinsic::assume:
llvm_unreachable("llvm.assume should have been removed already");		llvm_unreachable("llvm.assume should have been removed already");
▲ Show 20 Lines • Show All 927 Lines • ▼ Show 20 Lines
}		}

namespace {		namespace {

/// A helper class for matching addressing modes.		/// A helper class for matching addressing modes.
///		///
/// This encapsulates the logic for matching the target-legal addressing modes.		/// This encapsulates the logic for matching the target-legal addressing modes.
class AddressingModeMatcher {		class AddressingModeMatcher {
		static constexpr unsigned INVALID_ARG_NO = ~0u;

SmallVectorImpl<Instruction *> &AddrModeInsts;		SmallVectorImpl<Instruction *> &AddrModeInsts;
const TargetLowering &TLI;		const TargetLowering &TLI;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
const DataLayout &DL;		const DataLayout &DL;
const LoopInfo &LI;		const LoopInfo &LI;
const std::function<const DominatorTree &()> getDTFn;		const std::function<const DominatorTree &()> getDTFn;

/// AccessTy/MemoryInst - This is the type for the access (e.g. double) and		/// AccessTy/MemoryInst - This is the type for the access (e.g. double) and
/// the memory instruction that we're computing this address for.		/// the memory instruction that we're computing this address for.
Type *AccessTy;		Type *AccessTy;
unsigned AddrSpace;		unsigned AddrSpace;
		unsigned ArgNo;
Instruction *MemoryInst;		Instruction *MemoryInst;

/// This is the addressing mode that we're building up. This is		/// This is the addressing mode that we're building up. This is
/// part of the return value of this addressing mode matching stuff.		/// part of the return value of this addressing mode matching stuff.
ExtAddrMode &AddrMode;		ExtAddrMode &AddrMode;

/// The instructions inserted by other CodeGenPrepare optimizations.		/// The instructions inserted by other CodeGenPrepare optimizations.
const SetOfInstrs &InsertedInsts;		const SetOfInstrs &InsertedInsts;
Show All 23 Lines	AddressingModeMatcher(
const std::function<const DominatorTree &()> getDTFn, Type *AT,		const std::function<const DominatorTree &()> getDTFn, Type *AT,
unsigned AS, Instruction *MI, ExtAddrMode &AM,		unsigned AS, Instruction *MI, ExtAddrMode &AM,
const SetOfInstrs &InsertedInsts, InstrToOrigTy &PromotedInsts,		const SetOfInstrs &InsertedInsts, InstrToOrigTy &PromotedInsts,
TypePromotionTransaction &TPT,		TypePromotionTransaction &TPT,
std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,
bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI)		bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI)
: AddrModeInsts(AMI), TLI(TLI), TRI(TRI),		: AddrModeInsts(AMI), TLI(TLI), TRI(TRI),
DL(MI->getModule()->getDataLayout()), LI(LI), getDTFn(getDTFn),		DL(MI->getModule()->getDataLayout()), LI(LI), getDTFn(getDTFn),
AccessTy(AT), AddrSpace(AS), MemoryInst(MI), AddrMode(AM),		AccessTy(AT), AddrSpace(AS), ArgNo(INVALID_ARG_NO), MemoryInst(MI),
InsertedInsts(InsertedInsts), PromotedInsts(PromotedInsts), TPT(TPT),		AddrMode(AM), InsertedInsts(InsertedInsts),
LargeOffsetGEP(LargeOffsetGEP), OptSize(OptSize), PSI(PSI), BFI(BFI) {		PromotedInsts(PromotedInsts), TPT(TPT), LargeOffsetGEP(LargeOffsetGEP),
		OptSize(OptSize), PSI(PSI), BFI(BFI) {
IgnoreProfitability = false;		IgnoreProfitability = false;
}		}

public:		public:
/// Find the maximal addressing mode that a load/store of V can fold,		/// Find the maximal addressing mode that a load/store of V can fold,
/// give an access type of AccessTy. This returns a list of involved		/// give an access type of AccessTy. This returns a list of involved
/// instructions in AddrModeInsts.		/// instructions in AddrModeInsts.
/// \p InsertedInsts The instructions inserted by other CodeGenPrepare		/// \p InsertedInsts The instructions inserted by other CodeGenPrepare
/// optimizations.		/// optimizations.
/// \p PromotedInsts maps the instructions to their type before promotion.		/// \p PromotedInsts maps the instructions to their type before promotion.
/// \p The ongoing transaction where every action should be registered.		/// \p The ongoing transaction where every action should be registered.
static ExtAddrMode		static ExtAddrMode
Match(Value V, Type AccessTy, unsigned AS, Instruction *MemoryInst,		Match(Value V, Type AccessTy, unsigned AS, Instruction *MemoryInst,
SmallVectorImpl<Instruction *> &AddrModeInsts,		SmallVectorImpl<Instruction *> &AddrModeInsts,
const TargetLowering &TLI, const LoopInfo &LI,		const TargetLowering &TLI, const LoopInfo &LI,
const std::function<const DominatorTree &()> getDTFn,		const std::function<const DominatorTree &()> getDTFn,
const TargetRegisterInfo &TRI, const SetOfInstrs &InsertedInsts,		const TargetRegisterInfo &TRI, const SetOfInstrs &InsertedInsts,
InstrToOrigTy &PromotedInsts, TypePromotionTransaction &TPT,		InstrToOrigTy &PromotedInsts, TypePromotionTransaction &TPT,
std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,
bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {		bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {
ExtAddrMode Result;		ExtAddrMode Result;

bool Success = AddressingModeMatcher(AddrModeInsts, TLI, TRI, LI, getDTFn,		AddressingModeMatcher Matcher(
AccessTy, AS, MemoryInst, Result,		AddrModeInsts, TLI, TRI, LI, getDTFn, AccessTy, AS, MemoryInst, Result,
InsertedInsts, PromotedInsts, TPT,		InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI, BFI);
LargeOffsetGEP, OptSize, PSI, BFI)		if (const CallBase *CB = getRegularCall(MemoryInst)) {
.matchAddr(V, 0);		unsigned AN = 0;
		for (const Use &U : CB->args()) {
		if (U == V) {
		Matcher.ArgNo = AN;
		break;
		}
		++AN;
		}
		}
		bool Success = Matcher.matchAddr(V, 0);

(void)Success;		(void)Success;
assert(Success && "Couldn't select anything?");		assert(Success && "Couldn't select anything?");
return Result;		return Result;
}		}

private:		private:
bool matchScaledValue(Value *ScaleReg, int64_t Scale, unsigned Depth);		bool matchScaledValue(Value *ScaleReg, int64_t Scale, unsigned Depth);
bool matchAddr(Value *Addr, unsigned Depth);		bool matchAddr(Value *Addr, unsigned Depth);
bool matchOperationAddr(User *AddrInst, unsigned Opcode, unsigned Depth,		bool matchOperationAddr(User *AddrInst, unsigned Opcode, unsigned Depth,
bool *MovedAway = nullptr);		bool *MovedAway = nullptr);
bool isProfitableToFoldIntoAddressingMode(Instruction *I,		bool isProfitableToFoldIntoAddressingMode(Instruction *I,
ExtAddrMode &AMBefore,		ExtAddrMode &AMBefore,
ExtAddrMode &AMAfter);		ExtAddrMode &AMAfter);
bool valueAlreadyLiveAtInst(Value Val, Value KnownLive1, Value *KnownLive2);		bool valueAlreadyLiveAtInst(Value Val, Value KnownLive1, Value *KnownLive2);
bool isPromotionProfitable(unsigned NewCost, unsigned OldCost,		bool isPromotionProfitable(unsigned NewCost, unsigned OldCost,
Value *PromotedOperand) const;		Value *PromotedOperand) const;
		bool canFoldAddr(const ExtAddrMode &TestAddrMode) const;
};		};

class PhiNodeSet;		class PhiNodeSet;

/// An iterator for PhiNodeSet.		/// An iterator for PhiNodeSet.
class PhiNodeSetIterator {		class PhiNodeSetIterator {
PhiNodeSet *const Set;		PhiNodeSet *const Set;
size_t CurrentIndex = 0;		size_t CurrentIndex = 0;
▲ Show 20 Lines • Show All 648 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::matchScaledValue(Value *ScaleReg, int64_t Scale,
ExtAddrMode TestAddrMode = AddrMode;		ExtAddrMode TestAddrMode = AddrMode;

// Add scale to turn X4+X3 -> X*7. This could also do things like		// Add scale to turn X4+X3 -> X*7. This could also do things like
// [A+B + A7] -> [B+A8].		// [A+B + A7] -> [B+A8].
TestAddrMode.Scale += Scale;		TestAddrMode.Scale += Scale;
TestAddrMode.ScaledReg = ScaleReg;		TestAddrMode.ScaledReg = ScaleReg;

// If the new address isn't legal, bail out.		// If the new address isn't legal, bail out.
if (!TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace))		if (!canFoldAddr(TestAddrMode))
return false;		return false;

// It was legal, so commit it.		// It was legal, so commit it.
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;

// Okay, we decided that we can add ScaleReg+Scale to AddrMode. Check now		// Okay, we decided that we can add ScaleReg+Scale to AddrMode. Check now
// to see if ScaleReg is actually X+C. If so, we can turn this into adding		// to see if ScaleReg is actually X+C. If so, we can turn this into adding
// XScale + CScale to addr mode. If we found available IV increment, do not		// XScale + CScale to addr mode. If we found available IV increment, do not
// go any further: we can reuse it and cannot eliminate it.		// go any further: we can reuse it and cannot eliminate it.
ConstantInt *CI = nullptr;		ConstantInt *CI = nullptr;
Value *AddLHS = nullptr;		Value *AddLHS = nullptr;
if (isa<Instruction>(ScaleReg) && // not a constant expr.		if (isa<Instruction>(ScaleReg) && // not a constant expr.
match(ScaleReg, m_Add(m_Value(AddLHS), m_ConstantInt(CI))) &&		match(ScaleReg, m_Add(m_Value(AddLHS), m_ConstantInt(CI))) &&
!isIVIncrement(ScaleReg, &LI) && CI->getValue().isSignedIntN(64)) {		!isIVIncrement(ScaleReg, &LI) && CI->getValue().isSignedIntN(64)) {
TestAddrMode.InBounds = false;		TestAddrMode.InBounds = false;
TestAddrMode.ScaledReg = AddLHS;		TestAddrMode.ScaledReg = AddLHS;
TestAddrMode.BaseOffs += CI->getSExtValue() * TestAddrMode.Scale;		TestAddrMode.BaseOffs += CI->getSExtValue() * TestAddrMode.Scale;

// If this addressing mode is legal, commit it and remember that we folded		// If this addressing mode is legal, commit it and remember that we folded
// this instruction.		// this instruction.
if (TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace)) {		if (canFoldAddr(TestAddrMode)) {
AddrModeInsts.push_back(cast<Instruction>(ScaleReg));		AddrModeInsts.push_back(cast<Instruction>(ScaleReg));
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;
return true;		return true;
}		}
// Restore status quo.		// Restore status quo.
TestAddrMode = AddrMode;		TestAddrMode = AddrMode;
}		}

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (auto IVStep = GetConstantStep(ScaleReg)) {
APInt Offset = Step * AddrMode.Scale;		APInt Offset = Step * AddrMode.Scale;
if (Offset.isSignedIntN(64)) {		if (Offset.isSignedIntN(64)) {
TestAddrMode.InBounds = false;		TestAddrMode.InBounds = false;
TestAddrMode.ScaledReg = IVInc;		TestAddrMode.ScaledReg = IVInc;
TestAddrMode.BaseOffs -= Offset.getLimitedValue();		TestAddrMode.BaseOffs -= Offset.getLimitedValue();
// If this addressing mode is legal, commit it..		// If this addressing mode is legal, commit it..
// (Note that we defer the (expensive) domtree base legality check		// (Note that we defer the (expensive) domtree base legality check
// to the very last possible point.)		// to the very last possible point.)
if (TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace) &&		if (canFoldAddr(TestAddrMode) &&
getDTFn().dominates(IVInc, MemoryInst)) {		getDTFn().dominates(IVInc, MemoryInst)) {
AddrModeInsts.push_back(cast<Instruction>(IVInc));		AddrModeInsts.push_back(cast<Instruction>(IVInc));
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;
return true;		return true;
}		}
// Restore status quo.		// Restore status quo.
TestAddrMode = AddrMode;		TestAddrMode = AddrMode;
}		}
▲ Show 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::isPromotionProfitable(
if (NewCost < OldCost)		if (NewCost < OldCost)
return true;		return true;
// The promotion is neutral but it may help folding the sign extension in		// The promotion is neutral but it may help folding the sign extension in
// loads for instance.		// loads for instance.
// Check that we did not create an illegal instruction.		// Check that we did not create an illegal instruction.
return isPromotedInstructionLegal(TLI, DL, PromotedOperand);		return isPromotedInstructionLegal(TLI, DL, PromotedOperand);
}		}

		bool AddressingModeMatcher::canFoldAddr(const ExtAddrMode &TestAddrMode) const {
		if (ArgNo != INVALID_ARG_NO) {
		// Check if the address is "foldable" into a regular call.
		const auto &CB = cast<CallBase>(*MemoryInst);
		if (TLI.canFoldAddrModeIntoCall(CB, ArgNo, TestAddrMode))
		return true;

		// Even if it isn't, accept it if we have a cold call, but still require
		// legal addressing mode. This limits the amount of code we potentially
		// sink.
		if (!CB.hasFnAttr(Attribute::Cold) \|\| OptSize \|\|
		llvm::shouldOptimizeForSize(CB.getParent(), PSI, BFI))
		return false;
		}

		return TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace);
		}

/// Given an instruction or constant expr, see if we can fold the operation		/// Given an instruction or constant expr, see if we can fold the operation
/// into the addressing mode. If so, update the addressing mode and return		/// into the addressing mode. If so, update the addressing mode and return
/// true, otherwise return false without modifying AddrMode.		/// true, otherwise return false without modifying AddrMode.
/// If \p MovedAway is not NULL, it contains the information of whether or		/// If \p MovedAway is not NULL, it contains the information of whether or
/// not AddrInst has to be folded into the addressing mode on success.		/// not AddrInst has to be folded into the addressing mode on success.
/// If \p MovedAway == true, \p AddrInst will not be part of the addressing		/// If \p MovedAway == true, \p AddrInst will not be part of the addressing
/// because it has been moved away.		/// because it has been moved away.
/// Thus AddrInst must not be added in the matched instructions.		/// Thus AddrInst must not be added in the matched instructions.
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	for (unsigned i = 1, e = AddrInst->getNumOperands(); i != e; ++i, ++GTI) {
VariableScale = TypeSize;		VariableScale = TypeSize;
}		}
}		}
}		}

// A common case is for the GEP to only do a constant offset. In this case,		// A common case is for the GEP to only do a constant offset. In this case,
// just add it to the disp field and check validity.		// just add it to the disp field and check validity.
if (VariableOperand == -1) {		if (VariableOperand == -1) {
		bool InBounds = AddrMode.InBounds;
		int64_t BaseOffs = AddrMode.BaseOffs;
AddrMode.BaseOffs += ConstantOffset;		AddrMode.BaseOffs += ConstantOffset;
if (matchAddr(AddrInst->getOperand(0), Depth + 1)) {
if (!cast<GEPOperator>(AddrInst)->isInBounds())		if (!cast<GEPOperator>(AddrInst)->isInBounds())
AddrMode.InBounds = false;		AddrMode.InBounds = false;
		if (matchAddr(AddrInst->getOperand(0), Depth + 1))
return true;		return true;
}		AddrMode.InBounds = InBounds;
AddrMode.BaseOffs -= ConstantOffset;		AddrMode.BaseOffs = BaseOffs;

if (EnableGEPOffsetSplit && isa<GetElementPtrInst>(AddrInst) &&		if (EnableGEPOffsetSplit && isa<GetElementPtrInst>(AddrInst) &&
TLI.shouldConsiderGEPOffsetSplit() && Depth == 0 &&		TLI.shouldConsiderGEPOffsetSplit() && Depth == 0 &&
ConstantOffset > 0) {		ConstantOffset > 0) {
// Record GEPs with non-zero offsets as candidates for splitting in		// Record GEPs with non-zero offsets as candidates for splitting in
// the event that the offset cannot fit into the r+i addressing mode.		// the event that the offset cannot fit into the r+i addressing mode.
// Simple and common case that only one GEP is used in calculating the		// Simple and common case that only one GEP is used in calculating the
// address for the memory access.		// address for the memory access.
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::matchAddr(Value *Addr, unsigned Depth) {
// Start a transaction at this point that we will rollback if the matching		// Start a transaction at this point that we will rollback if the matching
// fails.		// fails.
TypePromotionTransaction::ConstRestorationPt LastKnownGood =		TypePromotionTransaction::ConstRestorationPt LastKnownGood =
TPT.getRestorationPoint();		TPT.getRestorationPoint();
if (ConstantInt *CI = dyn_cast<ConstantInt>(Addr)) {		if (ConstantInt *CI = dyn_cast<ConstantInt>(Addr)) {
if (CI->getValue().isSignedIntN(64)) {		if (CI->getValue().isSignedIntN(64)) {
// Fold in immediates if legal for the target.		// Fold in immediates if legal for the target.
AddrMode.BaseOffs += CI->getSExtValue();		AddrMode.BaseOffs += CI->getSExtValue();
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.BaseOffs -= CI->getSExtValue();		AddrMode.BaseOffs -= CI->getSExtValue();
}		}
} else if (GlobalValue *GV = dyn_cast<GlobalValue>(Addr)) {		} else if (GlobalValue *GV = dyn_cast<GlobalValue>(Addr)) {
// If this is a global variable, try to fold it into the addressing mode.		// If this is a global variable, try to fold it into the addressing mode.
if (!AddrMode.BaseGV) {		if (!AddrMode.BaseGV) {
AddrMode.BaseGV = GV;		AddrMode.BaseGV = GV;
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.BaseGV = nullptr;		AddrMode.BaseGV = nullptr;
}		}
} else if (Instruction *I = dyn_cast<Instruction>(Addr)) {		} else if (Instruction *I = dyn_cast<Instruction>(Addr)) {
ExtAddrMode BackupAddrMode = AddrMode;		ExtAddrMode BackupAddrMode = AddrMode;
unsigned OldSize = AddrModeInsts.size();		unsigned OldSize = AddrModeInsts.size();

// Check to see if it is possible to fold this operation.		// Check to see if it is possible to fold this operation.
Show All 26 Lines	if (ConstantInt *CI = dyn_cast<ConstantInt>(Addr)) {
return true;		return true;
}		}

// Worse case, the target should support [reg] addressing modes. :)		// Worse case, the target should support [reg] addressing modes. :)
if (!AddrMode.HasBaseReg) {		if (!AddrMode.HasBaseReg) {
AddrMode.HasBaseReg = true;		AddrMode.HasBaseReg = true;
AddrMode.BaseReg = Addr;		AddrMode.BaseReg = Addr;
// Still check for legality in case the target supports [imm] but not [i+r].		// Still check for legality in case the target supports [imm] but not [i+r].
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.HasBaseReg = false;		AddrMode.HasBaseReg = false;
AddrMode.BaseReg = nullptr;		AddrMode.BaseReg = nullptr;
}		}

// If the base register is already taken, see if we can do [r+r].		// If the base register is already taken, see if we can do [r+r].
if (AddrMode.Scale == 0) {		if (AddrMode.Scale == 0) {
AddrMode.Scale = 1;		AddrMode.Scale = 1;
AddrMode.ScaledReg = Addr;		AddrMode.ScaledReg = Addr;
if (TLI.isLegalAddressingMode(DL, AddrMode, AccessTy, AddrSpace))		if (canFoldAddr(AddrMode))
return true;		return true;
AddrMode.Scale = 0;		AddrMode.Scale = 0;
AddrMode.ScaledReg = nullptr;		AddrMode.ScaledReg = nullptr;
}		}
// Couldn't match.		// Couldn't match.
TPT.rollback(LastKnownGood);		TPT.rollback(LastKnownGood);
return false;		return false;
}		}
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	if (CallInst *CI = dyn_cast<CallInst>(UserI)) {
// If this is a cold call, we can sink the addressing calculation into		// If this is a cold call, we can sink the addressing calculation into
// the cold path. See optimizeCallInst		// the cold path. See optimizeCallInst
bool OptForSize =		bool OptForSize =
OptSize \|\| llvm::shouldOptimizeForSize(CI->getParent(), PSI, BFI);		OptSize \|\| llvm::shouldOptimizeForSize(CI->getParent(), PSI, BFI);
if (!OptForSize)		if (!OptForSize)
continue;		continue;
}		}

InlineAsm *IA = dyn_cast<InlineAsm>(CI->getCalledOperand());		if (InlineAsm *IA = dyn_cast<InlineAsm>(CI->getCalledOperand())) {
if (!IA)
return true;

// If this is a memory operand, we're cool, otherwise bail out.		// If this is a memory operand, we're cool, otherwise bail out.
if (!IsOperandAMemoryOperand(CI, IA, I, TLI, TRI))		if (!IsOperandAMemoryOperand(CI, IA, I, TLI, TRI))
return true;		return true;
continue;		continue;
}		}

		// Intrinsics are handles elsewhere and we can't quite handle non-pointer
		// types yet.
		if (isa<IntrinsicInst>(UserI) \|\| !I->getType()->isPointerTy())
		return true;

		MemoryUses.push_back({&U, I->getType()});
		continue;
		}

if (FindAllMemoryUses(UserI, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,		if (FindAllMemoryUses(UserI, MemoryUses, ConsideredInsts, TLI, TRI, OptSize,
PSI, BFI, SeenInsts))		PSI, BFI, SeenInsts))
return true;		return true;
}		}

return false;		return false;
}		}

▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,
0);		0);
TypePromotionTransaction::ConstRestorationPt LastKnownGood =		TypePromotionTransaction::ConstRestorationPt LastKnownGood =
TPT.getRestorationPoint();		TPT.getRestorationPoint();
AddressingModeMatcher Matcher(MatchedAddrModeInsts, TLI, TRI, LI, getDTFn,		AddressingModeMatcher Matcher(MatchedAddrModeInsts, TLI, TRI, LI, getDTFn,
AddressAccessTy, AS, UserI, Result,		AddressAccessTy, AS, UserI, Result,
InsertedInsts, PromotedInsts, TPT,		InsertedInsts, PromotedInsts, TPT,
LargeOffsetGEP, OptSize, PSI, BFI);		LargeOffsetGEP, OptSize, PSI, BFI);
Matcher.IgnoreProfitability = true;		Matcher.IgnoreProfitability = true;
		if (const CallBase *CB = getRegularCall(UserI)) {
		unsigned AN = 0;
		for (const Use &U : CB->args()) {
		if (U == Address) {
		Matcher.ArgNo = AN;
		break;
		}
		++AN;
		}
		}
bool Success = Matcher.matchAddr(Address, 0);		bool Success = Matcher.matchAddr(Address, 0);
(void)Success;		(void)Success;
assert(Success && "Couldn't select anything?");		assert(Success && "Couldn't select anything?");

// The match was to check the profitability, the changes made are not		// The match was to check the profitability, the changes made are not
// part of the original matcher. Therefore, they should be dropped		// part of the original matcher. Therefore, they should be dropped
// otherwise the original matcher will not present the right state.		// otherwise the original matcher will not present the right state.
TPT.rollback(LastKnownGood);		TPT.rollback(LastKnownGood);
▲ Show 20 Lines • Show All 3,407 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	public:
bool isComplexDeinterleavingOperationSupported(		bool isComplexDeinterleavingOperationSupported(
ComplexDeinterleavingOperation Operation, Type *Ty) const override;		ComplexDeinterleavingOperation Operation, Type *Ty) const override;

Value *createComplexDeinterleavingIR(		Value *createComplexDeinterleavingIR(
Instruction *I, ComplexDeinterleavingOperation OperationType,		Instruction *I, ComplexDeinterleavingOperation OperationType,
ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,		ComplexDeinterleavingRotation Rotation, Value InputA, Value InputB,
Value *Accumulator = nullptr) const override;		Value *Accumulator = nullptr) const override;

		bool canFoldAddrModeIntoCall(const CallBase &CB, unsigned ArgNo,
		const AddrMode &AM) const override;

bool hasBitPreservingFPLogic(EVT VT) const override {		bool hasBitPreservingFPLogic(EVT VT) const override {
// FIXME: Is this always true? It should be true for vectors at least.		// FIXME: Is this always true? It should be true for vectors at least.
return VT == MVT::f32 \|\| VT == MVT::f64;		return VT == MVT::f32 \|\| VT == MVT::f64;
}		}

bool supportSplitCSR(MachineFunction *MF) const override {		bool supportSplitCSR(MachineFunction *MF) const override {
return MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS &&		return MF->getFunction().getCallingConv() == CallingConv::CXX_FAST_TLS &&
MF->getFunction().hasFnAttribute(Attribute::NoUnwind);		MF->getFunction().hasFnAttribute(Attribute::NoUnwind);
▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 24,341 Lines • ▼ Show 20 Lines	if (OperationType == ComplexDeinterleavingOperation::CAdd) {
if (IntId == Intrinsic::not_intrinsic)		if (IntId == Intrinsic::not_intrinsic)
return nullptr;		return nullptr;

return B.CreateIntrinsic(IntId, Ty, {InputA, InputB});		return B.CreateIntrinsic(IntId, Ty, {InputA, InputB});
}		}

return nullptr;		return nullptr;
}		}

		bool AArch64TargetLowering::canFoldAddrModeIntoCall(const CallBase &CB,
		unsigned ArgNo,
		const AddrMode &AM) const {
		// We should always accept a single base register.
		if (TargetLowering::canFoldAddrModeIntoCall(CB, ArgNo, AM))
		return true;

		if (ArgNo > 7 \|\| AM.BaseGV \|\| !AM.HasBaseReg)
		efriedmaUnsubmitted Not Done Reply Inline Actions Counting arguments like this won't do what you want if there are arguments which are passed in multiple registers. efriedma: Counting arguments like this won't do what you want if there are arguments which are passed in…
		chillAuthorUnsubmitted Done Reply Inline Actions Indeed. For now, I'm proposing a different approach, doing the transformation entirely in `MachineSink`. The relevant review is https://reviews.llvm.org/D152828 If it doesn't fly, I'll come back to this review and think how to solve this above issue. chill: Indeed. For now, I'm proposing a different approach, doing the transformation entirely in…
		return false;

		// For more complex addressing modes, check the possibility of a cheap
		// materialisation into an argument register.

		// reg + imm
		if (AM.Scale == 0)
		return isLegalAddImmediate(AM.BaseOffs);

		if (AM.BaseOffs != 0)
		return false;

		// reg + scale * reg
		if (AM.Scale == 1)
		return true;

		// Some CPUs have fast `reg + scale * reg` instruction, for scales of 2, 4, 8, and 16.
		if (!Subtarget->hasLSLFast() \|\| AM.Scale <= 0)
		return false;

		uint64_t S = uint64_t(AM.Scale);
		return (S & (S - 1)) == 0 && S <= 16;
		}

llvm/test/CodeGen/AArch64/call-addr-fold.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -codegenprepare < %s \| FileCheck %s
				target triple = "aarch64-linux"

				declare void @use(...)

				define i32 @f0(i1 %c1, ptr %p) {
				; CHECK-LABEL: @f0(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SUNKADDR:%.]] = getelementptr i8, ptr [[P:%.]], i64 8
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[SUNKADDR]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[SUNKADDR1:%.*]] = getelementptr i8, ptr [[P]], i64 8
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[SUNKADDR1]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i32, ptr %p, i32 2
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

				define i32 @f1(i1 %c1, ptr %p, i64 %i) {
				; CHECK-LABEL: @f1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SUNKADDR:%.]] = getelementptr i8, ptr [[P:%.]], i64 [[I:%.*]]
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[SUNKADDR]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[SUNKADDR1:%.*]] = getelementptr i8, ptr [[P]], i64 [[I]]
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[SUNKADDR1]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i8, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

				define i32 @f2(i1 %c1, ptr %p, i64 %i) {
				; CHECK-LABEL: @f2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A:%.]] = getelementptr i32, ptr [[P:%.]], i64 [[I:%.*]]
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[A]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[A]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i32, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}


				define i32 @f3(i1 %c1, ptr %p, i64 %i) "target-cpu"="neoverse-n1" {
				; CHECK-LABEL: @f3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SUNKADDR:%.]] = mul i64 [[I:%.]], 4
				; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, ptr [[P:%.]], i64 [[SUNKADDR]]
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[SUNKADDR1]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[SUNKADDR2:%.*]] = mul i64 [[I]], 4
				; CHECK-NEXT: [[SUNKADDR3:%.*]] = getelementptr i8, ptr [[P]], i64 [[SUNKADDR2]]
				; CHECK-NEXT: [[V1:%.*]] = load i32, ptr [[SUNKADDR3]], align 4
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i32, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = load i32, ptr %a
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

				define i32 @f4(i1 %c1, ptr %p, i64 %i) {
				; CHECK-LABEL: @f4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SUNKADDR:%.]] = getelementptr i8, ptr [[P:%.]], i64 [[I:%.*]]
				; CHECK-NEXT: [[V0:%.*]] = call i32 @use(ptr [[SUNKADDR]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[SUNKADDR1:%.*]] = getelementptr i8, ptr [[P]], i64 [[I]]
				; CHECK-NEXT: [[V1:%.*]] = call i32 @use(i32 1, ptr [[SUNKADDR1]])
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: [[V:%.*]] = phi i32 [ [[V0]], [[IF_THEN]] ], [ [[V1]], [[IF_ELSE]] ]
				; CHECK-NEXT: ret i32 [[V]]
				;
				entry:
				%a = getelementptr i8, ptr %p, i64 %i
				br i1 %c1, label %if.then, label %if.else

				if.then:
				%v0 = call i32 @use(ptr %a)
				br label %exit

				if.else:
				%v1 = call i32 @use(i32 1, ptr %a)
				br label %exit

				exit:
				%v = phi i32 [%v0, %if.then], [%v1, %if.else]
				ret i32 %v
				}

llvm/test/CodeGen/X86/dagcombine-tokenfactor-limit-crash.ll

	; RUN: llc %s -combiner-tokenfactor-inline-limit=5 -o - \| FileCheck %s			; RUN: llc %s -combiner-tokenfactor-inline-limit=5 -o - \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%struct.snork = type { i8 }			%struct.snork = type { i8 }
	%struct.wombat = type { [15 x i32] }			%struct.wombat = type { [15 x i32] }

	; CHECK: pushq %rbx			; CHECK: pushq %rbx
	; CHECK-NEXT: andq $-32, %rsp			; CHECK-NEXT: andq $-32, %rsp
	; CHECK-NEXT: subq $66144, %rsp # imm = 0x10260			; CHECK-NEXT: subq $66144, %rsp # imm = 0x10260
	; CHECK-NEXT: .cfi_offset %rbx, -24			; CHECK-NEXT: .cfi_offset %rbx, -24
				; CHECK-NEXT: movq %rdi, %rbx
	; CHECK-NEXT: movabsq $-868076584853899022, %rax # imm = 0xF3F3F8F201F2F8F2			; CHECK-NEXT: movabsq $-868076584853899022, %rax # imm = 0xF3F3F8F201F2F8F2
	; CHECK-NEXT: movq %rax, (%rsp)			; CHECK-NEXT: movq %rax, (%rsp)
	; CHECK-NEXT: movb $-13, 8263(%rsp)			; CHECK-NEXT: movb $-13, 8263(%rsp)
	; CHECK-NEXT: movq %rdi, %rbx
	; CHECK-NEXT: callq hoge			; CHECK-NEXT: callq hoge
	; CHECK-NEXT: movq %rbx, %rdi			; CHECK-NEXT: movq %rbx, %rdi
	; CHECK-NEXT: callq hoge			; CHECK-NEXT: callq hoge
	; CHECK-NEXT: callq hoge			; CHECK-NEXT: callq hoge
	; CHECK-NEXT: callq hoge			; CHECK-NEXT: callq hoge
	; CHECK-NEXT: callq eggs			; CHECK-NEXT: callq eggs
	; CHECK-NEXT: callq hoge			; CHECK-NEXT: callq hoge
	; CHECK-NEXT: movq %rbx, %rax			; CHECK-NEXT: movq %rbx, %rax
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Relax conditions for folding addressing mode into loads/storesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 496899

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/call-addr-fold.ll

llvm/test/CodeGen/X86/dagcombine-tokenfactor-limit-crash.ll

[CodeGenPrepare] Relax conditions for folding addressing mode into loads/stores
AbandonedPublic