This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64.h
3
AArch64ISelLowering.cpp
10
AArch64LoadStoreInterleave.cpp
-
AArch64TargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-variadic-aapcs.ll
-
arm64-virtual_base.ll
-
func-calls.ll
-
memcpy-f128.ll
-
optimal-load-store-pairs.ll

Differential D6054

[AArch64] Inline memcpy() as a sequence of ldp-stp with 64-bit registers
AbandonedPublic

Authored by sdmitrouk on Oct 31 2014, 7:25 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy
Jiangning

Summary

Here is the result of my tries to make memcpy() inlined in an "optimal" way, which means interleaved load/store pair instructions that use 64-bit registers.

It was suggested to make this in AArch64LoadStoreOptimizer pass, which did work until PostRA Machine Instruction Scheduler was enabled for AArch64 target, hence it became a separate pass that runs after PostRA MISched. The pass is disabled by default, but changes in tests make them pass with and without the pass.

When ldr/str is in the middle they are reordered as well except for cases like:

ldr
ldp
stp
str

which occur only on copying small amount of data and I'm not sure if its worth reordering them to

ldr
str
ldp
stp

but that can be done.

Unfortunately, I don't have AArch64 hardware to run performance test yet so I can't back it up with numbers, but such sequence was claimed to be preferred. At least this gives a way to test it. Or it can just be here for now.

Diff Detail

Event Timeline

sdmitrouk updated this revision to Diff 15614.Oct 31 2014, 7:25 AM

sdmitrouk retitled this revision from to [AArch64] Inline memcpy() as a sequence of ldp-stp with 64-bit registers.

sdmitrouk updated this object.

sdmitrouk edited the test plan for this revision. (Show Details)

sdmitrouk added reviewers: jmolloy, t.p.northover, Jiangning.

sdmitrouk set the repository for this revision to rL LLVM.

sdmitrouk added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 31 2014, 7:25 AM

Hi,

I think the principle here is OK. It'd have been nicer if we could convince the scheduler to do this instead, rather than going behind its back though. Have you talked to Andy Trick or Dave Estes to work out if this is possible?

Comments inline.

I'd also like Tim's signoff before this goes in.

James

lib/Target/AArch64/AArch64ISelLowering.cpp
474	s/af/as
lib/Target/AArch64/AArch64LoadStoreInterleave.cpp
26	The important thing is that we have ldp/stp in that order, ideally with increasing addresses. We don't need to cluster them all together - it's the ordering of memory operations that counts I think. So we can have: ldp stp add # unrelated operation ldp stp This should be fine, and may be a good thing, depending on the microarchitecture.
49	This is a fairly generic statistic name. Something more A64 specific perhaps?
145	Wouldn't isSafeToSpeculate() conservatively do the same job here?
206	This was not mentioned in the comment; why should all loads come before all stores?
291	Here and elsewhere: single-line if's should have their {}'s removed.

This revision now requires changes to proceed.Nov 3 2014, 4:04 AM

Addressed some of comments (will comment on the rest):

Fix typo s/af/as.
NumSequences => NumLdStSequencesUpdated.
Fixed rather confusing comment.
Removed extra curly braces around body of single-line if statements.

Hi James,

It'd have been nicer if we could convince the scheduler to do this instead, rather than going behind its back though.

That's what I've tried initially, but wasn't able to do.

Have you talked to Andy Trick or Dave Estes to work out if this is possible?

I don't think so, there were a couple of related threads on llvm-dev, and doing
this similar to load/store optimizer was the only proposed solution. Scheduler
doesn't seem to have extension points where one could provide hints about
instruction, at least I didn't find a way other than to subclass it.

Comments inline.

Those not answered inline here are addressed in newer revision.

Sergey

lib/Target/AArch64/AArch64LoadStoreInterleave.cpp
26	I'll try that, it shouldn't require a lot of changes. ideally with increasing addresses Actually, input of the pass is already in reverse order: ldp x10, x11, [x8, #48] stp x10, x11, [x9, #48] ldp x10, x11, [x8, #32] stp x10, x11, [x9, #32] ldp x10, x11, [x8, #16] stp x10, x11, [x9, #16] ldp x10, x8, [x8] stp x10, x8, [x9] which might come from `getMemcpyLoadsAndStores()` in `SelectionDAG.cpp`, which doesn't specify order.
145	I don't see any function with exactly this name, functions with similar name don't seem to fit and some are also static.
206	Wrong comment, I meant that first load should go before first store.

I'm really not sure about this one. I agree with James that hacking around with the instructions after the scheduler seems really iffy. It sounds much more like we're hitting a scheduler defect that we want to fix properly instead, unless it's a constraint that's just impossible to represent.

Perhaps some kind of forwarding from a load to a dependent store has been omitted?

I've also got some other issues with the actual implementation.

Cheers.

Tim.

lib/Target/AArch64/AArch64ISelLowering.cpp
6615–6620	How general is this? We should be writing for future cores as well as existing ones, and always preferring 64-bit operations seems like it'll be more and more of an oddity in future. It also seems like it belongs in a completely separate patch to the interleaving one.
lib/Target/AArch64/AArch64LoadStoreInterleave.cpp
242–245	This seems like a really fragile way to do this. It's only ever going to work on a basic block with a single memcpy operation and no other loads/stores.

It sounds much more like we're hitting a scheduler defect that we want to fix properly instead, unless it's a constraint that's just impossible to represent.

The scheduler seems to do its job correctly for generic case, but it seems to be missing
information about instruction operands. In this case it could ignore latency of ldp when
it's followed by stp with same operands.

Perhaps some kind of forwarding from a load to a dependent store has been omitted?

I tried gluing and/or combining nodes in a lot of ways, scheduler doesn't care about any
of these. Another way would be to introduce pseudo-instruction and expand it after
scheduling, but it requires temporary registers and its too late to allocate registers
at that point.

Regards,
Sergey

lib/Target/AArch64/AArch64ISelLowering.cpp
6615–6620	How general is this? We should be writing for future cores as well as existing ones, and always preferring 64-bit operations seems like it'll be more and more of an oddity in future. If I get it right, the problem with 128-bit registers is that they are floating point registers rather than general purpose ones, so as long as there is no 128-bit GP registers, this should hold. It also seems like it belongs in a completely separate patch to the interleaving one. The pass is there to allow better code generation for `memcpy()`, but they can be separated technically (pass can go first, maybe with slightly changed tests).
lib/Target/AArch64/AArch64LoadStoreInterleave.cpp
242–245	It's only ever going to work on a basic block with a single memcpy operation and no other loads/stores. The condition isn't exactly this one, but it does have similar constrain. Next revision that works on pairs of instructions should change this.

Changed code to process pairs of loads and stores rather than all of them inside a basic block.
Removed changes that are not directly related to new pass.

Tim, James,

I've updated the diff, but I'll also ask Andy Trick and/or Dave Estes if there
is a better way that involves instruction scheduler using this review as an
example of what I want to achieve.

Cheers,
Sergey

In this case it could ignore latency of ldp when it's followed by stp with same operands.

I don't think that's right. We're not magically going to make the stp less quick, we can just issue them back to back in the same cycle. Potentially a ScheduleHazardRecognizer might be the right thing here?

In this case it could ignore latency of ldp when it's followed by stp with same operands.

I don't think that's right. We're not magically going to make the stp less quick, we can just issue them back to back in the same cycle.

Well, I didn't assume there is some magic, what I meant is that when scheduler looks for the next instruction after stp, ldp should be the best match among all predecessors.

Potentially a ScheduleHazardRecognizer might be the right thing here?

From its description, I'd say that it does the opposite: allows to postpone execution of some instruction till the next cycle.

Dave's advice to look at clustering in scheduler applied through DAG mutations almost worked, the only issue is that some "free" instructions can still be inserted between ldp and stp (previously it were instructions that compute addresses), but not sure this can be solved using clustering. The next thing is custom scheduling strategy, it might be an option, but I just started trying adding it.

Custom instruction scheduler (actually, both pre-RA and post-RA might be needed) might be a solution, but it replaces generic scheduler, which will only make scheduling worse.

No more ideas, looks like such instruction interleaving can't be achieved in LLVM with reasonable effort.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64.h

1 line

AArch64ISelLowering.cpp

39 lines

AArch64LoadStoreInterleave.cpp

331 lines

AArch64TargetMachine.cpp

10 lines

CMakeLists.txt

1 line

test/

CodeGen/

AArch64/

arm64-variadic-aapcs.ll

2 lines

arm64-virtual_base.ll

2 lines

func-calls.ll

10 lines

memcpy-f128.ll

2 lines

optimal-load-store-pairs.ll

66 lines

Diff 15691

lib/Target/AArch64/AArch64.h

	Show All 29 Lines
	FunctionPass *createAArch64ConditionalCompares();			FunctionPass *createAArch64ConditionalCompares();
	FunctionPass *createAArch64AdvSIMDScalar();			FunctionPass *createAArch64AdvSIMDScalar();
	FunctionPass *createAArch64BranchRelaxation();			FunctionPass *createAArch64BranchRelaxation();
	FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,			FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	FunctionPass *createAArch64StorePairSuppressPass();			FunctionPass *createAArch64StorePairSuppressPass();
	FunctionPass *createAArch64ExpandPseudoPass();			FunctionPass *createAArch64ExpandPseudoPass();
	FunctionPass *createAArch64LoadStoreOptimizationPass();			FunctionPass *createAArch64LoadStoreOptimizationPass();
				FunctionPass *createAArch64LoadStoreInterleavePass();
	ModulePass *createAArch64PromoteConstantPass();			ModulePass *createAArch64PromoteConstantPass();
	FunctionPass *createAArch64ConditionOptimizerPass();			FunctionPass *createAArch64ConditionOptimizerPass();
	FunctionPass *createAArch64AddressTypePromotionPass();			FunctionPass *createAArch64AddressTypePromotionPass();
	FunctionPass *createAArch64A57FPLoadBalancing();			FunctionPass *createAArch64A57FPLoadBalancing();
	FunctionPass *createAArch64A53Fix835769();			FunctionPass *createAArch64A53Fix835769();
	/// \brief Creates an ARM-specific Target Transformation Info pass.			/// \brief Creates an ARM-specific Target Transformation Info pass.
	ImmutablePass *			ImmutablePass *
	createAArch64TargetTransformInfoPass(const AArch64TargetMachine *TM);			createAArch64TargetTransformInfoPass(const AArch64TargetMachine *TM);

	FunctionPass *createAArch64CleanupLocalDynamicTLSPass();			FunctionPass *createAArch64CleanupLocalDynamicTLSPass();

	FunctionPass *createAArch64CollectLOHPass();			FunctionPass *createAArch64CollectLOHPass();
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM)
setTargetDAGCombine(ISD::SELECT);		setTargetDAGCombine(ISD::SELECT);
setTargetDAGCombine(ISD::VSELECT);		setTargetDAGCombine(ISD::VSELECT);

setTargetDAGCombine(ISD::INTRINSIC_VOID);		setTargetDAGCombine(ISD::INTRINSIC_VOID);
setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN);		setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN);
setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);		setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);

MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 8;		MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 8;
MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 4;
MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 4;		MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 4;
		if (Subtarget->isCyclone()) {
		MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 4;
		} else {
		// It's optimal to use 64-bit registers with load/store pair instructions for
		// memcpy() inlining, rather than doing the same with regular load/store
		// instructions operating on 128-bit registers. Allow twice as big
		// instructions as for memmove().
		jmolloyUnsubmitted Not Done Reply Inline Actions s/af/as jmolloy: s/af/as
		MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 8;
		}

setStackPointerRegisterToSaveRestore(AArch64::SP);		setStackPointerRegisterToSaveRestore(AArch64::SP);

setSchedulingPreference(Sched::Hybrid);		setSchedulingPreference(Sched::Hybrid);

// Enable TBZ/TBNZ		// Enable TBZ/TBNZ
MaskAndBranchFoldingIsLegal = true;		MaskAndBranchFoldingIsLegal = true;

▲ Show 20 Lines • Show All 6,122 Lines • ▼ Show 20 Lines	return ((SrcAlign == 0 \|\| SrcAlign % AlignCheck == 0) &&
(DstAlign == 0 \|\| DstAlign % AlignCheck == 0));		(DstAlign == 0 \|\| DstAlign % AlignCheck == 0));
}		}

EVT AArch64TargetLowering::getOptimalMemOpType(uint64_t Size, unsigned DstAlign,		EVT AArch64TargetLowering::getOptimalMemOpType(uint64_t Size, unsigned DstAlign,
unsigned SrcAlign, bool IsMemset,		unsigned SrcAlign, bool IsMemset,
bool ZeroMemset,		bool ZeroMemset,
bool MemcpyStrSrc,		bool MemcpyStrSrc,
MachineFunction &MF) const {		MachineFunction &MF) const {
		// In general it's optimal to use 64-bit registers with load/store pair
		// instructions for memcpy() inlining, rather than doing the same with regular
		// load/store instructions operating on 128-bit registers. Do not use 128-bit
		// types.

		if (Subtarget->isCyclone()) {
		t.p.northoverUnsubmitted Not Done Reply Inline Actions How general is this? We should be writing for future cores as well as existing ones, and always preferring 64-bit operations seems like it'll be more and more of an oddity in future. It also seems like it belongs in a completely separate patch to the interleaving one. t.p.northover: How general is this? We should be writing for future cores as well as existing ones, and always…
		sdmitroukAuthorUnsubmitted Not Done Reply Inline Actions How general is this? We should be writing for future cores as well as existing ones, and always preferring 64-bit operations seems like it'll be more and more of an oddity in future. If I get it right, the problem with 128-bit registers is that they are floating point registers rather than general purpose ones, so as long as there is no 128-bit GP registers, this should hold. It also seems like it belongs in a completely separate patch to the interleaving one. The pass is there to allow better code generation for `memcpy()`, but they can be separated technically (pass can go first, maybe with slightly changed tests). sdmitrouk: > How general is this? We should be writing for future cores as well as existing ones, and…
// Don't use AdvSIMD to implement 16-byte memset. It would have taken one		// Don't use AdvSIMD to implement 16-byte memset. It would have taken one
// instruction to materialize the v2i64 zero and one store (with restrictive		// instruction to materialize the v2i64 zero and one store (with restrictive
// addressing mode). Just do two i64 store of zero-registers.		// addressing mode). Just do two i64 store of zero-registers.
bool Fast;		bool Fast;
const Function *F = MF.getFunction();		const Function *F = MF.getFunction();
if (Subtarget->hasFPARMv8() && !IsMemset && Size >= 16 &&		if (Subtarget->hasFPARMv8() && !IsMemset && Size >= 16 &&
!F->getAttributes().hasAttribute(AttributeSet::FunctionIndex,		!F->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
Attribute::NoImplicitFloat) &&		Attribute::NoImplicitFloat) &&
(memOpAlign(SrcAlign, DstAlign, 16) \|\|		(memOpAlign(SrcAlign, DstAlign, 16) \|\|
(allowsMisalignedMemoryAccesses(MVT::f128, 0, 1, &Fast) && Fast)))		(allowsMisalignedMemoryAccesses(MVT::f128, 0, 1, &Fast) && Fast)))
return MVT::f128;		return MVT::f128;
		}

return Size >= 8 ? MVT::i64 : MVT::i32;		return Size >= 8 ? MVT::i64 : MVT::i32;
}		}

// 12-bit optionally shifted immediates are legal for adds.		// 12-bit optionally shifted immediates are legal for adds.
bool AArch64TargetLowering::isLegalAddImmediate(int64_t Immed) const {		bool AArch64TargetLowering::isLegalAddImmediate(int64_t Immed) const {
if ((Immed >> 12) == 0 \|\| ((Immed & 0xfff) == 0 && Immed >> 24 == 0))		if ((Immed >> 12) == 0 \|\| ((Immed & 0xfff) == 0 && Immed >> 24 == 0))
return true;		return true;
▲ Show 20 Lines • Show All 2,219 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64LoadStoreInterleave.cpp

This file was added.

				//=- AArch64LoadStoreInterleave.cpp - Optimize Load/Store pairs for AArch64 -=//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass reorders load/store pair instructions to achieve better
				// performance. Preferred sequence of operations is as follows:
				//
				// * [1]: load pair of 64-bit registers
				// * [1]: store pair of 64-bit registers
				// * [2]: load pair of 64-bit registers
				// * [2]: store pair of 64-bit registers
				// * ...
				//
				// Example of transformation:
				//
				// Before: After:
				//
				// 1. <load1> 1. <something1>
				// 2. <something1> 2. <something2>
				// 3. <store1> 3. <load1>
				// 4. <something2> 4. <store1>
				jmolloyUnsubmitted Not Done Reply Inline Actions The important thing is that we have ldp/stp in that order, ideally with increasing addresses. We don't need to cluster them all together - it's the ordering of memory operations that counts I think. So we can have: ldp stp add # unrelated operation ldp stp This should be fine, and may be a good thing, depending on the microarchitecture. jmolloy: The important thing is that we have ldp/stp in that order, ideally with increasing addresses.
				sdmitroukAuthorUnsubmitted Not Done Reply Inline Actions I'll try that, it shouldn't require a lot of changes. ideally with increasing addresses Actually, input of the pass is already in reverse order: ldp x10, x11, [x8, #48] stp x10, x11, [x9, #48] ldp x10, x11, [x8, #32] stp x10, x11, [x9, #32] ldp x10, x11, [x8, #16] stp x10, x11, [x9, #16] ldp x10, x8, [x8] stp x10, x8, [x9] which might come from `getMemcpyLoadsAndStores()` in `SelectionDAG.cpp`, which doesn't specify order. sdmitrouk: I'll try that, it shouldn't require a lot of changes. > ideally with increasing addresses…
				// 5. <load2> 5. <load2>
				// 6. <store2> 6. <store2>
				// 7. <something3> 7. <something3>
				//
				//===----------------------------------------------------------------------===//

				#include "AArch64.h"
				#include "AArch64InstrInfo.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "aarch64-ldst-itl"

				STATISTIC(NumLdStSequencesUpdated, "Number of load/pair sequences updated");

				jmolloyUnsubmitted Not Done Reply Inline Actions This is a fairly generic statistic name. Something more A64 specific perhaps? jmolloy: This is a fairly generic statistic name. Something more A64 specific perhaps?
				namespace {
				class AArch64LoadStoreInterleave : public MachineFunctionPass {
				const TargetInstrInfo *TII;
				const TargetRegisterInfo *TRI;

				public:
				static char ID;
				AArch64LoadStoreInterleave() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override;
				bool interleaveMemOp(MachineBasicBlock &MBB);
				MachineInstr moveInstruction(MachineInstr I,
				MachineBasicBlock::iterator InsertionPoint);
				const char *getPassName() const override {
				return "AArch64 LoadStore Interleave";
				}
				};
				} // end anonymous namespace

				char AArch64LoadStoreInterleave::ID = 0;

				FunctionPass *llvm::createAArch64LoadStoreInterleavePass() {
				return new AArch64LoadStoreInterleave();
				}

				// Optimizes every basic block of the function.
				bool AArch64LoadStoreInterleave::runOnMachineFunction(MachineFunction &MF) {
				DEBUG(dbgs() << "******** AArch64 LoadStore Interleaving ********\n"
				<< "********** Function: " << MF.getName() << '\n');

				const TargetMachine &TM = MF.getTarget();
				TII = static_cast<const AArch64InstrInfo *>(
				TM.getSubtargetImpl()->getInstrInfo());
				TRI = TM.getSubtargetImpl()->getRegisterInfo();

				bool Modified = false;
				for (auto &MBB : MF) {
				Modified \|= interleaveMemOp(MBB);
				}

				return Modified;
				}

				// Gets size of operands of load or store pair instruction in bytes.
				static int getOperandWidth(int Opcode) {
				switch (Opcode) {
				default:
				llvm_unreachable("Didn't expect anything except load and store pairs.");

				case AArch64::STPWi:
				case AArch64::LDPWi:
				case AArch64::STRWui:
				case AArch64::STURWi:
				case AArch64::LDRWui:
				case AArch64::LDURWi:
				return 4;

				case AArch64::STPXi:
				case AArch64::LDPXi:
				case AArch64::STRXui:
				case AArch64::STURXi:
				case AArch64::LDRXui:
				case AArch64::LDURXi:
				return 8;

				case AArch64::STPSi:
				case AArch64::LDPSi:
				case AArch64::STRSui:
				case AArch64::STURSi:
				case AArch64::LDRSui:
				case AArch64::LDURSi:
				return 4;

				case AArch64::STPDi:
				case AArch64::LDPDi:
				case AArch64::STRDui:
				case AArch64::STURDi:
				case AArch64::LDRDui:
				case AArch64::LDURDi:
				return 8;

				case AArch64::STPQi:
				case AArch64::LDPQi:
				case AArch64::STRQui:
				case AArch64::STURQi:
				case AArch64::LDRQui:
				case AArch64::LDURQi:
				return 16;
				}
				}

				// Checks that instruction can safely be moved outside sequence of load and
				// store pair instruction.
				static bool isSafeInstruction(unsigned LdBase, unsigned StBase, MachineInstr *I,
				const TargetRegisterInfo *TRI, int SeenStore) {
				if (I->isDebugValue())
				jmolloyUnsubmitted Not Done Reply Inline Actions Wouldn't isSafeToSpeculate() conservatively do the same job here? jmolloy: Wouldn't isSafeToSpeculate() conservatively do the same job here?
				sdmitroukAuthorUnsubmitted Not Done Reply Inline Actions I don't see any function with exactly this name, functions with similar name don't seem to fit and some are also static. sdmitrouk: I don't see any function with exactly this name, functions with similar name don't seem to fit…
				return true;

				if (I->isCall() \|\| I->isTerminator() \|\| I->hasUnmodeledSideEffects())
				return false;

				if (I->mayStore() \|\| (SeenStore && I->mayLoad()))
				return false;

				for (const MachineOperand &MO : I->operands()) {
				if (!MO.isReg())
				continue;

				unsigned Reg = MO.getReg();
				if (MO.isDef() && TRI->regsOverlap(Reg, LdBase))
				return false;
				if (SeenStore && MO.isDef() && TRI->regsOverlap(Reg, StBase))
				return false;
				}

				return true;
				}

				// Collects links to load and store instructions from the basic block. Return
				// value indicating whether at least one of instructions is a pair load or
				// store.
				static bool collectLoadAndStores(MachineBasicBlock &MBB,
				SmallVectorImpl<MachineInstr*> &Lds,
				SmallVectorImpl<MachineInstr*> &Sts) {
				bool SeenPair = false;
				for (MachineInstr &MI : MBB) {
				switch (MI.getOpcode()) {
				default:
				// Just move on to the next instruction.
				break;

				case AArch64::STPSi:
				case AArch64::STPDi:
				case AArch64::STPQi:
				case AArch64::STPWi:
				case AArch64::STPXi:
				SeenPair = true;
				// Fall through.

				case AArch64::STRSui:
				case AArch64::STURSi:
				case AArch64::STRDui:
				case AArch64::STURDi:
				case AArch64::STRQui:
				case AArch64::STURQi:
				case AArch64::STRWui:
				case AArch64::STURWi:
				case AArch64::STRXui:
				case AArch64::STURXi:
				// Sequence of interesting operations should go first.
				if (!Lds.empty())
				Sts.push_back(&MI);
				break;

				case AArch64::LDPDi:
				case AArch64::LDPQi:
				case AArch64::LDPWi:
				jmolloyUnsubmitted Not Done Reply Inline Actions This was not mentioned in the comment; why should all loads come before all stores? jmolloy: This was not mentioned in the comment; why should all loads come before all stores?
				sdmitroukAuthorUnsubmitted Not Done Reply Inline Actions Wrong comment, I meant that first load should go before first store. sdmitrouk: Wrong comment, I meant that first load should go before first store.
				case AArch64::LDPXi:
				SeenPair = true;
				// Fall through.

				case AArch64::LDRSui:
				case AArch64::LDURSi:
				case AArch64::LDRDui:
				case AArch64::LDURDi:
				case AArch64::LDRQui:
				case AArch64::LDURQi:
				case AArch64::LDRWui:
				case AArch64::LDURWi:
				case AArch64::LDRXui:
				case AArch64::LDURXi:
				Lds.push_back(&MI);
				break;
				}
				}

				return SeenPair;
				}

				// Extract base address from the instruction.
				static inline unsigned getBase(const MachineInstr* I) {
				unsigned OpNum = (I->getNumOperands() == 4) ? 2 : 1;
				return I->getOperand(OpNum).getReg();
				}

				// Extract offset from the instruction.
				static inline int64_t getOffset(const MachineInstr* I) {
				unsigned OpNum = (I->getNumOperands() == 4) ? 3 : 2;
				return I->getOperand(OpNum).getImm();
				}

				// Checks if a set of load and store instructions can be safely reordered.
				static bool isSafeToReorder(MachineBasicBlock &MBB,
				const SmallVectorImpl<MachineInstr*> &Lds,
				const SmallVectorImpl<MachineInstr*> &Sts,
				const TargetRegisterInfo *TRI) {
				t.p.northoverUnsubmitted Not Done Reply Inline Actions This seems like a really fragile way to do this. It's only ever going to work on a basic block with a single memcpy operation and no other loads/stores. t.p.northover: This seems like a really fragile way to do this. It's only ever going to work on a basic block…
				sdmitroukAuthorUnsubmitted Not Done Reply Inline Actions It's only ever going to work on a basic block with a single memcpy operation and no other loads/stores. The condition isn't exactly this one, but it does have similar constrain. Next revision that works on pairs of instructions should change this. sdmitrouk: > It's only ever going to work on a basic block with a single memcpy operation and no other…
				if (Sts.empty() \|\| Sts.size() != Lds.size())
				return false;

				unsigned N = Sts.size();

				// Check that each pair of instructions operate on data of the same width.
				for (unsigned i = 0; i < N; ++i) {
				const int LoadWidth = getOperandWidth(Lds[i]->getOpcode());
				const int StoreWidth = getOperandWidth(Sts[i]->getOpcode());
				if (LoadWidth != StoreWidth)
				return false;
				}

				const unsigned LdBase = getBase(Lds[0]);
				const unsigned StBase = getBase(Sts[0]);

				// Check that all load and store instructions use same base register and
				// each pair has same offset.
				for (unsigned i = 0; i < N; ++i) {
				if (getBase(Lds[i]) != LdBase \|\| getBase(Sts[i]) != StBase)
				return false;

				if (getOffset(Sts[i]) != getOffset(Lds[i]))
				return false;
				}

				bool SeenStore = false;
				for (MachineBasicBlock::iterator I = Lds[0], E = Sts[N - 1]; I != E; ++I) {
				if (std::find(Sts.begin(), Sts.end(), (MachineInstr*)I) != Sts.end()) {
				SeenStore = true;
				continue;
				}

				if (std::find(Lds.begin(), Lds.end(), (MachineInstr*)I) != Lds.end())
				continue;

				if (!isSafeInstruction(LdBase, StBase, I, TRI, SeenStore))
				return false;
				}

				return true;
				}

				// Evaluates possibility and performs reordering of load and store instructions
				// within basic block.
				bool AArch64LoadStoreInterleave::interleaveMemOp(MachineBasicBlock &MBB) {
				jmolloyUnsubmitted Not Done Reply Inline Actions Here and elsewhere: single-line if's should have their {}'s removed. jmolloy: Here and elsewhere: single-line if's should have their {}'s removed.
				SmallVector<MachineInstr*, 8> Lds;
				SmallVector<MachineInstr*, 8> Sts;

				if (!collectLoadAndStores(MBB, Lds, Sts))
				return false;

				if (!isSafeToReorder(MBB, Lds, Sts, TRI))
				return false;

				const unsigned N = Sts.size();

				DEBUG(dbgs() << "Interleaving sequence of " << N << " instructions "
				"in " << MBB.getName() << "\n");

				MachineBasicBlock::iterator InsertionPoint = Sts[N - 1];

				for (unsigned i = 0; i < N; ++i) {
				InsertionPoint = moveInstruction(Sts[N - 1 - i], InsertionPoint);
				InsertionPoint = moveInstruction(Lds[N - 1 - i], InsertionPoint);
				}

				++NumLdStSequencesUpdated;

				return true;
				}

				// Moves load or store pair instruction before the insertion point and returns
				// next position for insertion.
				MachineInstr *AArch64LoadStoreInterleave::moveInstruction(
				MachineInstr *I, MachineBasicBlock::iterator InsertionPoint) {
				MachineInstr NewI = BuildMI(I->getParent(), InsertionPoint,
				I->getDebugLoc(), TII->get(I->getOpcode()));
				for (const MachineOperand &operand : I->operands()) {
				NewI->addOperand(operand);
				}

				I->eraseFromParent();

				return NewI;
				}

lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	EnableCondOpt("aarch64-condopt",
cl::desc("Enable the condition optimizer pass"),		cl::desc("Enable the condition optimizer pass"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableA53Fix835769("aarch64-fix-cortex-a53-835769", cl::Hidden,		EnableA53Fix835769("aarch64-fix-cortex-a53-835769", cl::Hidden,
cl::desc("Work around Cortex-A53 erratum 835769"),		cl::desc("Work around Cortex-A53 erratum 835769"),
cl::init(false));		cl::init(false));

		static cl::opt<bool>
		EnableAArch64InterleavedMemOp("aarch64-interleaved-ldstp", cl::Hidden,
		cl::desc("Allow AArch64 load/store clustering and "
		"interleaving"),
		cl::init(false));

extern "C" void LLVMInitializeAArch64Target() {		extern "C" void LLVMInitializeAArch64Target() {
// Register the target.		// Register the target.
RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);		RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);
RegisterTargetMachine<AArch64beTargetMachine> Y(TheAArch64beTarget);		RegisterTargetMachine<AArch64beTargetMachine> Y(TheAArch64beTarget);
RegisterTargetMachine<AArch64leTargetMachine> Z(TheARM64Target);		RegisterTargetMachine<AArch64leTargetMachine> Z(TheARM64Target);
}		}

/// TargetMachine ctor - Create an AArch64 architecture model.		/// TargetMachine ctor - Create an AArch64 architecture model.
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	bool AArch64PassConfig::addPreSched2() {
addPass(createAArch64ExpandPseudoPass());		addPass(createAArch64ExpandPseudoPass());
// Use load/store pair instructions when possible.		// Use load/store pair instructions when possible.
if (TM->getOptLevel() != CodeGenOpt::None && EnableLoadStoreOpt)		if (TM->getOptLevel() != CodeGenOpt::None && EnableLoadStoreOpt)
addPass(createAArch64LoadStoreOptimizationPass());		addPass(createAArch64LoadStoreOptimizationPass());
return true;		return true;
}		}

bool AArch64PassConfig::addPreEmitPass() {		bool AArch64PassConfig::addPreEmitPass() {
		// Reorder load/store pair instruction for better performance.
		if (TM->getOptLevel() != CodeGenOpt::None && EnableLoadStoreOpt &&
		EnableAArch64InterleavedMemOp)
		addPass(createAArch64LoadStoreInterleavePass());
if (EnableA53Fix835769)		if (EnableA53Fix835769)
addPass(createAArch64A53Fix835769());		addPass(createAArch64A53Fix835769());
// Relax conditional branch instructions if they're otherwise out of		// Relax conditional branch instructions if they're otherwise out of
// range of their destination.		// range of their destination.
addPass(createAArch64BranchRelaxation());		addPass(createAArch64BranchRelaxation());
if (TM->getOptLevel() != CodeGenOpt::None && EnableCollectLOH &&		if (TM->getOptLevel() != CodeGenOpt::None && EnableCollectLOH &&
TM->getSubtarget<AArch64Subtarget>().isTargetMachO())		TM->getSubtarget<AArch64Subtarget>().isTargetMachO())
addPass(createAArch64CollectLOHPass());		addPass(createAArch64CollectLOHPass());
return true;		return true;
}		}

lib/Target/AArch64/CMakeLists.txt

Show All 27 Lines	add_llvm_target(AArch64CodeGen
AArch64FastISel.cpp		AArch64FastISel.cpp
AArch64A53Fix835769.cpp		AArch64A53Fix835769.cpp
AArch64FrameLowering.cpp		AArch64FrameLowering.cpp
AArch64ConditionOptimizer.cpp		AArch64ConditionOptimizer.cpp
AArch64ISelDAGToDAG.cpp		AArch64ISelDAGToDAG.cpp
AArch64ISelLowering.cpp		AArch64ISelLowering.cpp
AArch64InstrInfo.cpp		AArch64InstrInfo.cpp
AArch64LoadStoreOptimizer.cpp		AArch64LoadStoreOptimizer.cpp
		AArch64LoadStoreInterleave.cpp
AArch64MCInstLower.cpp		AArch64MCInstLower.cpp
AArch64PromoteConstant.cpp		AArch64PromoteConstant.cpp
AArch64PBQPRegAlloc.cpp		AArch64PBQPRegAlloc.cpp
AArch64RegisterInfo.cpp		AArch64RegisterInfo.cpp
AArch64SelectionDAGInfo.cpp		AArch64SelectionDAGInfo.cpp
AArch64StorePairSuppress.cpp		AArch64StorePairSuppress.cpp
AArch64Subtarget.cpp		AArch64Subtarget.cpp
AArch64TargetMachine.cpp		AArch64TargetMachine.cpp
Show All 12 Lines

test/CodeGen/AArch64/arm64-variadic-aapcs.ll

	; RUN: llc -verify-machineinstrs -mtriple=arm64-linux-gnu -pre-RA-sched=linearize -enable-misched=false < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=arm64-linux-gnu -pre-RA-sched=linearize -enable-misched=false < %s -mcpu=cyclone \| FileCheck %s

	%va_list = type {i8, i8, i8*, i32, i32}			%va_list = type {i8, i8, i8*, i32, i32}

	@var = global %va_list zeroinitializer, align 8			@var = global %va_list zeroinitializer, align 8

	declare void @llvm.va_start(i8*)			declare void @llvm.va_start(i8*)

	define void @test_simple(i32 %n, ...) {			define void @test_simple(i32 %n, ...) {
	▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-virtual_base.ll

	; RUN: llc < %s -O3 -march arm64 \| FileCheck %s			; RUN: llc < %s -O3 -march arm64 -mcpu=cyclone \| FileCheck %s
	; <rdar://13463602>			; <rdar://13463602>

	%struct.Counter_Struct = type { i64, i64 }			%struct.Counter_Struct = type { i64, i64 }
	%struct.Bicubic_Patch_Struct = type { %struct.Method_Struct, i32, %struct.Object_Struct, %struct.Texture_Struct, %struct.Interior_Struct, %struct.Object_Struct, %struct.Object_Struct, %struct.Bounding_Box_Struct, i64, i32, i32, i32, [4 x [4 x [3 x double]]], [3 x double], double, double, %struct.Bezier_Node_Struct* }			%struct.Bicubic_Patch_Struct = type { %struct.Method_Struct, i32, %struct.Object_Struct, %struct.Texture_Struct, %struct.Interior_Struct, %struct.Object_Struct, %struct.Object_Struct, %struct.Bounding_Box_Struct, i64, i32, i32, i32, [4 x [4 x [3 x double]]], [3 x double], double, double, %struct.Bezier_Node_Struct* }
	%struct.Method_Struct = type { i32 (%struct.Object_Struct, %struct.Ray_Struct, %struct.istack_struct), i32 (double, %struct.Object_Struct), void (double, %struct.Object_Struct, %struct.istk_entry), i8 (%struct.Object_Struct), void (%struct.Object_Struct, double, %struct.Transform_Struct), void (%struct.Object_Struct, double, %struct.Transform_Struct), void (%struct.Object_Struct, double, %struct.Transform_Struct), void (%struct.Object_Struct, %struct.Transform_Struct), void (%struct.Object_Struct), void (%struct.Object_Struct)* }			%struct.Method_Struct = type { i32 (%struct.Object_Struct, %struct.Ray_Struct, %struct.istack_struct), i32 (double, %struct.Object_Struct), void (double, %struct.Object_Struct, %struct.istk_entry), i8 (%struct.Object_Struct), void (%struct.Object_Struct, double, %struct.Transform_Struct), void (%struct.Object_Struct, double, %struct.Transform_Struct), void (%struct.Object_Struct, double, %struct.Transform_Struct), void (%struct.Object_Struct, %struct.Transform_Struct), void (%struct.Object_Struct), void (%struct.Object_Struct)* }
	%struct.Object_Struct = type { %struct.Method_Struct, i32, %struct.Object_Struct, %struct.Texture_Struct, %struct.Interior_Struct, %struct.Object_Struct, %struct.Object_Struct, %struct.Bounding_Box_Struct, i64 }			%struct.Object_Struct = type { %struct.Method_Struct, i32, %struct.Object_Struct, %struct.Texture_Struct, %struct.Interior_Struct, %struct.Object_Struct, %struct.Object_Struct, %struct.Bounding_Box_Struct, i64 }
	%struct.Texture_Struct = type { i16, i16, i16, i32, float, float, float, %struct.Warps_Struct, %struct.Pattern_Struct, %struct.Blend_Map_Struct, %union.anon.9, %struct.Texture_Struct, %struct.Pigment_Struct, %struct.Tnormal_Struct, %struct.Finish_Struct, %struct.Texture_Struct, i32 }			%struct.Texture_Struct = type { i16, i16, i16, i32, float, float, float, %struct.Warps_Struct, %struct.Pattern_Struct, %struct.Blend_Map_Struct, %union.anon.9, %struct.Texture_Struct, %struct.Pigment_Struct, %struct.Tnormal_Struct, %struct.Finish_Struct, %struct.Texture_Struct, i32 }
	%struct.Warps_Struct = type { i16, %struct.Warps_Struct* }			%struct.Warps_Struct = type { i16, %struct.Warps_Struct* }
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

test/CodeGen/AArch64/func-calls.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s --check-prefix=CHECK			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mcpu=cyclone \| FileCheck %s --check-prefix=CHECK
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-neon \| FileCheck --check-prefix=CHECK-NONEON %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-neon -mcpu=cyclone \| FileCheck --check-prefix=CHECK-NONEON %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 \| FileCheck --check-prefix=CHECK-NOFP %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-none-linux-gnu -mattr=-fp-armv8 -mcpu=cyclone \| FileCheck --check-prefix=CHECK-NOFP %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu \| FileCheck --check-prefix=CHECK-BE %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-none-linux-gnu -mcpu=cyclone \| FileCheck --check-prefix=CHECK-BE %s

	%myStruct = type { i64 , i8, i32 }			%myStruct = type { i64 , i8, i32 }

	@var8 = global i8 0			@var8 = global i8 0
	@var8_2 = global i8 0			@var8_2 = global i8 0
	@var32 = global i32 0			@var32 = global i32 0
	@var64 = global i64 0			@var64 = global i64 0
	@var128 = global i128 0			@var128 = global i128 0
	▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	; CHECK-NONEON: stp [[I128LO]], [[I128HI]], [sp, #16]			; CHECK-NONEON: stp [[I128LO]], [[I128HI]], [sp, #16]
	; CHECK: bl check_i128_stackalign			; CHECK: bl check_i128_stackalign

	call void @check_i128_regalign(i32 0, i128 42)			call void @check_i128_regalign(i32 0, i128 42)
	; CHECK-NOT: mov x1			; CHECK-NOT: mov x1
	; CHECK-LE: movz x2, #{{0x2a\|42}}			; CHECK-LE: movz x2, #{{0x2a\|42}}
	; CHECK-LE: mov x3, xzr			; CHECK-LE: mov x3, xzr
	; CHECK-BE: movz {{x\|w}}3, #{{0x2a\|42}}			; CHECK-BE: movz {{x\|w}}3, #{{0x2a\|42}}
	; CHECK-BE: mov x2, xzr			; CHECK-BE: mov{{z?}} x2, {{xzr\|#0}}
	; CHECK: bl check_i128_regalign			; CHECK: bl check_i128_regalign

	ret void			ret void
	}			}

	@fptr = global void()* null			@fptr = global void()* null

	define void @check_indirect_call() {			define void @check_indirect_call() {
	; CHECK-LABEL: check_indirect_call:			; CHECK-LABEL: check_indirect_call:
	%func = load void()** @fptr			%func = load void()** @fptr
	call void %func()			call void %func()
	; CHECK: ldr [[FPTR:x[0-9]+]], [{{x[0-9]+}}, {{#?}}:lo12:fptr]			; CHECK: ldr [[FPTR:x[0-9]+]], [{{x[0-9]+}}, {{#?}}:lo12:fptr]
	; CHECK: blr [[FPTR]]			; CHECK: blr [[FPTR]]

	ret void			ret void
	}			}

test/CodeGen/AArch64/memcpy-f128.ll

	; RUN: llc < %s -march=aarch64 -mtriple=aarch64-linux-gnu \| FileCheck %s			; RUN: llc < %s -march=aarch64 -mtriple=aarch64-linux-gnu -mcpu=cyclone \| FileCheck %s

	%structA = type { i128 }			%structA = type { i128 }
	@stubA = internal unnamed_addr constant %structA zeroinitializer, align 8			@stubA = internal unnamed_addr constant %structA zeroinitializer, align 8

	; Make sure we don't hit llvm_unreachable.			; Make sure we don't hit llvm_unreachable.

	define void @test1() {			define void @test1() {
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	Show All 10 Lines

test/CodeGen/AArch64/optimal-load-store-pairs.ll

This file was added.

				; RUN: llc < %s -mcpu=cortex-a53 -march=aarch64 -mtriple=aarch64-linux-gnu -aarch64-interleaved-ldstp=1 \| FileCheck %s
				; RUN: llc < %s -mcpu=cortex-a57 -march=aarch64 -mtriple=aarch64-linux-gnu -aarch64-interleaved-ldstp=1 \| FileCheck %s

				; Here "optimal" means:
				; - use of 64-bit registers (no floating point 128-bit registers);
				; - interleaving loads and stores without any instructions in the middle.

				; marked as external to prevent possible optimizations
				@a = external global [4 x i32]
				@b = external global [4 x i32]

				define void @copy-16-bytes-with-8-byte-registers() {
				; CHECK-LABEL: @copy-16-bytes-with-8-byte-registers
				; CHECK: adrp
				; CHECK: add
				; CHECK: adrp
				; CHECK: add
				; CHECK: ldp [[v1:x[0-9]+]], [[v2:x[0-9]+]]
				; CHECK: stp [[v1]], [[v2]]
				; CHECK: ret
				entry:
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* bitcast ([4 x i32]* @a to i8), i8 bitcast ([4 x i32]* @b to i8*), i64 16, i32 8, i1 false)
				ret void
				}

				define void @copy-56-bytes-with-8-byte-registers() {
				; CHECK-LABEL: @copy-56-bytes-with-8-byte-registers
				; CHECK: adrp
				; CHECK: add
				; CHECK: adrp
				; CHECK: add
				; CHECK: ld{{[rp]}} {{x[0-9]+}}
				; CHECK: st{{[rp]}} {{x[0-9]+}}
				; CHECK: ld{{[rp]}} {{x[0-9]+}}
				; CHECK: st{{[rp]}} {{x[0-9]+}}
				; CHECK: ld{{[rp]}} {{x[0-9]+}}
				; CHECK: st{{[rp]}} {{x[0-9]+}}
				; CHECK: ld{{[rp]}} {{x[0-9]+}}
				; CHECK: st{{[rp]}} {{x[0-9]+}}
				; CHECK: ret
				entry:
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* bitcast ([4 x i32]* @a to i8), i8 bitcast ([4 x i32]* @b to i8*), i64 56, i32 8, i1 false)
				ret void
				}

				define void @copy-64-bytes-with-8-byte-registers() {
				; CHECK-LABEL: @copy-64-bytes-with-8-byte-registers
				; CHECK: adrp
				; CHECK: add
				; CHECK: adrp
				; CHECK: add
				; CHECK: ldp [[v1:x[0-9]+]], [[v2:x[0-9]+]]
				; CHECK: stp [[v1]], [[v2]]
				; CHECK: ldp [[v3:x[0-9]+]], [[v4:x[0-9]+]]
				; CHECK: stp [[v3]], [[v4]]
				; CHECK: ldp [[v5:x[0-9]+]], [[v6:x[0-9]+]]
				; CHECK: stp [[v5]], [[v6]]
				; CHECK: ldp [[v7:x[0-9]+]], [[v8:x[0-9]+]]
				; CHECK: stp [[v7]], [[v8]]
				; CHECK: ret
				entry:
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* bitcast ([4 x i32]* @a to i8), i8 bitcast ([4 x i32]* @b to i8*), i64 64, i32 8, i1 false)
				ret void
				}

				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32, i1)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Inline memcpy() as a sequence of ldp-stp with 64-bit registersAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 15691

lib/Target/AArch64/AArch64.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64LoadStoreInterleave.cpp

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Target/AArch64/CMakeLists.txt

test/CodeGen/AArch64/arm64-variadic-aapcs.ll

test/CodeGen/AArch64/arm64-virtual_base.ll

test/CodeGen/AArch64/func-calls.ll

test/CodeGen/AArch64/memcpy-f128.ll

test/CodeGen/AArch64/optimal-load-store-pairs.ll

[AArch64] Inline memcpy() as a sequence of ldp-stp with 64-bit registers
AbandonedPublic