This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARM.h
2
ARMISelDAGToDAG.cpp
1/3
ARMInstrThumb2.td
6/14
ARMLowOverheadLoops.cpp
-
ARMTargetMachine.cpp
-
CMakeLists.txt
-
test/
-
CodeGen/ARM/
-
ARM/
-
O3-pipeline.ll
-
Transforms/HardwareLoops/ARM/
-
HardwareLoops/
-
ARM/
-
calls.ll
-
cond-mov.mir
1/5
massive.mir
-
revert-after-call.mir
-
revert-after-spill.mir
-
simple-do.ll
-
size-limit.mir
-
structure.ll
-
switch.mir

Differential D63476

[ARM] DLS/LE low-overhead loop code generation
ClosedPublic

Authored by samparker on Jun 18 2019, 12:36 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
efriedma
t.p.northover

Commits

rGa6fd919cb3f5: [ARM] DLS/LE low-overhead loop code generation
rL364288: [ARM] DLS/LE low-overhead loop code generation

Summary

Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decrement_reg is lowered to two, so that we can separate the decrement and branching operations. The pseudo instructions are expanded pre-emission where we can still decide whether we actually want to generate a low-overhead loop. The pass currently bails, revering to an sub, icmp and br, in the cases where a call or stack spill/restore happens between the decrement and branching instructions.

Diff Detail

Event Timeline

samparker created this revision.Jun 18 2019, 12:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2019, 12:36 AM

Herald added subscribers: kristof.beyls, javed.absar, mgorny, qcolombet. · View Herald Transcript

SjoerdMeijer added inline comments.Jun 20 2019, 6:59 AM

lib/Target/ARM/ARMFinalizeLoops.cpp
139 ↗	(On Diff #205264)	if `Revert` is true here at some point, can we stop iterating over the rest of the blocks/instructions?
lib/Target/ARM/ARMISelDAGToDAG.cpp
2995	nit: can this be simplified using `getIntrinsicID()`?
3004	another nit: indentation a bit off?

forgot to say: went through this for the first time, some first nits in my previous comment. Will continue looking tomorrow.

SjoerdMeijer added inline comments.Jun 21 2019, 8:37 AM

lib/Target/ARM/ARMFinalizeLoops.cpp
25 ↗	(On Diff #205264)	Nit, perhaps: "ARM low-overhead loop..."
175 ↗	(On Diff #205264)	This looks fine for now. It might be that WLS/DLS is an expensive MOV instruction, that we possibly don't even need. But I think that's an optimisation that we can worry about later.
lib/Target/ARM/ARMInstrThumb2.td
5194	nit, perhaps `} // isNotDuplicable = 1`
test/Transforms/HardwareLoops/ARM/massive.mir
2	yes, it is massive! :-) But I think we can simplify this a lot by using intrinsic `@llvm.arm.space`: // A space-consuming intrinsic primarily for testing ARMConstantIslands. The // first argument is the number of bytes this "instruction" takes up, the second // and return value are essentially chains, used to force ordering during ISel. def int_arm_space : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], []>; as mentioned constant island tests are using this, I think you want to do something similar here.

How confident are you that this is always valid? From what I understand it creates an assumption that the code will not grow past this point in the pipeline (technically that the LE will not move further from the loop start). What would stop, for example, constant island pass from putting a constant pool into the loop?

And as Sjoerd says, I found out that DLS doesn't really do anything and can be ignored (all the "smarts" are in the LE). WLS obviously does more, and a DLSTP is important for tail predication. But if the count is already in lr, we needn't emit the DLS.

> How confident are you that this is always valid?

What are our options here?

I guess one of them is making sure literal pools won't get placed inside loops, or reordering this:

addPass(createARMFinalizeLoopsPass());
addPass(createARMConstantIslandPass());

or is there a reason we can't do this?

samparker marked 2 inline comments as done.Jun 24 2019, 12:58 AM

samparker added inline comments.

lib/Target/ARM/ARMFinalizeLoops.cpp
139 ↗	(On Diff #205264)	Maybe... We'd still need to find 'End' and that would be the terminator, but I guess it could help prevent visiting other blocks. I reorder the block iterator and that should work.
test/Transforms/HardwareLoops/ARM/massive.mir
2	Bingo! Thanks

I will try to reorder the final passes. I hope that I can change the size of the pseudo instructions to be pessimistically big enough to be a cmp and br. I imagine that TTI will have to try to calculate the size, or at least the amount of live variables, so that these loops don't cause unnecessary register pressure and actually slow things down because of spills.

Renamed all the things to low-overhead loops.
Used the arm.space intrinsic and added another test for the edge case.
Reversed the order of the search for LoopEnd and LoopDec, breaking early if possible.
Switched the order of constant island and low-overhead loops.

I don't know of any reason why this way round wouldn't work, so long as we tell constant island pass that the size of the instructions is large enough. It might be a little inefficient at times? I think the version of this on the branch put the handling _into_ constant island pass, but that version worked a little differently.

This looks good to me. I.e., the approach of expanding the pseudos as late as possible, and the availability of size estimation in TTI for these pseudos, seems the right approach to me. I hope that when overestimate the size, if necessary, that this will be always be safe.

So I am happy with this, if Dave is happy with this too.

test/Transforms/HardwareLoops/ARM/massive.mir
47	a proper nit: just a minor clean up, but we don't need these comments on the functions
57	and this stack protector stuff
63	and these attributes

This revision is now accepted and ready to land.Jun 24 2019, 1:12 PM

This looks pretty neat to me. I have some comments that could be done in followups (as in, if its broken, we can fix it later).

It may be better to put this into constant island pass instead, to use its iterative nature and have it remain as the only place that deals with sizes.
Can you put a nice long comment into somewhere like the start of ARMFinalizeHardwareLoops explaining things like what the pseudos do and why they are needed (they are for registry allocation, essentially?), and what assumptions this is making about them remaining in specific blocks.

lib/Target/ARM/ARMInstrThumb2.td
5193	12 bytes for something that should usually take 4 seems overly pessimistic to me, and may pessimise other optimisations. Can we get this smaller somehow? I would expect the fallback to probably be (subs; bne) ideally.
lib/Target/ARM/ARMLowOverheadLoops.cpp
76	How expensive is this? We might as well not do it for cores that won't have low overhead loops.
96	These don't really seem to add much to me :)
135	Can you explain this a little? Is it for correctness or performance? Because we can't merge the two instructions?
148	I think this should give a better error message in release builds. report_fatal_error or similar. Otherwise It may just miscompile. It should never happen, as far as I understand?
165	Formatting
173	Format
201	This is just the same call, with one parameter different?

I will add some comments, but I really don't think this belongs in constant islands. This doesn't have to worry about iterative changes, the loop size may only vary by 8 bytes, which is nothing compared to the 4KB that we need to concern ourselves with. Plus this is a very specific pass, especially once we start having to handle the tail predicated loops!

lib/Target/ARM/ARMInstrThumb2.td
5193	12 is what this implementation will produce though in the fallback case. When we use subs, we can reduce it.
lib/Target/ARM/ARMLowOverheadLoops.cpp
76	Good point! I'll add a check and an exit.
96	All in good time... while and the vector ones will follow. But yes, LoopDec can go.
135	Sure, I'll add a comment. To summarise though, its because the decremented value of LR has probably been stored to the stack - but this decrement isn't really going to happen until the end of the block, where we can't spill. If we want the value of LR to be on the stack, we'd have to perform a manual sub. Added to this that we're probably reloading LR at the bottom of the loop for LE, we'd have to either perform an add before LE or use the other form of LE that doesn't perform the decrement.
148	cheers!
201	Yes, i know it looks awkward... I'll try to see if I can make it nicer.

In D63476#1557021, @samparker wrote:

I will add some comments, but I really don't think this belongs in constant islands. This doesn't have to worry about iterative changes, the loop size may only vary by 8 bytes, which is nothing compared to the 4KB that we need to concern ourselves with. Plus this is a very specific pass, especially once we start having to handle the tail predicated loops!

Its not really about the 4KB range of a LE instruction, I agree that's not super important, but the much smaller range of a cbz for example. If we are over-estimating the size of the loop in constant island pass, we may loose out on other optimisations we would otherwise have performed.

lib/Target/ARM/ARMLowOverheadLoops.cpp
76	Also, I'm not sure it's doing everything it should, and might not be calculating offsets. Can you add a test with multiple loop bbs that together go over the limit?

samparker marked an inline comment as done.Jun 25 2019, 2:43 AM

samparker added inline comments.

lib/Target/ARM/ARMLowOverheadLoops.cpp
76	sounds good.

Closed by commit rL364288: [ARM] DLS/LE low-overhead loop code generation (authored by sam_parker). · Explain WhyJun 25 2019, 3:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

ARM/

ARM.h

3 lines

ARMISelDAGToDAG.cpp

30 lines

ARMInstrThumb2.td

16 lines

ARMLowOverheadLoops.cpp

280 lines

ARMTargetMachine.cpp

5 lines

CMakeLists.txt

1 line

test/

CodeGen/

ARM/

O3-pipeline.ll

7 lines

Transforms/

HardwareLoops/

ARM/

calls.ll

22 lines

cond-mov.mir

115 lines

massive.mir

154 lines

revert-after-call.mir

141 lines

revert-after-spill.mir

139 lines

37 lines

155 lines

177 lines

198 lines

Diff 206171

lib/Target/ARM/ARM.h

	Show All 29 Lines
	class FunctionPass;			class FunctionPass;
	class InstructionSelector;			class InstructionSelector;
	class MachineBasicBlock;			class MachineBasicBlock;
	class MachineFunction;			class MachineFunction;
	class MachineInstr;			class MachineInstr;
	class MCInst;			class MCInst;
	class PassRegistry;			class PassRegistry;

				FunctionPass *createARMLowOverheadLoopsPass();
	Pass *createARMParallelDSPPass();			Pass *createARMParallelDSPPass();
	FunctionPass *createARMISelDag(ARMBaseTargetMachine &TM,			FunctionPass *createARMISelDag(ARMBaseTargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	FunctionPass *createA15SDOptimizerPass();			FunctionPass *createA15SDOptimizerPass();
	FunctionPass *createARMLoadStoreOptimizationPass(bool PreAlloc = false);			FunctionPass *createARMLoadStoreOptimizationPass(bool PreAlloc = false);
	FunctionPass *createARMExpandPseudoPass();			FunctionPass *createARMExpandPseudoPass();
	FunctionPass *createARMCodeGenPreparePass();			FunctionPass *createARMCodeGenPreparePass();
	FunctionPass *createARMConstantIslandPass();			FunctionPass *createARMConstantIslandPass();
	Show All 14 Lines
	void initializeARMLoadStoreOptPass(PassRegistry &);			void initializeARMLoadStoreOptPass(PassRegistry &);
	void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);			void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);
	void initializeARMCodeGenPreparePass(PassRegistry &);			void initializeARMCodeGenPreparePass(PassRegistry &);
	void initializeARMConstantIslandsPass(PassRegistry &);			void initializeARMConstantIslandsPass(PassRegistry &);
	void initializeARMExpandPseudoPass(PassRegistry &);			void initializeARMExpandPseudoPass(PassRegistry &);
	void initializeThumb2SizeReducePass(PassRegistry &);			void initializeThumb2SizeReducePass(PassRegistry &);
	void initializeThumb2ITBlockPass(PassRegistry &);			void initializeThumb2ITBlockPass(PassRegistry &);
	void initializeMVEVPTBlockPass(PassRegistry &);			void initializeMVEVPTBlockPass(PassRegistry &);
				void initializeARMLowOverheadLoopsPass(PassRegistry &);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_ARM_ARM_H			#endif // LLVM_LIB_TARGET_ARM_ARM_H

lib/Target/ARM/ARMISelDAGToDAG.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	SDValue InFlag = N->getOperand(4);			SDValue InFlag = N->getOperand(4);
	assert(N1.getOpcode() == ISD::BasicBlock);			assert(N1.getOpcode() == ISD::BasicBlock);
	assert(N2.getOpcode() == ISD::Constant);			assert(N2.getOpcode() == ISD::Constant);
	assert(N3.getOpcode() == ISD::Register);			assert(N3.getOpcode() == ISD::Register);

	unsigned CC = (unsigned) cast<ConstantSDNode>(N2)->getZExtValue();			unsigned CC = (unsigned) cast<ConstantSDNode>(N2)->getZExtValue();

	if (InFlag.getOpcode() == ARMISD::CMPZ) {			if (InFlag.getOpcode() == ARMISD::CMPZ) {
				if (InFlag.getOperand(0).getOpcode() == ISD::INTRINSIC_W_CHAIN) {
				SDValue Int = InFlag.getOperand(0);
				uint64_t ID = cast<ConstantSDNode>(Int->getOperand(1))->getZExtValue();

				// Handle low-overhead loops.
				if (ID == Intrinsic::loop_decrement_reg) {
				SDValue Elements = Int.getOperand(2);
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: can this be simplified using `getIntrinsicID()`? SjoerdMeijer: nit: can this be simplified using `getIntrinsicID()`?
				SDValue Size = CurDAG->getTargetConstant(
				cast<ConstantSDNode>(Int.getOperand(3))->getZExtValue(), dl,
				MVT::i32);

				SDValue Args[] = { Elements, Size, Int.getOperand(0) };
				SDNode *LoopDec =
				CurDAG->getMachineNode(ARM::t2LoopDec, dl,
				CurDAG->getVTList(MVT::i32, MVT::Other),
				Args);
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions another nit: indentation a bit off? SjoerdMeijer: another nit: indentation a bit off?
				ReplaceUses(Int.getNode(), LoopDec);

				SDValue EndArgs[] = { SDValue(LoopDec, 0), N1, Chain };
				SDNode *LoopEnd =
				CurDAG->getMachineNode(ARM::t2LoopEnd, dl, MVT::Other, EndArgs);

				ReplaceUses(N, LoopEnd);
				CurDAG->RemoveDeadNode(N);
				CurDAG->RemoveDeadNode(InFlag.getNode());
				CurDAG->RemoveDeadNode(Int.getNode());
				return;
				}
				}

	bool SwitchEQNEToPLMI;			bool SwitchEQNEToPLMI;
	SelectCMPZ(InFlag.getNode(), SwitchEQNEToPLMI);			SelectCMPZ(InFlag.getNode(), SwitchEQNEToPLMI);
	InFlag = N->getOperand(4);			InFlag = N->getOperand(4);

	if (SwitchEQNEToPLMI) {			if (SwitchEQNEToPLMI) {
	switch ((ARMCC::CondCodes)CC) {			switch ((ARMCC::CondCodes)CC) {
	default: llvm_unreachable("CMPZ must be either NE or EQ!");			default: llvm_unreachable("CMPZ must be either NE or EQ!");
	case ARMCC::NE:			case ARMCC::NE:
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrThumb2.td

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	let Inst{15-14} = 0b11;			let Inst{15-14} = 0b11;
	let Inst{0} = 0b1;			let Inst{0} = 0b1;
	let isBranch = 1;			let isBranch = 1;
	let isTerminator = 1;			let isTerminator = 1;
	let DecoderMethod = "DecodeLOLoop";			let DecoderMethod = "DecodeLOLoop";
	let Predicates = [IsThumb2, HasV8_1MMainline, HasLOB];			let Predicates = [IsThumb2, HasV8_1MMainline, HasLOB];
	}			}

				let isNotDuplicable = 1 in {
	def t2WLS : t2LOL<(outs GPRlr:$LR),			def t2WLS : t2LOL<(outs GPRlr:$LR),
	(ins rGPR:$Rn, wlslabel_u11:$label),			(ins rGPR:$Rn, wlslabel_u11:$label),
	"wls", "$LR, $Rn, $label"> {			"wls", "$LR, $Rn, $label"> {
	bits<4> Rn;			bits<4> Rn;
	bits<11> label;			bits<11> label;
	let Inst{22-20} = 0b100;			let Inst{22-20} = 0b100;
	let Inst{19-16} = Rn{3-0};			let Inst{19-16} = Rn{3-0};
	let Inst{13-12} = 0b00;			let Inst{13-12} = 0b00;
	Show All 27 Lines
	def t2LE : t2LOL<(outs ), (ins lelabel_u11:$label), "le", "$label"> {			def t2LE : t2LOL<(outs ), (ins lelabel_u11:$label), "le", "$label"> {
	bits<11> label;			bits<11> label;
	let Inst{22-16} = 0b0101111;			let Inst{22-16} = 0b0101111;
	let Inst{13-12} = 0b00;			let Inst{13-12} = 0b00;
	let Inst{11} = label{0};			let Inst{11} = label{0};
	let Inst{10-1} = label{10-1};			let Inst{10-1} = label{10-1};
	}			}

				def t2DoLoopStart :
				t2PseudoInst<(outs), (ins rGPR:$elts), 4, IIC_Br,
				[(int_set_loop_iterations rGPR:$elts)]>, Sched<[WriteBr]>;

				def t2LoopDec :
				t2PseudoInst<(outs GPRlr:$Rm), (ins GPRlr:$Rn, imm0_7:$size),
				4, IIC_Br, []>, Sched<[WriteBr]>;

				let isBranch = 1, isTerminator = 1, hasSideEffects = 1 in
				def t2LoopEnd :
				t2PseudoInst<(outs), (ins GPRlr:$elts, brtarget:$target),
				8, IIC_Br, []>, Sched<[WriteBr]>;
				dmgreenUnsubmitted Not Done Reply Inline Actions 12 bytes for something that should usually take 4 seems overly pessimistic to me, and may pessimise other optimisations. Can we get this smaller somehow? I would expect the fallback to probably be (subs; bne) ideally. dmgreen: 12 bytes for something that should usually take 4 seems overly pessimistic to me, and may…
				samparkerAuthorUnsubmitted Done Reply Inline Actions 12 is what this implementation will produce though in the fallback case. When we use subs, we can reduce it. samparker: 12 is what this implementation will produce though in the fallback case. When we use subs, we…

				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit, perhaps `} // isNotDuplicable = 1` SjoerdMeijer: nit, perhaps `} // isNotDuplicable = 1`
				} // end isNotDuplicable

	class CS<string iname, bits<4> opcode, list<dag> pattern=[]>			class CS<string iname, bits<4> opcode, list<dag> pattern=[]>
	: V8_1MI<(outs rGPR:$Rd), (ins GPRwithZR:$Rn, GPRwithZR:$Rm, pred_noal:$fcond),			: V8_1MI<(outs rGPR:$Rd), (ins GPRwithZR:$Rn, GPRwithZR:$Rm, pred_noal:$fcond),
	AddrModeNone, NoItinerary, iname, "$Rd, $Rn, $Rm, $fcond", "", pattern> {			AddrModeNone, NoItinerary, iname, "$Rd, $Rn, $Rm, $fcond", "", pattern> {
	bits<4> Rd;			bits<4> Rd;
	bits<4> Rm;			bits<4> Rm;
	bits<4> Rn;			bits<4> Rn;
	bits<4> fcond;			bits<4> fcond;

	Show All 33 Lines

lib/Target/ARM/ARMLowOverheadLoops.cpp

This file was added.

				//===-- ARMFinalizeHardwareLoops.cpp - Low-overhead Loops ------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// Finalize v8.1-m low-overhead loops by converting the associated pseudo
				/// instructions into machine operations.
				///
				//===----------------------------------------------------------------------===//

				#include "ARM.h"
				#include "ARMBaseInstrInfo.h"
				#include "ARMBaseRegisterInfo.h"
				#include "ARMBasicBlockInfo.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineLoopInfo.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "arm-low-overhead-loops"
				#define ARM_LOW_OVERHEAD_LOOPS_NAME "ARM Low Overhead Loops pass"

				namespace {

				class ARMLowOverheadLoops : public MachineFunctionPass {
				const ARMBaseInstrInfo *TII = nullptr;
				MachineRegisterInfo *MRI = nullptr;
				std::unique_ptr<ARMBasicBlockUtils> BBUtils = nullptr;

				public:
				static char ID;

				ARMLowOverheadLoops() : MachineFunctionPass(ID) { }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequired<MachineLoopInfo>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				bool ProcessLoop(MachineLoop *ML);

				void Expand(MachineLoop ML, MachineInstr Start,
				MachineInstr Dec, MachineInstr End, bool Revert);

				MachineFunctionProperties getRequiredProperties() const override {
				return MachineFunctionProperties().set(
				MachineFunctionProperties::Property::NoVRegs);
				}

				StringRef getPassName() const override {
				return ARM_LOW_OVERHEAD_LOOPS_NAME;
				}
				};
				}

				char ARMLowOverheadLoops::ID = 0;

				INITIALIZE_PASS(ARMLowOverheadLoops, DEBUG_TYPE, ARM_LOW_OVERHEAD_LOOPS_NAME,
				false, false)

				bool ARMLowOverheadLoops::runOnMachineFunction(MachineFunction &MF) {
				LLVM_DEBUG(dbgs() << "ARM Loops on " << MF.getName() << " ------------- \n");

				auto &MLI = getAnalysis<MachineLoopInfo>();
				MRI = &MF.getRegInfo();
				TII = static_cast<const ARMBaseInstrInfo*>(
				MF.getSubtarget().getInstrInfo());
				BBUtils = std::unique_ptr<ARMBasicBlockUtils>(new ARMBasicBlockUtils(MF));
				BBUtils->computeAllBlockSizes();
				dmgreenUnsubmitted Not Done Reply Inline Actions How expensive is this? We might as well not do it for cores that won't have low overhead loops. dmgreen: How expensive is this? We might as well not do it for cores that won't have low overhead loops.
				samparkerAuthorUnsubmitted Done Reply Inline Actions Good point! I'll add a check and an exit. samparker: Good point! I'll add a check and an exit.
				dmgreenUnsubmitted Not Done Reply Inline Actions Also, I'm not sure it's doing everything it should, and might not be calculating offsets. Can you add a test with multiple loop bbs that together go over the limit? dmgreen: Also, I'm not sure it's doing everything it should, and might not be calculating offsets. Can…
				samparkerAuthorUnsubmitted Done Reply Inline Actions sounds good. samparker: sounds good.

				bool Changed = false;
				for (auto ML : MLI) {
				if (!ML->getParentLoop())
				Changed \|= ProcessLoop(ML);
				}
				return Changed;
				}

				bool ARMLowOverheadLoops::ProcessLoop(MachineLoop *ML) {

				bool Changed = false;

				// Process inner loops first.
				for (auto I = ML->begin(), E = ML->end(); I != E; ++I)
				Changed \|= ProcessLoop(*I);

				LLVM_DEBUG(dbgs() << "ARM Loops: Processing " << *ML);

				auto IsLoopStart = [](MachineInstr &MI) {
				dmgreenUnsubmitted Not Done Reply Inline Actions These don't really seem to add much to me :) dmgreen: These don't really seem to add much to me :)
				samparkerAuthorUnsubmitted Done Reply Inline Actions All in good time... while and the vector ones will follow. But yes, LoopDec can go. samparker: All in good time... while and the vector ones will follow. But yes, LoopDec can go.
				return MI.getOpcode() == ARM::t2DoLoopStart;
				};

				auto IsLoopDec = [](MachineInstr &MI) {
				return MI.getOpcode() == ARM::t2LoopDec;
				};

				auto IsLoopEnd = [](MachineInstr &MI) {
				return MI.getOpcode() == ARM::t2LoopEnd;
				};

				auto SearchForStart = [&IsLoopStart](MachineBasicBlock MBB) -> MachineInstr {
				for (auto &MI : *MBB) {
				if (IsLoopStart(MI))
				return &MI;
				}
				return nullptr;
				};

				MachineInstr *Start = nullptr;
				MachineInstr *Dec = nullptr;
				MachineInstr *End = nullptr;
				bool Revert = false;

				if (auto *Preheader = ML->getLoopPreheader())
				Start = SearchForStart(Preheader);

				for (auto *MBB : reverse(ML->getBlocks())) {
				for (auto &MI : *MBB) {
				if (IsLoopDec(MI))
				Dec = &MI;
				else if (IsLoopEnd(MI))
				End = &MI;

				if (!Dec)
				continue;

				// If we find that we load/store LR between LoopDec and LoopEnd, revert
				// back to a 'normal' loop.
				dmgreenUnsubmitted Not Done Reply Inline Actions Can you explain this a little? Is it for correctness or performance? Because we can't merge the two instructions? dmgreen: Can you explain this a little? Is it for correctness or performance? Because we can't merge the…
				samparkerAuthorUnsubmitted Done Reply Inline Actions Sure, I'll add a comment. To summarise though, its because the decremented value of LR has probably been stored to the stack - but this decrement isn't really going to happen until the end of the block, where we can't spill. If we want the value of LR to be on the stack, we'd have to perform a manual sub. Added to this that we're probably reloading LR at the bottom of the loop for LE, we'd have to either perform an add before LE or use the other form of LE that doesn't perform the decrement. samparker: Sure, I'll add a comment. To summarise though, its because the decremented value of LR has…
				if (MI.mayLoad() \|\| MI.mayStore())
				Revert =
				MI.getOperand(0).isReg() && MI.getOperand(0).getReg() == ARM::LR;
				if (MI.getDesc().isCall())
				Revert = true;
				}

				if (Dec && End && Revert)
				break;
				}

				if (Start \|\| Dec \|\| End)
				assert((Start && Dec && End) && "Failed to find all loop components");
				dmgreenUnsubmitted Not Done Reply Inline Actions I think this should give a better error message in release builds. report_fatal_error or similar. Otherwise It may just miscompile. It should never happen, as far as I understand? dmgreen: I think this should give a better error message in release builds. report_fatal_error or…
				samparkerAuthorUnsubmitted Done Reply Inline Actions cheers! samparker: cheers!
				else {
				LLVM_DEBUG(dbgs() << "ARM Loops: Not a low-overhead loop.\n");
				return Changed;
				}

				assert((End->getOperand(1).isMBB() &&
				End->getOperand(1).getMBB() == ML->getHeader()) &&
				"Expected LoopEnd to target Loop Header");

				// The LE instructions has 12-bits for the label offset.
				if (!BBUtils->isBBInRange(End, ML->getHeader(), 4096)) {
				LLVM_DEBUG(dbgs() << "ARM Loops: Too large for a low-overhead loop!\n");
				Revert = true;
				}

				LLVM_DEBUG(dbgs() << "ARM Loops:\n - Found Loop Start: " << *Start
				<< " - Found Loop Dec: " << *Dec
				dmgreenUnsubmitted Not Done Reply Inline Actions Formatting dmgreen: Formatting
				<< " - Found Loop End: " << *End);

				Expand(ML, Start, Dec, End, Revert);
				return true;
				}

				void ARMLowOverheadLoops::Expand(MachineLoop ML, MachineInstr Start,
				MachineInstr Dec, MachineInstr End,
				dmgreenUnsubmitted Not Done Reply Inline Actions Format dmgreen: Format
				bool Revert) {

				auto ExpandLoopStart = [this](MachineLoop ML, MachineInstr Start) {
				// The trip count should already been held in LR since the instructions
				// within the loop can only read and write to LR. So, there should be a
				// mov to setup the count. WLS/DLS perform this move, so find the original
				// and delete it - inserting WLS/DLS in its place.
				MachineBasicBlock *MBB = Start->getParent();
				MachineInstr *InsertPt = nullptr;
				for (auto &I : MRI->def_instructions(ARM::LR)) {
				if (I.getParent() != MBB)
				continue;

				// Always execute.
				if (!I.getOperand(2).isImm() \|\| I.getOperand(2).getImm() != ARMCC::AL)
				continue;

				// Only handle move reg, if the trip count it will need moving into a reg
				// before the setup instruction anyway.
				if (!I.getDesc().isMoveReg() \|\|
				!I.getOperand(1).isIdenticalTo(Start->getOperand(0)))
				continue;
				InsertPt = &I;
				break;
				}

				MachineInstrBuilder MIB = InsertPt ?
				BuildMI(*MBB, InsertPt, Start->getDebugLoc(), TII->get(ARM::t2DLS)) :
				dmgreenUnsubmitted Not Done Reply Inline Actions This is just the same call, with one parameter different? dmgreen: This is just the same call, with one parameter different?
				samparkerAuthorUnsubmitted Done Reply Inline Actions Yes, i know it looks awkward... I'll try to see if I can make it nicer. samparker: Yes, i know it looks awkward... I'll try to see if I can make it nicer.
				BuildMI(*MBB, Start, Start->getDebugLoc(), TII->get(ARM::t2DLS));
				if (InsertPt)
				InsertPt->eraseFromParent();

				MIB.addDef(ARM::LR);
				MIB.add(Start->getOperand(0));
				LLVM_DEBUG(dbgs() << "ARM Loops: Inserted DLS: " << *MIB);
				Start->eraseFromParent();
				};

				// Combine the LoopDec and LoopEnd instructions into LE(TP).
				auto ExpandLoopEnd = [this](MachineLoop ML, MachineInstr Dec,
				MachineInstr *End) {
				MachineBasicBlock *MBB = End->getParent();
				MachineInstrBuilder MIB = BuildMI(*MBB, End, End->getDebugLoc(),
				TII->get(ARM::t2LEUpdate));
				MIB.addDef(ARM::LR);
				MIB.add(End->getOperand(0));
				MIB.add(End->getOperand(1));
				LLVM_DEBUG(dbgs() << "ARM Loops: Inserted LE: " << *MIB);

				// If there is a branch after loop end, which branches to the fallthrough
				// block, remove the branch.
				MachineBasicBlock *Latch = End->getParent();
				MachineInstr *Terminator = &Latch->instr_back();
				if (End != Terminator) {
				MachineBasicBlock *Exit = ML->getExitBlock();
				if (Latch->isLayoutSuccessor(Exit)) {
				LLVM_DEBUG(dbgs() << "ARM Loops: Removing loop exit branch: "
				<< *Terminator);
				Terminator->eraseFromParent();
				}
				}
				End->eraseFromParent();
				Dec->eraseFromParent();
				};

				// Generate a subs, or sub and cmp, and a branch instead of an LE.
				// TODO: Check flags so that we can possibly generate a subs.
				auto ExpandBranch = [this](MachineInstr Dec, MachineInstr End) {
				LLVM_DEBUG(dbgs() << "ARM Loops: Reverting to sub, cmp, br.\n");
				// Create sub
				MachineBasicBlock *MBB = Dec->getParent();
				MachineInstrBuilder MIB = BuildMI(*MBB, Dec, Dec->getDebugLoc(),
				TII->get(ARM::t2SUBri));
				MIB.addDef(ARM::LR);
				MIB.add(Dec->getOperand(1));
				MIB.add(Dec->getOperand(2));
				MIB.addImm(ARMCC::AL);
				MIB.addReg(0);
				MIB.addReg(0);

				// Create cmp
				MBB = End->getParent();
				MIB = BuildMI(*MBB, End, End->getDebugLoc(), TII->get(ARM::t2CMPri));
				MIB.addReg(ARM::LR);
				MIB.addImm(0);
				MIB.addImm(ARMCC::AL);

				// Create bne
				MIB = BuildMI(*MBB, End, End->getDebugLoc(), TII->get(ARM::t2Bcc));
				MIB.add(End->getOperand(1)); // branch target
				MIB.addImm(ARMCC::NE); // condition code
				End->eraseFromParent();
				Dec->eraseFromParent();
				};

				if (Revert) {
				Start->eraseFromParent();
				ExpandBranch(Dec, End);
				} else {
				ExpandLoopStart(ML, Start);
				ExpandLoopEnd(ML, Dec, End);
				}
				}

				FunctionPass *llvm::createARMLowOverheadLoopsPass() {
				return new ARMLowOverheadLoops();
				}

lib/Target/ARM/ARMTargetMachine.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	extern "C" void LLVMInitializeARMTarget() {
initializeARMPreAllocLoadStoreOptPass(Registry);		initializeARMPreAllocLoadStoreOptPass(Registry);
initializeARMParallelDSPPass(Registry);		initializeARMParallelDSPPass(Registry);
initializeARMCodeGenPreparePass(Registry);		initializeARMCodeGenPreparePass(Registry);
initializeARMConstantIslandsPass(Registry);		initializeARMConstantIslandsPass(Registry);
initializeARMExecutionDomainFixPass(Registry);		initializeARMExecutionDomainFixPass(Registry);
initializeARMExpandPseudoPass(Registry);		initializeARMExpandPseudoPass(Registry);
initializeThumb2SizeReducePass(Registry);		initializeThumb2SizeReducePass(Registry);
initializeMVEVPTBlockPass(Registry);		initializeMVEVPTBlockPass(Registry);
		initializeARMLowOverheadLoopsPass(Registry);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO())		if (TT.isOSBinFormatMachO())
return llvm::make_unique<TargetLoweringObjectFileMachO>();		return llvm::make_unique<TargetLoweringObjectFileMachO>();
if (TT.isOSWindows())		if (TT.isOSWindows())
return llvm::make_unique<TargetLoweringObjectFileCOFF>();		return llvm::make_unique<TargetLoweringObjectFileCOFF>();
return llvm::make_unique<ARMElfTargetObjectFile>();		return llvm::make_unique<ARMElfTargetObjectFile>();
▲ Show 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	if ((TM->getOptLevel() != CodeGenOpt::None &&
// expect it to be generally either beneficial or harmless. On Mach-O it		// expect it to be generally either beneficial or harmless. On Mach-O it
// is disabled as we emit the .subsections_via_symbols directive which		// is disabled as we emit the .subsections_via_symbols directive which
// means that merging extern globals is not safe.		// means that merging extern globals is not safe.
bool MergeExternalByDefault = !TM->getTargetTriple().isOSBinFormatMachO();		bool MergeExternalByDefault = !TM->getTargetTriple().isOSBinFormatMachO();
addPass(createGlobalMergePass(TM, 127, OnlyOptimizeForSize,		addPass(createGlobalMergePass(TM, 127, OnlyOptimizeForSize,
MergeExternalByDefault));		MergeExternalByDefault));
}		}

		if (TM->getOptLevel() != CodeGenOpt::None)
		addPass(createHardwareLoopsPass());

return false;		return false;
}		}

bool ARMPassConfig::addInstSelector() {		bool ARMPassConfig::addInstSelector() {
addPass(createARMISelDag(getARMTargetMachine(), getOptLevel()));		addPass(createARMISelDag(getARMTargetMachine(), getOptLevel()));
return false;		return false;
}		}

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	addPass(createUnpackMachineBundles([](const MachineFunction &MF) {
return MF.getSubtarget<ARMSubtarget>().isThumb2();		return MF.getSubtarget<ARMSubtarget>().isThumb2();
}));		}));

// Don't optimize barriers at -O0.		// Don't optimize barriers at -O0.
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addPass(createARMOptimizeBarriersPass());		addPass(createARMOptimizeBarriersPass());

addPass(createARMConstantIslandPass());		addPass(createARMConstantIslandPass());
		addPass(createARMLowOverheadLoopsPass());
}		}

lib/Target/ARM/CMakeLists.txt

Show All 33 Lines	add_llvm_target(ARMCodeGen
ARMHazardRecognizer.cpp		ARMHazardRecognizer.cpp
ARMInstructionSelector.cpp		ARMInstructionSelector.cpp
ARMISelDAGToDAG.cpp		ARMISelDAGToDAG.cpp
ARMISelLowering.cpp		ARMISelLowering.cpp
ARMInstrInfo.cpp		ARMInstrInfo.cpp
ARMLegalizerInfo.cpp		ARMLegalizerInfo.cpp
ARMParallelDSP.cpp		ARMParallelDSP.cpp
ARMLoadStoreOptimizer.cpp		ARMLoadStoreOptimizer.cpp
		ARMLowOverheadLoops.cpp
ARMMCInstLower.cpp		ARMMCInstLower.cpp
ARMMachineFunctionInfo.cpp		ARMMachineFunctionInfo.cpp
ARMMacroFusion.cpp		ARMMacroFusion.cpp
ARMRegisterInfo.cpp		ARMRegisterInfo.cpp
ARMOptimizeBarriersPass.cpp		ARMOptimizeBarriersPass.cpp
ARMRegisterBankInfo.cpp		ARMRegisterBankInfo.cpp
ARMSelectionDAGInfo.cpp		ARMSelectionDAGInfo.cpp
ARMSubtarget.cpp		ARMSubtarget.cpp
Show All 17 Lines

test/CodeGen/ARM/O3-pipeline.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: CodeGen Prepare			; CHECK-NEXT: CodeGen Prepare
	; CHECK-NEXT: Rewrite Symbols			; CHECK-NEXT: Rewrite Symbols
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Merge internal globals			; CHECK-NEXT: Merge internal globals
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Hardware Loop Insertion
	; CHECK-NEXT: Safe Stack instrumentation pass			; CHECK-NEXT: Safe Stack instrumentation pass
	; CHECK-NEXT: Insert stack protectors			; CHECK-NEXT: Insert stack protectors
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Analyze Machine Code For Garbage Collection			; CHECK-NEXT: Analyze Machine Code For Garbage Collection
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Branch Probability Basic Block Placement			; CHECK-NEXT: Branch Probability Basic Block Placement
	; CHECK-NEXT: Thumb2 instruction size reduce pass			; CHECK-NEXT: Thumb2 instruction size reduce pass
	; CHECK-NEXT: Unpack machine instruction bundles			; CHECK-NEXT: Unpack machine instruction bundles
	; CHECK-NEXT: optimise barriers pass			; CHECK-NEXT: optimise barriers pass
	; CHECK-NEXT: ARM constant island placement and branch shortening pass			; CHECK-NEXT: ARM constant island placement and branch shortening pass
				; CHECK-NEXT: MachineDominator Tree Construction
				; CHECK-NEXT: Machine Natural Loop Construction
				; CHECK-NEXT: ARM Low Overhead Loops pass
	; CHECK-NEXT: Contiguously Lay Out Funclets			; CHECK-NEXT: Contiguously Lay Out Funclets
	; CHECK-NEXT: StackMap Liveness Analysis			; CHECK-NEXT: StackMap Liveness Analysis
	; CHECK-NEXT: Live DEBUG_VALUE analysis			; CHECK-NEXT: Live DEBUG_VALUE analysis
	; CHECK-NEXT: Insert fentry calls			; CHECK-NEXT: Insert fentry calls
	; CHECK-NEXT: Insert XRay ops			; CHECK-NEXT: Insert XRay ops
	; CHECK-NEXT: Implement the 'patchable-function' attribute			; CHECK-NEXT: Implement the 'patchable-function' attribute
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: ARM Assembly Printer			; CHECK-NEXT: ARM Assembly Printer
	; CHECK-NEXT: Free MachineFunction			; CHECK-NEXT: Free MachineFunction

test/Transforms/HardwareLoops/ARM/calls.ll

; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MAIN		; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MAIN
; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+fullfp16 -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FP		; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+fullfp16 -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FP
; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+fp-armv8,+fullfp16 -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FP64		; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+fp-armv8,+fullfp16 -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-FP64
; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MVE		; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MVE
; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve.fp -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MVEFP		; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve.fp -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MVEFP
		; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve.fp -disable-arm-loloops=false %s -o - \| FileCheck %s --check-prefix=CHECK-LLC

; CHECK-LABEL: skip_call		; CHECK-LABEL: skip_call
; CHECK-NOT: call void @llvm.set.loop.iterations		; CHECK-NOT: call void @llvm.set.loop.iterations
; CHECK-NOT: call i32 @llvm.loop.decrement		; CHECK-NOT: call i32 @llvm.loop.decrement

define i32 @skip_call(i32 %n) {		define i32 @skip_call(i32 %n) {
entry:		entry:
%cmp6 = icmp eq i32 %n, 0		%cmp6 = icmp eq i32 %n, 0
Show All 21 Lines

; CHECK-LABEL: test_target_specific		; CHECK-LABEL: test_target_specific
; CHECK: call void @llvm.set.loop.iterations.i32(i32 50)		; CHECK: call void @llvm.set.loop.iterations.i32(i32 50)
; CHECK: [[COUNT:%[^ ]+]] = phi i32 [ 50, %entry ], [ [[LOOP_DEC:%[^ ]+]], %loop ]		; CHECK: [[COUNT:%[^ ]+]] = phi i32 [ 50, %entry ], [ [[LOOP_DEC:%[^ ]+]], %loop ]
; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[COUNT]], i32 1)		; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[COUNT]], i32 1)
; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0		; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
; CHECK: br i1 [[CMP]], label %loop, label %exit		; CHECK: br i1 [[CMP]], label %loop, label %exit

		; CHECK-LLC-LABEL: test_target_specific:
		; CHECK-LLC: mov.w lr, #50
		; CHECK-LLC: dls lr, lr
		; CHECK-LLC-NOT: mov lr,
		; CHECK-LLC: [[LOOP_HEADER:\.LBB[0-9_]+]]:
		; CHECK-LLC: le lr, [[LOOP_HEADER]]
		; CHECK-LLC-NOT: b .
		; CHECK-LLC: @ %exit

define i32 @test_target_specific(i32* %a, i32* %b) {		define i32 @test_target_specific(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop
loop:		loop:
%acc = phi i32 [ 0, %entry ], [ %res, %loop ]		%acc = phi i32 [ 0, %entry ], [ %res, %loop ]
%count = phi i32 [ 0, %entry ], [ %count.next, %loop ]		%count = phi i32 [ 0, %entry ], [ %count.next, %loop ]
%addr.a = getelementptr i32, i32* %a, i32 %count		%addr.a = getelementptr i32, i32* %a, i32 %count
%addr.b = getelementptr i32, i32* %b, i32 %count		%addr.b = getelementptr i32, i32* %b, i32 %count
Show All 29 Lines	exit:
ret void		ret void
}		}

; CHECK-LABEL: test_fabs		; CHECK-LABEL: test_fabs
; CHECK-MAIN-NOT: call void @llvm.set.loop.iterations		; CHECK-MAIN-NOT: call void @llvm.set.loop.iterations
; CHECK-MVE-NOT: call void @llvm.set.loop.iterations		; CHECK-MVE-NOT: call void @llvm.set.loop.iterations
; CHECK-FP: call void @llvm.set.loop.iterations.i32(i32 100)		; CHECK-FP: call void @llvm.set.loop.iterations.i32(i32 100)
; CHECK-MVEFP: call void @llvm.set.loop.iterations.i32(i32 100)		; CHECK-MVEFP: call void @llvm.set.loop.iterations.i32(i32 100)

		; CHECK-LLC-LABEL: test_fabs:
		; CHECK-LLC: mov.w lr, #100
		; CHECK-LLC: dls lr, lr
		; CHECK-LLC-NOT: mov lr,
		; CHECK-LLC: [[LOOP_HEADER:\.LBB[0-9_]+]]:
		; CHECK-LLC-NOT: bl
		; CHECK-LLC: le lr, [[LOOP_HEADER]]
		; CHECK-LLC-NOT: b .
		; CHECK-LLC: @ %exit

define float @test_fabs(float* %a) {		define float @test_fabs(float* %a) {
entry:		entry:
br label %loop		br label %loop
loop:		loop:
%acc = phi float [ 0.0, %entry ], [ %res, %loop ]		%acc = phi float [ 0.0, %entry ], [ %res, %loop ]
%count = phi i32 [ 0, %entry ], [ %count.next, %loop ]		%count = phi i32 [ 0, %entry ], [ %count.next, %loop ]
%addr.a = getelementptr float, float* %a, i32 %count		%addr.a = getelementptr float, float* %a, i32 %count
%load.a = load float, float* %addr.a		%load.a = load float, float* %addr.a
▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

test/Transforms/HardwareLoops/ARM/cond-mov.mir

This file was added.

				# RUN: llc -mtriple=thumbv8.1m.main -run-pass=arm-low-overhead-loops %s -o - \| FileCheck %s
				# CHECK: $lr = tMOVr $r0, 13, $noreg
				# CHECK: $lr = t2DLS killed $r0
				# CHECK: $lr = t2LEUpdate renamable $lr, %bb.1

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main"

				define i32 @do_copy(i32 %n, i32* nocapture %p, i32* nocapture readonly %q) {
				entry:
				%scevgep = getelementptr i32, i32* %q, i32 -1
				%scevgep3 = getelementptr i32, i32* %p, i32 -1
				call void @llvm.set.loop.iterations.i32(i32 %n)
				br label %while.body

				while.body:
				%lsr.iv4 = phi i32* [ %scevgep5, %while.body ], [ %scevgep3, %entry ]
				%lsr.iv = phi i32* [ %scevgep1, %while.body ], [ %scevgep, %entry ]
				%0 = phi i32 [ %n, %entry ], [ %2, %while.body ]
				%scevgep2 = getelementptr i32, i32* %lsr.iv, i32 1
				%scevgep6 = getelementptr i32, i32* %lsr.iv4, i32 1
				%1 = load i32, i32* %scevgep2, align 4
				store i32 %1, i32* %scevgep6, align 4
				%scevgep1 = getelementptr i32, i32* %lsr.iv, i32 1
				%scevgep5 = getelementptr i32, i32* %lsr.iv4, i32 1
				%2 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %0, i32 1)
				%3 = icmp ne i32 %2, 0
				br i1 %3, label %while.body, label %while.end

				while.end:
				ret i32 0
				}

				declare void @llvm.set.loop.iterations.i32(i32) #0
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #0
				declare void @llvm.stackprotector(i8, i8*) #1

				attributes #0 = { noduplicate nounwind }
				attributes #1 = { nounwind }

				...
				---
				name: do_copy
				alignment: 1
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)
				liveins: $r0, $r1, $r2, $r7, $lr

				$sp = frame-setup t2STMDB_UPD $sp, 14, $noreg, killed $r7, killed $lr
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				$lr = tMOVr $r0, 13, $noreg
				t2DoLoopStart killed $r0
				renamable $r0 = t2SUBri killed renamable $r1, 4, 14, $noreg, $noreg
				renamable $r1 = t2SUBri killed renamable $r2, 4, 14, $noreg, $noreg

				bb.1.while.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $r0, $r1

				renamable $r2, renamable $r1 = t2LDR_PRE killed renamable $r1, 4, 14, $noreg :: (load 4 from %ir.scevgep2)
				early-clobber renamable $r0 = t2STR_PRE killed renamable $r2, killed renamable $r0, 4, 14, $noreg :: (store 4 into %ir.scevgep6)
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1
				t2B %bb.2, 14, $noreg

				bb.2.while.end:
				$r0 = t2MOVi 0, 14, $noreg, $noreg
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r7, def $pc, implicit killed $r0

				...

test/Transforms/HardwareLoops/ARM/massive.mir

This file was added.

				# RUN: llc -mtriple=armv8.1m.main -run-pass=arm-low-overhead-loops %s -o - \| FileCheck %s
				# CHECK: for.body:
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions yes, it is massive! :-) But I think we can simplify this a lot by using intrinsic `@llvm.arm.space`: // A space-consuming intrinsic primarily for testing ARMConstantIslands. The // first argument is the number of bytes this "instruction" takes up, the second // and return value are essentially chains, used to force ordering during ISel. def int_arm_space : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], []>; as mentioned constant island tests are using this, I think you want to do something similar here. SjoerdMeijer: yes, it is massive! :-) But I think we can simplify this a lot by using intrinsic `@llvm.arm.
				samparkerAuthorUnsubmitted Done Reply Inline Actions Bingo! Thanks samparker: Bingo! Thanks
				# CHECK-NOT: t2DLS
				# CHECK-NOT: t2LEUpdate

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-unknown-unknown"

				; Function Attrs: norecurse nounwind
				define dso_local arm_aapcscc void @massive(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i32 %N) local_unnamed_addr #0 {
				entry:
				%cmp8 = icmp eq i32 %N, 0
				br i1 %cmp8, label %for.cond.cleanup, label %for.body.preheader

				for.body.preheader: ; preds = %entry
				%scevgep = getelementptr i32, i32* %a, i32 -1
				%scevgep4 = getelementptr i32, i32* %c, i32 -1
				%scevgep8 = getelementptr i32, i32* %b, i32 -1
				call void @llvm.set.loop.iterations.i32(i32 %N)
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv9 = phi i32* [ %scevgep8, %for.body.preheader ], [ %scevgep10, %for.body ]
				%lsr.iv5 = phi i32* [ %scevgep4, %for.body.preheader ], [ %scevgep6, %for.body ]
				%lsr.iv1 = phi i32* [ %scevgep, %for.body.preheader ], [ %scevgep2, %for.body ]
				%0 = phi i32 [ %N, %for.body.preheader ], [ %3, %for.body ]
				%size = call i32 @llvm.arm.space(i32 4096, i32 undef)
				%scevgep11 = getelementptr i32, i32* %lsr.iv9, i32 1
				%1 = load i32, i32* %scevgep11, align 4, !tbaa !3
				%scevgep7 = getelementptr i32, i32* %lsr.iv5, i32 1
				%2 = load i32, i32* %scevgep7, align 4, !tbaa !3
				%mul = mul nsw i32 %2, %1
				%scevgep3 = getelementptr i32, i32* %lsr.iv1, i32 1
				store i32 %mul, i32* %scevgep3, align 4, !tbaa !3
				%scevgep2 = getelementptr i32, i32* %lsr.iv1, i32 1
				%scevgep6 = getelementptr i32, i32* %lsr.iv5, i32 1
				%scevgep10 = getelementptr i32, i32* %lsr.iv9, i32 1
				%3 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %0, i32 1)
				%4 = icmp ne i32 %3, 0
				br i1 %4, label %for.body, label %for.cond.cleanup
				}

				; Function Attrs: nounwind
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions a proper nit: just a minor clean up, but we don't need these comments on the functions SjoerdMeijer: a proper nit: just a minor clean up, but we don't need these comments on the functions
				declare i32 @llvm.arm.space(i32, i32) #1

				; Function Attrs: noduplicate nounwind
				declare void @llvm.set.loop.iterations.i32(i32) #2

				; Function Attrs: noduplicate nounwind
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #2

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #1
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions and this stack protector stuff SjoerdMeijer: and this stack protector stuff

				attributes #0 = { norecurse nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+armv8.1-m.main,+hwdiv,+ras,+soft-float,+strict-align,+thumb-mode,-crypto,-d32,-dotprod,-fp-armv8,-fp-armv8d16,-fp-armv8d16sp,-fp-armv8sp,-fp16,-fp16fml,-fp64,-fpregs,-fullfp16,-neon,-vfp2,-vfp2d16,-vfp2d16sp,-vfp2sp,-vfp3,-vfp3d16,-vfp3d16sp,-vfp3sp,-vfp4,-vfp4d16,-vfp4d16sp,-vfp4sp" "unsafe-fp-math"="false" "use-soft-float"="true" }
				attributes #1 = { nounwind }
				attributes #2 = { noduplicate nounwind }

				!llvm.module.flags = !{!0, !1}
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions and these attributes SjoerdMeijer: and these attributes
				!llvm.ident = !{!2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"min_enum_size", i32 4}
				!2 = !{!"clang version 9.0.0 (http://llvm.org/git/clang.git a9c7c0fc5d468f3d18a5c6beb697ab0d5be2ff4c) (http://llvm.org/git/llvm.git f34bff0c141a04a5182d57e2cfb1e4bc582c81b0)"}
				!3 = !{!4, !4, i64 0}
				!4 = !{!"int", !5, i64 0}
				!5 = !{!"omnipotent char", !6, i64 0}
				!6 = !{!"Simple C/C++ TBAA"}

				...
				---
				name: massive
				alignment: 1
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: false
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				- { reg: '$r3', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)

				frame-setup tPUSH 14, $noreg, $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				$r7 = frame-setup tMOVr $sp, 14, $noreg
				frame-setup CFI_INSTRUCTION def_cfa_register $r7
				tCMPi8 $r3, 0, 14, $noreg, implicit-def $cpsr
				t2IT 0, 8, implicit-def $itstate
				tPOP_RET 0, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14, $noreg
				renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14, $noreg
				renamable $r0, dead $cpsr = tSUBi8 killed renamable $r0, 4, 14, $noreg
				$lr = tMOVr $r3, 14, $noreg
				t2DoLoopStart killed $r3

				bb.1.for.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)

				dead renamable $r3 = SPACE 4096, undef renamable $r0
				renamable $r12, renamable $r1 = t2LDR_PRE killed renamable $r1, 4, 14, $noreg :: (load 4 from %ir.scevgep11, !tbaa !3)
				renamable $r3, renamable $r2 = t2LDR_PRE killed renamable $r2, 4, 14, $noreg :: (load 4 from %ir.scevgep7, !tbaa !3)
				renamable $r3 = nsw t2MUL killed renamable $r3, killed renamable $r12, 14, $noreg
				early-clobber renamable $r0 = t2STR_PRE killed renamable $r3, killed renamable $r0, 4, 14, $noreg :: (store 4 into %ir.scevgep3, !tbaa !3)
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1
				tB %bb.2, 14, $noreg

				bb.2.for.cond.cleanup:
				tPOP_RET 14, $noreg, def $r7, def $pc

				...

test/Transforms/HardwareLoops/ARM/revert-after-call.mir

This file was added.

				# RUN: llc -mtriple=thumbv8.1m.main %s -o - \| FileCheck %s

				# CHECK: .LBB0_2:
				# CHECK: sub.w lr, lr, #1
				# CHECK: mov [[TMP:r[0-9]+]], lr
				# CHECK: bl bar
				# CHECK: mov lr, [[TMP]]
				# CHECK: cmp.w lr, #0
				# CHECK: bne{{.*}} .LBB0_2

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-none-eabi"

				define i32 @skip_call(i32 %n) #0 {
				entry:
				%cmp6 = icmp eq i32 %n, 0
				br i1 %cmp6, label %while.end, label %while.body.preheader

				while.body.preheader: ; preds = %entry
				call void @llvm.set.loop.iterations.i32(i32 %n)
				br label %while.body

				while.body: ; preds = %while.body, %while.body.preheader
				%res.07 = phi i32 [ %add, %while.body ], [ 0, %while.body.preheader ]
				%0 = phi i32 [ %n, %while.body.preheader ], [ %1, %while.body ]
				%call = tail call i32 bitcast (i32 (...)* @bar to i32 ()*)()
				%add = add nsw i32 %call, %res.07
				%1 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %0, i32 1)
				%2 = icmp ne i32 %1, 0
				br i1 %2, label %while.body, label %while.end

				while.end: ; preds = %while.body, %entry
				%res.0.lcssa = phi i32 [ 0, %entry ], [ %add, %while.body ]
				ret i32 %res.0.lcssa
				}

				declare i32 @bar(...) local_unnamed_addr #0
				declare void @llvm.set.loop.iterations.i32(i32) #1
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #1
				declare void @llvm.stackprotector(i8, i8*) #2

				attributes #0 = { "target-features"="+mve.fp" }
				attributes #1 = { noduplicate nounwind }
				attributes #2 = { nounwind }

				...
				---
				name: skip_call
				alignment: 1
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 16
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: true
				hasCalls: true
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 2, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r5', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 3, name: '', type: spill-slot, offset: -16, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r4', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				successors: %bb.1(0x30000000), %bb.3(0x50000000)
				liveins: $r0, $r4, $r5, $r7, $lr

				$sp = frame-setup t2STMDB_UPD $sp, 14, $noreg, killed $r4, killed $r5, killed $r7, killed $lr
				frame-setup CFI_INSTRUCTION def_cfa_offset 16
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				frame-setup CFI_INSTRUCTION offset $r5, -12
				frame-setup CFI_INSTRUCTION offset $r4, -16
				t2CMPri $r0, 0, 14, $noreg, implicit-def $cpsr
				t2Bcc %bb.1, 0, killed $cpsr

				bb.3.while.body.preheader:
				successors: %bb.4(0x80000000)
				liveins: $r0

				$lr = tMOVr $r0, 14, $noreg
				renamable $r4 = t2MOVi 0, 14, $noreg, $noreg
				t2DoLoopStart killed $r0

				bb.4.while.body:
				successors: %bb.4(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $r4

				renamable $lr = t2LoopDec killed renamable $lr, 1
				$r5 = tMOVr killed $lr, 14, $noreg
				tBL 14, $noreg, @bar, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp, implicit-def $r0
				$lr = tMOVr killed $r5, 14, $noreg
				renamable $r4 = nsw t2ADDrr killed renamable $r0, killed renamable $r4, 14, $noreg, $noreg
				t2LoopEnd renamable $lr, %bb.4
				t2B %bb.2, 14, $noreg

				bb.2.while.end:
				liveins: $r4

				$r0 = tMOVr killed $r4, 14, $noreg
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r4, def $r5, def $r7, def $pc, implicit killed $r0

				bb.1:
				renamable $r4 = t2MOVi 0, 14, $noreg, $noreg
				$r0 = tMOVr killed $r4, 14, $noreg
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r4, def $r5, def $r7, def $pc, implicit killed $r0

				...

test/Transforms/HardwareLoops/ARM/revert-after-spill.mir

This file was added.

				# RUN: llc -mtriple=thumbv8.1m.main %s -o - \| FileCheck %s

				# CHECK: .LBB0_2:
				# CHECK: sub.w lr, lr, #1
				# CHECK: str.w lr, [sp, #12]
				# CHECK: ldr.w lr, [sp, #12]
				# CHECK: cmp.w lr, #0
				# CHECK: bne{{.*}} .LBB0_2

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-none-eabi"

				define i32 @skip_spill(i32 %n) #0 {
				entry:
				%cmp6 = icmp eq i32 %n, 0
				br i1 %cmp6, label %while.end, label %while.body.preheader

				while.body.preheader: ; preds = %entry
				call void @llvm.set.loop.iterations.i32(i32 %n)
				br label %while.body

				while.body: ; preds = %while.body, %while.body.preheader
				%res.07 = phi i32 [ %add, %while.body ], [ 0, %while.body.preheader ]
				%0 = phi i32 [ %n, %while.body.preheader ], [ %1, %while.body ]
				%call = tail call i32 bitcast (i32 (...)* @bar to i32 ()*)()
				%add = add nsw i32 %call, %res.07
				%1 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %0, i32 1)
				%2 = icmp ne i32 %1, 0
				br i1 %2, label %while.body, label %while.end

				while.end: ; preds = %while.body, %entry
				%res.0.lcssa = phi i32 [ 0, %entry ], [ %add, %while.body ]
				ret i32 %res.0.lcssa
				}

				declare i32 @bar(...) local_unnamed_addr #0
				declare void @llvm.set.loop.iterations.i32(i32) #1
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #1
				declare void @llvm.stackprotector(i8, i8*) #2

				attributes #0 = { "target-features"="+mve.fp" }
				attributes #1 = { noduplicate nounwind }
				attributes #2 = { nounwind }

				...
				---
				name: skip_spill
				alignment: 1
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 16
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: true
				hasCalls: true
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 2, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r5', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 3, name: '', type: spill-slot, offset: -16, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r4', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				successors: %bb.1(0x30000000), %bb.3(0x50000000)
				liveins: $r0, $r4, $r5, $r7, $lr

				$sp = frame-setup t2STMDB_UPD $sp, 14, $noreg, killed $r4, killed $r5, killed $r7, killed $lr
				frame-setup CFI_INSTRUCTION def_cfa_offset 16
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				frame-setup CFI_INSTRUCTION offset $r5, -12
				frame-setup CFI_INSTRUCTION offset $r4, -16
				t2CMPri $r0, 0, 14, $noreg, implicit-def $cpsr
				t2Bcc %bb.1, 0, killed $cpsr

				bb.3.while.body.preheader:
				successors: %bb.4(0x80000000)
				liveins: $r0

				$lr = tMOVr $r0, 14, $noreg
				renamable $r4 = t2MOVi 0, 14, $noreg, $noreg
				t2DoLoopStart killed $r0

				bb.4.while.body:
				successors: %bb.4(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $r4

				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2STRi12 $lr, %stack.0, 0, 14, $noreg :: (store 4)
				$lr = t2LDRi12 %stack.0, 0, 14, $noreg :: (load 4)
				renamable $r4 = nsw t2ADDrr renamable $lr, killed renamable $r4, 14, $noreg, $noreg
				t2LoopEnd renamable $lr, %bb.4
				t2B %bb.2, 14, $noreg

				bb.2.while.end:
				liveins: $r4

				$r0 = tMOVr killed $r4, 14, $noreg
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r4, def $r5, def $r7, def $pc, implicit killed $r0

				bb.1:
				renamable $r4 = t2MOVi 0, 14, $noreg, $noreg
				$r0 = tMOVr killed $r4, 14, $noreg
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r4, def $r5, def $r7, def $pc, implicit killed $r0

				...

test/Transforms/HardwareLoops/ARM/simple-do.ll

	; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s			; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s
	; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=true %s -S -o - \| FileCheck %s --check-prefix=DISABLED			; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=true %s -S -o - \| FileCheck %s --check-prefix=DISABLED
	; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=-lob -hardware-loops %s -S -o - \| FileCheck %s --check-prefix=DISABLED			; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=-lob -hardware-loops %s -S -o - \| FileCheck %s --check-prefix=DISABLED
				; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -disable-arm-loloops=false %s -o - \| FileCheck %s --check-prefix=CHECK-LLC

	; DISABLED-NOT: llvm.set.loop.iterations			; DISABLED-NOT: llvm.set.loop.iterations
	; DISABLED-NOT: llvm.loop.decrement			; DISABLED-NOT: llvm.loop.decrement

	@g = common local_unnamed_addr global i32* null, align 4			@g = common local_unnamed_addr global i32* null, align 4

	; CHECK-LABEL: do_copy			; CHECK-LABEL: do_copy
	; CHECK: call void @llvm.set.loop.iterations.i32(i32 %n)			; CHECK: call void @llvm.set.loop.iterations.i32(i32 %n)
	; CHECK: br label %while.body			; CHECK: br label %while.body

	; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %entry ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]			; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %entry ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
	; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)			; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)
	; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0			; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
	; CHECK: br i1 [[CMP]], label %while.body, label %while.end			; CHECK: br i1 [[CMP]], label %while.body, label %while.end

				; CHECK-LLC-LABEL:do_copy:
				; CHECK-LLC-NOT: mov lr, r0
				; CHECK-LLC: dls lr, r0
				; CHECK-LLC-NOT: mov lr, r0
				; CHECK-LLC: [[LOOP_HEADER:\.LBB[0-9_]+]]:
				; CHECK-LLC: le lr, [[LOOP_HEADER]]
				; CHECK-LLC-NOT: b [[LOOP_EXIT:\.LBB[0-9._]+]]
				; CHECK-LLC: @ %while.end
	define i32 @do_copy(i32 %n, i32* nocapture %p, i32* nocapture readonly %q) {			define i32 @do_copy(i32 %n, i32* nocapture %p, i32* nocapture readonly %q) {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	%q.addr.05 = phi i32* [ %incdec.ptr, %while.body ], [ %q, %entry ]			%q.addr.05 = phi i32* [ %incdec.ptr, %while.body ], [ %q, %entry ]
	%p.addr.04 = phi i32* [ %incdec.ptr1, %while.body ], [ %p, %entry ]			%p.addr.04 = phi i32* [ %incdec.ptr1, %while.body ], [ %p, %entry ]
	%x.addr.03 = phi i32 [ %dec, %while.body ], [ %n, %entry ]			%x.addr.03 = phi i32 [ %dec, %while.body ], [ %n, %entry ]
	Show All 14 Lines
	; CHECK: call void @llvm.set.loop.iterations.i32(i32 %n)			; CHECK: call void @llvm.set.loop.iterations.i32(i32 %n)
	; CHECK-NEXT: br label %while.body			; CHECK-NEXT: br label %while.body

	; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]			; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
	; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)			; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)
	; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0			; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
	; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit			; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit

				; CHECK-LLC-LABEL:do_inc1:
				; CHECK-LLC: dls lr,
				; CHECK-LLC-NOT: mov lr,
				; CHECK-LLC: [[LOOP_HEADER:\.LBB[0-9_]+]]:
				; CHECK-LLC: le lr, [[LOOP_HEADER]]
				; CHECK-LLC-NOT: b [[LOOP_EXIT:\.LBB[0-9_]+]]
				; CHECK-LLC: [[LOOP_EXIT:\.LBB[0-9_]+]]:

	define i32 @do_inc1(i32 %n) {			define i32 @do_inc1(i32 %n) {
	entry:			entry:
	%cmp7 = icmp eq i32 %n, 0			%cmp7 = icmp eq i32 %n, 0
	br i1 %cmp7, label %while.end, label %while.body.lr.ph			br i1 %cmp7, label %while.end, label %while.body.lr.ph

	while.body.lr.ph:			while.body.lr.ph:
	%0 = load i32, i32* @g, align 4			%0 = load i32, i32* @g, align 4
	br label %while.body			br label %while.body
	Show All 23 Lines
	; CHECK: [[COUNT:%[^ ]+]] = add nuw i32 [[HALVE]], 1			; CHECK: [[COUNT:%[^ ]+]] = add nuw i32 [[HALVE]], 1
	; CHECK: call void @llvm.set.loop.iterations.i32(i32 [[COUNT]])			; CHECK: call void @llvm.set.loop.iterations.i32(i32 [[COUNT]])
	; CHECK-NEXT: br label %while.body			; CHECK-NEXT: br label %while.body

	; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[COUNT]], %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]			; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[COUNT]], %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
	; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)			; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)
	; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0			; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
	; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit			; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit

				; CHECK-LLC: do_inc2:
				; CHECK-LLC-NOT: mov lr,
				; CHECK-LLC: dls lr,
				; CHECK-LLC-NOT: mov lr,
				; CHECK-LLC: [[LOOP_HEADER:\.LBB[0-9._]+]]:
				; CHECK-LLC: le lr, [[LOOP_HEADER]]
				; CHECK-LLC-NOT: b [[LOOP_EXIT:\.LBB[0-9._]+]]
				; CHECK-LLC: [[LOOP_EXIT:\.LBB[0-9_]+]]:

	define i32 @do_inc2(i32 %n) {			define i32 @do_inc2(i32 %n) {
	entry:			entry:
	%cmp7 = icmp sgt i32 %n, 0			%cmp7 = icmp sgt i32 %n, 0
	br i1 %cmp7, label %while.body.lr.ph, label %while.end			br i1 %cmp7, label %while.body.lr.ph, label %while.end

	while.body.lr.ph:			while.body.lr.ph:
	%0 = load i32, i32* @g, align 4			%0 = load i32, i32* @g, align 4
	br label %while.body			br label %while.body
	Show All 27 Lines
	; CHECK: [[COUNT:%[^ ]+]] = add nuw i32 [[HALVE]], 1			; CHECK: [[COUNT:%[^ ]+]] = add nuw i32 [[HALVE]], 1
	; CHECK: call void @llvm.set.loop.iterations.i32(i32 [[COUNT]])			; CHECK: call void @llvm.set.loop.iterations.i32(i32 [[COUNT]])
	; CHECK-NEXT: br label %while.body			; CHECK-NEXT: br label %while.body

	; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[COUNT]], %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]			; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[COUNT]], %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
	; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)			; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)
	; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0			; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
	; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit			; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit

				; CHECK-LLC: do_dec2
				; CHECK-LLC-NOT: mov lr,
				; CHECK-LLC: dls lr,
				; CHECK-LLC-NOT: mov lr,
				; CHECK-LLC: [[LOOP_HEADER:\.LBB[0-9_]+]]:
				; CHECK-LLC: le lr, [[LOOP_HEADER]]
				; CHECK-LLC-NOT: b .
				; CHECK-LLC: @ %while.end
	define i32 @do_dec2(i32 %n) {			define i32 @do_dec2(i32 %n) {
	entry:			entry:
	%cmp6 = icmp sgt i32 %n, 0			%cmp6 = icmp sgt i32 %n, 0
	br i1 %cmp6, label %while.body.lr.ph, label %while.end			br i1 %cmp6, label %while.body.lr.ph, label %while.end

	while.body.lr.ph:			while.body.lr.ph:
	%0 = load i32, i32* @g, align 4			%0 = load i32, i32* @g, align 4
	br label %while.body			br label %while.body
	Show All 18 Lines

test/Transforms/HardwareLoops/ARM/size-limit.mir

This file was added.

				# RUN: llc -mtriple=armv8.1m.main -run-pass=arm-low-overhead-loops %s -o - \| FileCheck %s
				# CHECK: entry:
				# CHECK: $lr = t2DLS
				# CHECK: for.body:
				# CHECK: $lr = t2LEUpdate renamable $lr

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-unknown-unknown"

				; Function Attrs: norecurse nounwind
				define dso_local arm_aapcscc void @size_limit(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i32 %N) local_unnamed_addr #0 {
				entry:
				%cmp8 = icmp eq i32 %N, 0
				br i1 %cmp8, label %for.cond.cleanup, label %for.body.preheader

				for.body.preheader: ; preds = %entry
				%scevgep = getelementptr i32, i32* %a, i32 -1
				%scevgep4 = getelementptr i32, i32* %c, i32 -1
				%scevgep8 = getelementptr i32, i32* %b, i32 -1
				call void @llvm.set.loop.iterations.i32(i32 %N)
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv9 = phi i32* [ %scevgep8, %for.body.preheader ], [ %scevgep10, %for.body ]
				%lsr.iv5 = phi i32* [ %scevgep4, %for.body.preheader ], [ %scevgep6, %for.body ]
				%lsr.iv1 = phi i32* [ %scevgep, %for.body.preheader ], [ %scevgep2, %for.body ]
				%0 = phi i32 [ %N, %for.body.preheader ], [ %3, %for.body ]
				%size = call i32 @llvm.arm.space(i32 4072, i32 undef)
				%scevgep11 = getelementptr i32, i32* %lsr.iv9, i32 1
				%1 = load i32, i32* %scevgep11, align 4, !tbaa !3
				%scevgep7 = getelementptr i32, i32* %lsr.iv5, i32 1
				%2 = load i32, i32* %scevgep7, align 4, !tbaa !3
				%mul = mul nsw i32 %2, %1
				%scevgep3 = getelementptr i32, i32* %lsr.iv1, i32 1
				store i32 %mul, i32* %scevgep3, align 4, !tbaa !3
				%scevgep2 = getelementptr i32, i32* %lsr.iv1, i32 1
				%scevgep6 = getelementptr i32, i32* %lsr.iv5, i32 1
				%scevgep10 = getelementptr i32, i32* %lsr.iv9, i32 1
				%3 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %0, i32 1)
				%4 = icmp ne i32 %3, 0
				br i1 %4, label %for.body, label %for.cond.cleanup
				}

				; Function Attrs: nounwind
				declare i32 @llvm.arm.space(i32, i32) #1

				; Function Attrs: noduplicate nounwind
				declare void @llvm.set.loop.iterations.i32(i32) #2

				; Function Attrs: noduplicate nounwind
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #2

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #1

				attributes #0 = { norecurse nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+armv8.1-m.main,+hwdiv,+ras,+soft-float,+strict-align,+thumb-mode,-crypto,-d32,-dotprod,-fp-armv8,-fp-armv8d16,-fp-armv8d16sp,-fp-armv8sp,-fp16,-fp16fml,-fp64,-fpregs,-fullfp16,-neon,-vfp2,-vfp2d16,-vfp2d16sp,-vfp2sp,-vfp3,-vfp3d16,-vfp3d16sp,-vfp3sp,-vfp4,-vfp4d16,-vfp4d16sp,-vfp4sp" "unsafe-fp-math"="false" "use-soft-float"="true" }
				attributes #1 = { nounwind }
				attributes #2 = { noduplicate nounwind }

				!llvm.module.flags = !{!0, !1}
				!llvm.ident = !{!2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 1, !"min_enum_size", i32 4}
				!2 = !{!"clang version 9.0.0 (http://llvm.org/git/clang.git a9c7c0fc5d468f3d18a5c6beb697ab0d5be2ff4c) (http://llvm.org/git/llvm.git f34bff0c141a04a5182d57e2cfb1e4bc582c81b0)"}
				!3 = !{!4, !4, i64 0}
				!4 = !{!"int", !5, i64 0}
				!5 = !{!"omnipotent char", !6, i64 0}
				!6 = !{!"Simple C/C++ TBAA"}

				...
				---
				name: size_limit
				alignment: 1
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: false
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				- { reg: '$r2', virtual-reg: '' }
				- { reg: '$r3', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 8
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				successors: %bb.1(0x80000000)

				frame-setup tPUSH 14, $noreg, $r7, killed $lr, implicit-def $sp, implicit $sp
				frame-setup CFI_INSTRUCTION def_cfa_offset 8
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				$r7 = frame-setup tMOVr $sp, 14, $noreg
				frame-setup CFI_INSTRUCTION def_cfa_register $r7
				tCMPi8 $r3, 0, 14, $noreg, implicit-def $cpsr
				t2IT 0, 8, implicit-def $itstate
				tPOP_RET 0, killed $cpsr, def $r7, def $pc, implicit killed $itstate
				renamable $r1, dead $cpsr = tSUBi8 killed renamable $r1, 4, 14, $noreg
				renamable $r2, dead $cpsr = tSUBi8 killed renamable $r2, 4, 14, $noreg
				renamable $r0, dead $cpsr = tSUBi8 killed renamable $r0, 4, 14, $noreg
				$lr = tMOVr $r3, 14, $noreg
				t2DoLoopStart killed $r3

				bb.1.for.body:
				successors: %bb.1(0x7c000000), %bb.2(0x04000000)

				dead renamable $r3 = SPACE 4072, undef renamable $r0
				renamable $r12, renamable $r1 = t2LDR_PRE killed renamable $r1, 4, 14, $noreg :: (load 4 from %ir.scevgep11, !tbaa !3)
				renamable $r3, renamable $r2 = t2LDR_PRE killed renamable $r2, 4, 14, $noreg :: (load 4 from %ir.scevgep7, !tbaa !3)
				renamable $r3 = nsw t2MUL killed renamable $r3, killed renamable $r12, 14, $noreg
				early-clobber renamable $r0 = t2STR_PRE killed renamable $r3, killed renamable $r0, 4, 14, $noreg :: (store 4 into %ir.scevgep3, !tbaa !3)
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2LoopEnd renamable $lr, %bb.1
				tB %bb.2, 14, $noreg

				bb.2.for.cond.cleanup:
				tPOP_RET 14, $noreg, def $r7, def $pc

				...

test/Transforms/HardwareLoops/ARM/structure.ll

; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s		; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false %s -S -o - \| FileCheck %s
		; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -disable-arm-loloops=false %s -o - \| FileCheck %s --check-prefix=CHECK-LLC
		; RUN: opt -mtriple=thumbv8.1m.main -loop-unroll -unroll-remainder=false -S < %s \| llc -mtriple=thumbv8.1m.main -disable-arm-loloops=false \| FileCheck %s --check-prefix=CHECK-UNROLL

; CHECK-LABEL: early_exit		; CHECK-LABEL: early_exit
; CHECK-NOT: llvm.set.loop.iterations		; CHECK-NOT: llvm.set.loop.iterations
; CHECK-NOT: llvm.loop.decrement		; CHECK-NOT: llvm.loop.decrement
define i32 @early_exit(i32* nocapture readonly %a, i32 %max, i32 %n) {		define i32 @early_exit(i32* nocapture readonly %a, i32 %max, i32 %n) {
entry:		entry:
br label %do.body		br label %do.body

Show All 28 Lines

; CHECK: [[REM:%[^ ]+]] = phi i32 [ %N, %while.cond1.preheader.us ], [ [[LOOP_DEC:%[^ ]+]], %while.body3.us ]		; CHECK: [[REM:%[^ ]+]] = phi i32 [ %N, %while.cond1.preheader.us ], [ [[LOOP_DEC:%[^ ]+]], %while.body3.us ]
; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)		; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[REM]], i32 1)
; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0		; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
; CHECK: br i1 [[CMP]], label %while.body3.us, label %while.cond1.while.end_crit_edge.us		; CHECK: br i1 [[CMP]], label %while.body3.us, label %while.cond1.while.end_crit_edge.us

; CHECK-NOT: [[LOOP_DEC1:%[^ ]+]] = call i1 @llvm.loop.decrement.i32(i32 1)		; CHECK-NOT: [[LOOP_DEC1:%[^ ]+]] = call i1 @llvm.loop.decrement.i32(i32 1)
; CHECK-NOT: br i1 [[LOOP_DEC1]], label %while.cond1.preheader.us, label %while.end7		; CHECK-NOT: br i1 [[LOOP_DEC1]], label %while.cond1.preheader.us, label %while.end7

		; CHECK-LLC: nested:
		; CHECK-LLC-NOT: mov lr, r1
		; CHECK-LLC: dls lr, r1
		; CHECK-LLC-NOT: mov lr, r1
		; CHECK-LLC: [[LOOP_HEADER:\.LBB[0-9._]+]]:
		; CHECK-LLC: le lr, [[LOOP_HEADER]]
		; CHECK-LLC-NOT: b [[LOOP_EXIT:\.LBB[0-9._]+]]
		; CHECK-LLC: [[LOOP_EXIT:\.LBB[0-9._]+]]:

define void @nested(i32* nocapture %A, i32 %N) {		define void @nested(i32* nocapture %A, i32 %N) {
entry:		entry:
%cmp20 = icmp eq i32 %N, 0		%cmp20 = icmp eq i32 %N, 0
br i1 %cmp20, label %while.end7, label %while.cond1.preheader.us		br i1 %cmp20, label %while.end7, label %while.cond1.preheader.us

while.cond1.preheader.us:		while.cond1.preheader.us:
%i.021.us = phi i32 [ %inc6.us, %while.cond1.while.end_crit_edge.us ], [ 0, %entry ]		%i.021.us = phi i32 [ %inc6.us, %while.cond1.while.end_crit_edge.us ], [ 0, %entry ]
%mul.us = mul i32 %i.021.us, %N		%mul.us = mul i32 %i.021.us, %N
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	latch.1:
store i32 %ld, i32* %addr.b		store i32 %ld, i32* %addr.b
%cmp.2 = icmp ult i32 %count.next, %N		%cmp.2 = icmp ult i32 %count.next, %N
br i1 %cmp.2, label %header, label %latch.1		br i1 %cmp.2, label %header, label %latch.1

exit:		exit:
ret void		ret void
}		}

		; CHECK-LABEL: search
		; CHECK: for.body.preheader:
		; CHECK: call void @llvm.set.loop.iterations.i32(i32 %N)
		; CHECK: br label %for.body
		; CHECK: for.body:
		; CHECK: for.inc:
		; CHECK: [[LOOP_DEC:%[^ ]+]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32
		; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
		; CHECK: br i1 [[CMP]], label %for.body, label %for.cond.cleanup
		define i32 @search(i8* nocapture readonly %c, i32 %N) {
		entry:
		%cmp11 = icmp eq i32 %N, 0
		br i1 %cmp11, label %for.cond.cleanup, label %for.body

		for.cond.cleanup:
		%found.0.lcssa = phi i32 [ 0, %entry ], [ %found.1, %for.inc ]
		%spaces.0.lcssa = phi i32 [ 0, %entry ], [ %spaces.1, %for.inc ]
		%sub = sub nsw i32 %found.0.lcssa, %spaces.0.lcssa
		ret i32 %sub

		for.body:
		%i.014 = phi i32 [ %inc3, %for.inc ], [ 0, %entry ]
		%spaces.013 = phi i32 [ %spaces.1, %for.inc ], [ 0, %entry ]
		%found.012 = phi i32 [ %found.1, %for.inc ], [ 0, %entry ]
		%arrayidx = getelementptr inbounds i8, i8* %c, i32 %i.014
		%0 = load i8, i8* %arrayidx, align 1
		switch i8 %0, label %for.inc [
		i8 108, label %sw.bb
		i8 111, label %sw.bb
		i8 112, label %sw.bb
		i8 32, label %sw.bb1
		]

		sw.bb: ; preds = %for.body, %for.body, %for.body
		%inc = add nsw i32 %found.012, 1
		br label %for.inc

		sw.bb1: ; preds = %for.body
		%inc2 = add nsw i32 %spaces.013, 1
		br label %for.inc

		for.inc: ; preds = %sw.bb, %sw.bb1, %for.body
		%found.1 = phi i32 [ %found.012, %for.body ], [ %found.012, %sw.bb1 ], [ %inc, %sw.bb ]
		%spaces.1 = phi i32 [ %spaces.013, %for.body ], [ %inc2, %sw.bb1 ], [ %spaces.013, %sw.bb ]
		%inc3 = add nuw i32 %i.014, 1
		%exitcond = icmp eq i32 %inc3, %N
		br i1 %exitcond, label %for.cond.cleanup, label %for.body
		}

		; CHECK-LABEL: unroll_inc_int
		; CHECK: call void @llvm.set.loop.iterations.i32(i32 %N)
		; CHECK: call i32 @llvm.loop.decrement.reg.i32.i32.i32(

		; TODO: We should be able to support the unrolled loop body.
		; CHECK-UNROLL-LABEL: unroll_inc_int:
		; CHECK-UNROLL: [[PREHEADER:.LBB[0-9_]+]]: @ %for.body.preheader
		; CHECK-UNROLL-NOT: dls
		; CHECK-UNROLL: [[LOOP:.LBB[0-9_]+]]: @ %for.body
		; CHECK-UNROLL-NOT: le lr, [[LOOP]]
		; CHECK-UNROLL: bne [[LOOP]]
		; CHECK-UNROLL: %for.body.epil.preheader
		; CHECK-UNROLL: dls
		; CHECK-UNROLL: %for.body.epil
		; CHECK-UNROLL: le

		define void @unroll_inc_int(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i32 %N) {
		entry:
		%cmp8 = icmp sgt i32 %N, 0
		br i1 %cmp8, label %for.body, label %for.cond.cleanup

		for.cond.cleanup:
		ret void

		for.body:
		%i.09 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
		%arrayidx = getelementptr inbounds i32, i32* %b, i32 %i.09
		%0 = load i32, i32* %arrayidx, align 4
		%arrayidx1 = getelementptr inbounds i32, i32* %c, i32 %i.09
		%1 = load i32, i32* %arrayidx1, align 4
		%mul = mul nsw i32 %1, %0
		%arrayidx2 = getelementptr inbounds i32, i32* %a, i32 %i.09
		store i32 %mul, i32* %arrayidx2, align 4
		%inc = add nuw nsw i32 %i.09, 1
		%exitcond = icmp eq i32 %inc, %N
		br i1 %exitcond, label %for.cond.cleanup, label %for.body
		}

		; CHECK-LABEL: unroll_inc_unsigned
		; CHECK: call void @llvm.set.loop.iterations.i32(i32 %N)
		; CHECK: call i32 @llvm.loop.decrement.reg.i32.i32.i32(

		; CHECK-LLC-LABEL: unroll_inc_unsigned:
		; CHECK-LLC: dls lr, [[COUNT:r[0-9]+]]
		; CHECK-LLC: le lr

		; TODO: We should be able to support the unrolled loop body.
		; CHECK-UNROLL-LABEL: unroll_inc_unsigned:
		; CHECK-UNROLL: [[PREHEADER:.LBB[0-9_]+]]: @ %for.body.preheader
		; CHECK-UNROLL-NOT: dls
		; CHECK-UNROLL: [[LOOP:.LBB[0-9_]+]]: @ %for.body
		; CHECK-UNROLL-NOT: le lr, [[LOOP]]
		; CHECK-UNROLL: bne [[LOOP]]
		; CHECK-UNROLL: %for.body.epil.preheader
		; CHECK-UNROLL: dls
		; CHECK-UNROLL: %for.body.epil
		; CHECK-UNROLL: le
		define void @unroll_inc_unsigned(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i32 %N) {
		entry:
		%cmp8 = icmp eq i32 %N, 0
		br i1 %cmp8, label %for.cond.cleanup, label %for.body

		for.cond.cleanup:
		ret void

		for.body:
		%i.09 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
		%arrayidx = getelementptr inbounds i32, i32* %b, i32 %i.09
		%0 = load i32, i32* %arrayidx, align 4
		%arrayidx1 = getelementptr inbounds i32, i32* %c, i32 %i.09
		%1 = load i32, i32* %arrayidx1, align 4
		%mul = mul nsw i32 %1, %0
		%arrayidx2 = getelementptr inbounds i32, i32* %a, i32 %i.09
		store i32 %mul, i32* %arrayidx2, align 4
		%inc = add nuw i32 %i.09, 1
		%exitcond = icmp eq i32 %inc, %N
		br i1 %exitcond, label %for.cond.cleanup, label %for.body
		}

		; CHECK-LABEL: unroll_dec_int
		; CHECK: call void @llvm.set.loop.iterations.i32(i32 %N)
		; CHECK: call i32 @llvm.loop.decrement.reg.i32.i32.i32(

		; TODO: An unnecessary register is being held to hold COUNT, lr should just
		; be used instead.
		; CHECK-LLC-LABEL: unroll_dec_int:
		; CHECK-LLC: dls lr, [[COUNT:r[0-9]+]]
		; CHECK-LLC: subs [[COUNT]], #1
		; CHECK-LLC: le lr

		; CHECK-UNROLL-LABEL: unroll_dec_int
		; CHECK-UNROLL: dls lr
		; CHECK-UNROLL: le lr
		; CHECK-UNROLL: dls lr
		; CHECK-UNROLL: le lr
		define void @unroll_dec_int(i32* nocapture %a, i32* nocapture readonly %b, i32* nocapture readonly %c, i32 %N) {
		entry:
		%cmp8 = icmp sgt i32 %N, 0
		br i1 %cmp8, label %for.body, label %for.cond.cleanup

		for.cond.cleanup:
		ret void

		for.body:
		%i.09 = phi i32 [ %dec, %for.body ], [ %N, %entry ]
		%arrayidx = getelementptr inbounds i32, i32* %b, i32 %i.09
		%0 = load i32, i32* %arrayidx, align 4
		%arrayidx1 = getelementptr inbounds i32, i32* %c, i32 %i.09
		%1 = load i32, i32* %arrayidx1, align 4
		%mul = mul nsw i32 %1, %0
		%arrayidx2 = getelementptr inbounds i32, i32* %a, i32 %i.09
		store i32 %mul, i32* %arrayidx2, align 4
		%dec = add nsw i32 %i.09, -1
		%cmp = icmp sgt i32 %dec, 0
		br i1 %cmp, label %for.body, label %for.cond.cleanup
		}

declare void @llvm.set.loop.iterations.i32(i32) #0		declare void @llvm.set.loop.iterations.i32(i32) #0
declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #0		declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #0

test/Transforms/HardwareLoops/ARM/switch.mir

This file was added.

				# RUN: llc -mtriple=thumbv8.1m.main %s -run-pass=arm-low-overhead-loops -o -
				# CHECK: bb.1.for.body.preheader:
				# CHECK: $lr = t2DLS
				# CHECK-NOT: t2LoopDec
				# CHECK: bb.6.for.inc:
				# CHECK: $lr = t2LEUpdate renamable $lr, %bb.2

				--- \|
				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-unknown-unknown"

				; Function Attrs: norecurse nounwind readonly
				define dso_local arm_aapcscc i32 @search(i8* nocapture readonly %c, i32 %N) local_unnamed_addr #0 {
				entry:
				%cmp11 = icmp eq i32 %N, 0
				br i1 %cmp11, label %for.cond.cleanup, label %for.body.preheader

				for.body.preheader:
				call void @llvm.set.loop.iterations.i32(i32 %N)
				br label %for.body

				for.cond.cleanup:
				%found.0.lcssa = phi i32 [ 0, %entry ], [ %found.1, %for.inc ]
				%spaces.0.lcssa = phi i32 [ 0, %entry ], [ %spaces.1, %for.inc ]
				%sub = sub nsw i32 %found.0.lcssa, %spaces.0.lcssa
				ret i32 %sub

				for.body:
				%lsr.iv1 = phi i8* [ %c, %for.body.preheader ], [ %scevgep, %for.inc ]
				%spaces.013 = phi i32 [ %spaces.1, %for.inc ], [ 0, %for.body.preheader ]
				%found.012 = phi i32 [ %found.1, %for.inc ], [ 0, %for.body.preheader ]
				%0 = phi i32 [ %N, %for.body.preheader ], [ %3, %for.inc ]
				%1 = load i8, i8* %lsr.iv1, align 1
				%2 = zext i8 %1 to i32
				switch i32 %2, label %for.inc [
				i32 108, label %sw.bb
				i32 111, label %sw.bb
				i32 112, label %sw.bb
				i32 32, label %sw.bb1
				]

				sw.bb:
				%inc = add nsw i32 %found.012, 1
				br label %for.inc

				sw.bb1:
				%inc2 = add nsw i32 %spaces.013, 1
				br label %for.inc

				for.inc:
				%found.1 = phi i32 [ %found.012, %for.body ], [ %found.012, %sw.bb1 ], [ %inc, %sw.bb ]
				%spaces.1 = phi i32 [ %spaces.013, %for.body ], [ %inc2, %sw.bb1 ], [ %spaces.013, %sw.bb ]
				%scevgep = getelementptr i8, i8* %lsr.iv1, i32 1
				%3 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %0, i32 1)
				%4 = icmp ne i32 %3, 0
				br i1 %4, label %for.body, label %for.cond.cleanup
				}

				declare void @llvm.set.loop.iterations.i32(i32) #1
				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #1
				declare void @llvm.stackprotector(i8, i8*) #2

				attributes #0 = { norecurse nounwind readonly "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+armv8.1-m.main,+hwdiv,+ras,+soft-float,+strict-align,+thumb-mode,-crypto,-d32,-dotprod,-fp-armv8,-fp-armv8d16,-fp-armv8d16sp,-fp-armv8sp,-fp16,-fp16fml,-fp64,-fpregs,-fullfp16,-neon,-vfp2,-vfp2d16,-vfp2d16sp,-vfp2sp,-vfp3,-vfp3d16,-vfp3d16sp,-vfp3sp,-vfp4,-vfp4d16,-vfp4d16sp,-vfp4sp" "unsafe-fp-math"="false" "use-soft-float"="true" }
				attributes #1 = { noduplicate nounwind }
				attributes #2 = { nounwind }

				...
				---
				name: search
				alignment: 1
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers: []
				liveins:
				- { reg: '$r0', virtual-reg: '' }
				- { reg: '$r1', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 16
				offsetAdjustment: -8
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack:
				- { id: 0, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 2, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r6', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				- { id: 3, name: '', type: spill-slot, offset: -16, size: 4, alignment: 4,
				stack-id: default, callee-saved-register: '$r4', callee-saved-restored: true,
				debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
				constants: []
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				successors: %bb.1(0x30000000), %bb.3(0x50000000)
				liveins: $r0, $r1, $r4, $r6, $lr

				$sp = frame-setup t2STMDB_UPD $sp, 14, $noreg, killed $r4, killed $r6, $r7, killed $lr
				frame-setup CFI_INSTRUCTION def_cfa_offset 16
				frame-setup CFI_INSTRUCTION offset $lr, -4
				frame-setup CFI_INSTRUCTION offset $r7, -8
				frame-setup CFI_INSTRUCTION offset $r6, -12
				frame-setup CFI_INSTRUCTION offset $r4, -16
				$r7 = frame-setup t2ADDri $sp, 8, 14, $noreg, $noreg
				frame-setup CFI_INSTRUCTION def_cfa $r7, 8
				t2CMPri $r1, 0, 14, $noreg, implicit-def $cpsr
				t2Bcc %bb.1, 0, killed $cpsr

				bb.3.for.body.preheader:
				successors: %bb.4(0x80000000)
				liveins: $r0, $r1

				$lr = tMOVr $r1, 14, $noreg
				t2DoLoopStart killed $r1
				renamable $r1 = t2MOVi 0, 14, $noreg, $noreg
				renamable $r12 = t2MOVi 1, 14, $noreg, $noreg
				renamable $r2 = t2MOVi 0, 14, $noreg, $noreg

				bb.4.for.body:
				successors: %bb.5(0x26666665), %bb.6(0x5999999b)
				liveins: $lr, $r0, $r1, $r2, $r12

				renamable $r3 = t2LDRBi12 renamable $r0, 0, 14, $noreg :: (load 1 from %ir.lsr.iv1)
				renamable $r4 = t2SUBri renamable $r3, 108, 14, $noreg, $noreg
				renamable $lr = t2LoopDec killed renamable $lr, 1
				t2CMPri renamable $r4, 4, 14, $noreg, implicit-def $cpsr
				t2Bcc %bb.5, 8, killed $cpsr

				bb.6.for.body:
				successors: %bb.7(0x6db6db6e), %bb.5(0x12492492)
				liveins: $lr, $r0, $r1, $r2, $r3, $r4, $r12

				renamable $r4 = t2LSLrr renamable $r12, killed renamable $r4, 14, $noreg, $noreg
				t2TSTri killed renamable $r4, 25, 14, $noreg, implicit-def $cpsr
				t2Bcc %bb.5, 0, killed $cpsr

				bb.7.sw.bb:
				successors: %bb.8(0x80000000)
				liveins: $lr, $r0, $r1, $r2, $r12

				renamable $r2 = nsw t2ADDri killed renamable $r2, 1, 14, $noreg, $noreg
				t2B %bb.8, 14, $noreg

				bb.5.for.body:
				successors: %bb.8(0x80000000)
				liveins: $lr, $r0, $r1, $r2, $r3, $r12

				t2CMPri killed renamable $r3, 32, 14, $noreg, implicit-def $cpsr
				BUNDLE implicit-def dead $itstate, implicit-def $r1, implicit killed $r1, implicit killed $cpsr {
				t2IT 0, 8, implicit-def $itstate
				renamable $r1 = nsw t2ADDri killed renamable $r1, 1, 0, killed $cpsr, $noreg, implicit $r1, implicit internal killed $itstate
				}

				bb.8.for.inc:
				successors: %bb.4(0x7c000000), %bb.2(0x04000000)
				liveins: $lr, $r0, $r1, $r2, $r12

				renamable $r0 = t2ADDri killed renamable $r0, 1, 14, $noreg, $noreg
				t2LoopEnd renamable $lr, %bb.4
				t2B %bb.2, 14, $noreg

				bb.2.for.cond.cleanup:
				liveins: $r1, $r2

				renamable $r0 = nsw t2SUBrr killed renamable $r2, killed renamable $r1, 14, $noreg, $noreg
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r4, def $r6, def $r7, def $pc, implicit killed $r0

				bb.1:
				renamable $r2 = t2MOVi 0, 14, $noreg, $noreg
				renamable $r1 = t2MOVi 0, 14, $noreg, $noreg
				renamable $r0 = nsw t2SUBrr killed renamable $r2, killed renamable $r1, 14, $noreg, $noreg
				$sp = t2LDMIA_RET $sp, 14, $noreg, def $r4, def $r6, def $r7, def $pc, implicit killed $r0

				...

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] DLS/LE low-overhead loop code generationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 206171

lib/Target/ARM/ARM.h

lib/Target/ARM/ARMISelDAGToDAG.cpp

lib/Target/ARM/ARMInstrThumb2.td

lib/Target/ARM/ARMLowOverheadLoops.cpp

lib/Target/ARM/ARMTargetMachine.cpp

lib/Target/ARM/CMakeLists.txt

test/CodeGen/ARM/O3-pipeline.ll

test/Transforms/HardwareLoops/ARM/calls.ll

test/Transforms/HardwareLoops/ARM/cond-mov.mir

test/Transforms/HardwareLoops/ARM/massive.mir

test/Transforms/HardwareLoops/ARM/revert-after-call.mir

test/Transforms/HardwareLoops/ARM/revert-after-spill.mir

test/Transforms/HardwareLoops/ARM/simple-do.ll

test/Transforms/HardwareLoops/ARM/size-limit.mir

test/Transforms/HardwareLoops/ARM/structure.ll

test/Transforms/HardwareLoops/ARM/switch.mir

[ARM] DLS/LE low-overhead loop code generation
ClosedPublic