This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/1
TargetInstrInfo.h
-
lib/
-
CodeGen/
5/5
CalcSpillWeights.cpp
2
MachineVerifier.cpp
2/4
PHIElimination.cpp
-
Target/ARM/
-
ARM/
-
ARMBaseInstrInfo.h
-
ARMBaseInstrInfo.cpp
1/3
ARMInstrThumb2.td
4/5
ARMLowOverheadLoops.cpp
-
MVEVPTOptimisationsPass.cpp
-
test/CodeGen/Thumb2/
-
CodeGen/
-
Thumb2/
-
LowOverheadLoops/
-
count_dominates_start.mir
-
fast-fp-loops.ll
-
minloop.ll
-
mve-float-loops.ll
-
mve-float32regloops.ll
-
mve-postinc-dct.ll
-
mve-postinc-lsr.ll
-
mve-satmul-loops.ll
-
mve-vldshuffle.ll

Differential D91358

[ARM][RegAlloc] Add t2LoopEndDec
ClosedPublic

Authored by dmgreen on Nov 12 2020, 8:04 AM.

Download Raw Diff

Details

Reviewers

efriedma
MatzeB
qcolombet
SjoerdMeijer
samparker

Commits

rG0447f3508f02: [ARM][RegAlloc] Add t2LoopEndDec

Summary

We currently have problems with the way that low overhead loops are specified, with LR sometimes being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour.

This might obviously be a little contentious, so I would like to get opinions from people who know what they are talking about. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Nov 12 2020, 8:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2020, 8:04 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls and 2 others. · View Herald Transcript

dmgreen requested review of this revision.Nov 12 2020, 8:04 AM

dmgreen added reviewers: MatzeB, qcolombet, SjoerdMeijer, samparker.

dmgreen added a parent revision: D91267: [ARM] Remove copies from low overhead phi inductions..

dmgreen added a child revision: D91663: [ARM] Disable WLSTP loops.Nov 17 2020, 3:06 PM

dmgreen mentioned this in D91267: [ARM] Remove copies from low overhead phi inductions..Nov 17 2020, 3:15 PM

This might obviously be a little contentious, so I would like to get opinions from people who know what they are talking about.

The ARM changes look very reasonable to me. To be a bit more precise, yes, this is definitely something we want, so agreed on the motivation and the (ARM) implementation, showing nice codegen changes.

But value producing terminators in regalloc is not my forte, so indeed best if someone approves that part.

Find inline some/mostly nits.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
351	typo: trerminator
llvm/lib/CodeGen/CalcSpillWeights.cpp
227	Nit: guess you don't really need to check isTerminator as that will be checked in isUnspillableTerminator?
llvm/lib/Target/ARM/ARMInstrThumb2.td
5451	bikeshedding names: how about `t2LoopDecEnd` to more keep the semantics this is a decrement + end, in that order?
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1302–1308	It wasn't immediately clear to me why we need this now.
1365	nit: perhaps an assert that MI is a loop end?
1632	nit: how about `RevertLoopEnd(LoLoop.End, RevertLoopDec(LoLoop.Dec));` to get rid of the curly brackets around the if-else here?
llvm/lib/Target/ARM/MVETPAndVPTOptimisationsPass.cpp
39 ↗	(On Diff #304823)	nit: `arm-enable-mergeenddec` is a bit difficult to read. Since it is relative long already, how about making it even a bit longer, something like: `arm-enable-merge-loopenddec`?
40 ↗	(On Diff #304823)	typo? mergeing?
243 ↗	(On Diff #304823)	nit: Replace the loop dec into the loop end as a single instruction -> Replace the loop dec and loop end as a single instruction or something similar...

Thanks for taking a look.

llvm/lib/CodeGen/CalcSpillWeights.cpp
227	I was trying to make it explicit, that it needed to be a terminator that defined the value and this shouldn't be used for anything else. I can change that if you think it's better to remove it. It is currently just trying to be conservative.
llvm/lib/Target/ARM/ARMInstrThumb2.td
5451	Yeah, It does both together. I have no strong opinion on the order in the name. I guess I chose t2LoopEndDec as I felt that the major thing it did was behave as a loop end (like an LE instruction). The Dec somewhat of a side effect of that. Can change it if you think it's better the other way. Let me know.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1302–1308	I updated the comment.
llvm/lib/Target/ARM/MVETPAndVPTOptimisationsPass.cpp
39 ↗	(On Diff #304823)	Thanks. This did feel like a particularly ugly option name.

Adding @mtrofin, @bjope, @arsenm for the unspillable terminator change.

llvm/lib/CodeGen/CalcSpillWeights.cpp
227	Not a big deal, whatever you prefer.
llvm/lib/Target/ARM/ARMInstrThumb2.td
5451	No strong opinion on this, I was bikeshedding anyway.

mtrofin added inline comments.Nov 25 2020, 9:46 AM

llvm/lib/CodeGen/CalcSpillWeights.cpp
227	from a readability perspective, isUnspillableTerminator suggests "isTerminator && isUnspillable". How about having isUnspillableTerminator not virtual and doing this: return isTerminator() && isUnspillableTerminatorImpl() where isUnspillableTerminatorImpl is protected and virtual?
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1375	Nit: addReg(MCRegister::NoRegister) is probably more readable.

bjope added inline comments.Nov 25 2020, 1:05 PM

llvm/lib/CodeGen/PHIElimination.cpp
452	This is only valid if there are no other uses of SrcReg (no anywhere else, and not in the current MBB that we are eliminating PHI nodes inside, such as another PHI node in MBB). Haven't analyzed it more closely than that. But I don't know if it can be generally guaranteed that the def in an UnspillableTerminator only has a single use. Is such a rule guaranteed here (or even feasible)?

Thanks for the comments. Added a isUnspillableTerminatorImpl.

llvm/lib/CodeGen/CalcSpillWeights.cpp
227	Sure, sounds good to me.
llvm/lib/CodeGen/PHIElimination.cpp
452	Yeah. We are being very careful! We are only using these for hardware loop instructions that we control the insertion of and are careful to only do the transform to produce the instruction (t2LoopEndDec in this case) when it has a single remaining user. CheckUsers in MVETPAndVPTOptimisations::MergeLoopEnd does that. That's why this is very opt-in and it's trying to be careful to say "Use with care".

bjope added inline comments.Nov 27 2020, 6:05 AM

llvm/lib/CodeGen/PHIElimination.cpp
452	My concern was that this is not inside the target specific part of the backend. So if some other target wants to use the concept of "unspillable terminators" then there must be some clear rules what to expect. I think that having an assert that there is only one use would be appropriate here (to at least catch any violations to such a rule at this point where things would go wrong otherwise). But maybe the MachineVerifier should help out detect problems if an unspillable terminator def has more than one use as well (to catch such problems already in a pass breaking the rule)? And maybe we need to document what is expected related to "unspillable terminators" related to this restriction somewhere (not sure exactly where). I guess there could be lots of generic passes, but also target specific passes, that must adhere to such a rule if it is allowed to insert an unspillable terminator already at ISel. So I still don't see how to ensure that all passes between ISel and PHI-elmination are aware about that they aren't allowed to introduce new uses of the unspillable terminator def? Or how close to PHI-elimination must a pass introducing "unspillable terminators" be?

Add a machine verifier check and an extra assert.

llvm/lib/CodeGen/PHIElimination.cpp
452	I agree. I have added a check to the verifier and an assert in here. In our case we are adding these for hardware loop instructions soon before registry allocation, after all the other mir level optimization have happened. There's also some details about the instruction in isUnspillableTerminatorImpl.

Ping

Rebase

bjope added inline comments.Dec 7 2020, 12:24 PM

llvm/lib/CodeGen/MachineVerifier.cpp
1553	typ: /Checl/Check/
1559	Use lists in MRI also contain dbg-uses. So I think we want to use use_instr_nodbg here to only count non-dbg-uses. Also a bit unclear if zero uses should be valid or not (code is currently checking "at most one user", but code comments and the report says "should have a single user"). There is a hasOneDBGUser helper in MRI that could be used if there has to be one user.

Thanks. I think hasOneDBGUse would have probably worked in our use case, but checking use_instr_nodbg is a little more general.

@SjoerdMeijer : I've reviewed the non-target-specific changes now. And that part looks good to me now.

I don't know if the target-specific parts already has been reviewed (or if anything has changed lately). Can someone with ARM/Thumb2 knowledge can acknowledge that part and set "accept revision".

In D91358#2442675, @bjope wrote:

@SjoerdMeijer : I've reviewed the non-target-specific changes now. And that part looks good to me now.

I don't know if the target-specific parts already has been reviewed (or if anything has changed lately). Can someone with ARM/Thumb2 knowledge can acknowledge that part and set "accept revision".

Thanks for helping and reviewing!

I've commented on the ARM specific part, and that has been addressed. With your comments addressed too, this then LGTM.

This revision is now accepted and ready to land.Dec 9 2020, 7:34 AM

Thanks folks.

Closed by commit rG0447f3508f02: [ARM][RegAlloc] Add t2LoopEndDec (authored by dmgreen). · Explain WhyDec 10 2020, 4:14 AM

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG0447f3508f02: [ARM][RegAlloc] Add t2LoopEndDec.

foad mentioned this in D110939: [PHIElimination] Update LiveVariables after handling an unspillable terminator.Oct 1 2021, 7:51 AM

foad mentioned this in rGf65458df32f7: [PHIElimination] Update LiveVariables after handling an unspillable terminator.Oct 5 2021, 6:26 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetInstrInfo.h

17 lines

lib/

CodeGen/

CalcSpillWeights.cpp

8 lines

MachineVerifier.cpp

10 lines

PHIElimination.cpp

13 lines

Target/

ARM/

ARMBaseInstrInfo.h

4 lines

ARMBaseInstrInfo.cpp

3 lines

ARMInstrThumb2.td

4 lines

ARMLowOverheadLoops.cpp

74 lines

MVEVPTOptimisationsPass.cpp

40 lines

test/

CodeGen/

Thumb2/

LowOverheadLoops/

count_dominates_start.mir

3 lines

fast-fp-loops.ll

2 lines

minloop.ll

94 lines

mve-float-loops.ll

3 lines

mve-float32regloops.ll

2 lines

12 lines

2 lines

269 lines

3 lines

Diff 310836

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	public:
/// subregisters has no single offset.		/// subregisters has no single offset.
///		///
/// Targets with nontrivial bigendian implementations may need to override		/// Targets with nontrivial bigendian implementations may need to override
/// this, particularly to support spilled vector registers.		/// this, particularly to support spilled vector registers.
virtual bool getStackSlotRange(const TargetRegisterClass *RC, unsigned SubIdx,		virtual bool getStackSlotRange(const TargetRegisterClass *RC, unsigned SubIdx,
unsigned &Size, unsigned &Offset,		unsigned &Size, unsigned &Offset,
const MachineFunction &MF) const;		const MachineFunction &MF) const;

		/// Return true if the given instruction is terminator that is unspillable,
		SjoerdMeijerUnsubmitted Done Reply Inline Actions typo: trerminator SjoerdMeijer: typo: trerminator
		/// according to isUnspillableTerminatorImpl.
		bool isUnspillableTerminator(const MachineInstr *MI) const {
		return MI->isTerminator() && isUnspillableTerminatorImpl(MI);
		}

/// Returns the size in bytes of the specified MachineInstr, or ~0U		/// Returns the size in bytes of the specified MachineInstr, or ~0U
/// when this function is not implemented by a target.		/// when this function is not implemented by a target.
virtual unsigned getInstSizeInBytes(const MachineInstr &MI) const {		virtual unsigned getInstSizeInBytes(const MachineInstr &MI) const {
return ~0U;		return ~0U;
}		}

/// Return true if the instruction is as cheap as a move instruction.		/// Return true if the instruction is as cheap as a move instruction.
///		///
▲ Show 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	protected:
/// If the specific machine instruction is a instruction that moves/copies		/// If the specific machine instruction is a instruction that moves/copies
/// value from one register to another register return destination and source		/// value from one register to another register return destination and source
/// registers as machine operands.		/// registers as machine operands.
virtual Optional<DestSourcePair>		virtual Optional<DestSourcePair>
isCopyInstrImpl(const MachineInstr &MI) const {		isCopyInstrImpl(const MachineInstr &MI) const {
return None;		return None;
}		}

		/// Return true if the given terminator MI is not expected to spill. This
		/// sets the live interval as not spillable and adjusts phi node lowering to
		/// not introduce copies after the terminator. Use with care, these are
		/// currently used for hardware loop intrinsics in very controlled situations,
		/// created prior to registry allocation in loops that only have single phi
		/// users for the terminators value. They may run out of registers if not used
		/// carefully.
		virtual bool isUnspillableTerminatorImpl(const MachineInstr *MI) const {
		return false;
		}

public:		public:
/// If the specific machine instruction is a instruction that moves/copies		/// If the specific machine instruction is a instruction that moves/copies
/// value from one register to another register return destination and source		/// value from one register to another register return destination and source
/// registers as machine operands.		/// registers as machine operands.
/// For COPY-instruction the method naturally returns destination and source		/// For COPY-instruction the method naturally returns destination and source
/// registers as machine operands, for all other instructions the method calls		/// registers as machine operands, for all other instructions the method calls
/// target-dependent implementation.		/// target-dependent implementation.
Optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {		Optional<DestSourcePair> isCopyInstr(const MachineInstr &MI) const {
▲ Show 20 Lines • Show All 980 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CalcSpillWeights.cpp

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	float VirtRegAuxInfo::futureWeight(LiveInterval &LI, SlotIndex Start,
SlotIndex End) {		SlotIndex End) {
return weightCalcHelper(LI, &Start, &End);		return weightCalcHelper(LI, &Start, &End);
}		}

float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,		float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
SlotIndex *End) {		SlotIndex *End) {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
MachineBasicBlock *MBB = nullptr;		MachineBasicBlock *MBB = nullptr;
MachineLoop *Loop = nullptr;		MachineLoop *Loop = nullptr;
bool IsExiting = false;		bool IsExiting = false;
float TotalWeight = 0;		float TotalWeight = 0;
unsigned NumInstr = 0; // Number of instructions using LI		unsigned NumInstr = 0; // Number of instructions using LI
SmallPtrSet<MachineInstr *, 8> Visited;		SmallPtrSet<MachineInstr *, 8> Visited;

std::pair<Register, Register> TargetHint = MRI.getRegAllocationHint(LI.reg());		std::pair<Register, Register> TargetHint = MRI.getRegAllocationHint(LI.reg());
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (IsLocalSplitArtifact && ((SI < Start) \|\| (SI > End)))
continue;		continue;

NumInstr++;		NumInstr++;
if (MI->isIdentityCopy() \|\| MI->isImplicitDef())		if (MI->isIdentityCopy() \|\| MI->isImplicitDef())
continue;		continue;
if (!Visited.insert(MI).second)		if (!Visited.insert(MI).second)
continue;		continue;

		// For terminators that produce values, ask the backend if the register is
		// not spillable.
		if (TII.isUnspillableTerminator(MI) && MI->definesRegister(LI.reg())) {
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: guess you don't really need to check isTerminator as that will be checked in isUnspillableTerminator? SjoerdMeijer: Nit: guess you don't really need to check isTerminator as that will be checked in…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I was trying to make it explicit, that it needed to be a terminator that defined the value and this shouldn't be used for anything else. I can change that if you think it's better to remove it. It is currently just trying to be conservative. dmgreen: I was trying to make it explicit, that it needed to be a terminator that defined the value and…
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Not a big deal, whatever you prefer. SjoerdMeijer: Not a big deal, whatever you prefer.
		mtrofinUnsubmitted Done Reply Inline Actions from a readability perspective, isUnspillableTerminator suggests "isTerminator && isUnspillable". How about having isUnspillableTerminator not virtual and doing this: return isTerminator() && isUnspillableTerminatorImpl() where isUnspillableTerminatorImpl is protected and virtual? mtrofin: from a readability perspective, isUnspillableTerminator suggests "isTerminator &&…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Sure, sounds good to me. dmgreen: Sure, sounds good to me.
		LI.markNotSpillable();
		return -1.0f;
		}

float Weight = 1.0f;		float Weight = 1.0f;
if (IsSpillable) {		if (IsSpillable) {
// Get loop info for mi.		// Get loop info for mi.
if (MI->getParent() != MBB) {		if (MI->getParent() != MBB) {
MBB = MI->getParent();		MBB = MI->getParent();
Loop = Loops.getLoopFor(MBB);		Loop = Loops.getLoopFor(MBB);
IsExiting = Loop ? Loop->isLoopExiting(MBB) : false;		IsExiting = Loop ? Loop->isLoopExiting(MBB) : false;
}		}
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineVerifier.cpp

Show First 20 Lines • Show All 1,544 Lines • ▼ Show 20 Lines	if (FirstNonPHI)
report("Found PHI instruction after non-PHI", MI);		report("Found PHI instruction after non-PHI", MI);
} else if (FirstNonPHI == nullptr)		} else if (FirstNonPHI == nullptr)
FirstNonPHI = MI;		FirstNonPHI = MI;

// Check the tied operands.		// Check the tied operands.
if (MI->isInlineAsm())		if (MI->isInlineAsm())
verifyInlineAsm(MI);		verifyInlineAsm(MI);

		// Check that unspillable terminators define a reg and have at most one use.
		bjopeUnsubmitted Not Done Reply Inline Actions typ: /Checl/Check/ bjope: typ: /Checl/Check/
		if (TII->isUnspillableTerminator(MI)) {
		if (!MI->getOperand(0).isReg() \|\| !MI->getOperand(0).isDef())
		report("Unspillable Terminator does not define a reg", MI);
		Register Def = MI->getOperand(0).getReg();
		if (Def.isVirtual() &&
		std::distance(MRI->use_nodbg_begin(Def), MRI->use_nodbg_end()) > 1)
		bjopeUnsubmitted Not Done Reply Inline Actions Use lists in MRI also contain dbg-uses. So I think we want to use use_instr_nodbg here to only count non-dbg-uses. Also a bit unclear if zero uses should be valid or not (code is currently checking "at most one user", but code comments and the report says "should have a single user"). There is a hasOneDBGUser helper in MRI that could be used if there has to be one user. bjope: Use lists in MRI also contain dbg-uses. So I think we want to use use_instr_nodbg here to only…
		report("Unspillable Terminator expected to have at most one use!", MI);
		}

// A fully-formed DBG_VALUE must have a location. Ignore partially formed		// A fully-formed DBG_VALUE must have a location. Ignore partially formed
// DBG_VALUEs: these are convenient to use in tests, but should never get		// DBG_VALUEs: these are convenient to use in tests, but should never get
// generated.		// generated.
if (MI->isDebugValue() && MI->getNumOperands() == 4)		if (MI->isDebugValue() && MI->getNumOperands() == 4)
if (!MI->getDebugLoc())		if (!MI->getDebugLoc())
report("Missing DebugLoc for debug instruction", MI);		report("Missing DebugLoc for debug instruction", MI);

// Meta instructions should never be the subject of debug value tracking,		// Meta instructions should never be the subject of debug value tracking,
▲ Show 20 Lines • Show All 1,519 Lines • Show Last 20 Lines

llvm/lib/CodeGen/PHIElimination.cpp

Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	for (int i = NumSrcs - 1; i >= 0; --i) {
MachineBasicBlock &opBlock = MPhi->getOperand(i2+2).getMBB();		MachineBasicBlock &opBlock = MPhi->getOperand(i2+2).getMBB();

// Check to make sure we haven't already emitted the copy for this block.		// Check to make sure we haven't already emitted the copy for this block.
// This can happen because PHI nodes may have multiple entries for the same		// This can happen because PHI nodes may have multiple entries for the same
// basic block.		// basic block.
if (!MBBsInsertedInto.insert(&opBlock).second)		if (!MBBsInsertedInto.insert(&opBlock).second)
continue; // If the copy has already been emitted, we're done.		continue; // If the copy has already been emitted, we're done.

		MachineInstr *SrcRegDef = MRI->getVRegDef(SrcReg);
		if (SrcRegDef && TII->isUnspillableTerminator(SrcRegDef)) {
		assert(SrcRegDef->getOperand(0).isReg() &&
		SrcRegDef->getOperand(0).isDef() &&
		"Expected operand 0 to be a reg def!");
		// Now that the PHI's use has been removed (as the instruction was
		// removed) there should be no other uses of the SrcReg.
		assert(MRI->use_empty(SrcReg) &&
		bjopeUnsubmitted Not Done Reply Inline Actions This is only valid if there are no other uses of SrcReg (no anywhere else, and not in the current MBB that we are eliminating PHI nodes inside, such as another PHI node in MBB). Haven't analyzed it more closely than that. But I don't know if it can be generally guaranteed that the def in an UnspillableTerminator only has a single use. Is such a rule guaranteed here (or even feasible)? bjope: This is only valid if there are no other uses of SrcReg (no anywhere else, and not in the…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah. We are being very careful! We are only using these for hardware loop instructions that we control the insertion of and are careful to only do the transform to produce the instruction (t2LoopEndDec in this case) when it has a single remaining user. CheckUsers in MVETPAndVPTOptimisations::MergeLoopEnd does that. That's why this is very opt-in and it's trying to be careful to say "Use with care". dmgreen: Yeah. We are being very careful! We are only using these for hardware loop instructions that…
		bjopeUnsubmitted Not Done Reply Inline Actions My concern was that this is not inside the target specific part of the backend. So if some other target wants to use the concept of "unspillable terminators" then there must be some clear rules what to expect. I think that having an assert that there is only one use would be appropriate here (to at least catch any violations to such a rule at this point where things would go wrong otherwise). But maybe the MachineVerifier should help out detect problems if an unspillable terminator def has more than one use as well (to catch such problems already in a pass breaking the rule)? And maybe we need to document what is expected related to "unspillable terminators" related to this restriction somewhere (not sure exactly where). I guess there could be lots of generic passes, but also target specific passes, that must adhere to such a rule if it is allowed to insert an unspillable terminator already at ISel. So I still don't see how to ensure that all passes between ISel and PHI-elmination are aware about that they aren't allowed to introduce new uses of the unspillable terminator def? Or how close to PHI-elimination must a pass introducing "unspillable terminators" be? bjope: My concern was that this is not inside the target specific part of the backend. So if some…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I agree. I have added a check to the verifier and an assert in here. In our case we are adding these for hardware loop instructions soon before registry allocation, after all the other mir level optimization have happened. There's also some details about the instruction in isUnspillableTerminatorImpl. dmgreen: I agree. I have added a check to the verifier and an assert in here. In our case we are adding…
		"Expected a single use from UnspillableTerminator");
		SrcRegDef->getOperand(0).setReg(IncomingReg);
		continue;
		}

// Find a safe location to insert the copy, this may be the first terminator		// Find a safe location to insert the copy, this may be the first terminator
// in the block (or end()).		// in the block (or end()).
MachineBasicBlock::iterator InsertPos =		MachineBasicBlock::iterator InsertPos =
findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);		findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);

// Insert the copy.		// Insert the copy.
MachineInstr *NewSrcInstr = nullptr;		MachineInstr *NewSrcInstr = nullptr;
if (!reusedIncoming && IncomingReg) {		if (!reusedIncoming && IncomingReg) {
▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMBaseInstrInfo.h

Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines	public:
MachineBasicBlock::iterator		MachineBasicBlock::iterator
insertOutlinedCall(Module &M, MachineBasicBlock &MBB,		insertOutlinedCall(Module &M, MachineBasicBlock &MBB,
MachineBasicBlock::iterator &It, MachineFunction &MF,		MachineBasicBlock::iterator &It, MachineFunction &MF,
const outliner::Candidate &C) const override;		const outliner::Candidate &C) const override;

/// Enable outlining by default at -Oz.		/// Enable outlining by default at -Oz.
bool shouldOutlineFromFunctionByDefault(MachineFunction &MF) const override;		bool shouldOutlineFromFunctionByDefault(MachineFunction &MF) const override;

		bool isUnspillableTerminatorImpl(const MachineInstr *MI) const override {
		return MI->getOpcode() == ARM::t2LoopEndDec;
		}

private:		private:
/// Returns an unused general-purpose register which can be used for		/// Returns an unused general-purpose register which can be used for
/// constructing an outlined call if one exists. Returns 0 otherwise.		/// constructing an outlined call if one exists. Returns 0 otherwise.
unsigned findRegisterToSaveLRTo(const outliner::Candidate &C) const;		unsigned findRegisterToSaveLRTo(const outliner::Candidate &C) const;

// Adds an instruction which saves the link register on top of the stack into		// Adds an instruction which saves the link register on top of the stack into
/// the MachineBasicBlock \p MBB at position \p It.		/// the MachineBasicBlock \p MBB at position \p It.
void saveLROnStack(MachineBasicBlock &MBB,		void saveLROnStack(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp

Show First 20 Lines • Show All 5,945 Lines • ▼ Show 20 Lines	if (Opc == ARM::tPICADD \|\| Opc == ARM::PICADD \|\| Opc == ARM::PICSTR \|\|
Opc == ARM::PICLDRSH \|\| Opc == ARM::t2LDRpci_pic \|\|		Opc == ARM::PICLDRSH \|\| Opc == ARM::t2LDRpci_pic \|\|
Opc == ARM::t2MOVi16_ga_pcrel \|\| Opc == ARM::t2MOVTi16_ga_pcrel \|\|		Opc == ARM::t2MOVi16_ga_pcrel \|\| Opc == ARM::t2MOVTi16_ga_pcrel \|\|
Opc == ARM::t2MOV_ga_pcrel)		Opc == ARM::t2MOV_ga_pcrel)
return outliner::InstrType::Illegal;		return outliner::InstrType::Illegal;

// Be conservative with ARMv8.1 MVE instructions.		// Be conservative with ARMv8.1 MVE instructions.
if (Opc == ARM::t2BF_LabelPseudo \|\| Opc == ARM::t2DoLoopStart \|\|		if (Opc == ARM::t2BF_LabelPseudo \|\| Opc == ARM::t2DoLoopStart \|\|
Opc == ARM::t2DoLoopStartTP \|\| Opc == ARM::t2WhileLoopStart \|\|		Opc == ARM::t2DoLoopStartTP \|\| Opc == ARM::t2WhileLoopStart \|\|
Opc == ARM::t2LoopDec \|\| Opc == ARM::t2LoopEnd)		Opc == ARM::t2LoopDec \|\| Opc == ARM::t2LoopEnd \|\|
		Opc == ARM::t2LoopEndDec)
return outliner::InstrType::Illegal;		return outliner::InstrType::Illegal;

const MCInstrDesc &MCID = MI.getDesc();		const MCInstrDesc &MCID = MI.getDesc();
uint64_t MIFlags = MCID.TSFlags;		uint64_t MIFlags = MCID.TSFlags;
if ((MIFlags & ARMII::DomainMask) == ARMII::DomainMVE)		if ((MIFlags & ARMII::DomainMask) == ARMII::DomainMVE)
return outliner::InstrType::Illegal;		return outliner::InstrType::Illegal;

// Is this a terminator for a basic block?		// Is this a terminator for a basic block?
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrThumb2.td

Show First 20 Lines • Show All 5,442 Lines • ▼ Show 20 Lines	t2PseudoInst<(outs),
(ins rGPR:$elts, brtarget:$target),		(ins rGPR:$elts, brtarget:$target),
8, IIC_Br, []>,		8, IIC_Br, []>,
Sched<[WriteBr]>;		Sched<[WriteBr]>;

def t2LoopEnd :		def t2LoopEnd :
t2PseudoInst<(outs), (ins GPRlr:$elts, brtarget:$target),		t2PseudoInst<(outs), (ins GPRlr:$elts, brtarget:$target),
8, IIC_Br, []>, Sched<[WriteBr]>;		8, IIC_Br, []>, Sched<[WriteBr]>;

		def t2LoopEndDec :
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions bikeshedding names: how about `t2LoopDecEnd` to more keep the semantics this is a decrement + end, in that order? SjoerdMeijer: bikeshedding names: how about `t2LoopDecEnd` to more keep the semantics this is a decrement +…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah, It does both together. I have no strong opinion on the order in the name. I guess I chose t2LoopEndDec as I felt that the major thing it did was behave as a loop end (like an LE instruction). The Dec somewhat of a side effect of that. Can change it if you think it's better the other way. Let me know. dmgreen: Yeah, It does both together. I have no strong opinion on the order in the name. I guess I chose…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions No strong opinion on this, I was bikeshedding anyway. SjoerdMeijer: No strong opinion on this, I was bikeshedding anyway.
		t2PseudoInst<(outs GPRlr:$Rm), (ins GPRlr:$elts, brtarget:$target),
		8, IIC_Br, []>, Sched<[WriteBr]>;

} // end isBranch, isTerminator, hasSideEffects		} // end isBranch, isTerminator, hasSideEffects

}		}

} // end isNotDuplicable		} // end isNotDuplicable

class CS<string iname, bits<4> opcode, list<dag> pattern=[]>		class CS<string iname, bits<4> opcode, list<dag> pattern=[]>
: V8_1MI<(outs rGPR:$Rd), (ins GPRwithZRnosp:$Rn, GPRwithZRnosp:$Rm, pred_noal:$fcond),		: V8_1MI<(outs rGPR:$Rd), (ins GPRwithZRnosp:$Rn, GPRwithZRnosp:$Rm, pred_noal:$fcond),
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

Show First 20 Lines • Show All 501 Lines • ▼ Show 20 Lines	private:

void RevertWhile(MachineInstr *MI) const;		void RevertWhile(MachineInstr *MI) const;
void RevertDo(MachineInstr *MI) const;		void RevertDo(MachineInstr *MI) const;

bool RevertLoopDec(MachineInstr *MI) const;		bool RevertLoopDec(MachineInstr *MI) const;

void RevertLoopEnd(MachineInstr *MI, bool SkipCmp = false) const;		void RevertLoopEnd(MachineInstr *MI, bool SkipCmp = false) const;

		void RevertLoopEndDec(MachineInstr *MI) const;

void ConvertVPTBlocks(LowOverheadLoop &LoLoop);		void ConvertVPTBlocks(LowOverheadLoop &LoLoop);

MachineInstr *ExpandLoopStart(LowOverheadLoop &LoLoop);		MachineInstr *ExpandLoopStart(LowOverheadLoop &LoLoop);

void Expand(LowOverheadLoop &LoLoop);		void Expand(LowOverheadLoop &LoLoop);

void IterationCountDCE(LowOverheadLoop &LoLoop);		void IterationCountDCE(LowOverheadLoop &LoLoop);
};		};
▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines
void LowOverheadLoop::Validate(ARMBasicBlockUtils *BBUtils) {		void LowOverheadLoop::Validate(ARMBasicBlockUtils *BBUtils) {
if (Revert)		if (Revert)
return;		return;

// Check branch target ranges: WLS[TP] can only branch forwards and LE[TP]		// Check branch target ranges: WLS[TP] can only branch forwards and LE[TP]
// can only jump back.		// can only jump back.
auto ValidateRanges = [](MachineInstr Start, MachineInstr End,		auto ValidateRanges = [](MachineInstr Start, MachineInstr End,
ARMBasicBlockUtils *BBUtils, MachineLoop &ML) {		ARMBasicBlockUtils *BBUtils, MachineLoop &ML) {
assert(End->getOperand(1).isMBB() &&		MachineBasicBlock *TgtBB = End->getOpcode() == ARM::t2LoopEnd
"Expected LoopEnd to target basic block!");		? End->getOperand(1).getMBB()
		: End->getOperand(2).getMBB();
// TODO Maybe there's cases where the target doesn't have to be the header,		// TODO Maybe there's cases where the target doesn't have to be the header,
// but for now be safe and revert.		// but for now be safe and revert.
if (End->getOperand(1).getMBB() != ML.getHeader()) {		if (TgtBB != ML.getHeader()) {
LLVM_DEBUG(dbgs() << "ARM Loops: LoopEnd is not targeting header.\n");		LLVM_DEBUG(dbgs() << "ARM Loops: LoopEnd is not targeting header.\n");
return false;		return false;
}		}

// The WLS and LE instructions have 12-bits for the label offset. WLS		// The WLS and LE instructions have 12-bits for the label offset. WLS
// requires a positive offset, while LE uses negative.		// requires a positive offset, while LE uses negative.
if (BBUtils->getOffsetOf(End) < BBUtils->getOffsetOf(ML.getHeader()) \|\|		if (BBUtils->getOffsetOf(End) < BBUtils->getOffsetOf(ML.getHeader()) \|\|
!BBUtils->isBBInRange(End, ML.getHeader(), 4094)) {		!BBUtils->isBBInRange(End, ML.getHeader(), 4094)) {
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	bool ARMLowOverheadLoops::ProcessLoop(MachineLoop *ML) {
for (auto *MBB : reverse(ML->getBlocks())) {		for (auto *MBB : reverse(ML->getBlocks())) {
for (auto &MI : *MBB) {		for (auto &MI : *MBB) {
if (MI.isDebugValue())		if (MI.isDebugValue())
continue;		continue;
else if (MI.getOpcode() == ARM::t2LoopDec)		else if (MI.getOpcode() == ARM::t2LoopDec)
LoLoop.Dec = &MI;		LoLoop.Dec = &MI;
else if (MI.getOpcode() == ARM::t2LoopEnd)		else if (MI.getOpcode() == ARM::t2LoopEnd)
LoLoop.End = &MI;		LoLoop.End = &MI;
		else if (MI.getOpcode() == ARM::t2LoopEndDec)
		LoLoop.End = LoLoop.Dec = &MI;
else if (isLoopStart(MI))		else if (isLoopStart(MI))
LoLoop.Start = &MI;		LoLoop.Start = &MI;
else if (MI.getDesc().isCall()) {		else if (MI.getDesc().isCall()) {
// TODO: Though the call will require LE to execute again, does this		// TODO: Though the call will require LE to execute again, does this
// mean we should revert? Always executing LE hopefully should be		// mean we should revert? Always executing LE hopefully should be
// faster than performing a sub,cmp,br or even subs,br.		// faster than performing a sub,cmp,br or even subs,br.
LoLoop.Revert = true;		LoLoop.Revert = true;
LLVM_DEBUG(dbgs() << "ARM Loops: Found call.\n");		LLVM_DEBUG(dbgs() << "ARM Loops: Found call.\n");
} else {		} else {
// Record VPR defs and build up their corresponding vpt blocks.		// Record VPR defs and build up their corresponding vpt blocks.
// Check we know how to tail predicate any mve instructions.		// Check we know how to tail predicate any mve instructions.
LoLoop.AnalyseMVEInst(&MI);		LoLoop.AnalyseMVEInst(&MI);
}		}
}		}
}		}

LLVM_DEBUG(LoLoop.dump());		LLVM_DEBUG(LoLoop.dump());
if (!LoLoop.FoundAllComponents()) {		if (!LoLoop.FoundAllComponents()) {
LLVM_DEBUG(dbgs() << "ARM Loops: Didn't find loop start, update, end\n");		LLVM_DEBUG(dbgs() << "ARM Loops: Didn't find loop start, update, end\n");
return false;		return false;
}		}

// Check that the only instruction using LoopDec is LoopEnd.		// Check that the only instruction using LoopDec is LoopEnd. This can only
		// happen when the Dec and End are separate, not a single t2LoopEndDec.
// TODO: Check for copy chains that really have no effect.		// TODO: Check for copy chains that really have no effect.
		if (LoLoop.Dec != LoLoop.End) {
SmallPtrSet<MachineInstr*, 2> Uses;		SmallPtrSet<MachineInstr *, 2> Uses;
RDA->getReachingLocalUses(LoLoop.Dec, MCRegister::from(ARM::LR), Uses);		RDA->getReachingLocalUses(LoLoop.Dec, MCRegister::from(ARM::LR), Uses);
if (Uses.size() > 1 \|\| !Uses.count(LoLoop.End)) {		if (Uses.size() > 1 \|\| !Uses.count(LoLoop.End)) {
LLVM_DEBUG(dbgs() << "ARM Loops: Unable to remove LoopDec.\n");		LLVM_DEBUG(dbgs() << "ARM Loops: Unable to remove LoopDec.\n");
LoLoop.Revert = true;		LoLoop.Revert = true;
}		}
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions It wasn't immediately clear to me why we need this now. SjoerdMeijer: It wasn't immediately clear to me why we need this now.
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I updated the comment. dmgreen: I updated the comment.
		}
LoLoop.Validate(BBUtils.get());		LoLoop.Validate(BBUtils.get());
Expand(LoLoop);		Expand(LoLoop);
return true;		return true;
}		}

// WhileLoopStart holds the exit block, so produce a cmp lr, 0 and then a		// WhileLoopStart holds the exit block, so produce a cmp lr, 0 and then a
// beq that branches to the exit branch.		// beq that branches to the exit branch.
// TODO: We could also try to generate a cbz if the value in LR is also in		// TODO: We could also try to generate a cbz if the value in LR is also in
Show All 37 Lines	void ARMLowOverheadLoops::RevertLoopEnd(MachineInstr *MI, bool SkipCmp) const {

MachineBasicBlock *DestBB = MI->getOperand(1).getMBB();		MachineBasicBlock *DestBB = MI->getOperand(1).getMBB();
unsigned BrOpc = BBUtils->isBBInRange(MI, DestBB, 254) ?		unsigned BrOpc = BBUtils->isBBInRange(MI, DestBB, 254) ?
ARM::tBcc : ARM::t2Bcc;		ARM::tBcc : ARM::t2Bcc;

llvm::RevertLoopEnd(MI, TII, BrOpc, SkipCmp);		llvm::RevertLoopEnd(MI, TII, BrOpc, SkipCmp);
}		}

		// Generate a subs, or sub and cmp, and a branch instead of an LE.
		void ARMLowOverheadLoops::RevertLoopEndDec(MachineInstr *MI) const {
		LLVM_DEBUG(dbgs() << "ARM Loops: Reverting to subs, br: " << *MI);
		SjoerdMeijerUnsubmitted Done Reply Inline Actions nit: perhaps an assert that MI is a loop end? SjoerdMeijer: nit: perhaps an assert that MI is a loop end?
		assert(MI->getOpcode() == ARM::t2LoopEndDec && "Expected a t2LoopEndDec!");
		MachineBasicBlock *MBB = MI->getParent();

		MachineInstrBuilder MIB =
		BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(ARM::t2SUBri));
		MIB.addDef(ARM::LR);
		MIB.add(MI->getOperand(1));
		MIB.addImm(1);
		MIB.addImm(ARMCC::AL);
		MIB.addReg(ARM::NoRegister);
		mtrofinUnsubmitted Done Reply Inline Actions Nit: addReg(MCRegister::NoRegister) is probably more readable. mtrofin: Nit: addReg(MCRegister::NoRegister) is probably more readable.
		MIB.addReg(ARM::CPSR);
		MIB->getOperand(5).setIsDef(true);

		MachineBasicBlock *DestBB = MI->getOperand(2).getMBB();
		unsigned BrOpc =
		BBUtils->isBBInRange(MI, DestBB, 254) ? ARM::tBcc : ARM::t2Bcc;

		// Create bne
		MIB = BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(BrOpc));
		MIB.add(MI->getOperand(2)); // branch target
		MIB.addImm(ARMCC::NE); // condition code
		MIB.addReg(ARM::CPSR);

		MI->eraseFromParent();
		}

// Perform dead code elimation on the loop iteration count setup expression.		// Perform dead code elimation on the loop iteration count setup expression.
// If we are tail-predicating, the number of elements to be processed is the		// If we are tail-predicating, the number of elements to be processed is the
// operand of the VCTP instruction in the vector body, see getCount(), which is		// operand of the VCTP instruction in the vector body, see getCount(), which is
// register $r3 in this example:		// register $r3 in this example:
//		//
// $lr = big-itercount-expression		// $lr = big-itercount-expression
// ..		// ..
// $lr = t2DoLoopStart renamable $lr		// $lr = t2DoLoopStart renamable $lr
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	void ARMLowOverheadLoops::Expand(LowOverheadLoop &LoLoop) {
auto ExpandLoopEnd = [this](LowOverheadLoop &LoLoop) {		auto ExpandLoopEnd = [this](LowOverheadLoop &LoLoop) {
MachineInstr *End = LoLoop.End;		MachineInstr *End = LoLoop.End;
MachineBasicBlock *MBB = End->getParent();		MachineBasicBlock *MBB = End->getParent();
unsigned Opc = LoLoop.IsTailPredicationLegal() ?		unsigned Opc = LoLoop.IsTailPredicationLegal() ?
ARM::MVE_LETP : ARM::t2LEUpdate;		ARM::MVE_LETP : ARM::t2LEUpdate;
MachineInstrBuilder MIB = BuildMI(*MBB, End, End->getDebugLoc(),		MachineInstrBuilder MIB = BuildMI(*MBB, End, End->getDebugLoc(),
TII->get(Opc));		TII->get(Opc));
MIB.addDef(ARM::LR);		MIB.addDef(ARM::LR);
MIB.add(End->getOperand(0));		unsigned Off = LoLoop.Dec == LoLoop.End ? 1 : 0;
MIB.add(End->getOperand(1));		MIB.add(End->getOperand(Off + 0));
		MIB.add(End->getOperand(Off + 1));
LLVM_DEBUG(dbgs() << "ARM Loops: Inserted LE: " << *MIB);		LLVM_DEBUG(dbgs() << "ARM Loops: Inserted LE: " << *MIB);
LoLoop.ToRemove.insert(LoLoop.Dec);		LoLoop.ToRemove.insert(LoLoop.Dec);
LoLoop.ToRemove.insert(End);		LoLoop.ToRemove.insert(End);
return &*MIB;		return &*MIB;
};		};

// TODO: We should be able to automatically remove these branches before we		// TODO: We should be able to automatically remove these branches before we
// get here - probably by teaching analyzeBranch about the pseudo		// get here - probably by teaching analyzeBranch about the pseudo
Show All 12 Lines	auto RemoveDeadBranch = [](MachineInstr *I) {
}		}
};		};

if (LoLoop.Revert) {		if (LoLoop.Revert) {
if (LoLoop.Start->getOpcode() == ARM::t2WhileLoopStart)		if (LoLoop.Start->getOpcode() == ARM::t2WhileLoopStart)
RevertWhile(LoLoop.Start);		RevertWhile(LoLoop.Start);
else		else
RevertDo(LoLoop.Start);		RevertDo(LoLoop.Start);
bool FlagsAlreadySet = RevertLoopDec(LoLoop.Dec);		if (LoLoop.Dec == LoLoop.End)
RevertLoopEnd(LoLoop.End, FlagsAlreadySet);		RevertLoopEndDec(LoLoop.End);
		else
		RevertLoopEnd(LoLoop.End, RevertLoopDec(LoLoop.Dec));
} else {		} else {
		SjoerdMeijerUnsubmitted Done Reply Inline Actions nit: how about `RevertLoopEnd(LoLoop.End, RevertLoopDec(LoLoop.Dec));` to get rid of the curly brackets around the if-else here? SjoerdMeijer: nit: how about `RevertLoopEnd(LoLoop.End, RevertLoopDec(LoLoop.Dec));` to get rid of the curly…
LoLoop.Start = ExpandLoopStart(LoLoop);		LoLoop.Start = ExpandLoopStart(LoLoop);
RemoveDeadBranch(LoLoop.Start);		RemoveDeadBranch(LoLoop.Start);
LoLoop.End = ExpandLoopEnd(LoLoop);		LoLoop.End = ExpandLoopEnd(LoLoop);
RemoveDeadBranch(LoLoop.End);		RemoveDeadBranch(LoLoop.End);
if (LoLoop.IsTailPredicationLegal())		if (LoLoop.IsTailPredicationLegal())
ConvertVPTBlocks(LoLoop);		ConvertVPTBlocks(LoLoop);
for (auto *I : LoLoop.ToRemove) {		for (auto *I : LoLoop.ToRemove) {
LLVM_DEBUG(dbgs() << "ARM Loops: Erasing " << *I);		LLVM_DEBUG(dbgs() << "ARM Loops: Erasing " << *I);
Show All 26 Lines
bool ARMLowOverheadLoops::RevertNonLoops() {		bool ARMLowOverheadLoops::RevertNonLoops() {
LLVM_DEBUG(dbgs() << "ARM Loops: Reverting any remaining pseudos...\n");		LLVM_DEBUG(dbgs() << "ARM Loops: Reverting any remaining pseudos...\n");
bool Changed = false;		bool Changed = false;

for (auto &MBB : *MF) {		for (auto &MBB : *MF) {
SmallVector<MachineInstr*, 4> Starts;		SmallVector<MachineInstr*, 4> Starts;
SmallVector<MachineInstr*, 4> Decs;		SmallVector<MachineInstr*, 4> Decs;
SmallVector<MachineInstr*, 4> Ends;		SmallVector<MachineInstr*, 4> Ends;
		SmallVector<MachineInstr *, 4> EndDecs;

for (auto &I : MBB) {		for (auto &I : MBB) {
if (isLoopStart(I))		if (isLoopStart(I))
Starts.push_back(&I);		Starts.push_back(&I);
else if (I.getOpcode() == ARM::t2LoopDec)		else if (I.getOpcode() == ARM::t2LoopDec)
Decs.push_back(&I);		Decs.push_back(&I);
else if (I.getOpcode() == ARM::t2LoopEnd)		else if (I.getOpcode() == ARM::t2LoopEnd)
Ends.push_back(&I);		Ends.push_back(&I);
		else if (I.getOpcode() == ARM::t2LoopEndDec)
		EndDecs.push_back(&I);
}		}

if (Starts.empty() && Decs.empty() && Ends.empty())		if (Starts.empty() && Decs.empty() && Ends.empty() && EndDecs.empty())
continue;		continue;

Changed = true;		Changed = true;

for (auto *Start : Starts) {		for (auto *Start : Starts) {
if (Start->getOpcode() == ARM::t2WhileLoopStart)		if (Start->getOpcode() == ARM::t2WhileLoopStart)
RevertWhile(Start);		RevertWhile(Start);
else		else
RevertDo(Start);		RevertDo(Start);
}		}
for (auto *Dec : Decs)		for (auto *Dec : Decs)
RevertLoopDec(Dec);		RevertLoopDec(Dec);

for (auto *End : Ends)		for (auto *End : Ends)
RevertLoopEnd(End);		RevertLoopEnd(End);
		for (auto *End : EndDecs)
		RevertLoopEndDec(End);
}		}
return Changed;		return Changed;
}		}

FunctionPass *llvm::createARMLowOverheadLoopsPass() {		FunctionPass *llvm::createARMLowOverheadLoopsPass() {
return new ARMLowOverheadLoops();		return new ARMLowOverheadLoops();
}		}

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp

Show All 29 Lines
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include <cassert>		#include <cassert>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "arm-mve-vpt-opts"		#define DEBUG_TYPE "arm-mve-vpt-opts"

		static cl::opt<bool>
		MergeEndDec("arm-enable-merge-loopenddec", cl::Hidden,
		cl::desc("Enable merging Loop End and Dec instructions."),
		cl::init(true));

namespace {		namespace {
class MVEVPTOptimisations : public MachineFunctionPass {		class MVEVPTOptimisations : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;
const Thumb2InstrInfo *TII;		const Thumb2InstrInfo *TII;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;

MVEVPTOptimisations() : MachineFunctionPass(ID) {}		MVEVPTOptimisations() : MachineFunctionPass(ID) {}
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	static bool findLoopComponents(MachineLoop ML, MachineRegisterInfo MRI,

// Find the loop end from the terminators.		// Find the loop end from the terminators.
LoopEnd = nullptr;		LoopEnd = nullptr;
for (auto &T : Latch->terminators()) {		for (auto &T : Latch->terminators()) {
if (T.getOpcode() == ARM::t2LoopEnd && T.getOperand(1).getMBB() == Header) {		if (T.getOpcode() == ARM::t2LoopEnd && T.getOperand(1).getMBB() == Header) {
LoopEnd = &T;		LoopEnd = &T;
break;		break;
}		}
		if (T.getOpcode() == ARM::t2LoopEndDec &&
		T.getOperand(2).getMBB() == Header) {
		LoopEnd = &T;
		break;
		}
}		}
if (!LoopEnd) {		if (!LoopEnd) {
LLVM_DEBUG(dbgs() << " no LoopEnd\n");		LLVM_DEBUG(dbgs() << " no LoopEnd\n");
return false;		return false;
}		}
LLVM_DEBUG(dbgs() << " found loop end: " << *LoopEnd);		LLVM_DEBUG(dbgs() << " found loop end: " << *LoopEnd);

// Find the dec from the use of the end. There may be copies between		// Find the dec from the use of the end. There may be copies between
// instructions. We expect the loop to loop like:		// instructions. We expect the loop to loop like:
// $vs = t2DoLoopStart ...		// $vs = t2DoLoopStart ...
// loop:		// loop:
// $vp = phi [ $vs ], [ $vd ]		// $vp = phi [ $vs ], [ $vd ]
// ...		// ...
// $vd = t2LoopDec $vp		// $vd = t2LoopDec $vp
// ...		// ...
// t2LoopEnd $vd, loop		// t2LoopEnd $vd, loop
		if (LoopEnd->getOpcode() == ARM::t2LoopEndDec)
		LoopDec = LoopEnd;
		else {
LoopDec =		LoopDec =
LookThroughCOPY(MRI->getVRegDef(LoopEnd->getOperand(0).getReg()), MRI);		LookThroughCOPY(MRI->getVRegDef(LoopEnd->getOperand(0).getReg()), MRI);
if (!LoopDec \|\| LoopDec->getOpcode() != ARM::t2LoopDec) {		if (!LoopDec \|\| LoopDec->getOpcode() != ARM::t2LoopDec) {
LLVM_DEBUG(dbgs() << " didn't find LoopDec where we expected!\n");		LLVM_DEBUG(dbgs() << " didn't find LoopDec where we expected!\n");
return false;		return false;
}		}
		}
LLVM_DEBUG(dbgs() << " found loop dec: " << *LoopDec);		LLVM_DEBUG(dbgs() << " found loop dec: " << *LoopDec);

LoopPhi =		LoopPhi =
LookThroughCOPY(MRI->getVRegDef(LoopDec->getOperand(1).getReg()), MRI);		LookThroughCOPY(MRI->getVRegDef(LoopDec->getOperand(1).getReg()), MRI);
if (!LoopPhi \|\| LoopPhi->getOpcode() != TargetOpcode::PHI \|\|		if (!LoopPhi \|\| LoopPhi->getOpcode() != TargetOpcode::PHI \|\|
LoopPhi->getNumOperands() != 5 \|\|		LoopPhi->getNumOperands() != 5 \|\|
(LoopPhi->getOperand(2).getMBB() != Latch &&		(LoopPhi->getOperand(2).getMBB() != Latch &&
LoopPhi->getOperand(4).getMBB() != Latch)) {		LoopPhi->getOperand(4).getMBB() != Latch)) {
Show All 18 Lines
// This function converts loops with t2LoopEnd and t2LoopEnd instructions into		// This function converts loops with t2LoopEnd and t2LoopEnd instructions into
// a single t2LoopEndDec instruction. To do that it needs to make sure that LR		// a single t2LoopEndDec instruction. To do that it needs to make sure that LR
// will be valid to be used for the low overhead loop, which means nothing else		// will be valid to be used for the low overhead loop, which means nothing else
// is using LR (especially calls) and there are no superfluous copies in the		// is using LR (especially calls) and there are no superfluous copies in the
// loop. The t2LoopEndDec is a branching terminator that produces a value (the		// loop. The t2LoopEndDec is a branching terminator that produces a value (the
// decrement) around the loop edge, which means we need to be careful that they		// decrement) around the loop edge, which means we need to be careful that they
// will be valid to allocate without any spilling.		// will be valid to allocate without any spilling.
bool MVEVPTOptimisations::MergeLoopEnd(MachineLoop *ML) {		bool MVEVPTOptimisations::MergeLoopEnd(MachineLoop *ML) {
		if (!MergeEndDec)
		return false;

LLVM_DEBUG(dbgs() << "MergeLoopEnd on loop " << ML->getHeader()->getName()		LLVM_DEBUG(dbgs() << "MergeLoopEnd on loop " << ML->getHeader()->getName()
<< "\n");		<< "\n");

MachineInstr LoopEnd, LoopPhi, LoopStart, LoopDec;		MachineInstr LoopEnd, LoopPhi, LoopStart, LoopDec;
if (!findLoopComponents(ML, MRI, LoopStart, LoopPhi, LoopDec, LoopEnd))		if (!findLoopComponents(ML, MRI, LoopStart, LoopPhi, LoopDec, LoopEnd))
return false;		return false;

// Check if there is an illegal instruction (a call) in the low overhead loop		// Check if there is an illegal instruction (a call) in the low overhead loop
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	bool MVEVPTOptimisations::MergeLoopEnd(MachineLoop *ML) {
if (LoopPhi->getOperand(2).getMBB() == ML->getLoopLatch()) {		if (LoopPhi->getOperand(2).getMBB() == ML->getLoopLatch()) {
LoopPhi->getOperand(3).setReg(StartReg);		LoopPhi->getOperand(3).setReg(StartReg);
LoopPhi->getOperand(1).setReg(DecReg);		LoopPhi->getOperand(1).setReg(DecReg);
} else {		} else {
LoopPhi->getOperand(1).setReg(StartReg);		LoopPhi->getOperand(1).setReg(StartReg);
LoopPhi->getOperand(3).setReg(DecReg);		LoopPhi->getOperand(3).setReg(DecReg);
}		}

LoopDec->getOperand(1).setReg(PhiReg);		// Replace the loop dec and loop end as a single instruction.
LoopEnd->getOperand(0).setReg(DecReg);		MachineInstrBuilder MI =
		BuildMI(LoopEnd->getParent(), LoopEnd, LoopEnd->getDebugLoc(),
		TII->get(ARM::t2LoopEndDec), DecReg)
		.addReg(PhiReg)
		.add(LoopEnd->getOperand(1));
		LLVM_DEBUG(dbgs() << "Merged LoopDec and End into: " << *MI.getInstr());

		LoopDec->eraseFromParent();
		LoopEnd->eraseFromParent();
for (auto *MI : Copies)		for (auto *MI : Copies)
MI->eraseFromParent();		MI->eraseFromParent();
return true;		return true;
}		}

// Convert t2DoLoopStart to t2DoLoopStartTP if the loop contains VCTP		// Convert t2DoLoopStart to t2DoLoopStartTP if the loop contains VCTP
// instructions. This keeps the VCTP count reg operand on the t2DoLoopStartTP		// instructions. This keeps the VCTP count reg operand on the t2DoLoopStartTP
// instruction, making the backend ARMLowOverheadLoops passes job of finding the		// instruction, making the backend ARMLowOverheadLoops passes job of finding the
// VCTP operand much simpler.		// VCTP operand much simpler.
bool MVEVPTOptimisations::ConvertTailPredLoop(MachineLoop *ML,		bool MVEVPTOptimisations::ConvertTailPredLoop(MachineLoop *ML,
MachineDominatorTree *DT) {		MachineDominatorTree *DT) {
LLVM_DEBUG(dbgs() << "ConvertTailPredLoop on loop "		LLVM_DEBUG(dbgs() << "ConvertTailPredLoop on loop "
<< ML->getHeader()->getName() << "\n");		<< ML->getHeader()->getName() << "\n");

// Find some loop components including the LoopEnd/Dec/Start, and any VCTP's		// Find some loop components including the LoopEnd/Dec/Start, and any VCTP's
// in the loop.		// in the loop.
MachineInstr LoopEnd, LoopPhi, LoopStart, LoopDec;		MachineInstr LoopEnd, LoopPhi, LoopStart, LoopDec;
if (!findLoopComponents(ML, MRI, LoopStart, LoopPhi, LoopDec, LoopEnd))		if (!findLoopComponents(ML, MRI, LoopStart, LoopPhi, LoopDec, LoopEnd))
return false;		return false;
		if (LoopDec != LoopEnd)
		return false;

SmallVector<MachineInstr *, 4> VCTPs;		SmallVector<MachineInstr *, 4> VCTPs;
for (MachineBasicBlock *BB : ML->blocks())		for (MachineBasicBlock *BB : ML->blocks())
for (MachineInstr &MI : *BB)		for (MachineInstr &MI : *BB)
if (isVCTP(&MI))		if (isVCTP(&MI))
VCTPs.push_back(&MI);		VCTPs.push_back(&MI);

if (VCTPs.empty()) {		if (VCTPs.empty()) {
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/count_dominates_start.mir

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	body: \|
; CHECK: [[t2SUBri1:%[0-9]+]]:rgpr = t2SUBri [[PHI4]], 8, 14 /* CC::al */, $noreg, $noreg		; CHECK: [[t2SUBri1:%[0-9]+]]:rgpr = t2SUBri [[PHI4]], 8, 14 /* CC::al */, $noreg, $noreg
; CHECK: [[COPY7:%[0-9]+]]:gpr = COPY [[t2SUBri1]]		; CHECK: [[COPY7:%[0-9]+]]:gpr = COPY [[t2SUBri1]]
; CHECK: [[MVE_VLDRHU16_post:%[0-9]+]]:rgpr, [[MVE_VLDRHU16_post1:%[0-9]+]]:mqpr = MVE_VLDRHU16_post [[PHI]], 16, 1, [[MVE_VCTP16_]] :: (load 16 from %ir.lsr.iv35, align 2)		; CHECK: [[MVE_VLDRHU16_post:%[0-9]+]]:rgpr, [[MVE_VLDRHU16_post1:%[0-9]+]]:mqpr = MVE_VLDRHU16_post [[PHI]], 16, 1, [[MVE_VCTP16_]] :: (load 16 from %ir.lsr.iv35, align 2)
; CHECK: [[MVE_VLDRHU16_post2:%[0-9]+]]:rgpr, [[MVE_VLDRHU16_post3:%[0-9]+]]:mqpr = MVE_VLDRHU16_post [[PHI1]], 16, 1, [[MVE_VCTP16_]] :: (load 16 from %ir.lsr.iv12, align 2)		; CHECK: [[MVE_VLDRHU16_post2:%[0-9]+]]:rgpr, [[MVE_VLDRHU16_post3:%[0-9]+]]:mqpr = MVE_VLDRHU16_post [[PHI1]], 16, 1, [[MVE_VCTP16_]] :: (load 16 from %ir.lsr.iv12, align 2)
; CHECK: [[MVE_VMLADAVas16_:%[0-9]+]]:tgpreven = MVE_VMLADAVas16 [[PHI2]], killed [[MVE_VLDRHU16_post3]], killed [[MVE_VLDRHU16_post1]], 1, [[MVE_VCTP16_]]		; CHECK: [[MVE_VMLADAVas16_:%[0-9]+]]:tgpreven = MVE_VMLADAVas16 [[PHI2]], killed [[MVE_VLDRHU16_post3]], killed [[MVE_VLDRHU16_post1]], 1, [[MVE_VCTP16_]]
; CHECK: [[COPY8:%[0-9]+]]:gpr = COPY [[MVE_VMLADAVas16_]]		; CHECK: [[COPY8:%[0-9]+]]:gpr = COPY [[MVE_VMLADAVas16_]]
; CHECK: [[COPY9:%[0-9]+]]:gpr = COPY [[MVE_VLDRHU16_post2]]		; CHECK: [[COPY9:%[0-9]+]]:gpr = COPY [[MVE_VLDRHU16_post2]]
; CHECK: [[COPY10:%[0-9]+]]:gpr = COPY [[MVE_VLDRHU16_post]]		; CHECK: [[COPY10:%[0-9]+]]:gpr = COPY [[MVE_VLDRHU16_post]]
; CHECK: [[t2LoopDec:%[0-9]+]]:gprlr = t2LoopDec [[PHI3]], 1		; CHECK: [[t2LoopEndDec:%[0-9]+]]:gprlr = t2LoopEndDec [[PHI3]], %bb.3, implicit-def $cpsr
; CHECK: t2LoopEnd [[t2LoopDec]], %bb.3, implicit-def dead $cpsr
; CHECK: t2B %bb.4, 14 /* CC::al */, $noreg		; CHECK: t2B %bb.4, 14 /* CC::al */, $noreg
; CHECK: bb.4.for.cond.cleanup:		; CHECK: bb.4.for.cond.cleanup:
; CHECK: [[PHI5:%[0-9]+]]:gpr = PHI [[COPY3]], %bb.1, [[COPY8]], %bb.3		; CHECK: [[PHI5:%[0-9]+]]:gpr = PHI [[COPY3]], %bb.1, [[COPY8]], %bb.3
; CHECK: $r0 = COPY [[PHI5]]		; CHECK: $r0 = COPY [[PHI5]]
; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit $r0		; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit $r0
bb.0.entry:		bb.0.entry:
successors: %bb.1(0x50000000), %bb.4(0x30000000)		successors: %bb.1(0x50000000), %bb.4(0x30000000)
liveins: $r0, $r1, $r2		liveins: $r0, $r1, $r2
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll

	Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: @ in Loop: Header=BB2_4 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB2_4 Depth=1
	; CHECK-NEXT: vmul.f16 q0, q6, q5			; CHECK-NEXT: vmul.f16 q0, q6, q5
	; CHECK-NEXT: adds r0, #8			; CHECK-NEXT: adds r0, #8
	; CHECK-NEXT: vcvtt.f32.f16 s23, s1			; CHECK-NEXT: vcvtt.f32.f16 s23, s1
	; CHECK-NEXT: adds r1, #8			; CHECK-NEXT: adds r1, #8
	; CHECK-NEXT: vcvtb.f32.f16 s22, s1			; CHECK-NEXT: vcvtb.f32.f16 s22, s1
	; CHECK-NEXT: adds r3, #4			; CHECK-NEXT: adds r3, #4
	; CHECK-NEXT: vcvtt.f32.f16 s21, s0			; CHECK-NEXT: vcvtt.f32.f16 s21, s0
	; CHECK-NEXT: subs.w lr, lr, #1
	; CHECK-NEXT: vcvtb.f32.f16 s20, s0			; CHECK-NEXT: vcvtb.f32.f16 s20, s0
	; CHECK-NEXT: vadd.f32 q5, q3, q5			; CHECK-NEXT: vadd.f32 q5, q3, q5
				; CHECK-NEXT: subs.w lr, lr, #1
	; CHECK-NEXT: bne .LBB2_4			; CHECK-NEXT: bne .LBB2_4
	; CHECK-NEXT: b .LBB2_21			; CHECK-NEXT: b .LBB2_21
	; CHECK-NEXT: .LBB2_4: @ %vector.body			; CHECK-NEXT: .LBB2_4: @ %vector.body
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
	; CHECK-NEXT: vmov q3, q5			; CHECK-NEXT: vmov q3, q5
	; CHECK-NEXT: @ implicit-def: $q6			; CHECK-NEXT: @ implicit-def: $q6
	; CHECK-NEXT: vadd.i32 q4, q0, r3			; CHECK-NEXT: vadd.i32 q4, q0, r3
	▲ Show 20 Lines • Show All 230 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/minloop.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s

	define void @arm_min_q31(i32* nocapture readonly %pSrc, i32 %blockSize, i32* nocapture %pResult, i32* nocapture %pIndex) {			define void @arm_min_q31(i32* nocapture readonly %pSrc, i32 %blockSize, i32* nocapture %pResult, i32* nocapture %pIndex) {
	; CHECK-LABEL: arm_min_q31:			; CHECK-LABEL: arm_min_q31:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}			; CHECK-NEXT: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
	; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}			; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
	; CHECK-NEXT: .pad #4
	; CHECK-NEXT: sub sp, #4
	; CHECK-NEXT: ldr.w r12, [r0]			; CHECK-NEXT: ldr.w r12, [r0]
	; CHECK-NEXT: subs.w r9, r1, #1			; CHECK-NEXT: subs.w r9, r1, #1
	; CHECK-NEXT: beq .LBB0_3			; CHECK-NEXT: beq .LBB0_3
	; CHECK-NEXT: @ %bb.1: @ %while.body.preheader			; CHECK-NEXT: @ %bb.1: @ %while.body.preheader
	; CHECK-NEXT: subs r6, r1, #2			; CHECK-NEXT: subs r7, r1, #2
	; CHECK-NEXT: and r7, r9, #3			; CHECK-NEXT: and r8, r9, #3
	; CHECK-NEXT: cmp r6, #3			; CHECK-NEXT: cmp r7, #3
	; CHECK-NEXT: str r7, [sp] @ 4-byte Spill
	; CHECK-NEXT: bhs .LBB0_4			; CHECK-NEXT: bhs .LBB0_4
	; CHECK-NEXT: @ %bb.2:			; CHECK-NEXT: @ %bb.2:
	; CHECK-NEXT: mov.w r8, #0			; CHECK-NEXT: movs r6, #0
	; CHECK-NEXT: b .LBB0_6			; CHECK-NEXT: b .LBB0_6
	; CHECK-NEXT: .LBB0_3:			; CHECK-NEXT: .LBB0_3:
	; CHECK-NEXT: mov.w r8, #0			; CHECK-NEXT: movs r6, #0
	; CHECK-NEXT: b .LBB0_10			; CHECK-NEXT: b .LBB0_10
	; CHECK-NEXT: .LBB0_4: @ %while.body.preheader.new			; CHECK-NEXT: .LBB0_4: @ %while.body.preheader.new
	; CHECK-NEXT: bic r6, r9, #3			; CHECK-NEXT: bic r7, r9, #3
	; CHECK-NEXT: movs r4, #1			; CHECK-NEXT: movs r6, #1
	; CHECK-NEXT: subs r6, #4			; CHECK-NEXT: subs r7, #4
	; CHECK-NEXT: mov.w r8, #0			; CHECK-NEXT: add.w lr, r6, r7, lsr #2
	; CHECK-NEXT: add.w lr, r4, r6, lsr #2			; CHECK-NEXT: movs r6, #0
	; CHECK-NEXT: movs r6, #4			; CHECK-NEXT: dls lr, lr
	; CHECK-NEXT: mov lr, lr			; CHECK-NEXT: movs r7, #4
	; CHECK-NEXT: mov r11, lr
	; CHECK-NEXT: .LBB0_5: @ %while.body			; CHECK-NEXT: .LBB0_5: @ %while.body
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldr r10, [r0, #16]!			; CHECK-NEXT: ldr r10, [r0, #16]!
	; CHECK-NEXT: mov lr, r11
	; CHECK-NEXT: sub.w lr, lr, #1
	; CHECK-NEXT: sub.w r9, r9, #4			; CHECK-NEXT: sub.w r9, r9, #4
	; CHECK-NEXT: ldrd r7, r5, [r0, #-12]			; CHECK-NEXT: ldrd r5, r4, [r0, #-12]
	; CHECK-NEXT: mov r11, lr			; CHECK-NEXT: ldr r11, [r0, #-4]
	; CHECK-NEXT: ldr r4, [r0, #-4]			; CHECK-NEXT: cmp r12, r5
	; CHECK-NEXT: cmp r12, r7
	; CHECK-NEXT: it gt			; CHECK-NEXT: it gt
	; CHECK-NEXT: subgt.w r8, r6, #3			; CHECK-NEXT: subgt r6, r7, #3
	; CHECK-NEXT: csel r7, r7, r12, gt			; CHECK-NEXT: csel r5, r5, r12, gt
	; CHECK-NEXT: cmp r7, r5			; CHECK-NEXT: cmp r5, r4
	; CHECK-NEXT: it gt			; CHECK-NEXT: it gt
	; CHECK-NEXT: subgt.w r8, r6, #2			; CHECK-NEXT: subgt r6, r7, #2
	; CHECK-NEXT: csel r7, r5, r7, gt			; CHECK-NEXT: csel r5, r4, r5, gt
	; CHECK-NEXT: cmp r7, r4			; CHECK-NEXT: cmp r5, r11
	; CHECK-NEXT: it gt			; CHECK-NEXT: it gt
	; CHECK-NEXT: subgt.w r8, r6, #1			; CHECK-NEXT: subgt r6, r7, #1
	; CHECK-NEXT: csel r7, r4, r7, gt			; CHECK-NEXT: csel r5, r11, r5, gt
	; CHECK-NEXT: cmp r7, r10			; CHECK-NEXT: cmp r5, r10
	; CHECK-NEXT: csel r8, r6, r8, gt			; CHECK-NEXT: csel r6, r7, r6, gt
	; CHECK-NEXT: add.w r6, r6, #4			; CHECK-NEXT: add.w r7, r7, #4
	; CHECK-NEXT: csel r12, r10, r7, gt			; CHECK-NEXT: csel r12, r10, r5, gt
	; CHECK-NEXT: cmp.w lr, #0			; CHECK-NEXT: le lr, .LBB0_5
	; CHECK-NEXT: bne .LBB0_5
	; CHECK-NEXT: b .LBB0_6
	; CHECK-NEXT: .LBB0_6: @ %while.end.loopexit.unr-lcssa			; CHECK-NEXT: .LBB0_6: @ %while.end.loopexit.unr-lcssa
	; CHECK-NEXT: ldr r7, [sp] @ 4-byte Reload			; CHECK-NEXT: cmp.w r8, #0
	; CHECK-NEXT: cbz r7, .LBB0_10			; CHECK-NEXT: beq .LBB0_10
	; CHECK-NEXT: @ %bb.7: @ %while.body.epil			; CHECK-NEXT: @ %bb.7: @ %while.body.epil
	; CHECK-NEXT: ldr r4, [r0, #4]			; CHECK-NEXT: ldr r7, [r0, #4]
	; CHECK-NEXT: sub.w r1, r1, r9			; CHECK-NEXT: sub.w r1, r1, r9
	; CHECK-NEXT: cmp r12, r4			; CHECK-NEXT: cmp r12, r7
	; CHECK-NEXT: csel r8, r1, r8, gt			; CHECK-NEXT: csel r6, r1, r6, gt
	; CHECK-NEXT: csel r12, r4, r12, gt			; CHECK-NEXT: csel r12, r7, r12, gt
	; CHECK-NEXT: cmp r7, #1			; CHECK-NEXT: cmp.w r8, #1
	; CHECK-NEXT: beq .LBB0_10			; CHECK-NEXT: beq .LBB0_10
	; CHECK-NEXT: @ %bb.8: @ %while.body.epil.1			; CHECK-NEXT: @ %bb.8: @ %while.body.epil.1
	; CHECK-NEXT: ldr r4, [r0, #8]			; CHECK-NEXT: ldr r7, [r0, #8]
	; CHECK-NEXT: cmp r12, r4			; CHECK-NEXT: cmp r12, r7
	; CHECK-NEXT: csinc r8, r8, r1, le			; CHECK-NEXT: csinc r6, r6, r1, le
	; CHECK-NEXT: csel r12, r4, r12, gt			; CHECK-NEXT: csel r12, r7, r12, gt
	; CHECK-NEXT: cmp r7, #2			; CHECK-NEXT: cmp.w r8, #2
	; CHECK-NEXT: beq .LBB0_10			; CHECK-NEXT: beq .LBB0_10
	; CHECK-NEXT: @ %bb.9: @ %while.body.epil.2			; CHECK-NEXT: @ %bb.9: @ %while.body.epil.2
	; CHECK-NEXT: ldr r0, [r0, #12]			; CHECK-NEXT: ldr r0, [r0, #12]
	; CHECK-NEXT: cmp r12, r0			; CHECK-NEXT: cmp r12, r0
	; CHECK-NEXT: it gt			; CHECK-NEXT: it gt
	; CHECK-NEXT: addgt.w r8, r1, #2			; CHECK-NEXT: addgt r6, r1, #2
	; CHECK-NEXT: csel r12, r0, r12, gt			; CHECK-NEXT: csel r12, r0, r12, gt
	; CHECK-NEXT: .LBB0_10: @ %while.end			; CHECK-NEXT: .LBB0_10: @ %while.end
	; CHECK-NEXT: str.w r12, [r2]			; CHECK-NEXT: str.w r12, [r2]
	; CHECK-NEXT: str.w r8, [r3]			; CHECK-NEXT: str r6, [r3]
	; CHECK-NEXT: add sp, #4
	; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}			; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
	entry:			entry:
	%0 = load i32, i32* %pSrc, align 4			%0 = load i32, i32* %pSrc, align 4
	%blkCnt.015 = add i32 %blockSize, -1			%blkCnt.015 = add i32 %blockSize, -1
	%cmp.not17 = icmp eq i32 %blkCnt.015, 0			%cmp.not17 = icmp eq i32 %blkCnt.015, 0
	br i1 %cmp.not17, label %while.end, label %while.body.preheader			br i1 %cmp.not17, label %while.end, label %while.body.preheader

	while.body.preheader: ; preds = %entry			while.body.preheader: ; preds = %entry
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll

Show First 20 Lines • Show All 870 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body.prol.loopexit, %for.body
%exitcond.3 = icmp eq i32 %inc.3, %N		%exitcond.3 = icmp eq i32 %inc.3, %N
br i1 %exitcond.3, label %for.cond.cleanup, label %for.body		br i1 %exitcond.3, label %for.cond.cleanup, label %for.body
}		}

define arm_aapcs_vfpcc void @float_int_int_mul(i32* nocapture readonly %a, i32* nocapture readonly %b, float* nocapture %c, i32 %N) {		define arm_aapcs_vfpcc void @float_int_int_mul(i32* nocapture readonly %a, i32* nocapture readonly %b, float* nocapture %c, i32 %N) {
; CHECK-LABEL: float_int_int_mul:		; CHECK-LABEL: float_int_int_mul:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: push {r4, r5, r6, lr}		; CHECK-NEXT: push {r4, r5, r6, lr}
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cbz r3, .LBB4_8
; CHECK-NEXT: beq .LBB4_8
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader		; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
; CHECK-NEXT: cmp r3, #3		; CHECK-NEXT: cmp r3, #3
; CHECK-NEXT: bhi .LBB4_3		; CHECK-NEXT: bhi .LBB4_3
; CHECK-NEXT: @ %bb.2:		; CHECK-NEXT: @ %bb.2:
; CHECK-NEXT: mov.w r12, #0		; CHECK-NEXT: mov.w r12, #0
; CHECK-NEXT: b .LBB4_6		; CHECK-NEXT: b .LBB4_6
; CHECK-NEXT: .LBB4_3: @ %vector.ph		; CHECK-NEXT: .LBB4_3: @ %vector.ph
; CHECK-NEXT: bic r12, r3, #3		; CHECK-NEXT: bic r12, r3, #3
▲ Show 20 Lines • Show All 1,051 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-float32regloops.ll

	Show First 20 Lines • Show All 1,074 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: b .LBB16_4			; CHECK-NEXT: b .LBB16_4
	; CHECK-NEXT: .LBB16_3: @ %while.end			; CHECK-NEXT: .LBB16_3: @ %while.end
	; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
	; CHECK-NEXT: ldr r0, [sp, #16] @ 4-byte Reload			; CHECK-NEXT: ldr r0, [sp, #16] @ 4-byte Reload
	; CHECK-NEXT: subs.w r12, r12, #1			; CHECK-NEXT: subs.w r12, r12, #1
	; CHECK-NEXT: vstrb.8 q0, [r2], #16			; CHECK-NEXT: vstrb.8 q0, [r2], #16
	; CHECK-NEXT: add.w r0, r5, r0, lsl #2			; CHECK-NEXT: add.w r0, r5, r0, lsl #2
	; CHECK-NEXT: add.w r5, r0, #16			; CHECK-NEXT: add.w r5, r0, #16
	; CHECK-NEXT: beq.w .LBB16_12			; CHECK-NEXT: beq .LBB16_12
	; CHECK-NEXT: .LBB16_4: @ %while.body			; CHECK-NEXT: .LBB16_4: @ %while.body
	; CHECK-NEXT: @ =>This Loop Header: Depth=1			; CHECK-NEXT: @ =>This Loop Header: Depth=1
	; CHECK-NEXT: @ Child Loop BB16_6 Depth 2			; CHECK-NEXT: @ Child Loop BB16_6 Depth 2
	; CHECK-NEXT: @ Child Loop BB16_10 Depth 2			; CHECK-NEXT: @ Child Loop BB16_10 Depth 2
	; CHECK-NEXT: add.w lr, r10, #8			; CHECK-NEXT: add.w lr, r10, #8
	; CHECK-NEXT: vldrw.u32 q0, [r1], #16			; CHECK-NEXT: vldrw.u32 q0, [r1], #16
	; CHECK-NEXT: ldrd r3, r7, [r10]			; CHECK-NEXT: ldrd r3, r7, [r10]
	; CHECK-NEXT: ldm.w lr, {r0, r4, r6, lr}			; CHECK-NEXT: ldm.w lr, {r0, r4, r6, lr}
	▲ Show 20 Lines • Show All 939 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll

	Show First 20 Lines • Show All 1,415 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: @ Parent Loop BB7_2 Depth=1			; CHECK-NEXT: @ Parent Loop BB7_2 Depth=1
	; CHECK-NEXT: @ => This Inner Loop Header: Depth=2			; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
	; CHECK-NEXT: add.w r11, r3, r5			; CHECK-NEXT: add.w r11, r3, r5
	; CHECK-NEXT: vctp.32 r10			; CHECK-NEXT: vctp.32 r10
	; CHECK-NEXT: vpsttt			; CHECK-NEXT: vpsttt
	; CHECK-NEXT: vldrwt.u32 q0, [r12], #16			; CHECK-NEXT: vldrwt.u32 q0, [r12], #16
	; CHECK-NEXT: vldrwt.u32 q1, [r3], #16			; CHECK-NEXT: vldrwt.u32 q1, [r3], #16
	; CHECK-NEXT: vfmat.f32 q6, q1, q0			; CHECK-NEXT: vfmat.f32 q6, q1, q0
	; CHECK-NEXT: add.w r6, r11, r5			; CHECK-NEXT: vstrw.32 q6, [sp, #48] @ 16-byte Spill
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vldrwt.u32 q1, [r11]			; CHECK-NEXT: vldrwt.u32 q1, [r11]
	; CHECK-NEXT: vfmat.f32 q7, q1, q0			; CHECK-NEXT: vfmat.f32 q7, q1, q0
	; CHECK-NEXT: vstrw.32 q7, [sp, #48] @ 16-byte Spill			; CHECK-NEXT: add.w r6, r11, r5
	; CHECK-NEXT: vmov q7, q6
	; CHECK-NEXT: vmov q6, q5			; CHECK-NEXT: vmov q6, q5
	; CHECK-NEXT: vmov q5, q3			; CHECK-NEXT: vmov q5, q3
	; CHECK-NEXT: vmov q3, q4			; CHECK-NEXT: vmov q3, q4
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vldrwt.u32 q1, [r6]			; CHECK-NEXT: vldrwt.u32 q1, [r6]
	; CHECK-NEXT: vmov q4, q2			; CHECK-NEXT: vmov q4, q2
	; CHECK-NEXT: vldrw.u32 q2, [sp, #64] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #64] @ 16-byte Reload
	; CHECK-NEXT: adds r7, r6, r5
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vfmat.f32 q2, q1, q0			; CHECK-NEXT: vfmat.f32 q2, q1, q0
	; CHECK-NEXT: vstrw.32 q2, [sp, #64] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q2, [sp, #64] @ 16-byte Spill
				; CHECK-NEXT: adds r7, r6, r5
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vldrwt.u32 q1, [r7]			; CHECK-NEXT: vldrwt.u32 q1, [r7]
	; CHECK-NEXT: vldrw.u32 q2, [sp, #80] @ 16-byte Reload			; CHECK-NEXT: vldrw.u32 q2, [sp, #80] @ 16-byte Reload
				; CHECK-NEXT: adds r6, r7, r5
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vfmat.f32 q2, q1, q0			; CHECK-NEXT: vfmat.f32 q2, q1, q0
	; CHECK-NEXT: adds r6, r7, r5
	; CHECK-NEXT: vstrw.32 q2, [sp, #80] @ 16-byte Spill			; CHECK-NEXT: vstrw.32 q2, [sp, #80] @ 16-byte Spill
	; CHECK-NEXT: vmov q2, q4			; CHECK-NEXT: vmov q2, q4
	; CHECK-NEXT: vmov q4, q3			; CHECK-NEXT: vmov q4, q3
	; CHECK-NEXT: vmov q3, q5			; CHECK-NEXT: vmov q3, q5
	; CHECK-NEXT: vmov q5, q6			; CHECK-NEXT: vmov q5, q6
	; CHECK-NEXT: vmov q6, q7			; CHECK-NEXT: vldrw.u32 q6, [sp, #48] @ 16-byte Reload
	; CHECK-NEXT: vldrw.u32 q7, [sp, #48] @ 16-byte Reload
	; CHECK-NEXT: adds r7, r6, r5			; CHECK-NEXT: adds r7, r6, r5
	; CHECK-NEXT: vpstt			; CHECK-NEXT: vpstt
	; CHECK-NEXT: vldrwt.u32 q1, [r6]			; CHECK-NEXT: vldrwt.u32 q1, [r6]
	; CHECK-NEXT: vfmat.f32 q2, q1, q0			; CHECK-NEXT: vfmat.f32 q2, q1, q0
	; CHECK-NEXT: sub.w r10, r10, #4			; CHECK-NEXT: sub.w r10, r10, #4
	; CHECK-NEXT: adds r6, r7, r5			; CHECK-NEXT: adds r6, r7, r5
	; CHECK-NEXT: vpstttt			; CHECK-NEXT: vpstttt
	; CHECK-NEXT: vldrwt.u32 q1, [r7]			; CHECK-NEXT: vldrwt.u32 q1, [r7]
	▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll

	Show First 20 Lines • Show All 1,062 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}			; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
	; CHECK-NEXT: .pad #56			; CHECK-NEXT: .pad #56
	; CHECK-NEXT: sub sp, #56			; CHECK-NEXT: sub sp, #56
	; CHECK-NEXT: cmp r2, #8			; CHECK-NEXT: cmp r2, #8
	; CHECK-NEXT: str r1, [sp, #20] @ 4-byte Spill			; CHECK-NEXT: str r1, [sp, #20] @ 4-byte Spill
	; CHECK-NEXT: vstr s0, [sp, #4] @ 4-byte Spill			; CHECK-NEXT: vstr s0, [sp, #4] @ 4-byte Spill
	; CHECK-NEXT: mov r1, r2			; CHECK-NEXT: mov r1, r2
	; CHECK-NEXT: str r2, [sp, #8] @ 4-byte Spill			; CHECK-NEXT: str r2, [sp, #8] @ 4-byte Spill
	; CHECK-NEXT: blo.w .LBB7_9			; CHECK-NEXT: blo .LBB7_9
	; CHECK-NEXT: @ %bb.1:			; CHECK-NEXT: @ %bb.1:
	; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload
	; CHECK-NEXT: movs r3, #1			; CHECK-NEXT: movs r3, #1
	; CHECK-NEXT: mov.w r10, #0			; CHECK-NEXT: mov.w r10, #0
	; CHECK-NEXT: str r2, [sp, #16] @ 4-byte Spill			; CHECK-NEXT: str r2, [sp, #16] @ 4-byte Spill
	; CHECK-NEXT: lsrs r1, r2, #2			; CHECK-NEXT: lsrs r1, r2, #2
	; CHECK-NEXT: b .LBB7_3			; CHECK-NEXT: b .LBB7_3
	; CHECK-NEXT: .LBB7_2: @ in Loop: Header=BB7_3 Depth=1			; CHECK-NEXT: .LBB7_2: @ in Loop: Header=BB7_3 Depth=1
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll

Show All 14 Lines
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: beq.w .LBB0_8		; CHECK-NEXT: beq.w .LBB0_8
; CHECK-NEXT: @ %bb.1: @ %entry		; CHECK-NEXT: @ %bb.1: @ %entry
; CHECK-NEXT: cmp r3, #1		; CHECK-NEXT: cmp r3, #1
; CHECK-NEXT: bne .LBB0_3		; CHECK-NEXT: bne .LBB0_3
; CHECK-NEXT: @ %bb.2:		; CHECK-NEXT: @ %bb.2:
; CHECK-NEXT: movs r7, #0		; CHECK-NEXT: movs r7, #0
; CHECK-NEXT: mov r12, r0		; CHECK-NEXT: mov r12, r0
; CHECK-NEXT: mov r9, r1		; CHECK-NEXT: mov r6, r1
; CHECK-NEXT: mov r11, r2		; CHECK-NEXT: mov r11, r2
; CHECK-NEXT: b .LBB0_6		; CHECK-NEXT: b .LBB0_6
; CHECK-NEXT: .LBB0_3: @ %vector.ph		; CHECK-NEXT: .LBB0_3: @ %vector.ph
; CHECK-NEXT: str r3, [sp, #4] @ 4-byte Spill		; CHECK-NEXT: str r3, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: bic r3, r3, #1		; CHECK-NEXT: bic r3, r3, #1
; CHECK-NEXT: subs r7, r3, #2		; CHECK-NEXT: subs r7, r3, #2
; CHECK-NEXT: movs r6, #1		; CHECK-NEXT: movs r6, #1
; CHECK-NEXT: adr r4, .LCPI0_0		; CHECK-NEXT: adr r4, .LCPI0_0
; CHECK-NEXT: str r3, [sp] @ 4-byte Spill		; CHECK-NEXT: str r3, [sp] @ 4-byte Spill
; CHECK-NEXT: add.w lr, r6, r7, lsr #1		; CHECK-NEXT: add.w lr, r6, r7, lsr #1
; CHECK-NEXT: add.w r11, r2, r3, lsl #2		; CHECK-NEXT: add.w r11, r2, r3, lsl #2
; CHECK-NEXT: add.w r9, r1, r3, lsl #2		; CHECK-NEXT: add.w r6, r1, r3, lsl #2
; CHECK-NEXT: add.w r12, r0, r3, lsl #2		; CHECK-NEXT: add.w r12, r0, r3, lsl #2
; CHECK-NEXT: dls lr, lr		; CHECK-NEXT: dls lr, lr
; CHECK-NEXT: vldrw.u32 q0, [r4]		; CHECK-NEXT: vldrw.u32 q0, [r4]
		; CHECK-NEXT: mvn r10, #-2147483648
; CHECK-NEXT: vmvn.i32 q1, #0x80000000		; CHECK-NEXT: vmvn.i32 q1, #0x80000000
; CHECK-NEXT: mov.w r10, #-1
; CHECK-NEXT: .LBB0_4: @ %vector.body		; CHECK-NEXT: .LBB0_4: @ %vector.body
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1		; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldrd r4, r5, [r0]		; CHECK-NEXT: ldrd r4, r8, [r0]
; CHECK-NEXT: adds r0, #8		; CHECK-NEXT: adds r0, #8
; CHECK-NEXT: ldrd r7, r6, [r1]		; CHECK-NEXT: ldrd r7, r5, [r1]
; CHECK-NEXT: adds r1, #8		; CHECK-NEXT: adds r1, #8
; CHECK-NEXT: smull r8, r5, r6, r5		; CHECK-NEXT: smull r8, r5, r5, r8
; CHECK-NEXT: smull r4, r7, r7, r4		; CHECK-NEXT: smull r4, r7, r7, r4
; CHECK-NEXT: asrl r8, r5, #31		; CHECK-NEXT: asrl r8, r5, #31
; CHECK-NEXT: asrl r4, r7, #31		; CHECK-NEXT: asrl r4, r7, #31
; CHECK-NEXT: rsbs.w r3, r4, #-2147483648		; CHECK-NEXT: rsbs.w r9, r4, #-2147483648
; CHECK-NEXT: vmov.32 q4[0], r4		; CHECK-NEXT: vmov.32 q4[0], r4
; CHECK-NEXT: sbcs.w r3, r10, r7		; CHECK-NEXT: mov.w r9, #-1
; CHECK-NEXT: vmov.32 q4[1], r7		; CHECK-NEXT: sbcs.w r3, r9, r7
; CHECK-NEXT: mov.w r3, #0		; CHECK-NEXT: mov.w r3, #0
; CHECK-NEXT: vmov.32 q4[2], r8		; CHECK-NEXT: vmov.32 q4[1], r7
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r3, #1		; CHECK-NEXT: movlt r3, #1
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: csetm r3, ne		; CHECK-NEXT: csetm r3, ne
; CHECK-NEXT: vmov.32 q4[3], r5		; CHECK-NEXT: vmov.32 q4[2], r8
; CHECK-NEXT: vmov.32 q2[0], r3		; CHECK-NEXT: vmov.32 q2[0], r3
		; CHECK-NEXT: vmov.32 q4[3], r5
; CHECK-NEXT: vmov.32 q2[1], r3		; CHECK-NEXT: vmov.32 q2[1], r3
; CHECK-NEXT: rsbs.w r3, r8, #-2147483648		; CHECK-NEXT: rsbs.w r3, r8, #-2147483648
; CHECK-NEXT: sbcs.w r3, r10, r5		; CHECK-NEXT: sbcs.w r3, r9, r5
; CHECK-NEXT: mvn r5, #-2147483648
; CHECK-NEXT: mov.w r3, #0		; CHECK-NEXT: mov.w r3, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r3, #1		; CHECK-NEXT: movlt r3, #1
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: csetm r3, ne		; CHECK-NEXT: csetm r3, ne
; CHECK-NEXT: vmov.32 q2[2], r3		; CHECK-NEXT: vmov.32 q2[2], r3
; CHECK-NEXT: vmov.32 q2[3], r3		; CHECK-NEXT: vmov.32 q2[3], r3
; CHECK-NEXT: vbic q3, q0, q2		; CHECK-NEXT: vbic q3, q0, q2
; CHECK-NEXT: vand q2, q4, q2		; CHECK-NEXT: vand q2, q4, q2
; CHECK-NEXT: vorr q2, q2, q3		; CHECK-NEXT: vorr q2, q2, q3
; CHECK-NEXT: vmov r4, s8		; CHECK-NEXT: vmov r4, s8
; CHECK-NEXT: vmov r3, s9		; CHECK-NEXT: vmov r3, s9
; CHECK-NEXT: subs r4, r4, r5		; CHECK-NEXT: subs.w r4, r4, r10
; CHECK-NEXT: sbcs r3, r3, #0		; CHECK-NEXT: sbcs r3, r3, #0
; CHECK-NEXT: vmov r4, s10		; CHECK-NEXT: vmov r4, s10
; CHECK-NEXT: mov.w r3, #0		; CHECK-NEXT: mov.w r3, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r3, #1		; CHECK-NEXT: movlt r3, #1
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: csetm r3, ne		; CHECK-NEXT: csetm r3, ne
; CHECK-NEXT: vmov.32 q3[0], r3		; CHECK-NEXT: vmov.32 q3[0], r3
; CHECK-NEXT: vmov.32 q3[1], r3		; CHECK-NEXT: vmov.32 q3[1], r3
; CHECK-NEXT: vmov r3, s11		; CHECK-NEXT: vmov r3, s11
; CHECK-NEXT: subs r4, r4, r5		; CHECK-NEXT: subs.w r4, r4, r10
; CHECK-NEXT: sbcs r3, r3, #0		; CHECK-NEXT: sbcs r3, r3, #0
; CHECK-NEXT: mov.w r3, #0		; CHECK-NEXT: mov.w r3, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r3, #1		; CHECK-NEXT: movlt r3, #1
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: csetm r3, ne		; CHECK-NEXT: csetm r3, ne
; CHECK-NEXT: vmov.32 q3[2], r3		; CHECK-NEXT: vmov.32 q3[2], r3
; CHECK-NEXT: vbic q4, q1, q3		; CHECK-NEXT: vbic q4, q1, q3
Show All 12 Lines
; CHECK-NEXT: sub.w lr, r3, r7		; CHECK-NEXT: sub.w lr, r3, r7
; CHECK-NEXT: mov.w r0, #-1		; CHECK-NEXT: mov.w r0, #-1
; CHECK-NEXT: dls lr, lr		; CHECK-NEXT: dls lr, lr
; CHECK-NEXT: mov.w r1, #-2147483648		; CHECK-NEXT: mov.w r1, #-2147483648
; CHECK-NEXT: mvn r2, #-2147483648		; CHECK-NEXT: mvn r2, #-2147483648
; CHECK-NEXT: .LBB0_7: @ %for.body		; CHECK-NEXT: .LBB0_7: @ %for.body
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1		; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldr r3, [r12], #4		; CHECK-NEXT: ldr r3, [r12], #4
; CHECK-NEXT: ldr r4, [r9], #4		; CHECK-NEXT: ldr r4, [r6], #4
; CHECK-NEXT: smull r4, r3, r4, r3		; CHECK-NEXT: smull r4, r3, r4, r3
; CHECK-NEXT: asrl r4, r3, #31		; CHECK-NEXT: asrl r4, r3, #31
; CHECK-NEXT: subs r5, r1, r4		; CHECK-NEXT: subs r5, r1, r4
; CHECK-NEXT: sbcs.w r5, r0, r3		; CHECK-NEXT: sbcs.w r5, r0, r3
; CHECK-NEXT: mov.w r5, #0		; CHECK-NEXT: mov.w r5, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r5, #1		; CHECK-NEXT: movlt r5, #1
; CHECK-NEXT: cmp r5, #0		; CHECK-NEXT: cmp r5, #0
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
; CHECK-NEXT: sub sp, #4		; CHECK-NEXT: sub sp, #4
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: .pad #8		; CHECK-NEXT: .pad #8
; CHECK-NEXT: sub sp, #8		; CHECK-NEXT: sub sp, #8
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: beq.w .LBB1_8		; CHECK-NEXT: beq.w .LBB1_8
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader		; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
; CHECK-NEXT: mov r8, r2		; CHECK-NEXT: mov r9, r1
; CHECK-NEXT: movs r2, #0		; CHECK-NEXT: movs r7, #0
; CHECK-NEXT: cmp r3, #3		; CHECK-NEXT: cmp r3, #3
; CHECK-NEXT: bhi .LBB1_3		; CHECK-NEXT: bhi .LBB1_3
; CHECK-NEXT: @ %bb.2:		; CHECK-NEXT: @ %bb.2:
; CHECK-NEXT: mov r12, r0		; CHECK-NEXT: mov r12, r0
; CHECK-NEXT: mov r10, r1		; CHECK-NEXT: mov r1, r9
; CHECK-NEXT: mov r11, r8		; CHECK-NEXT: mov r11, r2
; CHECK-NEXT: b .LBB1_6		; CHECK-NEXT: b .LBB1_6
; CHECK-NEXT: .LBB1_3: @ %vector.ph		; CHECK-NEXT: .LBB1_3: @ %vector.ph
		; CHECK-NEXT: bic r7, r3, #3
		; CHECK-NEXT: adr r4, .LCPI1_0
		; CHECK-NEXT: subs r1, r7, #4
; CHECK-NEXT: str r3, [sp, #4] @ 4-byte Spill		; CHECK-NEXT: str r3, [sp, #4] @ 4-byte Spill
; CHECK-NEXT: bic r3, r3, #3		; CHECK-NEXT: movs r3, #1
; CHECK-NEXT: subs r2, r3, #4		; CHECK-NEXT: vldrw.u32 q0, [r4]
; CHECK-NEXT: movs r7, #1		; CHECK-NEXT: add.w lr, r3, r1, lsr #2
; CHECK-NEXT: str r3, [sp] @ 4-byte Spill		; CHECK-NEXT: str r7, [sp] @ 4-byte Spill
; CHECK-NEXT: add.w r11, r8, r3, lsl #2		; CHECK-NEXT: adr r4, .LCPI1_1
; CHECK-NEXT: add.w lr, r7, r2, lsr #2		; CHECK-NEXT: add.w r11, r2, r7, lsl #2
; CHECK-NEXT: adr r7, .LCPI1_0		; CHECK-NEXT: add.w r1, r9, r7, lsl #2
; CHECK-NEXT: vldrw.u32 q0, [r7]		; CHECK-NEXT: add.w r12, r0, r7, lsl #2
; CHECK-NEXT: adr r7, .LCPI1_1
; CHECK-NEXT: add.w r10, r1, r3, lsl #2
; CHECK-NEXT: add.w r12, r0, r3, lsl #2
; CHECK-NEXT: dls lr, lr		; CHECK-NEXT: dls lr, lr
; CHECK-NEXT: vldrw.u32 q1, [r7]		; CHECK-NEXT: vldrw.u32 q1, [r4]
; CHECK-NEXT: mov.w r3, #-1		; CHECK-NEXT: mov.w r10, #-1
; CHECK-NEXT: mvn r9, #-2147483648
; CHECK-NEXT: .LBB1_4: @ %vector.body		; CHECK-NEXT: .LBB1_4: @ %vector.body
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1		; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
; CHECK-NEXT: vldrw.u32 q2, [r0], #16		; CHECK-NEXT: vldrw.u32 q2, [r0], #16
; CHECK-NEXT: vldrw.u32 q3, [r1], #16		; CHECK-NEXT: vldrw.u32 q3, [r9], #16
; CHECK-NEXT: mov r2, lr
; CHECK-NEXT: vmov.f32 s16, s10		; CHECK-NEXT: vmov.f32 s16, s10
; CHECK-NEXT: vmov.f32 s20, s14		; CHECK-NEXT: vmov.f32 s20, s14
; CHECK-NEXT: vmov.f32 s18, s11		; CHECK-NEXT: vmov.f32 s18, s11
; CHECK-NEXT: vmov.f32 s22, s15		; CHECK-NEXT: vmov.f32 s22, s15
; CHECK-NEXT: vmullb.s32 q6, q5, q4		; CHECK-NEXT: vmullb.s32 q6, q5, q4
; CHECK-NEXT: vmov.f32 s10, s9		; CHECK-NEXT: vmov.f32 s10, s9
; CHECK-NEXT: vmov r7, s25		; CHECK-NEXT: vmov r7, s25
; CHECK-NEXT: vmov r6, s24		; CHECK-NEXT: vmov r4, s24
; CHECK-NEXT: asrl r6, r7, #31		; CHECK-NEXT: asrl r4, r7, #31
; CHECK-NEXT: vmov lr, s26		; CHECK-NEXT: vmov r8, s26
; CHECK-NEXT: rsbs.w r5, r6, #-2147483648		; CHECK-NEXT: rsbs.w r5, r4, #-2147483648
; CHECK-NEXT: vmov.f32 s14, s13		; CHECK-NEXT: vmov.f32 s14, s13
; CHECK-NEXT: sbcs.w r5, r3, r7		; CHECK-NEXT: sbcs.w r5, r10, r7
; CHECK-NEXT: mov.w r5, #0		; CHECK-NEXT: mov.w r5, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r5, #1		; CHECK-NEXT: movlt r5, #1
; CHECK-NEXT: cmp r5, #0		; CHECK-NEXT: cmp r5, #0
; CHECK-NEXT: csetm r5, ne		; CHECK-NEXT: csetm r5, ne
; CHECK-NEXT: vmov.32 q4[0], r5		; CHECK-NEXT: vmov.32 q4[0], r5
; CHECK-NEXT: vmov.32 q4[1], r5		; CHECK-NEXT: vmov.32 q4[1], r5
; CHECK-NEXT: vmov r5, s27		; CHECK-NEXT: vmov r5, s27
; CHECK-NEXT: asrl lr, r5, #31		; CHECK-NEXT: asrl r8, r5, #31
; CHECK-NEXT: vmov.32 q6[0], r6		; CHECK-NEXT: vmov.32 q6[0], r4
; CHECK-NEXT: rsbs.w r4, lr, #-2147483648		; CHECK-NEXT: rsbs.w r6, r8, #-2147483648
; CHECK-NEXT: vmov.32 q6[1], r7		; CHECK-NEXT: vmov.32 q6[1], r7
; CHECK-NEXT: sbcs.w r4, r3, r5		; CHECK-NEXT: sbcs.w r6, r10, r5
; CHECK-NEXT: vmov.32 q6[2], lr		; CHECK-NEXT: vmov.32 q6[2], r8
; CHECK-NEXT: mov.w r4, #0		; CHECK-NEXT: mov.w r6, #0
; CHECK-NEXT: vmov.32 q6[3], r5		; CHECK-NEXT: vmov.32 q6[3], r5
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r4, #1		; CHECK-NEXT: movlt r6, #1
; CHECK-NEXT: cmp r4, #0		; CHECK-NEXT: cmp r6, #0
; CHECK-NEXT: csetm r4, ne		; CHECK-NEXT: csetm r6, ne
; CHECK-NEXT: mov lr, r2		; CHECK-NEXT: mvn r8, #-2147483648
; CHECK-NEXT: vmov.32 q4[2], r4		; CHECK-NEXT: vmov.32 q4[2], r6
; CHECK-NEXT: vmov.32 q4[3], r4		; CHECK-NEXT: vmov.32 q4[3], r6
; CHECK-NEXT: vmov r4, s14		; CHECK-NEXT: vmov r6, s14
; CHECK-NEXT: vbic q5, q0, q4		; CHECK-NEXT: vbic q5, q0, q4
; CHECK-NEXT: vand q4, q6, q4		; CHECK-NEXT: vand q4, q6, q4
; CHECK-NEXT: vorr q4, q4, q5		; CHECK-NEXT: vorr q4, q4, q5
; CHECK-NEXT: vmov r6, s16		; CHECK-NEXT: vmov r5, s16
; CHECK-NEXT: vmov r7, s17		; CHECK-NEXT: vmov r4, s17
; CHECK-NEXT: subs.w r6, r6, r9		; CHECK-NEXT: subs.w r5, r5, r8
; CHECK-NEXT: sbcs r7, r7, #0		; CHECK-NEXT: sbcs r4, r4, #0
; CHECK-NEXT: vmov r6, s18		; CHECK-NEXT: vmov r5, s18
; CHECK-NEXT: mov.w r7, #0		; CHECK-NEXT: mov.w r4, #0
		; CHECK-NEXT: it lt
		; CHECK-NEXT: movlt r4, #1
		; CHECK-NEXT: cmp r4, #0
		; CHECK-NEXT: csetm r4, ne
		; CHECK-NEXT: vmov.32 q5[0], r4
		; CHECK-NEXT: vmov.32 q5[1], r4
		; CHECK-NEXT: vmov r4, s19
		; CHECK-NEXT: subs.w r5, r5, r8
		; CHECK-NEXT: vmov r5, s12
		; CHECK-NEXT: sbcs r4, r4, #0
		; CHECK-NEXT: mov.w r4, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r7, #1		; CHECK-NEXT: movlt r4, #1
; CHECK-NEXT: cmp r7, #0		; CHECK-NEXT: cmp r4, #0
; CHECK-NEXT: csetm r7, ne		; CHECK-NEXT: csetm r4, ne
; CHECK-NEXT: vmov.32 q5[0], r7		; CHECK-NEXT: vmov.32 q5[2], r4
; CHECK-NEXT: vmov.32 q5[1], r7		; CHECK-NEXT: vmov r4, s8
; CHECK-NEXT: vmov r7, s19
; CHECK-NEXT: subs.w r6, r6, r9
; CHECK-NEXT: vmov r6, s12
; CHECK-NEXT: sbcs r7, r7, #0
; CHECK-NEXT: mov.w r7, #0
; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r7, #1
; CHECK-NEXT: cmp r7, #0
; CHECK-NEXT: csetm r7, ne
; CHECK-NEXT: vmov.32 q5[2], r7
; CHECK-NEXT: vmov r7, s8
; CHECK-NEXT: vbic q6, q1, q5		; CHECK-NEXT: vbic q6, q1, q5
; CHECK-NEXT: vand q4, q4, q5		; CHECK-NEXT: vand q4, q4, q5
; CHECK-NEXT: vorr q4, q4, q6		; CHECK-NEXT: vorr q4, q4, q6
; CHECK-NEXT: smull r6, r7, r6, r7		; CHECK-NEXT: smull r4, r7, r5, r4
; CHECK-NEXT: asrl r6, r7, #31		; CHECK-NEXT: asrl r4, r7, #31
; CHECK-NEXT: rsbs.w r5, r6, #-2147483648		; CHECK-NEXT: rsbs.w r5, r4, #-2147483648
; CHECK-NEXT: vmov.32 q3[0], r6		; CHECK-NEXT: vmov.32 q3[0], r4
; CHECK-NEXT: sbcs.w r5, r3, r7		; CHECK-NEXT: sbcs.w r5, r10, r7
; CHECK-NEXT: vmov.32 q3[1], r7		; CHECK-NEXT: vmov.32 q3[1], r7
; CHECK-NEXT: mov.w r5, #0		; CHECK-NEXT: mov.w r5, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r5, #1		; CHECK-NEXT: movlt r5, #1
; CHECK-NEXT: cmp r5, #0		; CHECK-NEXT: cmp r5, #0
; CHECK-NEXT: csetm r5, ne		; CHECK-NEXT: csetm r5, ne
; CHECK-NEXT: vmov.32 q5[0], r5		; CHECK-NEXT: vmov.32 q5[0], r5
; CHECK-NEXT: vmov.32 q5[1], r5		; CHECK-NEXT: vmov.32 q5[1], r5
; CHECK-NEXT: vmov r5, s10		; CHECK-NEXT: vmov r5, s10
; CHECK-NEXT: smull r4, r5, r4, r5		; CHECK-NEXT: smull r6, r5, r6, r5
; CHECK-NEXT: asrl r4, r5, #31		; CHECK-NEXT: asrl r6, r5, #31
; CHECK-NEXT: rsbs.w r2, r4, #-2147483648		; CHECK-NEXT: rsbs.w r3, r6, #-2147483648
; CHECK-NEXT: vmov.32 q3[2], r4		; CHECK-NEXT: vmov.32 q3[2], r6
; CHECK-NEXT: sbcs.w r2, r3, r5		; CHECK-NEXT: sbcs.w r3, r10, r5
; CHECK-NEXT: vmov.32 q3[3], r5		; CHECK-NEXT: vmov.32 q3[3], r5
; CHECK-NEXT: mov.w r2, #0		; CHECK-NEXT: mov.w r3, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r2, #1		; CHECK-NEXT: movlt r3, #1
; CHECK-NEXT: cmp r2, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: csetm r2, ne		; CHECK-NEXT: csetm r3, ne
; CHECK-NEXT: vmov.32 q5[2], r2		; CHECK-NEXT: vmov.32 q5[2], r3
; CHECK-NEXT: vmov.32 q5[3], r2		; CHECK-NEXT: vmov.32 q5[3], r3
; CHECK-NEXT: vbic q2, q0, q5		; CHECK-NEXT: vbic q2, q0, q5
; CHECK-NEXT: vand q3, q3, q5		; CHECK-NEXT: vand q3, q3, q5
; CHECK-NEXT: vorr q2, q3, q2		; CHECK-NEXT: vorr q2, q3, q2
; CHECK-NEXT: vmov r7, s8		; CHECK-NEXT: vmov r4, s8
; CHECK-NEXT: vmov r2, s9		; CHECK-NEXT: vmov r3, s9
; CHECK-NEXT: subs.w r7, r7, r9		; CHECK-NEXT: subs.w r4, r4, r8
; CHECK-NEXT: sbcs r2, r2, #0		; CHECK-NEXT: sbcs r3, r3, #0
; CHECK-NEXT: vmov r7, s10		; CHECK-NEXT: vmov r4, s10
; CHECK-NEXT: mov.w r2, #0		; CHECK-NEXT: mov.w r3, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r2, #1		; CHECK-NEXT: movlt r3, #1
; CHECK-NEXT: cmp r2, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: csetm r2, ne		; CHECK-NEXT: csetm r3, ne
; CHECK-NEXT: vmov.32 q3[0], r2		; CHECK-NEXT: vmov.32 q3[0], r3
; CHECK-NEXT: vmov.32 q3[1], r2		; CHECK-NEXT: vmov.32 q3[1], r3
; CHECK-NEXT: vmov r2, s11		; CHECK-NEXT: vmov r3, s11
; CHECK-NEXT: subs.w r7, r7, r9		; CHECK-NEXT: subs.w r4, r4, r8
; CHECK-NEXT: sbcs r2, r2, #0		; CHECK-NEXT: sbcs r3, r3, #0
; CHECK-NEXT: mov.w r2, #0		; CHECK-NEXT: mov.w r3, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r2, #1		; CHECK-NEXT: movlt r3, #1
; CHECK-NEXT: cmp r2, #0		; CHECK-NEXT: cmp r3, #0
; CHECK-NEXT: csetm r2, ne		; CHECK-NEXT: csetm r3, ne
; CHECK-NEXT: vmov.32 q3[2], r2		; CHECK-NEXT: vmov.32 q3[2], r3
; CHECK-NEXT: vbic q5, q1, q3		; CHECK-NEXT: vbic q5, q1, q3
; CHECK-NEXT: vand q2, q2, q3		; CHECK-NEXT: vand q2, q2, q3
; CHECK-NEXT: vorr q2, q2, q5		; CHECK-NEXT: vorr q2, q2, q5
; CHECK-NEXT: vmov.f32 s9, s10		; CHECK-NEXT: vmov.f32 s9, s10
; CHECK-NEXT: vmov.f32 s10, s16		; CHECK-NEXT: vmov.f32 s10, s16
; CHECK-NEXT: vmov.f32 s11, s18		; CHECK-NEXT: vmov.f32 s11, s18
; CHECK-NEXT: vstrb.8 q2, [r8], #16		; CHECK-NEXT: vstrb.8 q2, [r2], #16
; CHECK-NEXT: le lr, .LBB1_4		; CHECK-NEXT: le lr, .LBB1_4
; CHECK-NEXT: @ %bb.5: @ %middle.block		; CHECK-NEXT: @ %bb.5: @ %middle.block
; CHECK-NEXT: ldrd r2, r3, [sp] @ 8-byte Folded Reload		; CHECK-NEXT: ldrd r7, r3, [sp] @ 8-byte Folded Reload
; CHECK-NEXT: cmp r2, r3		; CHECK-NEXT: cmp r7, r3
; CHECK-NEXT: beq .LBB1_8		; CHECK-NEXT: beq .LBB1_8
; CHECK-NEXT: .LBB1_6: @ %for.body.preheader21		; CHECK-NEXT: .LBB1_6: @ %for.body.preheader21
; CHECK-NEXT: sub.w lr, r3, r2		; CHECK-NEXT: sub.w lr, r3, r7
; CHECK-NEXT: mov.w r0, #-1		; CHECK-NEXT: mov.w r0, #-1
; CHECK-NEXT: dls lr, lr		; CHECK-NEXT: dls lr, lr
; CHECK-NEXT: mov.w r1, #-2147483648		; CHECK-NEXT: mov.w r3, #-2147483648
; CHECK-NEXT: mvn r3, #-2147483648		; CHECK-NEXT: mvn r2, #-2147483648
; CHECK-NEXT: .LBB1_7: @ %for.body		; CHECK-NEXT: .LBB1_7: @ %for.body
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1		; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldr r2, [r12], #4		; CHECK-NEXT: ldr r4, [r12], #4
; CHECK-NEXT: ldr r4, [r10], #4		; CHECK-NEXT: ldr r5, [r1], #4
; CHECK-NEXT: smull r2, r5, r4, r2		; CHECK-NEXT: smull r4, r5, r5, r4
; CHECK-NEXT: asrl r2, r5, #31		; CHECK-NEXT: asrl r4, r5, #31
; CHECK-NEXT: subs r4, r1, r2		; CHECK-NEXT: subs r6, r3, r4
; CHECK-NEXT: sbcs.w r4, r0, r5		; CHECK-NEXT: sbcs.w r6, r0, r5
; CHECK-NEXT: mov.w r4, #0		; CHECK-NEXT: mov.w r6, #0
; CHECK-NEXT: it lt		; CHECK-NEXT: it lt
; CHECK-NEXT: movlt r4, #1		; CHECK-NEXT: movlt r6, #1
; CHECK-NEXT: cmp r4, #0		; CHECK-NEXT: cmp r6, #0
; CHECK-NEXT: csel r2, r2, r1, ne		; CHECK-NEXT: csel r4, r4, r3, ne
; CHECK-NEXT: csel r4, r5, r0, ne		; CHECK-NEXT: csel r5, r5, r0, ne
; CHECK-NEXT: subs r5, r2, r3		; CHECK-NEXT: subs r6, r4, r2
; CHECK-NEXT: sbcs r4, r4, #0		; CHECK-NEXT: sbcs r5, r5, #0
; CHECK-NEXT: csel r2, r2, r3, lt		; CHECK-NEXT: csel r4, r4, r2, lt
; CHECK-NEXT: str r2, [r11], #4		; CHECK-NEXT: str r4, [r11], #4
; CHECK-NEXT: le lr, .LBB1_7		; CHECK-NEXT: le lr, .LBB1_7
; CHECK-NEXT: .LBB1_8: @ %for.cond.cleanup		; CHECK-NEXT: .LBB1_8: @ %for.cond.cleanup
; CHECK-NEXT: add sp, #8		; CHECK-NEXT: add sp, #8
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: add sp, #4		; CHECK-NEXT: add sp, #4
; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}		; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
; CHECK-NEXT: .p2align 4		; CHECK-NEXT: .p2align 4
; CHECK-NEXT: @ %bb.9:		; CHECK-NEXT: @ %bb.9:
▲ Show 20 Lines • Show All 932 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body.preheader21, %for.body
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

define arm_aapcs_vfpcc void @ssatmul_8i_q15(i16* nocapture readonly %pSrcA, i16* nocapture readonly %pSrcB, i16* noalias nocapture %pDst, i32 %N) {		define arm_aapcs_vfpcc void @ssatmul_8i_q15(i16* nocapture readonly %pSrcA, i16* nocapture readonly %pSrcB, i16* noalias nocapture %pDst, i32 %N) {
; CHECK-LABEL: ssatmul_8i_q15:		; CHECK-LABEL: ssatmul_8i_q15:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .save {r4, r5, r6, lr}		; CHECK-NEXT: .save {r4, r5, r6, lr}
; CHECK-NEXT: push {r4, r5, r6, lr}		; CHECK-NEXT: push {r4, r5, r6, lr}
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cbz r3, .LBB7_8
; CHECK-NEXT: beq .LBB7_8
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader		; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
; CHECK-NEXT: cmp r3, #7		; CHECK-NEXT: cmp r3, #7
; CHECK-NEXT: bhi .LBB7_3		; CHECK-NEXT: bhi .LBB7_3
; CHECK-NEXT: @ %bb.2:		; CHECK-NEXT: @ %bb.2:
; CHECK-NEXT: movs r5, #0		; CHECK-NEXT: movs r5, #0
; CHECK-NEXT: mov r12, r0		; CHECK-NEXT: mov r12, r0
; CHECK-NEXT: mov r6, r1		; CHECK-NEXT: mov r6, r1
; CHECK-NEXT: mov r4, r2		; CHECK-NEXT: mov r4, r2
▲ Show 20 Lines • Show All 1,179 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body.preheader23, %for.body
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

define arm_aapcs_vfpcc void @ssatmul_16i_q7(i8* nocapture readonly %pSrcA, i8* nocapture readonly %pSrcB, i8* noalias nocapture %pDst, i32 %N) {		define arm_aapcs_vfpcc void @ssatmul_16i_q7(i8* nocapture readonly %pSrcA, i8* nocapture readonly %pSrcB, i8* noalias nocapture %pDst, i32 %N) {
; CHECK-LABEL: ssatmul_16i_q7:		; CHECK-LABEL: ssatmul_16i_q7:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .save {r4, r5, r6, lr}		; CHECK-NEXT: .save {r4, r5, r6, lr}
; CHECK-NEXT: push {r4, r5, r6, lr}		; CHECK-NEXT: push {r4, r5, r6, lr}
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cbz r3, .LBB16_8
; CHECK-NEXT: beq .LBB16_8
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader		; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
; CHECK-NEXT: cmp r3, #15		; CHECK-NEXT: cmp r3, #15
; CHECK-NEXT: bhi .LBB16_3		; CHECK-NEXT: bhi .LBB16_3
; CHECK-NEXT: @ %bb.2:		; CHECK-NEXT: @ %bb.2:
; CHECK-NEXT: movs r5, #0		; CHECK-NEXT: movs r5, #0
; CHECK-NEXT: mov r12, r0		; CHECK-NEXT: mov r12, r0
; CHECK-NEXT: mov r6, r1		; CHECK-NEXT: mov r6, r1
; CHECK-NEXT: mov r4, r2		; CHECK-NEXT: mov r4, r2
▲ Show 20 Lines • Show All 758 Lines • ▼ Show 20 Lines	for.cond.cleanup: ; preds = %vector.body, %entry
ret void		ret void
}		}

define arm_aapcs_vfpcc void @usatmul_8_q7(i8* nocapture readonly %pSrcA, i8* nocapture readonly %pSrcB, i8* noalias nocapture %pDst, i32 %N) {		define arm_aapcs_vfpcc void @usatmul_8_q7(i8* nocapture readonly %pSrcA, i8* nocapture readonly %pSrcB, i8* noalias nocapture %pDst, i32 %N) {
; CHECK-LABEL: usatmul_8_q7:		; CHECK-LABEL: usatmul_8_q7:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .save {r4, r5, r6, lr}		; CHECK-NEXT: .save {r4, r5, r6, lr}
; CHECK-NEXT: push {r4, r5, r6, lr}		; CHECK-NEXT: push {r4, r5, r6, lr}
; CHECK-NEXT: cmp r3, #0		; CHECK-NEXT: cbz r3, .LBB20_8
; CHECK-NEXT: beq .LBB20_8
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader		; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
; CHECK-NEXT: cmp r3, #7		; CHECK-NEXT: cmp r3, #7
; CHECK-NEXT: bhi .LBB20_3		; CHECK-NEXT: bhi .LBB20_3
; CHECK-NEXT: @ %bb.2:		; CHECK-NEXT: @ %bb.2:
; CHECK-NEXT: movs r5, #0		; CHECK-NEXT: movs r5, #0
; CHECK-NEXT: mov r12, r0		; CHECK-NEXT: mov r12, r0
; CHECK-NEXT: mov r6, r1		; CHECK-NEXT: mov r6, r1
; CHECK-NEXT: mov r4, r2		; CHECK-NEXT: mov r4, r2
▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	while.end: ; preds = %while.body, %middle.block, %entry
ret void		ret void
}		}

define void @arm_cmplx_mag_squared_f32(float* nocapture readonly %pSrc, float* nocapture %pDst, i32 %numSamples) {		define void @arm_cmplx_mag_squared_f32(float* nocapture readonly %pSrc, float* nocapture %pDst, i32 %numSamples) {
; CHECK-LABEL: arm_cmplx_mag_squared_f32:		; CHECK-LABEL: arm_cmplx_mag_squared_f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .save {r4, r5, r7, lr}		; CHECK-NEXT: .save {r4, r5, r7, lr}
; CHECK-NEXT: push {r4, r5, r7, lr}		; CHECK-NEXT: push {r4, r5, r7, lr}
; CHECK-NEXT: cmp r2, #0		; CHECK-NEXT: cbz r2, .LBB1_8
; CHECK-NEXT: beq .LBB1_8
; CHECK-NEXT: @ %bb.1: @ %while.body.preheader		; CHECK-NEXT: @ %bb.1: @ %while.body.preheader
; CHECK-NEXT: cmp r2, #4		; CHECK-NEXT: cmp r2, #4
; CHECK-NEXT: blo .LBB1_9		; CHECK-NEXT: blo .LBB1_9
; CHECK-NEXT: @ %bb.2: @ %vector.memcheck		; CHECK-NEXT: @ %bb.2: @ %vector.memcheck
; CHECK-NEXT: add.w r3, r0, r2, lsl #3		; CHECK-NEXT: add.w r3, r0, r2, lsl #3
; CHECK-NEXT: cmp r3, r1		; CHECK-NEXT: cmp r3, r1
; CHECK-NEXT: itt hi		; CHECK-NEXT: itt hi
; CHECK-NEXT: addhi.w r3, r1, r2, lsl #2		; CHECK-NEXT: addhi.w r3, r1, r2, lsl #2
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM][RegAlloc] Add t2LoopEndDecClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 310836

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/lib/CodeGen/CalcSpillWeights.cpp

llvm/lib/CodeGen/MachineVerifier.cpp

llvm/lib/CodeGen/PHIElimination.cpp

llvm/lib/Target/ARM/ARMBaseInstrInfo.h

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp

llvm/lib/Target/ARM/ARMInstrThumb2.td

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

llvm/lib/Target/ARM/MVEVPTOptimisationsPass.cpp

llvm/test/CodeGen/Thumb2/LowOverheadLoops/count_dominates_start.mir

llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll

llvm/test/CodeGen/Thumb2/LowOverheadLoops/minloop.ll

llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll

llvm/test/CodeGen/Thumb2/mve-float32regloops.ll

llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll

llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll

llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll

[ARM][RegAlloc] Add t2LoopEndDec
ClosedPublic