This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
3
ReachingDefAnalysis.h
-
lib/
-
CodeGen/
-
ReachingDefAnalysis.cpp
-
Target/ARM/
-
ARM/
-
ARMBaseInstrInfo.h
6/21
ARMLowOverheadLoops.cpp
-
test/CodeGen/Thumb2/LowOverheadLoops/
-
CodeGen/
-
Thumb2/
-
LowOverheadLoops/
-
vector-arith-codegen.ll

Differential D75533

[ARM][LowOverheadLoops] Handle reductions
ClosedPublic

Authored by samparker on Mar 3 2020, 8:34 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen

Commits

rG3ee580d0176f: [ARM][LowOverheadLoops] Handle reductions

Summary

While validating live-out values, record instructions that look like a reduction. This will comprise of a vector op (for now only vadd), a vorr (vmov) which store the previous value of vadd and then a vpsel in the exit block which is predicated upon a vctp. This vctp will combine the last two iterations using the vmov and vadd into a vector which can then be consumed by a vaddv.
Once we have determined that it's safe to perform tail-predication, we need to change this sequence of instructions so that the predication doesn't produce incorrect code. This involves changing the register allocation of the vadd so it updates itself and the predication on the final iteration will not update the falsely predicated lanes. This mimics what the vmov, vctp and vpsel do and so we then don't need any of those instructions.

Diff Detail

Event Timeline

samparker created this revision.Mar 3 2020, 8:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 8:34 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

samparker added a parent revision: D75452: [ARM][MVE] Validate tail predication values.Mar 3 2020, 8:35 AM

Harbormaster failed remote builds in B47916: Diff 247913!Mar 3 2020, 8:53 AM

Big patch: this is just a first scan of the code, and a first round of nits. Now going to look again, to let things sink in.

llvm/include/llvm/CodeGen/ReachingDefAnalysis.h
191–192	you renamed this `Defs` to `Incoming`...
193	So rename this one too?
271	and here too?
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
205	Can you comment what these members are? I agree that most are self-explanatory, but I am e.g. interested in `Init`, and if it e.g. can be null (there is a check in the fixup function), and what the meaning is of that.
574	nit: perhaps an assert that VPSEL is a vpsel would be good.
594	nit, I think nicer to read is: MO.getReg() == 0 -> !MO.getReg().isValid()
602	Nit: just a bit shorter would be: for (auto &MO : MI->uses()) { if (MO.isImm() && MO.getImm() == Imm) return true; return false;
650	nit: was just curious why we expect the first item in the set to be the vpsel. Can we rely on that with a set?
664	Could this be a good candidate for a helper function in ARMBaseInstrInfo.h?
681	I guess you mean VMOV can be an alias for VORR, which you're checking here?
708	Can or should this not be checked much earlier?

samparker marked 5 inline comments as done.Mar 16 2020, 8:28 AM

samparker added inline comments.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
205	'Init' is the possible instruction that maybe initialising our result register, such as a mov #0, but we won't necessarily have an instruction doing this.
650	Because the set only has one member if we get here.
664	I'm not sure how relevant it is to the rest of the backend, but it would be more readable here as a local helper - especially once we add more supported opcodes.
681	Yes, I don't think we actually have a MVE VMOV instruction.
708	Yes, at some point in an unknown patch. I've moved it into one of the legality helpers.

Rebased, which has made checking the VPSEL predicate a much more simple task.

Herald added a subscriber: danielkiss. · View Herald TranscriptJun 29 2020, 5:33 AM

SjoerdMeijer mentioned this in D82773: [ARM][MVE] Tail-predication: clean-up removing unused code.Jun 29 2020, 8:04 AM

SjoerdMeijer added inline comments.Jun 29 2020, 8:26 AM

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
684	typo: that
808	Perhaps it's good to add an comment here or in the description of the algorithm on line 733 - 753 that reductions need special treatment as they define values that are not used by predicated instructions inside the loop.
1336	Do we have a test case with more than 1 reduction?

Rebased after adding tests in reductions.ll.
Re-added support for VADDi8 and VADDi16.
Added some extra comments and TODOs.

Thanks, nice optimisation.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
277	is it not const anymore?
693	Do we have a (negative) test case with a float reduction?
737	nit: perhaps move this comment down a bit...
739	...and check this earlier.

This revision is now accepted and ready to land.Jun 30 2020, 8:39 AM

SjoerdMeijer mentioned this in rGaf45907653fd: [ARM][MVE] Tail-predication: clean-up of unused code.Jun 30 2020, 9:13 AM

samparker marked an inline comment as done.Jul 1 2020, 12:24 AM

samparker added inline comments.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
693	The vectorizer doesn't seem to want to produce vector float reductions loops, so I've left them out on testing.

Closed by commit rG3ee580d0176f: [ARM][LowOverheadLoops] Handle reductions (authored by samparker). · Explain WhyJul 1 2020, 1:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

ReachingDefAnalysis.h

1 line

lib/

CodeGen/

ReachingDefAnalysis.cpp

6 lines

Target/

ARM/

ARMBaseInstrInfo.h

1 line

ARMLowOverheadLoops.cpp

231 lines

test/

CodeGen/

Thumb2/

LowOverheadLoops/

vector-arith-codegen.ll

65 lines

Diff 274072

llvm/include/llvm/CodeGen/ReachingDefAnalysis.h

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	public:
/// Provides the uses, in the same block as MI, of register that MI defines.		/// Provides the uses, in the same block as MI, of register that MI defines.
/// This does not consider live-outs.		/// This does not consider live-outs.
void getReachingLocalUses(MachineInstr *MI, int PhysReg,		void getReachingLocalUses(MachineInstr *MI, int PhysReg,
InstSet &Uses) const;		InstSet &Uses) const;

/// Search MBB for a definition of PhysReg and insert it into Defs. If no		/// Search MBB for a definition of PhysReg and insert it into Defs. If no
/// definition is found, recursively search the predecessor blocks for them.		/// definition is found, recursively search the predecessor blocks for them.
void getLiveOuts(MachineBasicBlock *MBB, int PhysReg, InstSet &Defs,		void getLiveOuts(MachineBasicBlock *MBB, int PhysReg, InstSet &Defs,
BlockSet &VisitedBBs) const;		BlockSet &VisitedBBs) const;
		void getLiveOuts(MachineBasicBlock *MBB, int PhysReg, InstSet &Defs) const;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions you renamed this `Defs` to `Incoming`... SjoerdMeijer: you renamed this `Defs` to `Incoming`...

		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions So rename this one too? SjoerdMeijer: So rename this one too?
/// For the given block, collect the instructions that use the live-in		/// For the given block, collect the instructions that use the live-in
/// value of the provided register. Return whether the value is still		/// value of the provided register. Return whether the value is still
/// live on exit.		/// live on exit.
bool getLiveInUses(MachineBasicBlock *MBB, int PhysReg,		bool getLiveInUses(MachineBasicBlock *MBB, int PhysReg,
InstSet &Uses) const;		InstSet &Uses) const;

/// Collect the users of the value stored in PhysReg, which is defined		/// Collect the users of the value stored in PhysReg, which is defined
/// by MI.		/// by MI.
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	private:
MachineInstr getInstFromId(MachineBasicBlock MBB, int InstId) const;		MachineInstr getInstFromId(MachineBasicBlock MBB, int InstId) const;

/// Provides the instruction of the closest reaching def instruction of		/// Provides the instruction of the closest reaching def instruction of
/// PhysReg that reaches MI, relative to the begining of MI's basic block.		/// PhysReg that reaches MI, relative to the begining of MI's basic block.
MachineInstr getReachingLocalMIDef(MachineInstr MI, int PhysReg) const;		MachineInstr getReachingLocalMIDef(MachineInstr MI, int PhysReg) const;
};		};

} // namespace llvm		} // namespace llvm

		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions and here too? SjoerdMeijer: and here too?
#endif // LLVM_CODEGEN_REACHINGDEFSANALYSIS_H		#endif // LLVM_CODEGEN_REACHINGDEFSANALYSIS_H

llvm/lib/CodeGen/ReachingDefAnalysis.cpp

Show First 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	while (!ToVisit.empty()) {
if (getLiveInUses(MBB, PhysReg, Uses))		if (getLiveInUses(MBB, PhysReg, Uses))
ToVisit.insert(ToVisit.end(), MBB->successors().begin(),		ToVisit.insert(ToVisit.end(), MBB->successors().begin(),
MBB->successors().end());		MBB->successors().end());
Visited.insert(MBB);		Visited.insert(MBB);
}		}
}		}
}		}

		void ReachingDefAnalysis::getLiveOuts(MachineBasicBlock *MBB, int PhysReg,
		InstSet &Defs) const {
		SmallPtrSet<MachineBasicBlock*, 2> VisitedBBs;
		getLiveOuts(MBB, PhysReg, Defs, VisitedBBs);
		}

void		void
ReachingDefAnalysis::getLiveOuts(MachineBasicBlock *MBB, int PhysReg,		ReachingDefAnalysis::getLiveOuts(MachineBasicBlock *MBB, int PhysReg,
InstSet &Defs, BlockSet &VisitedBBs) const {		InstSet &Defs, BlockSet &VisitedBBs) const {
if (VisitedBBs.count(MBB))		if (VisitedBBs.count(MBB))
return;		return;

VisitedBBs.insert(MBB);		VisitedBBs.insert(MBB);
LivePhysRegs LiveRegs(*TRI);		LivePhysRegs LiveRegs(*TRI);
▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMBaseInstrInfo.h

	Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines
	Opc == ARM::tSUBi3 \|\| Opc == ARM::tSUBi8 \|\|			Opc == ARM::tSUBi3 \|\| Opc == ARM::tSUBi8 \|\|
	Opc == ARM::tSUBSi3 \|\| Opc == ARM::tSUBSi8 \|\|			Opc == ARM::tSUBSi3 \|\| Opc == ARM::tSUBSi8 \|\|
	Opc == ARM::t2SUBri \|\| Opc == ARM::t2SUBri12 \|\| Opc == ARM::t2SUBSri;			Opc == ARM::t2SUBri \|\| Opc == ARM::t2SUBri12 \|\| Opc == ARM::t2SUBSri;
	}			}

	static inline bool isMovRegOpcode(int Opc) {			static inline bool isMovRegOpcode(int Opc) {
	return Opc == ARM::MOVr \|\| Opc == ARM::tMOVr \|\| Opc == ARM::t2MOVr;			return Opc == ARM::MOVr \|\| Opc == ARM::tMOVr \|\| Opc == ARM::t2MOVr;
	}			}

	/// isValidCoprocessorNumber - decide whether an explicit coprocessor			/// isValidCoprocessorNumber - decide whether an explicit coprocessor
	/// number is legal in generic instructions like CDP. The answer can			/// number is legal in generic instructions like CDP. The answer can
	/// vary with the subtarget.			/// vary with the subtarget.
	static inline bool isValidCoprocessorNumber(unsigned Num,			static inline bool isValidCoprocessorNumber(unsigned Num,
	const FeatureBitset& featureBits) {			const FeatureBitset& featureBits) {
	// In Armv7 and Armv8-M CP10 and CP11 clash with VFP/NEON, however, the			// In Armv7 and Armv8-M CP10 and CP11 clash with VFP/NEON, however, the
	// coprocessor is still valid for CDP/MCR/MRC and friends. Allowing it is			// coprocessor is still valid for CDP/MCR/MRC and friends. Allowing it is
	// useful for code which is shared with older architectures which do not know			// useful for code which is shared with older architectures which do not know
	▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	public:
}		}

unsigned size() const { return Insts.size(); }		unsigned size() const { return Insts.size(); }
SmallVectorImpl<PredicatedMI> &getInsts() { return Insts; }		SmallVectorImpl<PredicatedMI> &getInsts() { return Insts; }
MachineInstr *getPredicateThen() const { return PredicateThen->MI; }		MachineInstr *getPredicateThen() const { return PredicateThen->MI; }
PredicatedMI *getDivergent() const { return Divergent; }		PredicatedMI *getDivergent() const { return Divergent; }
};		};

		struct Reduction {
		MachineInstr *Init;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Can you comment what these members are? I agree that most are self-explanatory, but I am e.g. interested in `Init`, and if it e.g. can be null (there is a check in the fixup function), and what the meaning is of that. SjoerdMeijer: Can you comment what these members are? I agree that most are self-explanatory, but I am e.g.
		samparkerAuthorUnsubmitted Done Reply Inline Actions 'Init' is the possible instruction that maybe initialising our result register, such as a mov #0, but we won't necessarily have an instruction doing this. samparker: 'Init' is the possible instruction that maybe initialising our result register, such as a mov…
		MachineInstr &Copy;
		MachineInstr &Reduce;
		MachineInstr &VPSEL;

		Reduction(MachineInstr Init, MachineInstr Mov, MachineInstr *Add,
		MachineInstr *Sel)
		: Init(Init), Copy(Mov), Reduce(Add), VPSEL(*Sel) { }
		};

struct LowOverheadLoop {		struct LowOverheadLoop {

MachineLoop &ML;		MachineLoop &ML;
		MachineBasicBlock *Preheader = nullptr;
MachineLoopInfo &MLI;		MachineLoopInfo &MLI;
ReachingDefAnalysis &RDA;		ReachingDefAnalysis &RDA;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
		const ARMBaseInstrInfo &TII;
MachineFunction *MF = nullptr;		MachineFunction *MF = nullptr;
MachineInstr *InsertPt = nullptr;		MachineInstr *InsertPt = nullptr;
MachineInstr *Start = nullptr;		MachineInstr *Start = nullptr;
MachineInstr *Dec = nullptr;		MachineInstr *Dec = nullptr;
MachineInstr *End = nullptr;		MachineInstr *End = nullptr;
MachineInstr *VCTP = nullptr;		MachineInstr *VCTP = nullptr;
SmallPtrSet<MachineInstr*, 4> SecondaryVCTPs;		SmallPtrSet<MachineInstr*, 4> SecondaryVCTPs;
VPTBlock *CurrentBlock = nullptr;		VPTBlock *CurrentBlock = nullptr;
SetVector<MachineInstr*> CurrentPredicate;		SetVector<MachineInstr*> CurrentPredicate;
SmallVector<VPTBlock, 4> VPTBlocks;		SmallVector<VPTBlock, 4> VPTBlocks;
SmallPtrSet<MachineInstr*, 4> ToRemove;		SmallPtrSet<MachineInstr*, 4> ToRemove;
		SmallVector<std::unique_ptr<Reduction>, 1> Reductions;
SmallPtrSet<MachineInstr*, 4> BlockMasksToRecompute;		SmallPtrSet<MachineInstr*, 4> BlockMasksToRecompute;
bool Revert = false;		bool Revert = false;
bool CannotTailPredicate = false;		bool CannotTailPredicate = false;

LowOverheadLoop(MachineLoop &ML, MachineLoopInfo &MLI,		LowOverheadLoop(MachineLoop &ML, MachineLoopInfo &MLI,
ReachingDefAnalysis &RDA, const TargetRegisterInfo &TRI)		ReachingDefAnalysis &RDA, const TargetRegisterInfo &TRI,
: ML(ML), MLI(MLI), RDA(RDA), TRI(TRI) {		const ARMBaseInstrInfo &TII)
		: ML(ML), MLI(MLI), RDA(RDA), TRI(TRI), TII(TII) {
MF = ML.getHeader()->getParent();		MF = ML.getHeader()->getParent();
		if (auto *MBB = ML.getLoopPreheader())
		Preheader = MBB;
		else if (auto *MBB = MLI.findLoopPreheader(&ML, true))
		Preheader = MBB;
}		}

// If this is an MVE instruction, check that we know how to use tail		// If this is an MVE instruction, check that we know how to use tail
// predication with it. Record VPT blocks and return whether the		// predication with it. Record VPT blocks and return whether the
// instruction is valid for tail predication.		// instruction is valid for tail predication.
bool ValidateMVEInst(MachineInstr *MI);		bool ValidateMVEInst(MachineInstr *MI);

void AnalyseMVEInst(MachineInstr *MI) {		void AnalyseMVEInst(MachineInstr *MI) {
CannotTailPredicate = !ValidateMVEInst(MI);		CannotTailPredicate = !ValidateMVEInst(MI);
}		}

bool IsTailPredicationLegal() const {		bool IsTailPredicationLegal() const {
// For now, let's keep things really simple and only support a single		// For now, let's keep things really simple and only support a single
// block for tail predication.		// block for tail predication.
return !Revert && FoundAllComponents() && VCTP &&		return !Revert && FoundAllComponents() && VCTP &&
!CannotTailPredicate && ML.getNumBlocks() == 1;		!CannotTailPredicate && ML.getNumBlocks() == 1;
}		}

// Check that the predication in the loop will be equivalent once we		// Check that the predication in the loop will be equivalent once we
// perform the conversion. Also ensure that we can provide the number		// perform the conversion. Also ensure that we can provide the number
// of elements to the loop start instruction.		// of elements to the loop start instruction.
bool ValidateTailPredicate(MachineInstr *StartInsertPt);		bool ValidateTailPredicate(MachineInstr *StartInsertPt);

		// See whether the live-out instructions are a reduction that we can fixup
		// later.
		bool FindValidReduction(InstSet &LiveMIs, InstSet &LiveOutUsers);

// Check that any values available outside of the loop will be the same		// Check that any values available outside of the loop will be the same
// after tail predication conversion.		// after tail predication conversion.
bool ValidateLiveOuts() const;		bool ValidateLiveOuts();
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions is it not const anymore? SjoerdMeijer: is it not const anymore?

// Is it safe to define LR with DLS/WLS?		// Is it safe to define LR with DLS/WLS?
// LR can be defined if it is the operand to start, because it's the same		// LR can be defined if it is the operand to start, because it's the same
// value, or if it's going to be equivalent to the operand to Start.		// value, or if it's going to be equivalent to the operand to Start.
MachineInstr *isSafeToDefineLR();		MachineInstr *isSafeToDefineLR();

// Check the branch targets are within range and we satisfy our		// Check the branch targets are within range and we satisfy our
// restrictions.		// restrictions.
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	private:
void RevertWhile(MachineInstr *MI) const;		void RevertWhile(MachineInstr *MI) const;

bool RevertLoopDec(MachineInstr *MI) const;		bool RevertLoopDec(MachineInstr *MI) const;

void RevertLoopEnd(MachineInstr *MI, bool SkipCmp = false) const;		void RevertLoopEnd(MachineInstr *MI, bool SkipCmp = false) const;

void ConvertVPTBlocks(LowOverheadLoop &LoLoop);		void ConvertVPTBlocks(LowOverheadLoop &LoLoop);

		void FixupReductions(LowOverheadLoop &LoLoop) const;

MachineInstr *ExpandLoopStart(LowOverheadLoop &LoLoop);		MachineInstr *ExpandLoopStart(LowOverheadLoop &LoLoop);

void Expand(LowOverheadLoop &LoLoop);		void Expand(LowOverheadLoop &LoLoop);

void IterationCountDCE(LowOverheadLoop &LoLoop);		void IterationCountDCE(LowOverheadLoop &LoLoop);
};		};
}		}

▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	auto CannotProvideElements = [this](MachineBasicBlock *MBB,
// Don't continue searching up through multiple predecessors.		// Don't continue searching up through multiple predecessors.
if (MBB->pred_size() > 1)		if (MBB->pred_size() > 1)
return true;		return true;

return false;		return false;
};		};

// First, find the block that looks like the preheader.		// First, find the block that looks like the preheader.
MachineBasicBlock *MBB = MLI.findLoopPreheader(&ML, true);		MachineBasicBlock *MBB = Preheader;
if (!MBB) {		if (!MBB) {
LLVM_DEBUG(dbgs() << "ARM Loops: Didn't find preheader.\n");		LLVM_DEBUG(dbgs() << "ARM Loops: Didn't find preheader.\n");
return false;		return false;
}		}

// Then search backwards for a def, until we get to InsertBB.		// Then search backwards for a def, until we get to InsertBB.
while (MBB != InsertBB) {		while (MBB != InsertBB) {
if (CannotProvideElements(MBB, NumElements)) {		if (CannotProvideElements(MBB, NumElements)) {
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	static bool isVectorPredicated(MachineInstr *MI) {
return PIdx != -1 && MI->getOperand(PIdx + 1).getReg() == ARM::VPR;		return PIdx != -1 && MI->getOperand(PIdx + 1).getReg() == ARM::VPR;
}		}

static bool isRegInClass(const MachineOperand &MO,		static bool isRegInClass(const MachineOperand &MO,
const TargetRegisterClass *Class) {		const TargetRegisterClass *Class) {
return MO.isReg() && MO.getReg() && Class->contains(MO.getReg());		return MO.isReg() && MO.getReg() && Class->contains(MO.getReg());
}		}

// MVE 'narrowing' operate on half a lane, reading from half and writing		// MVE 'narrowing' operate on half a lane, reading from half and writing
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: perhaps an assert that VPSEL is a vpsel would be good. SjoerdMeijer: nit: perhaps an assert that VPSEL is a vpsel would be good.
// to half, which are referred to has the top and bottom half. The other		// to half, which are referred to has the top and bottom half. The other
// half retains its previous value.		// half retains its previous value.
static bool retainsPreviousHalfElement(const MachineInstr &MI) {		static bool retainsPreviousHalfElement(const MachineInstr &MI) {
const MCInstrDesc &MCID = MI.getDesc();		const MCInstrDesc &MCID = MI.getDesc();
uint64_t Flags = MCID.TSFlags;		uint64_t Flags = MCID.TSFlags;
return (Flags & ARMII::RetainsPreviousHalfElement) != 0;		return (Flags & ARMII::RetainsPreviousHalfElement) != 0;
}		}

// Some MVE instructions read from the top/bottom halves of their operand(s)		// Some MVE instructions read from the top/bottom halves of their operand(s)
// and generate a vector result with result elements that are double the		// and generate a vector result with result elements that are double the
// width of the input.		// width of the input.
static bool producesDoubleWidthResult(const MachineInstr &MI) {		static bool producesDoubleWidthResult(const MachineInstr &MI) {
const MCInstrDesc &MCID = MI.getDesc();		const MCInstrDesc &MCID = MI.getDesc();
uint64_t Flags = MCID.TSFlags;		uint64_t Flags = MCID.TSFlags;
return (Flags & ARMII::DoubleWidthResult) != 0;		return (Flags & ARMII::DoubleWidthResult) != 0;
}		}

static bool isHorizontalReduction(const MachineInstr &MI) {		static bool isHorizontalReduction(const MachineInstr &MI) {
const MCInstrDesc &MCID = MI.getDesc();		const MCInstrDesc &MCID = MI.getDesc();
uint64_t Flags = MCID.TSFlags;		uint64_t Flags = MCID.TSFlags;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit, I think nicer to read is: MO.getReg() == 0 -> !MO.getReg().isValid() SjoerdMeijer: nit, I think nicer to read is: MO.getReg() == 0 -> !MO.getReg().isValid()
return (Flags & ARMII::HorizontalReduction) != 0;		return (Flags & ARMII::HorizontalReduction) != 0;
}		}

// Can this instruction generate a non-zero result when given only zeroed		// Can this instruction generate a non-zero result when given only zeroed
// operands? This allows us to know that, given operands with false bytes		// operands? This allows us to know that, given operands with false bytes
// zeroed by masked loads, that the result will also contain zeros in those		// zeroed by masked loads, that the result will also contain zeros in those
// bytes.		// bytes.
static bool canGenerateNonZeros(const MachineInstr &MI) {		static bool canGenerateNonZeros(const MachineInstr &MI) {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: just a bit shorter would be: for (auto &MO : MI->uses()) { if (MO.isImm() && MO.getImm() == Imm) return true; return false; SjoerdMeijer: Nit: just a bit shorter would be: for (auto &MO : MI->uses()) { if (MO.isImm() && MO.

// Check for instructions which can write into a larger element size,		// Check for instructions which can write into a larger element size,
// possibly writing into a previous zero'd lane.		// possibly writing into a previous zero'd lane.
if (producesDoubleWidthResult(MI))		if (producesDoubleWidthResult(MI))
return true;		return true;

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
Show All 31 Lines	if (!MO.isReg() \|\| !MO.getReg())
continue;		continue;
if (!isRegInClass(MO, QPRs) && AllowScalars)		if (!isRegInClass(MO, QPRs) && AllowScalars)
continue;		continue;
if (auto *OpDef = RDA.getMIOperand(&MI, MO))		if (auto *OpDef = RDA.getMIOperand(&MI, MO))
if (FalseLanesZero.count(OpDef))		if (FalseLanesZero.count(OpDef))
continue;		continue;
return false;		return false;
}		}
LLVM_DEBUG(dbgs() << "ARM Loops: Always False Zeros: " << MI);		LLVM_DEBUG(dbgs() << "ARM Loops: Always False Zeros: " << MI);
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: was just curious why we expect the first item in the set to be the vpsel. Can we rely on that with a set? SjoerdMeijer: nit: was just curious why we expect the first item in the set to be the vpsel. Can we rely on…
		samparkerAuthorUnsubmitted Done Reply Inline Actions Because the set only has one member if we get here. samparker: Because the set only has one member if we get here.
return true;		return true;
}		}

bool LowOverheadLoop::ValidateLiveOuts() const {		bool
		LowOverheadLoop::FindValidReduction(InstSet &LiveMIs, InstSet &LiveOutUsers) {
		// Also check for reductions where the operation needs to be merging values
		// from the last and previous loop iterations. This means an instruction
		// producing a value and a vmov storing the value calculated in the previous
		// iteration. So we can have two live-out regs, one produced by a vmov and
		// both being consumed by a vpsel.
		LLVM_DEBUG(dbgs() << "ARM Loops: Looking for reduction live-outs:\n";
		for (auto *MI : LiveMIs)
		dbgs() << " - " << *MI);

		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Could this be a good candidate for a helper function in ARMBaseInstrInfo.h? SjoerdMeijer: Could this be a good candidate for a helper function in ARMBaseInstrInfo.h?
		samparkerAuthorUnsubmitted Done Reply Inline Actions I'm not sure how relevant it is to the rest of the backend, but it would be more readable here as a local helper - especially once we add more supported opcodes. samparker: I'm not sure how relevant it is to the rest of the backend, but it would be more readable here…
		// Expect a vmov, a vadd and a single vpsel user.
		if (LiveMIs.size() != 2 \|\| LiveOutUsers.size() != 1)
		return false;

		MachineInstr VPSEL = LiveOutUsers.begin();
		if (VPSEL->getOpcode() != ARM::MVE_VPSEL)
		return false;

		unsigned VPRIdx = llvm::findFirstVPTPredOperandIdx(*VPSEL) + 1;
		MachineInstr *Pred = RDA.getMIOperand(VPSEL, VPRIdx);
		if (!Pred \|\| Pred != VCTP) {
		LLVM_DEBUG(dbgs() << "ARM Loops: Not using equivalent predicate.\n");
		return false;
		}

		MachineInstr *Reduce = RDA.getMIOperand(VPSEL, 1);
		if (!Reduce)
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I guess you mean VMOV can be an alias for VORR, which you're checking here? SjoerdMeijer: I guess you mean VMOV can be an alias for VORR, which you're checking here?
		samparkerAuthorUnsubmitted Done Reply Inline Actions Yes, I don't think we actually have a MVE VMOV instruction. samparker: Yes, I don't think we actually have a MVE VMOV instruction.
		return false;

		// TODO: Support more operations that VADD.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions typo: that SjoerdMeijer: typo: that
		if (Reduce->getOpcode() != ARM::MVE_VADDi32)
		return false;

		// Check that the VORR is actually a VMOV.
		MachineInstr *Copy = RDA.getMIOperand(VPSEL, 2);
		if (!Copy \|\| Copy->getOpcode() != ARM::MVE_VORR \|\|
		!Copy->getOperand(1).isReg() \|\| !Copy->getOperand(2).isReg() \|\|
		Copy->getOperand(1).getReg() != Copy->getOperand(2).getReg())
		return false;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Do we have a (negative) test case with a float reduction? SjoerdMeijer: Do we have a (negative) test case with a float reduction?
		samparkerAuthorUnsubmitted Done Reply Inline Actions The vectorizer doesn't seem to want to produce vector float reductions loops, so I've left them out on testing. samparker: The vectorizer doesn't seem to want to produce vector float reductions loops, so I've left them…

		assert((LiveMIs.count(Reduce) && LiveMIs.count(Copy)) &&
		"Expected live outs to be consumed by vpsel");

		assert((Reduce->getOperand(0).getReg() == Reduce->getOperand(1).getReg() \|\|
		Reduce->getOperand(0).getReg() == Reduce->getOperand(2).getReg()) &&
		"Expected VADD to be overwriting one of its operands");

		// Check that the vadd and vmov are only used by each other and the vpsel.
		SmallPtrSet<MachineInstr*, 2> CopyUsers;
		RDA.getGlobalUses(Copy, Copy->getOperand(0).getReg(), CopyUsers);
		if (CopyUsers.size() > 2 \|\| !CopyUsers.count(Reduce))
		return false;

		SmallPtrSet<MachineInstr*, 2> ReduceUsers;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Can or should this not be checked much earlier? SjoerdMeijer: Can or should this not be checked much earlier?
		samparkerAuthorUnsubmitted Done Reply Inline Actions Yes, at some point in an unknown patch. I've moved it into one of the legality helpers. samparker: Yes, at some point in an unknown patch. I've moved it into one of the legality helpers.
		RDA.getGlobalUses(Reduce, Reduce->getOperand(0).getReg(), ReduceUsers);
		if (ReduceUsers.size() > 2 \|\| !ReduceUsers.count(Copy))
		return false;

		// Then find whether there's an instruction initialising the register that
		// is storing the reduction.
		if (!Preheader)
		return false;

		SmallPtrSet<MachineInstr*, 2> Incoming;
		RDA.getLiveOuts(Preheader, Copy->getOperand(1).getReg(), Incoming);
		if (Incoming.size() > 1)
		return false;

		MachineInstr Init = Incoming.empty() ? nullptr : Incoming.begin();
		LLVM_DEBUG(dbgs() << "ARM Loops: Found a reduction:\n"
		<< " - " << *Copy
		<< " - " << *Reduce
		<< " - " << *VPSEL);
		Reductions.push_back(std::make_unique<Reduction>(Init, Copy, Reduce, VPSEL));
		return true;
		}

		bool LowOverheadLoop::ValidateLiveOuts() {
// We want to find out if the tail-predicated version of this loop will		// We want to find out if the tail-predicated version of this loop will
// produce the same values as the loop in its original form. For this to		// produce the same values as the loop in its original form. For this to
// be true, the newly inserted implicit predication must not change the		// be true, the newly inserted implicit predication must not change the
// the (observable) results.		// the (observable) results.
// We're doing this because many instructions in the loop will not be		// We're doing this because many instructions in the loop will not be
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: perhaps move this comment down a bit... SjoerdMeijer: nit: perhaps move this comment down a bit...
// predicated and so the conversion from VPT predication to tail-predication		// predicated and so the conversion from VPT predication to tail-predication
// can result in different values being produced; due to the tail-predication		// can result in different values being produced; due to the tail-predication
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions ...and check this earlier. SjoerdMeijer: ...and check this earlier.
// preventing many instructions from updating their falsely predicated		// preventing many instructions from updating their falsely predicated
// lanes. This analysis assumes that all the instructions perform lane-wise		// lanes. This analysis assumes that all the instructions perform lane-wise
// operations and don't perform any exchanges.		// operations and don't perform any exchanges.
// A masked load, whether through VPT or tail predication, will write zeros		// A masked load, whether through VPT or tail predication, will write zeros
// to any of the falsely predicated bytes. So, from the loads, we know that		// to any of the falsely predicated bytes. So, from the loads, we know that
// the false lanes are zeroed and here we're trying to track that those false		// the false lanes are zeroed and here we're trying to track that those false
// lanes remain zero, or where they change, the differences are masked away		// lanes remain zero, or where they change, the differences are masked away
// by their user(s).		// by their user(s).
// All MVE loads and stores have to be predicated, so we know that any load		// All MVE loads and stores have to be predicated, so we know that any load
// operands, or stored results are equivalent already. Other explicitly		// operands, or stored results are equivalent already. Other explicitly
// predicated instructions will perform the same operation in the original		// predicated instructions will perform the same operation in the original
// loop and the tail-predicated form too. Because of this, we can insert		// loop and the tail-predicated form too. Because of this, we can insert
// loads, stores and other predicated instructions into our Predicated		// loads, stores and other predicated instructions into our Predicated
// set and build from there.		// set and build from there.
const TargetRegisterClass *QPRs = TRI.getRegClass(ARM::MQPRRegClassID);		const TargetRegisterClass *QPRs = TRI.getRegClass(ARM::MQPRRegClassID);
SetVector<MachineInstr *> FalseLanesUnknown;		SetVector<MachineInstr *> FalseLanesUnknown;
SmallPtrSet<MachineInstr *, 4> FalseLanesZero;		SmallPtrSet<MachineInstr *, 4> FalseLanesZero;
SmallPtrSet<MachineInstr *, 4> Predicated;		SmallPtrSet<MachineInstr *, 4> Predicated;
MachineBasicBlock *MBB = ML.getHeader();		MachineBasicBlock *Header = ML.getHeader();

for (auto &MI : *MBB) {		for (auto &MI : *Header) {
const MCInstrDesc &MCID = MI.getDesc();		const MCInstrDesc &MCID = MI.getDesc();
uint64_t Flags = MCID.TSFlags;		uint64_t Flags = MCID.TSFlags;
if ((Flags & ARMII::DomainMask) != ARMII::DomainMVE)		if ((Flags & ARMII::DomainMask) != ARMII::DomainMVE)
continue;		continue;

if (isVCTP(&MI) \|\| isVPTOpcode(MI.getOpcode()))		if (isVCTP(&MI) \|\| isVPTOpcode(MI.getOpcode()))
continue;		continue;

Show All 31 Lines	auto HasPredicatedUsers = [this](MachineInstr *MI, const MachineOperand &MO,
}		}
return true;		return true;
};		};

// Visit the unknowns in reverse so that we can start at the values being		// Visit the unknowns in reverse so that we can start at the values being
// stored and then we can work towards the leaves, hopefully adding more		// stored and then we can work towards the leaves, hopefully adding more
// instructions to Predicated. Successfully terminating the loop means that		// instructions to Predicated. Successfully terminating the loop means that
// all the unknown values have to found to be masked by predicated user(s).		// all the unknown values have to found to be masked by predicated user(s).
		SmallPtrSet<MachineInstr*, 2> NonPredicated;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Perhaps it's good to add an comment here or in the description of the algorithm on line 733 - 753 that reductions need special treatment as they define values that are not used by predicated instructions inside the loop. SjoerdMeijer: Perhaps it's good to add an comment here or in the description of the algorithm on line 733…
for (auto *MI : reverse(FalseLanesUnknown)) {		for (auto *MI : reverse(FalseLanesUnknown)) {
for (auto &MO : MI->operands()) {		for (auto &MO : MI->operands()) {
if (!isRegInClass(MO, QPRs) \|\| !MO.isDef())		if (!isRegInClass(MO, QPRs) \|\| !MO.isDef())
continue;		continue;
if (!HasPredicatedUsers(MI, MO, Predicated)) {		if (!HasPredicatedUsers(MI, MO, Predicated)) {
LLVM_DEBUG(dbgs() << "ARM Loops: Found an unknown def of : "		LLVM_DEBUG(dbgs() << "ARM Loops: Found an unknown def of : "
<< TRI.getRegAsmName(MO.getReg()) << " at " << *MI);		<< TRI.getRegAsmName(MO.getReg()) << " at " << *MI);
return false;		NonPredicated.insert(MI);
		continue;
}		}
}		}
// Any unknown false lanes have been masked away by the user(s).		// Any unknown false lanes have been masked away by the user(s).
Predicated.insert(MI);		Predicated.insert(MI);
}		}

// Collect Q-regs that are live in the exit blocks. We don't collect scalars		SmallPtrSet<MachineInstr *, 2> LiveOutMIs;
// because they won't be affected by lane predication.		SmallPtrSet<MachineInstr*, 2> LiveOutUsers;
SmallSet<Register, 2> LiveOuts;
SmallVector<MachineBasicBlock *, 2> ExitBlocks;		SmallVector<MachineBasicBlock *, 2> ExitBlocks;
ML.getExitBlocks(ExitBlocks);		ML.getExitBlocks(ExitBlocks);
for (auto *MBB : ExitBlocks)
for (const MachineBasicBlock::RegisterMaskPair &RegMask : MBB->liveins())
if (QPRs->contains(RegMask.PhysReg))
LiveOuts.insert(RegMask.PhysReg);

// Collect the instructions in the loop body that define the live-out values.
SmallPtrSet<MachineInstr *, 2> LiveMIs;
assert(ML.getNumBlocks() == 1 && "Expected single block loop!");		assert(ML.getNumBlocks() == 1 && "Expected single block loop!");
for (auto Reg : LiveOuts)		assert(ExitBlocks.size() == 1 && "Expected a single exit block");
if (auto *MI = RDA.getLocalLiveOutMIDef(MBB, Reg))		MachineBasicBlock *ExitBB = ExitBlocks.front();
LiveMIs.insert(MI);		for (const MachineBasicBlock::RegisterMaskPair &RegMask : ExitBB->liveins()) {
		// Check Q-regs that are live in the exit blocks. We don't collect scalars
		// because they won't be affected by lane predication.
		if (QPRs->contains(RegMask.PhysReg)) {
		if (auto *MI = RDA.getLocalLiveOutMIDef(Header, RegMask.PhysReg))
		LiveOutMIs.insert(MI);
		RDA.getLiveInUses(ExitBB, RegMask.PhysReg, LiveOutUsers);
		}
		}

		// If we have any non-predicated live-outs, they need to be part of a
		// reduction that we can fixup later.
		if (!NonPredicated.empty() &&
		!FindValidReduction(NonPredicated, LiveOutUsers))
		return false;

LLVM_DEBUG(dbgs() << "ARM Loops: Found loop live-outs:\n";
for (auto *MI : LiveMIs)
dbgs() << " - " << *MI);
// We've already validated that any VPT predication within the loop will be		// We've already validated that any VPT predication within the loop will be
// equivalent when we perform the predication transformation; so we know that		// equivalent when we perform the predication transformation; so we know that
// any VPT predicated instruction is predicated upon VCTP. Any live-out		// any VPT predicated instruction is predicated upon VCTP. Any live-out
// instruction needs to be predicated, so check this here.		// instruction needs to be predicated, so check this here. The instructions
for (auto *MI : LiveMIs)		// in NonPredicated have been found to be a reduction that we can ensure its
if (!isVectorPredicated(MI))		// legality.
		for (auto *MI : LiveOutMIs)
		if (!isVectorPredicated(MI) && !NonPredicated.count(MI))
return false;		return false;

return true;		return true;
}		}

void LowOverheadLoop::CheckLegality(ARMBasicBlockUtils *BBUtils) {		void LowOverheadLoop::CheckLegality(ARMBasicBlockUtils *BBUtils) {
if (Revert)		if (Revert)
return;		return;
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	for (auto &MI : *MBB) {
if (isLoopStart(MI))		if (isLoopStart(MI))
return &MI;		return &MI;
}		}
if (MBB->pred_size() == 1)		if (MBB->pred_size() == 1)
return SearchForStart(*MBB->pred_begin());		return SearchForStart(*MBB->pred_begin());
return nullptr;		return nullptr;
};		};

LowOverheadLoop LoLoop(ML, MLI, RDA, TRI);		LowOverheadLoop LoLoop(ML, MLI, RDA, TRI, *TII);
// Search the preheader for the start intrinsic.		// Search the preheader for the start intrinsic.
// FIXME: I don't see why we shouldn't be supporting multiple predecessors		// FIXME: I don't see why we shouldn't be supporting multiple predecessors
// with potentially multiple set.loop.iterations, so we need to enable this.		// with potentially multiple set.loop.iterations, so we need to enable this.
if (auto *Preheader = ML->getLoopPreheader())		if (LoLoop.Preheader)
LoLoop.Start = SearchForStart(Preheader);		LoLoop.Start = SearchForStart(LoLoop.Preheader);
else if (auto *Preheader = MLI->findLoopPreheader(ML, true))
LoLoop.Start = SearchForStart(Preheader);
else		else
return false;		return false;

// Find the low-overhead loop components and decide whether or not to fall		// Find the low-overhead loop components and decide whether or not to fall
// back to a normal loop. Also look for a vctp instructions and decide		// back to a normal loop. Also look for a vctp instructions and decide
// whether we can convert that predicate using tail predication.		// whether we can convert that predicate using tail predication.
for (auto *MBB : reverse(ML->getBlocks())) {		for (auto *MBB : reverse(ML->getBlocks())) {
for (auto &MI : *MBB) {		for (auto &MI : *MBB) {
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	MachineInstr* ARMLowOverheadLoops::ExpandLoopStart(LowOverheadLoop &LoLoop) {
// If we're inserting at a mov lr, then remove it as it's redundant.		// If we're inserting at a mov lr, then remove it as it's redundant.
if (InsertPt != Start)		if (InsertPt != Start)
LoLoop.ToRemove.insert(InsertPt);		LoLoop.ToRemove.insert(InsertPt);
LoLoop.ToRemove.insert(Start);		LoLoop.ToRemove.insert(Start);
LLVM_DEBUG(dbgs() << "ARM Loops: Inserted start: " << *MIB);		LLVM_DEBUG(dbgs() << "ARM Loops: Inserted start: " << *MIB);
return &*MIB;		return &*MIB;
}		}

		void ARMLowOverheadLoops::FixupReductions(LowOverheadLoop &LoLoop) const {
		LLVM_DEBUG(dbgs() << "ARM Loops: Fixing up reduction(s).\n");
		auto BuildMov = [this](MachineInstr &InsertPt, Register To, Register From) {
		MachineBasicBlock *MBB = InsertPt.getParent();
		MachineInstrBuilder MIB =
		BuildMI(*MBB, &InsertPt, InsertPt.getDebugLoc(), TII->get(ARM::MVE_VORR));
		MIB.addDef(To);
		MIB.addReg(From);
		MIB.addReg(From);
		MIB.addImm(0);
		MIB.addReg(0);
		MIB.addReg(To);
		LLVM_DEBUG(dbgs() << "ARM Loops: Inserted VMOV: " << *MIB);
		};

		for (auto &Reduction : LoLoop.Reductions) {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Do we have a test case with more than 1 reduction? SjoerdMeijer: Do we have a test case with more than 1 reduction?
		MachineInstr &Copy = Reduction->Copy;
		MachineInstr &Reduce = Reduction->Reduce;
		Register DestReg = Copy.getOperand(0).getReg();

		// Change the initialiser if present
		if (Reduction->Init) {
		MachineInstr *Init = Reduction->Init;

		for (unsigned i = 0; i < Init->getNumOperands(); ++i) {
		MachineOperand &MO = Init->getOperand(i);
		if (MO.isReg() && MO.isUse() && MO.isTied() &&
		Init->findTiedOperandIdx(i) == 0)
		Init->getOperand(i).setReg(DestReg);
		}
		Init->getOperand(0).setReg(DestReg);
		LLVM_DEBUG(dbgs() << "ARM Loops: Changed init regs: " << *Init);
		} else
		BuildMov(LoLoop.Preheader->instr_back(), DestReg, Copy.getOperand(1).getReg());

		// Change the reducing op to write to the register that is used to copy
		// its value on the next iteration. Also update the tied-def operand.
		Reduce.getOperand(0).setReg(DestReg);
		Reduce.getOperand(5).setReg(DestReg);
		LLVM_DEBUG(dbgs() << "ARM Loops: Changed reduction regs: " << Reduce);

		// Instead of a vpsel, just copy the register into the necessary one.
		MachineInstr &VPSEL = Reduction->VPSEL;
		if (VPSEL.getOperand(0).getReg() != DestReg)
		BuildMov(VPSEL, VPSEL.getOperand(0).getReg(), DestReg);

		// Remove the unnecessary instructions.
		LLVM_DEBUG(dbgs() << "ARM Loops: Removing:\n"
		<< " - " << Copy
		<< " - " << VPSEL << "\n");
		Copy.eraseFromParent();
		VPSEL.eraseFromParent();
		}
		}

void ARMLowOverheadLoops::ConvertVPTBlocks(LowOverheadLoop &LoLoop) {		void ARMLowOverheadLoops::ConvertVPTBlocks(LowOverheadLoop &LoLoop) {
auto RemovePredicate = [](MachineInstr *MI) {		auto RemovePredicate = [](MachineInstr *MI) {
LLVM_DEBUG(dbgs() << "ARM Loops: Removing predicate from: " << *MI);		LLVM_DEBUG(dbgs() << "ARM Loops: Removing predicate from: " << *MI);
if (int PIdx = llvm::findFirstVPTPredOperandIdx(*MI)) {		if (int PIdx = llvm::findFirstVPTPredOperandIdx(*MI)) {
assert(MI->getOperand(PIdx).getImm() == ARMVCC::Then &&		assert(MI->getOperand(PIdx).getImm() == ARMVCC::Then &&
"Expected Then predicate!");		"Expected Then predicate!");
MI->getOperand(PIdx).setImm(ARMVCC::None);		MI->getOperand(PIdx).setImm(ARMVCC::None);
MI->getOperand(PIdx+1).setReg(0);		MI->getOperand(PIdx+1).setReg(0);
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	else
LoLoop.Start->eraseFromParent();		LoLoop.Start->eraseFromParent();
bool FlagsAlreadySet = RevertLoopDec(LoLoop.Dec);		bool FlagsAlreadySet = RevertLoopDec(LoLoop.Dec);
RevertLoopEnd(LoLoop.End, FlagsAlreadySet);		RevertLoopEnd(LoLoop.End, FlagsAlreadySet);
} else {		} else {
LoLoop.Start = ExpandLoopStart(LoLoop);		LoLoop.Start = ExpandLoopStart(LoLoop);
RemoveDeadBranch(LoLoop.Start);		RemoveDeadBranch(LoLoop.Start);
LoLoop.End = ExpandLoopEnd(LoLoop);		LoLoop.End = ExpandLoopEnd(LoLoop);
RemoveDeadBranch(LoLoop.End);		RemoveDeadBranch(LoLoop.End);
if (LoLoop.IsTailPredicationLegal())		if (LoLoop.IsTailPredicationLegal()) {
ConvertVPTBlocks(LoLoop);		ConvertVPTBlocks(LoLoop);
		FixupReductions(LoLoop);
		}
for (auto *I : LoLoop.ToRemove) {		for (auto *I : LoLoop.ToRemove) {
LLVM_DEBUG(dbgs() << "ARM Loops: Erasing " << *I);		LLVM_DEBUG(dbgs() << "ARM Loops: Erasing " << *I);
I->eraseFromParent();		I->eraseFromParent();
}		}
for (auto *I : LoLoop.BlockMasksToRecompute) {		for (auto *I : LoLoop.BlockMasksToRecompute) {
LLVM_DEBUG(dbgs() << "ARM Loops: Recomputing VPT/VPST Block Mask: " << *I);		LLVM_DEBUG(dbgs() << "ARM Loops: Recomputing VPT/VPST Block Mask: " << *I);
recomputeVPTBlockMask(*I);		recomputeVPTBlockMask(*I);
LLVM_DEBUG(dbgs() << " ... done: " << *I);		LLVM_DEBUG(dbgs() << " ... done: " << *I);
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vector-arith-codegen.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=armv8.1m.main -mattr=+mve -disable-mve-tail-predication=false --verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=armv8.1m.main -mattr=+mve -disable-mve-tail-predication=false --verify-machineinstrs %s -o - \| FileCheck %s

	define dso_local i32 @mul_reduce_add(i32* noalias nocapture readonly %a, i32* noalias nocapture readonly %b, i32 %N) {			define dso_local i32 @mul_reduce_add(i32* noalias nocapture readonly %a, i32* noalias nocapture readonly %b, i32 %N) {
	; CHECK-LABEL: mul_reduce_add:			; CHECK-LABEL: mul_reduce_add:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: cmp r2, #0			; CHECK-NEXT: cmp r2, #0
	; CHECK-NEXT: itt eq			; CHECK-NEXT: itt eq
	; CHECK-NEXT: moveq r0, #0			; CHECK-NEXT: moveq r0, #0
	; CHECK-NEXT: bxeq lr			; CHECK-NEXT: bxeq lr
	; CHECK-NEXT: push {r7, lr}			; CHECK-NEXT: push {r7, lr}
	; CHECK-NEXT: adds r3, r2, #3			; CHECK-NEXT: vmov.i32 q1, #0x0
	; CHECK-NEXT: vmov.i32 q0, #0x0
	; CHECK-NEXT: bic r3, r3, #3
	; CHECK-NEXT: sub.w r12, r3, #4
	; CHECK-NEXT: movs r3, #1
	; CHECK-NEXT: add.w lr, r3, r12, lsr #2
	; CHECK-NEXT: movs r3, #0			; CHECK-NEXT: movs r3, #0
	; CHECK-NEXT: dls lr, lr			; CHECK-NEXT: dlstp.32 lr, r2
	; CHECK-NEXT: .LBB0_1: @ %vector.body			; CHECK-NEXT: .LBB0_1: @ %vector.body
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: vctp.32 r2			; CHECK-NEXT: vldrw.u32 q0, [r0], #16
	; CHECK-NEXT: vmov q1, q0			; CHECK-NEXT: vldrw.u32 q2, [r1], #16
	; CHECK-NEXT: vpstt
	; CHECK-NEXT: vldrwt.u32 q0, [r0], #16
	; CHECK-NEXT: vldrwt.u32 q2, [r1], #16
	; CHECK-NEXT: adds r3, #4			; CHECK-NEXT: adds r3, #4
	; CHECK-NEXT: vmul.i32 q0, q2, q0			; CHECK-NEXT: vmul.i32 q0, q2, q0
	; CHECK-NEXT: subs r2, #4			; CHECK-NEXT: vadd.i32 q1, q0, q1
	; CHECK-NEXT: vadd.i32 q0, q0, q1			; CHECK-NEXT: letp lr, .LBB0_1
	; CHECK-NEXT: le lr, .LBB0_1
	; CHECK-NEXT: @ %bb.2: @ %middle.block			; CHECK-NEXT: @ %bb.2: @ %middle.block
	; CHECK-NEXT: vpsel q0, q0, q1			; CHECK-NEXT: vmov q0, q1
	; CHECK-NEXT: vaddv.u32 r0, q0			; CHECK-NEXT: vaddv.u32 r0, q0
	; CHECK-NEXT: pop {r7, pc}			; CHECK-NEXT: pop {r7, pc}
	entry:			entry:
	%cmp8 = icmp eq i32 %N, 0			%cmp8 = icmp eq i32 %N, 0
	br i1 %cmp8, label %for.cond.cleanup, label %vector.ph			br i1 %cmp8, label %for.cond.cleanup, label %vector.ph

	vector.ph: ; preds = %entry			vector.ph: ; preds = %entry
	%n.rnd.up = add i32 %N, 3			%n.rnd.up = add i32 %N, 3
	Show All 38 Lines
	define dso_local i32 @mul_reduce_add_const(i32* noalias nocapture readonly %a, i32 %b, i32 %N) {			define dso_local i32 @mul_reduce_add_const(i32* noalias nocapture readonly %a, i32 %b, i32 %N) {
	; CHECK-LABEL: mul_reduce_add_const:			; CHECK-LABEL: mul_reduce_add_const:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: cmp r2, #0			; CHECK-NEXT: cmp r2, #0
	; CHECK-NEXT: itt eq			; CHECK-NEXT: itt eq
	; CHECK-NEXT: moveq r0, #0			; CHECK-NEXT: moveq r0, #0
	; CHECK-NEXT: bxeq lr			; CHECK-NEXT: bxeq lr
	; CHECK-NEXT: push {r7, lr}			; CHECK-NEXT: push {r7, lr}
	; CHECK-NEXT: adds r1, r2, #3			; CHECK-NEXT: vmov.i32 q1, #0x0
	; CHECK-NEXT: movs r3, #1
	; CHECK-NEXT: bic r1, r1, #3
	; CHECK-NEXT: vmov.i32 q0, #0x0
	; CHECK-NEXT: subs r1, #4
	; CHECK-NEXT: add.w lr, r3, r1, lsr #2
	; CHECK-NEXT: movs r1, #0			; CHECK-NEXT: movs r1, #0
	; CHECK-NEXT: dls lr, lr			; CHECK-NEXT: dlstp.32 lr, r2
	; CHECK-NEXT: .LBB1_1: @ %vector.body			; CHECK-NEXT: .LBB1_1: @ %vector.body
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: vctp.32 r2			; CHECK-NEXT: vldrw.u32 q0, [r0], #16
	; CHECK-NEXT: vmov q1, q0
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vldrwt.u32 q0, [r0], #16
	; CHECK-NEXT: adds r1, #4			; CHECK-NEXT: adds r1, #4
	; CHECK-NEXT: subs r2, #4			; CHECK-NEXT: vadd.i32 q1, q0, q1
	; CHECK-NEXT: vadd.i32 q0, q0, q1			; CHECK-NEXT: letp lr, .LBB1_1
	; CHECK-NEXT: le lr, .LBB1_1
	; CHECK-NEXT: @ %bb.2: @ %middle.block			; CHECK-NEXT: @ %bb.2: @ %middle.block
	; CHECK-NEXT: vpsel q0, q0, q1			; CHECK-NEXT: vmov q0, q1
	; CHECK-NEXT: vaddv.u32 r0, q0			; CHECK-NEXT: vaddv.u32 r0, q0
	; CHECK-NEXT: pop {r7, pc}			; CHECK-NEXT: pop {r7, pc}
	entry:			entry:
	%cmp6 = icmp eq i32 %N, 0			%cmp6 = icmp eq i32 %N, 0
	br i1 %cmp6, label %for.cond.cleanup, label %vector.ph			br i1 %cmp6, label %for.cond.cleanup, label %vector.ph

	vector.ph: ; preds = %entry			vector.ph: ; preds = %entry
	%n.rnd.up = add i32 %N, 3			%n.rnd.up = add i32 %N, 3
	Show All 34 Lines
	define dso_local i32 @add_reduce_add_const(i32* noalias nocapture readonly %a, i32 %b, i32 %N) {			define dso_local i32 @add_reduce_add_const(i32* noalias nocapture readonly %a, i32 %b, i32 %N) {
	; CHECK-LABEL: add_reduce_add_const:			; CHECK-LABEL: add_reduce_add_const:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: cmp r2, #0			; CHECK-NEXT: cmp r2, #0
	; CHECK-NEXT: itt eq			; CHECK-NEXT: itt eq
	; CHECK-NEXT: moveq r0, #0			; CHECK-NEXT: moveq r0, #0
	; CHECK-NEXT: bxeq lr			; CHECK-NEXT: bxeq lr
	; CHECK-NEXT: push {r7, lr}			; CHECK-NEXT: push {r7, lr}
	; CHECK-NEXT: adds r1, r2, #3			; CHECK-NEXT: vmov.i32 q1, #0x0
	; CHECK-NEXT: movs r3, #1
	; CHECK-NEXT: bic r1, r1, #3
	; CHECK-NEXT: vmov.i32 q0, #0x0
	; CHECK-NEXT: subs r1, #4
	; CHECK-NEXT: add.w lr, r3, r1, lsr #2
	; CHECK-NEXT: movs r1, #0			; CHECK-NEXT: movs r1, #0
	; CHECK-NEXT: dls lr, lr			; CHECK-NEXT: dlstp.32 lr, r2
	; CHECK-NEXT: .LBB2_1: @ %vector.body			; CHECK-NEXT: .LBB2_1: @ %vector.body
	; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1			; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: vctp.32 r2			; CHECK-NEXT: vldrw.u32 q0, [r0], #16
	; CHECK-NEXT: vmov q1, q0
	; CHECK-NEXT: vpst
	; CHECK-NEXT: vldrwt.u32 q0, [r0], #16
	; CHECK-NEXT: adds r1, #4			; CHECK-NEXT: adds r1, #4
	; CHECK-NEXT: subs r2, #4			; CHECK-NEXT: vadd.i32 q1, q0, q1
	; CHECK-NEXT: vadd.i32 q0, q0, q1			; CHECK-NEXT: letp lr, .LBB2_1
	; CHECK-NEXT: le lr, .LBB2_1
	; CHECK-NEXT: @ %bb.2: @ %middle.block			; CHECK-NEXT: @ %bb.2: @ %middle.block
	; CHECK-NEXT: vpsel q0, q0, q1			; CHECK-NEXT: vmov q0, q1
	; CHECK-NEXT: vaddv.u32 r0, q0			; CHECK-NEXT: vaddv.u32 r0, q0
	; CHECK-NEXT: pop {r7, pc}			; CHECK-NEXT: pop {r7, pc}
	entry:			entry:
	%cmp6 = icmp eq i32 %N, 0			%cmp6 = icmp eq i32 %N, 0
	br i1 %cmp6, label %for.cond.cleanup, label %vector.ph			br i1 %cmp6, label %for.cond.cleanup, label %vector.ph

	vector.ph: ; preds = %entry			vector.ph: ; preds = %entry
	%n.rnd.up = add i32 %N, 3			%n.rnd.up = add i32 %N, 3
	▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM][LowOverheadLoops] Handle reductionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 274072

llvm/include/llvm/CodeGen/ReachingDefAnalysis.h

llvm/lib/CodeGen/ReachingDefAnalysis.cpp

llvm/lib/Target/ARM/ARMBaseInstrInfo.h

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vector-arith-codegen.ll

[ARM][LowOverheadLoops] Handle reductions
ClosedPublic