This is an archive of the discontinued LLVM Phabricator instance.

[ARM][ParallelDSP] Relax alias checks
ClosedPublic

Authored by samparker on Apr 23 2019, 8:25 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen

Summary

When deciding the safety of generating smlad, we checked for any writes within the block that may alias with any of the loads that need to be widened. This is overly conservative because it only matters when there's a potential aliasing write to a location accessed by a pair of loads.

Now we check for aliasing writes only once, during setup. If two loads are found to have an aliasing write between them, we don't add these loads to LoadPairs. This means that later during the transform, we can safely widened a pair without worrying about aliasing.

However, to maintain correctness, we also need to change the way that wide loads are inserted because the order is now important.

The MatchSMLAD method has also been changed, absorbing MatchReductions and AddMACCandidate to hopefully improve readability.

Diff Detail

Event Timeline

samparker created this revision.Apr 23 2019, 8:25 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptApr 23 2019, 8:25 AM

SjoerdMeijer added inline comments.May 3 2019, 1:52 AM

lib/Target/ARM/ARMParallelDSP.cpp
351	Functions CheckRAWDeps and CheckWARDeps are the same, just the order of Base and Offset are different, and whether WarDeps or RawDeps are queried, but this could be passed in as an argument.
379	Can you elaborate on what this function is doing? My first reaction would be, given that we are looking at 2 loads, how come are we looking for RAW dependencies? Perhaps the function name is a bit misleading?

samparker marked 2 inline comments as done.May 3 2019, 2:13 AM

samparker added inline comments.

lib/Target/ARM/ARMParallelDSP.cpp
351	I will have a look into changing these.
379	Yeah, it is... even looking at it myself I'm getting confused - not a great sign! First we check is the dependency sets are the same, which means there's no write between the loads. The next checks allow the sets not to be equal as long as the dominating load is the only one with RAW dependency. This means we can safely schedule the write and then the combined base + offset pair.

Greatly simplified the alias checks after they didn't seem to make much sense...

SjoerdMeijer added inline comments.May 8 2019, 2:06 AM

lib/Target/ARM/ARMParallelDSP.cpp
369	I think I am struggling with the algorithm here. Here, and in the line below, we get the "write sets" of the 2 loads. We then start iterating over them , and if a write does not occur in both sets then we say it is safe to pair. But is this enough?

samparker marked an inline comment as done.May 8 2019, 4:49 AM

samparker added inline comments.

lib/Target/ARM/ARMParallelDSP.cpp
369	Good point! No... we also need to check that it's possible to either move the first load forward or the second load backward.

When we create a wide load, we pull the second load up to the first. So for aliasing worries, we only need to check whether there's a write between the two loads that would prevent us from hoisting the second load.

samparker added a child revision: D61780: [ARM][ParallelDSP] Change the search for smlads.May 10 2019, 2:11 AM

This looks okay to me; just some nits inline.

lib/Target/ARM/ARMParallelDSP.cpp
323	nit: instruction -> instructions
344	nit: I don't think this condition can be true. The loads/reads are Simple loads (not atomics), and the only writes that can be loads are loads with atomic ordering.
368	nit: newline
619	Not even sure if it is possible, but looking at this condition I am wondering about i128s.
676	Don't think we need to check for hasOneUse again here; we only add loads with one use to the list. Asserts for BaseSExt and OffetSext might be useful.

This revision is now accepted and ready to land.May 10 2019, 6:05 AM

samparker marked 3 inline comments as done.May 13 2019, 2:01 AM

samparker added inline comments.

lib/Target/ARM/ARMParallelDSP.cpp
344	Makes sense, good catch!
619	The IR supports arbitrary bit widths, but we only support 32- and 64-bit macs.
676	Cheers, I'll remove the check - and we are asserting on SExt!

Committed in rL360567.

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMParallelDSP.cpp

356 lines

test/

CodeGen/

ARM/

ParallelDSP/

452 lines

6 lines

6 lines

10 lines

23 lines

4 lines

4 lines

4 lines

22 lines

21 lines

Diff 198450

lib/Target/ARM/ARMParallelDSP.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	namespace {
using PMACPairList = SmallVector<PMACPair, 8>;		using PMACPairList = SmallVector<PMACPair, 8>;
using Instructions = SmallVector<Instruction*,16>;		using Instructions = SmallVector<Instruction*,16>;
using MemLocList = SmallVector<MemoryLocation, 4>;		using MemLocList = SmallVector<MemoryLocation, 4>;

struct OpChain {		struct OpChain {
Instruction *Root;		Instruction *Root;
ValueList AllValues;		ValueList AllValues;
MemInstList VecLd; // List of all load instructions.		MemInstList VecLd; // List of all load instructions.
MemLocList MemLocs; // All memory locations read by this tree.		MemInstList Loads;
bool ReadOnly = true;		bool ReadOnly = true;

OpChain(Instruction *I, ValueList &vl) : Root(I), AllValues(vl) { }		OpChain(Instruction *I, ValueList &vl) : Root(I), AllValues(vl) { }
virtual ~OpChain() = default;		virtual ~OpChain() = default;

void SetMemoryLocations() {		void PopulateLoads() {
const auto Size = LocationSize::unknown();
for (auto *V : AllValues) {		for (auto *V : AllValues) {
if (auto *I = dyn_cast<Instruction>(V)) {
if (I->mayWriteToMemory())
ReadOnly = false;
if (auto *Ld = dyn_cast<LoadInst>(V))		if (auto *Ld = dyn_cast<LoadInst>(V))
MemLocs.push_back(MemoryLocation(Ld->getPointerOperand(), Size));		Loads.push_back(Ld);
}
}		}
}		}

unsigned size() const { return AllValues.size(); }		unsigned size() const { return AllValues.size(); }
};		};

// 'BinOpChain' and 'Reduction' are just some bookkeeping data structures.		// 'BinOpChain' and 'Reduction' are just some bookkeeping data structures.
// 'Reduction' contains the phi-node and accumulator statement from where we		// 'Reduction' contains the phi-node and accumulator statement from where we
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	class ARMParallelDSP : public LoopPass {
DominatorTree *DT;		DominatorTree *DT;
LoopInfo *LI;		LoopInfo *LI;
Loop *L;		Loop *L;
const DataLayout *DL;		const DataLayout *DL;
Module *M;		Module *M;
std::map<LoadInst, LoadInst> LoadPairs;		std::map<LoadInst, LoadInst> LoadPairs;
std::map<LoadInst*, std::unique_ptr<WidenedLoad>> WideLoads;		std::map<LoadInst*, std::unique_ptr<WidenedLoad>> WideLoads;

bool RecordSequentialLoads(BasicBlock *BB);		bool RecordMemoryOps(BasicBlock *BB);
bool InsertParallelMACs(Reduction &Reduction);		bool InsertParallelMACs(Reduction &Reduction);
bool AreSequentialLoads(LoadInst Ld0, LoadInst Ld1, MemInstList &VecMem);		bool AreSequentialLoads(LoadInst Ld0, LoadInst Ld1, MemInstList &VecMem);
LoadInst* CreateLoadIns(IRBuilder<NoFolder> &IRB,		LoadInst* CreateWideLoad(SmallVectorImpl<LoadInst*> &Loads,
SmallVectorImpl<LoadInst*> &Loads,
IntegerType *LoadTy);		IntegerType *LoadTy);
void CreateParallelMACPairs(Reduction &R);		void CreateParallelMACPairs(Reduction &R);
Instruction CreateSMLADCall(SmallVectorImpl<LoadInst> &VecLd0,		Instruction CreateSMLADCall(SmallVectorImpl<LoadInst> &VecLd0,
SmallVectorImpl<LoadInst*> &VecLd1,		SmallVectorImpl<LoadInst*> &VecLd1,
Instruction *Acc, bool Exchange,		Instruction *Acc, bool Exchange,
Instruction *InsertAfter);		Instruction *InsertAfter);

/// Try to match and generate: SMLAD, SMLADX - Signed Multiply Accumulate		/// Try to match and generate: SMLAD, SMLADX - Signed Multiply Accumulate
/// Dual performs two signed 16x16-bit multiplications. It adds the		/// Dual performs two signed 16x16-bit multiplications. It adds the
/// products to a 32-bit accumulate operand. Optionally, the instruction can		/// products to a 32-bit accumulate operand. Optionally, the instruction can
/// exchange the halfwords of the second operand before performing the		/// exchange the halfwords of the second operand before performing the
/// arithmetic.		/// arithmetic.
bool MatchSMLAD(Function &F);		bool MatchSMLAD(Function &F);

public:		public:
static char ID;		static char ID;

ARMParallelDSP() : LoopPass(ID) { }		ARMParallelDSP() : LoopPass(ID) { }

		bool doInitialization(Loop *L, LPPassManager &LPM) override {
		LoadPairs.clear();
		WideLoads.clear();
		return true;
		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
LoopPass::getAnalysisUsage(AU);		LoopPass::getAnalysisUsage(AU);
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<ScalarEvolutionWrapperPass>();		AU.addRequired<ScalarEvolutionWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	bool runOnLoop(Loop *TheLoop, LPPassManager &) override {
if (!ST->hasDSP()) {		if (!ST->hasDSP()) {
LLVM_DEBUG(dbgs() << "DSP extension not enabled: not running pass "		LLVM_DEBUG(dbgs() << "DSP extension not enabled: not running pass "
"ARMParallelDSP\n");		"ARMParallelDSP\n");
return false;		return false;
}		}

if (!ST->isLittle()) {		if (!ST->isLittle()) {
LLVM_DEBUG(dbgs() << "Only supporting little endian: not running pass "		LLVM_DEBUG(dbgs() << "Only supporting little endian: not running pass "
"ARMParallelDSP\n");		<< "ARMParallelDSP\n");
return false;		return false;
}		}

LoopAccessInfo LAI(L, SE, TLI, AA, DT, LI);		LoopAccessInfo LAI(L, SE, TLI, AA, DT, LI);

LLVM_DEBUG(dbgs() << "\n== Parallel DSP pass ==\n");		LLVM_DEBUG(dbgs() << "\n== Parallel DSP pass ==\n");
LLVM_DEBUG(dbgs() << " - " << F.getName() << "\n\n");		LLVM_DEBUG(dbgs() << " - " << F.getName() << "\n\n");

if (!RecordSequentialLoads(Header)) {		if (!RecordMemoryOps(Header)) {
LLVM_DEBUG(dbgs() << " - No sequential loads found.\n");		LLVM_DEBUG(dbgs() << " - No sequential loads found.\n");
return false;		return false;
}		}

bool Changes = MatchSMLAD(F);		bool Changes = MatchSMLAD(F);
return Changes;		return Changes;
}		}
};		};
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	bool ARMParallelDSP::AreSequentialLoads(LoadInst Ld0, LoadInst Ld1,
);		);

VecMem.clear();		VecMem.clear();
VecMem.push_back(Ld0);		VecMem.push_back(Ld0);
VecMem.push_back(Ld1);		VecMem.push_back(Ld1);
return true;		return true;
}		}

/// Iterate through the block and record base, offset pairs of loads as well as		/// Iterate through the block and record base, offset pairs of loads which can
/// maximal sequences of sequential loads.		/// be widened into a single load.
bool ARMParallelDSP::RecordSequentialLoads(BasicBlock *BB) {		bool ARMParallelDSP::RecordMemoryOps(BasicBlock *BB) {
SmallVector<LoadInst*, 8> Loads;		SmallVector<LoadInst*, 8> Loads;
		SmallVector<Instruction*, 8> Writes;

		// Collect loads and instruction that may write to memory. For now we only
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: instruction -> instructions SjoerdMeijer: nit: instruction -> instructions
		// record loads which are simple, sign-extended and have a single user.
		// TODO: Allow zero-extended loads.
for (auto &I : *BB) {		for (auto &I : *BB) {
		if (I.mayWriteToMemory())
		Writes.push_back(&I);
auto *Ld = dyn_cast<LoadInst>(&I);		auto *Ld = dyn_cast<LoadInst>(&I);
if (!Ld \|\| !Ld->isSimple() \|\|		if (!Ld \|\| !Ld->isSimple() \|\|
!Ld->hasOneUse() \|\| !isa<SExtInst>(Ld->user_back()))		!Ld->hasOneUse() \|\| !isa<SExtInst>(Ld->user_back()))
continue;		continue;
Loads.push_back(Ld);		Loads.push_back(Ld);
}		}

for (auto *Ld0 : Loads) {		using InstSet = std::set<Instruction*>;
for (auto *Ld1 : Loads) {		using DepMap = std::map<Instruction*, InstSet>;
if (Ld0 == Ld1)		DepMap RAWDeps;
		DepMap WARDeps;

		// Record any writes that may alias a load.
		const auto Size = LocationSize::unknown();
		for (auto Read : Loads) {
		for (auto Write : Writes) {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: I don't think this condition can be true. The loads/reads are Simple loads (not atomics), and the only writes that can be loads are loads with atomic ordering. SjoerdMeijer: nit: I don't think this condition can be true. The loads/reads are Simple loads (not atomics)…
		samparkerAuthorUnsubmitted Done Reply Inline Actions Makes sense, good catch! samparker: Makes sense, good catch!
		if (Read == Write)
continue;		continue;

if (AreSequentialAccesses<LoadInst>(Ld0, Ld1, DL, SE)) {		MemoryLocation ReadLoc =
LoadPairs[Ld0] = Ld1;		MemoryLocation(Read->getPointerOperand(), Size);

		if (!isModOrRefSet(intersectModRef(AA->getModRefInfo(Write, ReadLoc),
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Functions CheckRAWDeps and CheckWARDeps are the same, just the order of Base and Offset are different, and whether WarDeps or RawDeps are queried, but this could be passed in as an argument. SjoerdMeijer: Functions CheckRAWDeps and CheckWARDeps are the same, just the order of Base and Offset are…
		samparkerAuthorUnsubmitted Done Reply Inline Actions I will have a look into changing these. samparker: I will have a look into changing these.
		ModRefInfo::ModRef)))
		continue;
		if (DT->dominates(Write, Read))
		RAWDeps[Read].insert(Write);
		else
		WARDeps[Read].insert(Write);
		}
		}

		// Check there's not a write inbetween the two loads that may alias both of
		// them. If there's a write after the dominating load and before the dominated
		// load, then we can't merge the loads.
		auto SafeToPair = [&](LoadInst Base, LoadInst Offset) {
		LoadInst *Dominator = DT->dominates(Base, Offset) ? Base : Offset;
		LoadInst *Dominated = DT->dominates(Base, Offset) ? Offset : Base;

		if (WARDeps.count(Dominator) && RAWDeps.count(Dominated)) {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions nit: newline SjoerdMeijer: nit: newline
		InstSet &WARs = WARDeps[Dominator];
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I think I am struggling with the algorithm here. Here, and in the line below, we get the "write sets" of the 2 loads. We then start iterating over them , and if a write does not occur in both sets then we say it is safe to pair. But is this enough? SjoerdMeijer: I think I am struggling with the algorithm here. Here, and in the line below, we get the "write…
		samparkerAuthorUnsubmitted Done Reply Inline Actions Good point! No... we also need to check that it's possible to either move the first load forward or the second load backward. samparker: Good point! No... we also need to check that it's possible to either move the first load…
		InstSet &RAWs = RAWDeps[Dominated];
		for (auto WAR : WARs)
		for (auto RAW : RAWs)
		if (WAR == RAW)
		return false;
		}
		return true;
		};

		// Record base, offset load pairs.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Can you elaborate on what this function is doing? My first reaction would be, given that we are looking at 2 loads, how come are we looking for RAW dependencies? Perhaps the function name is a bit misleading? SjoerdMeijer: Can you elaborate on what this function is doing? My first reaction would be, given that we are…
		samparkerAuthorUnsubmitted Done Reply Inline Actions Yeah, it is... even looking at it myself I'm getting confused - not a great sign! First we check is the dependency sets are the same, which means there's no write between the loads. The next checks allow the sets not to be equal as long as the dominating load is the only one with RAW dependency. This means we can safely schedule the write and then the combined base + offset pair. samparker: Yeah, it is... even looking at it myself I'm getting confused - not a great sign! First we…
		for (auto *Base : Loads) {
		for (auto *Offset : Loads) {
		if (Base == Offset)
		continue;

		if (AreSequentialAccesses<LoadInst>(Base, Offset, DL, SE) &&
		SafeToPair(Base, Offset)) {
		LoadPairs[Base] = Offset;
break;		break;
}		}
}		}
}		}

LLVM_DEBUG(if (!LoadPairs.empty()) {		LLVM_DEBUG(if (!LoadPairs.empty()) {
dbgs() << "Consecutive load pairs:\n";		dbgs() << "Consecutive load pairs:\n";
for (auto &MapIt : LoadPairs) {		for (auto &MapIt : LoadPairs) {
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	bool ARMParallelDSP::InsertParallelMACs(Reduction &Reduction) {
if (Acc != Reduction.Phi) {		if (Acc != Reduction.Phi) {
LLVM_DEBUG(dbgs() << "Replace Accumulate: "; Acc->dump());		LLVM_DEBUG(dbgs() << "Replace Accumulate: "; Acc->dump());
Reduction.AccIntAdd->replaceAllUsesWith(Acc);		Reduction.AccIntAdd->replaceAllUsesWith(Acc);
return true;		return true;
}		}
return false;		return false;
}		}

static void MatchReductions(Function &F, Loop TheLoop, BasicBlock Header,
ReductionList &Reductions) {
RecurrenceDescriptor RecDesc;
const bool HasFnNoNaNAttr =
F.getFnAttribute("no-nans-fp-math").getValueAsString() == "true";
const BasicBlock *Latch = TheLoop->getLoopLatch();

for (PHINode &Phi : Header->phis()) {
const auto *Ty = Phi.getType();
if (!Ty->isIntegerTy(32) && !Ty->isIntegerTy(64))
continue;

const bool IsReduction =
RecurrenceDescriptor::AddReductionVar(&Phi,
RecurrenceDescriptor::RK_IntegerAdd,
TheLoop, HasFnNoNaNAttr, RecDesc);
if (!IsReduction)
continue;

Instruction *Acc = dyn_cast<Instruction>(Phi.getIncomingValueForBlock(Latch));
if (!Acc)
continue;

Reductions.push_back(Reduction(&Phi, Acc));
}

LLVM_DEBUG(
dbgs() << "\nAccumulating integer additions (reductions) found:\n";
for (auto &R : Reductions) {
dbgs() << "- "; R.Phi->dump();
dbgs() << "-> "; R.AccIntAdd->dump();
}
);
}

static void AddMACCandidate(OpChainList &Candidates,
Instruction *Mul,
Value MulOp0, Value MulOp1) {
assert(Mul->getOpcode() == Instruction::Mul &&
"expected mul instruction");
ValueList LHS;
ValueList RHS;
if (IsNarrowSequence<16>(MulOp0, LHS) &&
IsNarrowSequence<16>(MulOp1, RHS)) {
Candidates.push_back(make_unique<BinOpChain>(Mul, LHS, RHS));
}
}

static void MatchParallelMACSequences(Reduction &R,		static void MatchParallelMACSequences(Reduction &R,
OpChainList &Candidates) {		OpChainList &Candidates) {
Instruction *Acc = R.AccIntAdd;		Instruction *Acc = R.AccIntAdd;
LLVM_DEBUG(dbgs() << "\n- Analysing:\t" << *Acc << "\n");		LLVM_DEBUG(dbgs() << "\n- Analysing:\t" << *Acc << "\n");

// Returns false to signal the search should be stopped.		// Returns false to signal the search should be stopped.
std::function<bool(Value*)> Match =		std::function<bool(Value*)> Match =
[&Candidates, &Match](Value *V) -> bool {		[&Candidates, &Match](Value *V) -> bool {

auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
if (!I)		if (!I)
return false;		return false;

switch (I->getOpcode()) {		switch (I->getOpcode()) {
case Instruction::Add:		case Instruction::Add:
if (Match(I->getOperand(0)) \|\| (Match(I->getOperand(1))))		if (Match(I->getOperand(0)) \|\| (Match(I->getOperand(1))))
return true;		return true;
break;		break;
case Instruction::Mul: {		case Instruction::Mul: {
Value *MulOp0 = I->getOperand(0);		Value *MulOp0 = I->getOperand(0);
Value *MulOp1 = I->getOperand(1);		Value *MulOp1 = I->getOperand(1);
if (isa<SExtInst>(MulOp0) && isa<SExtInst>(MulOp1))		if (isa<SExtInst>(MulOp0) && isa<SExtInst>(MulOp1)) {
AddMACCandidate(Candidates, I, MulOp0, MulOp1);		ValueList LHS;
		ValueList RHS;
		if (IsNarrowSequence<16>(MulOp0, LHS) &&
		IsNarrowSequence<16>(MulOp1, RHS)) {
		Candidates.push_back(make_unique<BinOpChain>(I, LHS, RHS));
		}
		}
return false;		return false;
}		}
case Instruction::SExt:		case Instruction::SExt:
return Match(I->getOperand(0));		return Match(I->getOperand(0));
}		}
return false;		return false;
};		};

while (Match (Acc));		while (Match (Acc));
LLVM_DEBUG(dbgs() << "Finished matching MAC sequences, found "		LLVM_DEBUG(dbgs() << "Finished matching MAC sequences, found "
<< Candidates.size() << " candidates.\n");		<< Candidates.size() << " candidates.\n");
}		}

// Collects all instructions that are not part of the MAC chains, which is the
// set of instructions that can potentially alias with the MAC operands.
static void AliasCandidates(BasicBlock *Header, Instructions &Reads,
Instructions &Writes) {
for (auto &I : *Header) {
if (I.mayReadFromMemory())
Reads.push_back(&I);
if (I.mayWriteToMemory())
Writes.push_back(&I);
}
}

// Check whether statements in the basic block that write to memory alias with
// the memory locations accessed by the MAC-chains.
// TODO: we need the read statements when we accept more complicated chains.
static bool AreAliased(AliasAnalysis *AA, Instructions &Reads,
Instructions &Writes, OpChainList &MACCandidates) {
LLVM_DEBUG(dbgs() << "Alias checks:\n");
for (auto &MAC : MACCandidates) {
LLVM_DEBUG(dbgs() << "mul: "; MAC->Root->dump());

// At the moment, we allow only simple chains that only consist of reads,
// accumulate their result with an integer add, and thus that don't write
// memory, and simply bail if they do.
if (!MAC->ReadOnly)
return true;

// Now for all writes in the basic block, check that they don't alias with
// the memory locations accessed by our MAC-chain:
for (auto *I : Writes) {
LLVM_DEBUG(dbgs() << "- "; I->dump());
assert(MAC->MemLocs.size() >= 2 && "expecting at least 2 memlocs");
for (auto &MemLoc : MAC->MemLocs) {
if (isModOrRefSet(intersectModRef(AA->getModRefInfo(I, MemLoc),
ModRefInfo::ModRef))) {
LLVM_DEBUG(dbgs() << "Yes, aliases found\n");
return true;
}
}
}
}

LLVM_DEBUG(dbgs() << "OK: no aliases found!\n");
return false;
}

static bool CheckMACMemory(OpChainList &Candidates) {		static bool CheckMACMemory(OpChainList &Candidates) {
for (auto &C : Candidates) {		for (auto &C : Candidates) {
// A mul has 2 operands, and a narrow op consist of sext and a load; thus		// A mul has 2 operands, and a narrow op consist of sext and a load; thus
// we expect at least 4 items in this operand value list.		// we expect at least 4 items in this operand value list.
if (C->size() < 4) {		if (C->size() < 4) {
LLVM_DEBUG(dbgs() << "Operand list too short.\n");		LLVM_DEBUG(dbgs() << "Operand list too short.\n");
return false;		return false;
}		}
C->SetMemoryLocations();		C->PopulateLoads();
ValueList &LHS = static_cast<BinOpChain*>(C.get())->LHS;		ValueList &LHS = static_cast<BinOpChain*>(C.get())->LHS;
ValueList &RHS = static_cast<BinOpChain*>(C.get())->RHS;		ValueList &RHS = static_cast<BinOpChain*>(C.get())->RHS;

// Use +=2 to skip over the expected extend instructions.		// Use +=2 to skip over the expected extend instructions.
for (unsigned i = 0, e = LHS.size(); i < e; i += 2) {		for (unsigned i = 0, e = LHS.size(); i < e; i += 2) {
if (!isa<LoadInst>(LHS[i]) \|\| !isa<LoadInst>(RHS[i]))		if (!isa<LoadInst>(LHS[i]) \|\| !isa<LoadInst>(RHS[i]))
return false;		return false;
}		}
Show All 29 Lines
//		//
// If constants are used instead of loads, these will need to be hoisted		// If constants are used instead of loads, these will need to be hoisted
// out and into a register.		// out and into a register.
//		//
// If loop invariants are used instead of loads, these need to be packed		// If loop invariants are used instead of loads, these need to be packed
// before the loop begins.		// before the loop begins.
//		//
bool ARMParallelDSP::MatchSMLAD(Function &F) {		bool ARMParallelDSP::MatchSMLAD(Function &F) {
BasicBlock *Header = L->getHeader();
LLVM_DEBUG(dbgs() << "= Matching SMLAD =\n";
dbgs() << "Header block:\n"; Header->dump();
dbgs() << "Loop info:\n\n"; L->dump());

bool Changed = false;		auto FindReductions = [&](ReductionList &Reductions) {
		RecurrenceDescriptor RecDesc;
		const bool HasFnNoNaNAttr =
		F.getFnAttribute("no-nans-fp-math").getValueAsString() == "true";
		BasicBlock *Latch = L->getLoopLatch();

		for (PHINode &Phi : Latch->phis()) {
		const auto *Ty = Phi.getType();
		if (!Ty->isIntegerTy(32) && !Ty->isIntegerTy(64))
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Not even sure if it is possible, but looking at this condition I am wondering about i128s. SjoerdMeijer: Not even sure if it is possible, but looking at this condition I am wondering about i128s.
		samparkerAuthorUnsubmitted Done Reply Inline Actions The IR supports arbitrary bit widths, but we only support 32- and 64-bit macs. samparker: The IR supports arbitrary bit widths, but we only support 32- and 64-bit macs.
		continue;

		const bool IsReduction = RecurrenceDescriptor::AddReductionVar(
		&Phi, RecurrenceDescriptor::RK_IntegerAdd, L, HasFnNoNaNAttr, RecDesc);

		if (!IsReduction)
		continue;

		Instruction *Acc = dyn_cast<Instruction>(Phi.getIncomingValueForBlock(Latch));
		if (!Acc)
		continue;

		Reductions.push_back(Reduction(&Phi, Acc));
		}
		return !Reductions.empty();
		};

ReductionList Reductions;		ReductionList Reductions;
MatchReductions(F, L, Header, Reductions);		if (!FindReductions(Reductions))
		return false;

for (auto &R : Reductions) {		for (auto &R : Reductions) {
OpChainList MACCandidates;		OpChainList MACCandidates;
MatchParallelMACSequences(R, MACCandidates);		MatchParallelMACSequences(R, MACCandidates);
if (!CheckMACMemory(MACCandidates))		if (!CheckMACMemory(MACCandidates))
continue;		continue;

R.MACCandidates = std::move(MACCandidates);		R.MACCandidates = std::move(MACCandidates);

LLVM_DEBUG(dbgs() << "MAC candidates:\n";		LLVM_DEBUG(dbgs() << "MAC candidates:\n";
for (auto &M : R.MACCandidates)		for (auto &M : R.MACCandidates)
M->Root->dump();		M->Root->dump();
dbgs() << "\n";);		dbgs() << "\n";);
}		}

// Collect all instructions that may read or write memory. Our alias		bool Changed = false;
// analysis checks bail out if any of these instructions aliases with an		// Check whether statements in the basic block that write to memory alias
// instruction from the MAC-chain.		// with the memory locations accessed by the MAC-chains.
Instructions Reads, Writes;
AliasCandidates(Header, Reads, Writes);

for (auto &R : Reductions) {		for (auto &R : Reductions) {
if (AreAliased(AA, Reads, Writes, R.MACCandidates))
return false;
CreateParallelMACPairs(R);		CreateParallelMACPairs(R);
Changed \|= InsertParallelMACs(R);		Changed \|= InsertParallelMACs(R);
}		}

LLVM_DEBUG(if (Changed) dbgs() << "Header block:\n"; Header->dump(););
return Changed;		return Changed;
}		}

LoadInst* ARMParallelDSP::CreateLoadIns(IRBuilder<NoFolder> &IRB,		LoadInst* ARMParallelDSP::CreateWideLoad(SmallVectorImpl<LoadInst*> &Loads,
SmallVectorImpl<LoadInst*> &Loads,
IntegerType *LoadTy) {		IntegerType *LoadTy) {
assert(Loads.size() == 2 && "currently only support widening two loads");		assert(Loads.size() == 2 && "currently only support widening two loads");

const unsigned AddrSpace = Loads[0]->getPointerAddressSpace();		LoadInst *Base = Loads[0];
Value *VecPtr = IRB.CreateBitCast(Loads[0]->getPointerOperand(),		LoadInst *Offset = Loads[1];
LoadTy->getPointerTo(AddrSpace));
LoadInst *WideLoad = IRB.CreateAlignedLoad(LoadTy, VecPtr,		Instruction *BaseSExt = dyn_cast<SExtInst>(Base->user_back());
Loads[0]->getAlignment());		Instruction *OffsetSExt = dyn_cast<SExtInst>(Offset->user_back());
// Fix up users, Loads[0] needs trunc while Loads[1] needs a lshr and trunc.
Instruction *SExt0 = dyn_cast<SExtInst>(Loads[0]->user_back());		assert((Base->hasOneUse() && Offset->hasOneUse() && BaseSExt && OffsetSExt)
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Don't think we need to check for hasOneUse again here; we only add loads with one use to the list. Asserts for BaseSExt and OffetSext might be useful. SjoerdMeijer: Don't think we need to check for hasOneUse again here; we only add loads with one use to the…
		samparkerAuthorUnsubmitted Done Reply Inline Actions Cheers, I'll remove the check - and we are asserting on SExt! samparker: Cheers, I'll remove the check - and we are asserting on SExt!
Instruction *SExt1 = dyn_cast<SExtInst>(Loads[1]->user_back());		&& "Loads should have a single, extending, user");

assert((Loads[0]->hasOneUse() && Loads[1]->hasOneUse() && SExt0 && SExt1) &&		std::function<void(Value, Value)> MoveBefore =
"Loads should have a single, extending, user");		[&](Value A, Value B) -> void {
		if (!isa<Instruction>(A) \|\| !isa<Instruction>(B))
		return;

		auto *Source = cast<Instruction>(A);
		auto *Sink = cast<Instruction>(B);

std::function<void(Instruction, Instruction)> MoveAfter =
[&](Instruction* Source, Instruction* Sink) -> void {
if (DT->dominates(Source, Sink) \|\|		if (DT->dominates(Source, Sink) \|\|
Source->getParent() != Sink->getParent() \|\|		Source->getParent() != Sink->getParent() \|\|
isa<PHINode>(Source) \|\| isa<PHINode>(Sink))		isa<PHINode>(Source) \|\| isa<PHINode>(Sink))
return;		return;

Sink->moveAfter(Source);		Source->moveBefore(Sink);
for (auto &U : Sink->uses())		for (auto &U : Source->uses())
MoveAfter(Sink, cast<Instruction>(U.getUser()));		MoveBefore(Source, U.getUser());
};		};

		// Insert the load at the point of the original dominating load.
		LoadInst *DomLoad = DT->dominates(Base, Offset) ? Base : Offset;
		IRBuilder<NoFolder> IRB(DomLoad->getParent(),
		++BasicBlock::iterator(DomLoad));

		// Bitcast the pointer to a wider type and create the wide load, while making
		// sure to maintain the original alignment as this prevents ldrd from being
		// generated when it could be illegal due to memory alignment.
		const unsigned AddrSpace = DomLoad->getPointerAddressSpace();
		Value *VecPtr = IRB.CreateBitCast(Base->getPointerOperand(),
		LoadTy->getPointerTo(AddrSpace));
		LoadInst *WideLoad = IRB.CreateAlignedLoad(LoadTy, VecPtr,
		Base->getAlignment());

		// Make sure everything is in the correct order in the basic block.
		MoveBefore(Base->getPointerOperand(), VecPtr);
		MoveBefore(VecPtr, WideLoad);

// From the wide load, create two values that equal the original two loads.		// From the wide load, create two values that equal the original two loads.
Value *Bottom = IRB.CreateTrunc(WideLoad, Loads[0]->getType());		// Loads[0] needs trunc while Loads[1] needs a lshr and trunc.
SExt0->setOperand(0, Bottom);		// TODO: Support big-endian as well.
if (auto *I = dyn_cast<Instruction>(Bottom)) {		Value *Bottom = IRB.CreateTrunc(WideLoad, Base->getType());
I->moveAfter(WideLoad);		BaseSExt->setOperand(0, Bottom);
MoveAfter(I, SExt0);
}

IntegerType *Ld1Ty = cast<IntegerType>(Loads[1]->getType());		IntegerType *OffsetTy = cast<IntegerType>(Offset->getType());
Value *ShiftVal = ConstantInt::get(LoadTy, Ld1Ty->getBitWidth());		Value *ShiftVal = ConstantInt::get(LoadTy, OffsetTy->getBitWidth());
Value *Top = IRB.CreateLShr(WideLoad, ShiftVal);		Value *Top = IRB.CreateLShr(WideLoad, ShiftVal);
if (auto *I = dyn_cast<Instruction>(Top))		Value *Trunc = IRB.CreateTrunc(Top, OffsetTy);
MoveAfter(WideLoad, I);		OffsetSExt->setOperand(0, Trunc);

Value *Trunc = IRB.CreateTrunc(Top, Ld1Ty);
SExt1->setOperand(0, Trunc);
if (auto *I = dyn_cast<Instruction>(Trunc))
MoveAfter(I, SExt1);

WideLoads.emplace(std::make_pair(Loads[0],		WideLoads.emplace(std::make_pair(Base,
make_unique<WidenedLoad>(Loads, WideLoad)));		make_unique<WidenedLoad>(Loads, WideLoad)));
return WideLoad;		return WideLoad;
}		}

Instruction ARMParallelDSP::CreateSMLADCall(SmallVectorImpl<LoadInst> &VecLd0,		Instruction ARMParallelDSP::CreateSMLADCall(SmallVectorImpl<LoadInst> &VecLd0,
SmallVectorImpl<LoadInst*> &VecLd1,		SmallVectorImpl<LoadInst*> &VecLd1,
Instruction *Acc, bool Exchange,		Instruction *Acc, bool Exchange,
Instruction *InsertAfter) {		Instruction *InsertAfter) {
LLVM_DEBUG(dbgs() << "Create SMLAD intrinsic using:\n"		LLVM_DEBUG(dbgs() << "Create SMLAD intrinsic using:\n"
<< "- " << *VecLd0[0] << "\n"		<< "- " << *VecLd0[0] << "\n"
<< "- " << *VecLd0[1] << "\n"		<< "- " << *VecLd0[1] << "\n"
<< "- " << *VecLd1[0] << "\n"		<< "- " << *VecLd1[0] << "\n"
<< "- " << *VecLd1[1] << "\n"		<< "- " << *VecLd1[1] << "\n"
<< "- " << *Acc << "\n"		<< "- " << *Acc << "\n"
<< "- Exchange: " << Exchange << "\n");		<< "- Exchange: " << Exchange << "\n");

IRBuilder<NoFolder> Builder(InsertAfter->getParent(),
++BasicBlock::iterator(InsertAfter));

// Replace the reduction chain with an intrinsic call		// Replace the reduction chain with an intrinsic call
IntegerType *Ty = IntegerType::get(M->getContext(), 32);		IntegerType *Ty = IntegerType::get(M->getContext(), 32);
LoadInst *WideLd0 = WideLoads.count(VecLd0[0]) ?		LoadInst *WideLd0 = WideLoads.count(VecLd0[0]) ?
WideLoads[VecLd0[0]]->getLoad() : CreateLoadIns(Builder, VecLd0, Ty);		WideLoads[VecLd0[0]]->getLoad() : CreateWideLoad(VecLd0, Ty);
LoadInst *WideLd1 = WideLoads.count(VecLd1[0]) ?		LoadInst *WideLd1 = WideLoads.count(VecLd1[0]) ?
WideLoads[VecLd1[0]]->getLoad() : CreateLoadIns(Builder, VecLd1, Ty);		WideLoads[VecLd1[0]]->getLoad() : CreateWideLoad(VecLd1, Ty);

Value* Args[] = { WideLd0, WideLd1, Acc };		Value* Args[] = { WideLd0, WideLd1, Acc };
Function *SMLAD = nullptr;		Function *SMLAD = nullptr;
if (Exchange)		if (Exchange)
SMLAD = Acc->getType()->isIntegerTy(32) ?		SMLAD = Acc->getType()->isIntegerTy(32) ?
Intrinsic::getDeclaration(M, Intrinsic::arm_smladx) :		Intrinsic::getDeclaration(M, Intrinsic::arm_smladx) :
Intrinsic::getDeclaration(M, Intrinsic::arm_smlaldx);		Intrinsic::getDeclaration(M, Intrinsic::arm_smlaldx);
else		else
SMLAD = Acc->getType()->isIntegerTy(32) ?		SMLAD = Acc->getType()->isIntegerTy(32) ?
Intrinsic::getDeclaration(M, Intrinsic::arm_smlad) :		Intrinsic::getDeclaration(M, Intrinsic::arm_smlad) :
Intrinsic::getDeclaration(M, Intrinsic::arm_smlald);		Intrinsic::getDeclaration(M, Intrinsic::arm_smlald);

		IRBuilder<NoFolder> Builder(InsertAfter->getParent(),
		++BasicBlock::iterator(InsertAfter));
CallInst *Call = Builder.CreateCall(SMLAD, Args);		CallInst *Call = Builder.CreateCall(SMLAD, Args);
NumSMLAD++;		NumSMLAD++;
return Call;		return Call;
}		}

// Compare the value lists in Other to this chain.		// Compare the value lists in Other to this chain.
bool BinOpChain::AreSymmetrical(BinOpChain *Other) {		bool BinOpChain::AreSymmetrical(BinOpChain *Other) {
// Element-by-element comparison of Value lists returning true if they are		// Element-by-element comparison of Value lists returning true if they are
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/aliasing.ll

This file was added.

				; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s
				;
				; Alias check: check that the rewrite isn't triggered when there's a store
				; instruction possibly aliasing any mul load operands; arguments are passed
				; without 'restrict' enabled.
				;
				; CHECK-NOT: call i32 @llvm.arm.smlad
				;
				define dso_local i32 @no_restrict(i32 %arg, i32* nocapture %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2

				; Store inserted here, aliasing with arrayidx, arrayidx1, arrayidx3
				store i16 42, i16* %arrayidx, align 2

				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; Alias check: check that the rewrite isn't triggered when there's a store
				; aliasing one of the mul load operands. Arguments are now annotated with
				; 'noalias'.
				;
				; CHECK-NOT: call i32 @llvm.arm.smlad
				;
				define dso_local i32 @restrict(i32 %arg, i32* noalias %arg1, i16* noalias readonly %arg2, i16* noalias readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2

				; Store inserted here, aliasing only with loads from 'arrayidx'.
				store i16 42, i16* %arrayidx, align 2

				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026

				; Here the Mul is the LHS, and the Add the RHS.
				%add11 = add i32 %mul9, %add10

				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: store_dominates_all
				; CHECK: store
				; CHECK: load
				; CHECK: load
				; CHECK: load
				; CHECK: load
				; CHECK: smlad
				define dso_local i32 @store_dominates_all(i32 %arg, i32* nocapture %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				store i16 42, i16* %arrayidx, align 2
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: loads_dominate
				; CHECK-NOT: store
				; CHECK: load i32
				; CHECK-NOT: store
				; CHECK: load i32
				; CHECK: store
				define dso_local i32 @loads_dominate(i32 %arg, i32* nocapture %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				store i16 42, i16* %arrayidx, align 2
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: store_alias_arg3_legal_1
				; CHECK-NOT: store
				; CHECK: phi i32
				; CHECK: [[IV:%[^ ]+]] = phi i32 [ %add
				; CHECK: [[ARG3_GEP:%[^ ]+]] = getelementptr inbounds i16, i16* %arg3, i32 [[IV]]
				; CHECK: [[ARG3:%[^ ]+]] = bitcast i16* [[ARG3_GEP]] to i32*
				; CHECK: load i32, i32* [[ARG3]]
				; CHECK: [[ARG2_GEP:%[^ ]+]] = getelementptr inbounds i16, i16* %arg2, i32 [[IV]]
				; CHECK: [[ARG2:%[^ ]+]] = bitcast i16* [[ARG2_GEP]] to i32*
				; CHECK: load i32, i32* [[ARG2]]
				; CHECK: store
				define dso_local i32 @store_alias_arg3_legal_1(i32 %arg, i32* nocapture %arg1, i16* noalias nocapture readonly %arg2, i16* nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				store i16 42, i16* %arrayidx, align 2
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: store_alias_arg3_legal_2
				; CHECK-NOT: store
				; CHECK: [[BITCAST:[^ ]+]] = bitcast i16* %arrayidx to i32*
				; CHECK: load i32, i32* [[BITCAST]]
				; CHECK: store i16 42, i16* %arrayidx
				; CHECK: [[BITCAST3:[^ ]+]] = bitcast i16* %arrayidx3 to i32*
				; CHECK: load i32, i32* [[BITCAST3]]
				; CHECK: smlad
				define dso_local i32 @store_alias_arg3_legal_2(i32 %arg, i32* nocapture %arg1, i16* noalias nocapture readonly %arg2, i16* nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				store i16 42, i16* %arrayidx, align 2
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: store_alias_arg3_illegal_1
				; CHECK-NOT: load i32
				define dso_local i32 @store_alias_arg3_illegal_1(i32 %arg, i32* nocapture %arg1, i16* noalias nocapture readonly %arg2, i16* noalias nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				store i16 42, i16* %arrayidx1, align 2
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: store_alias_arg3_illegal_2
				; CHECK-NOT: load i32
				define dso_local i32 @store_alias_arg3_illegal_2(i32 %arg, i32* nocapture %arg1, i16* noalias nocapture readonly %arg2, i16* noalias nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				store i16 42, i16* %arrayidx, align 2
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: store_alias_arg2_illegal_1
				; CHECK-NOT: load i32
				define dso_local i32 @store_alias_arg2_illegal_1(i32 %arg, i32* nocapture %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				store i16 42, i16* %arrayidx6, align 2
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

				; CHECK-LABEL: store_alias_arg2_illegal_2
				; CHECK-NOT: load i32
				define dso_local i32 @store_alias_arg2_illegal_2(i32 %arg, i32* nocapture %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
				entry:
				%cmp24 = icmp sgt i32 %arg, 0
				br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%.pre = load i16, i16* %arg3, align 2
				%.pre27 = load i16, i16* %arg2, align 2
				br label %for.body

				for.cond.cleanup:
				%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
				ret i32 %mac1.0.lcssa

				for.body:
				%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
				%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
				%0 = load i16, i16* %arrayidx, align 2
				%add = add nuw nsw i32 %i.025, 1
				%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
				%1 = load i16, i16* %arrayidx1, align 2
				%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
				%2 = load i16, i16* %arrayidx3, align 2
				%conv = sext i16 %2 to i32
				%conv4 = sext i16 %0 to i32
				%mul = mul nsw i32 %conv, %conv4
				%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
				store i16 42, i16* %arrayidx3, align 2
				%3 = load i16, i16* %arrayidx6, align 2
				%conv7 = sext i16 %3 to i32
				%conv8 = sext i16 %1 to i32
				%mul9 = mul nsw i32 %conv7, %conv8
				%add10 = add i32 %mul, %mac1.026
				%add11 = add i32 %mul9, %add10
				%exitcond = icmp ne i32 %add, %arg
				br i1 %exitcond, label %for.body, label %for.cond.cleanup
				}

test/CodeGen/ARM/ParallelDSP/smlad0.ll

	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s
	; RUN: opt -mtriple=armeb-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=armeb-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	;			;
	; The Cortex-M0 does not support unaligned accesses:			; The Cortex-M0 does not support unaligned accesses:
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	;			;
	; Check DSP extension:			; Check DSP extension:
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED

	define dso_local i32 @OneReduction(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {			define dso_local i32 @OneReduction(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
	;			;
	; CHECK-LABEL: @OneReduction			; CHECK-LABEL: @OneReduction
	; CHECK: %mac1{{\.}}026 = phi i32 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]			; CHECK: %mac1{{\.}}026 = phi i32 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]
	; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*			; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx to i32*
	; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2			; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*			; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
	; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2			; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2
	; CHECK: [[V8]] = call i32 @llvm.arm.smlad(i32 [[V5]], i32 [[V7]], i32 %mac1{{\.}}026)			; CHECK: [[V8]] = call i32 @llvm.arm.smlad(i32 [[V7]], i32 [[V5]], i32 %mac1{{\.}}026)
	; CHECK-NOT: call i32 @llvm.arm.smlad			; CHECK-NOT: call i32 @llvm.arm.smlad
	;			;
	; CHECK-UNSUPPORTED-NOT: call i32 @llvm.arm.smlad			; CHECK-UNSUPPORTED-NOT: call i32 @llvm.arm.smlad
	;			;
	entry:			entry:
	%cmp24 = icmp sgt i32 %arg, 0			%cmp24 = icmp sgt i32 %arg, 0
	br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlad1.ll

	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s

	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: %mac1{{\.}}026 = phi i32 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]			; CHECK: %mac1{{\.}}026 = phi i32 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]
	; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*			; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx to i32*
	; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2			; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*			; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
	; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2			; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2
	; CHECK: [[V8]] = call i32 @llvm.arm.smlad(i32 [[V5]], i32 [[V7]], i32 %mac1{{\.}}026)			; CHECK: [[V8]] = call i32 @llvm.arm.smlad(i32 [[V7]], i32 [[V5]], i32 %mac1{{\.}}026)

	define dso_local i32 @test1(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {			define dso_local i32 @test1(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
	entry:			entry:
	%cmp24 = icmp sgt i32 %arg, 0			%cmp24 = icmp sgt i32 %arg, 0
	br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:			for.body.preheader:
	%.pre = load i16, i16* %arg3, align 2			%.pre = load i16, i16* %arg3, align 2
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlad11.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S -stats 2>&1 \| FileCheck %s			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S -stats 2>&1 \| FileCheck %s
	;			;
	; A more complicated chain: 4 mul operations, so we expect 2 smlad calls.			; A more complicated chain: 4 mul operations, so we expect 2 smlad calls.
	;			;
	; CHECK: %mac1{{\.}}054 = phi i32 [ [[V17:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]			; CHECK: %mac1{{\.}}054 = phi i32 [ [[V17:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]
	; CHECK: [[V8:%[0-9]+]] = bitcast i16* %arrayidx8 to i32*
	; CHECK: [[V9:%[0-9]+]] = load i32, i32* [[V8]], align 2
	; CHECK: [[V10:%[0-9]+]] = bitcast i16* %arrayidx to i32*			; CHECK: [[V10:%[0-9]+]] = bitcast i16* %arrayidx to i32*
	; CHECK: [[V11:%[0-9]+]] = load i32, i32* [[V10]], align 2			; CHECK: [[V11:%[0-9]+]] = load i32, i32* [[V10]], align 2
	; CHECK: [[V12:%[0-9]+]] = call i32 @llvm.arm.smlad(i32 [[V9]], i32 [[V11]], i32 %mac1{{\.}}054)
	; CHECK: [[V13:%[0-9]+]] = bitcast i16* %arrayidx17 to i32*
	; CHECK: [[V14:%[0-9]+]] = load i32, i32* [[V13]], align 2
	; CHECK: [[V15:%[0-9]+]] = bitcast i16* %arrayidx4 to i32*			; CHECK: [[V15:%[0-9]+]] = bitcast i16* %arrayidx4 to i32*
	; CHECK: [[V16:%[0-9]+]] = load i32, i32* [[V15]], align 2			; CHECK: [[V16:%[0-9]+]] = load i32, i32* [[V15]], align 2
				; CHECK: [[V8:%[0-9]+]] = bitcast i16* %arrayidx8 to i32*
				; CHECK: [[V9:%[0-9]+]] = load i32, i32* [[V8]], align 2
				; CHECK: [[V13:%[0-9]+]] = bitcast i16* %arrayidx17 to i32*
				; CHECK: [[V14:%[0-9]+]] = load i32, i32* [[V13]], align 2
				; CHECK: [[V12:%[0-9]+]] = call i32 @llvm.arm.smlad(i32 [[V9]], i32 [[V11]], i32 %mac1{{\.}}054)
	; CHECK: [[V17:%[0-9]+]] = call i32 @llvm.arm.smlad(i32 [[V14]], i32 [[V16]], i32 [[V12]])			; CHECK: [[V17:%[0-9]+]] = call i32 @llvm.arm.smlad(i32 [[V14]], i32 [[V16]], i32 [[V12]])
	;			;
	; And we don't want to see a 3rd smlad:			; And we don't want to see a 3rd smlad:
	; CHECK-NOT: call i32 @llvm.arm.smlad			; CHECK-NOT: call i32 @llvm.arm.smlad
	;			;
	; CHECK: 2 arm-parallel-dsp - Number of smlad instructions generated			; CHECK: 2 arm-parallel-dsp - Number of smlad instructions generated
	;			;
	define dso_local i32 @test(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {			define dso_local i32 @test(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlad6.ll

This file was deleted.

	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s
	;
	; Alias check: check that the rewrite isn't triggered when there's a store
	; instruction possibly aliasing any mul load operands; arguments are passed
	; without 'restrict' enabled.
	;
	; CHECK-NOT: call i32 @llvm.arm.smlad
	;
	define dso_local i32 @test(i32 %arg, i32* nocapture %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
	entry:
	%cmp24 = icmp sgt i32 %arg, 0
	br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:
	%.pre = load i16, i16* %arg3, align 2
	%.pre27 = load i16, i16* %arg2, align 2
	br label %for.body

	for.cond.cleanup:
	%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
	ret i32 %mac1.0.lcssa

	for.body:
	%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
	%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
	%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
	%0 = load i16, i16* %arrayidx, align 2

	; Store inserted here, aliasing with arrayidx, arrayidx1, arrayidx3
	store i16 42, i16* %arrayidx, align 2

	%add = add nuw nsw i32 %i.025, 1
	%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
	%1 = load i16, i16* %arrayidx1, align 2
	%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
	%2 = load i16, i16* %arrayidx3, align 2
	%conv = sext i16 %2 to i32
	%conv4 = sext i16 %0 to i32
	%mul = mul nsw i32 %conv, %conv4
	%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
	%3 = load i16, i16* %arrayidx6, align 2
	%conv7 = sext i16 %3 to i32
	%conv8 = sext i16 %1 to i32
	%mul9 = mul nsw i32 %conv7, %conv8
	%add10 = add i32 %mul, %mac1.026
	%add11 = add i32 %mul9, %add10
	%exitcond = icmp ne i32 %add, %arg
	br i1 %exitcond, label %for.body, label %for.cond.cleanup
	}

test/CodeGen/ARM/ParallelDSP/smlad7.ll

This file was deleted.

	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s
	;
	; Alias check: check that the rewrite isn't triggered when there's a store
	; aliasing one of the mul load operands. Arguments are now annotated with
	; 'noalias'.
	;
	; CHECK-NOT: call i32 @llvm.arm.smlad
	;
	define dso_local i32 @test(i32 %arg, i32* noalias %arg1, i16* noalias readonly %arg2, i16* noalias readonly %arg3) {
	entry:
	%cmp24 = icmp sgt i32 %arg, 0
	br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:
	%.pre = load i16, i16* %arg3, align 2
	%.pre27 = load i16, i16* %arg2, align 2
	br label %for.body

	for.cond.cleanup:
	%mac1.0.lcssa = phi i32 [ 0, %entry ], [ %add11, %for.body ]
	ret i32 %mac1.0.lcssa

	for.body:
	%mac1.026 = phi i32 [ %add11, %for.body ], [ 0, %for.body.preheader ]
	%i.025 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
	%arrayidx = getelementptr inbounds i16, i16* %arg3, i32 %i.025
	%0 = load i16, i16* %arrayidx, align 2

	; Store inserted here, aliasing only with loads from 'arrayidx'.
	store i16 42, i16* %arrayidx, align 2

	%add = add nuw nsw i32 %i.025, 1
	%arrayidx1 = getelementptr inbounds i16, i16* %arg3, i32 %add
	%1 = load i16, i16* %arrayidx1, align 2
	%arrayidx3 = getelementptr inbounds i16, i16* %arg2, i32 %i.025
	%2 = load i16, i16* %arrayidx3, align 2
	%conv = sext i16 %2 to i32
	%conv4 = sext i16 %0 to i32
	%mul = mul nsw i32 %conv, %conv4
	%arrayidx6 = getelementptr inbounds i16, i16* %arg2, i32 %add
	%3 = load i16, i16* %arrayidx6, align 2
	%conv7 = sext i16 %3 to i32
	%conv8 = sext i16 %1 to i32
	%mul9 = mul nsw i32 %conv7, %conv8
	%add10 = add i32 %mul, %mac1.026

	; Here the Mul is the LHS, and the Add the RHS.
	%add11 = add i32 %mul9, %add10

	%exitcond = icmp ne i32 %add, %arg
	br i1 %exitcond, label %for.body, label %for.cond.cleanup
	}

test/CodeGen/ARM/ParallelDSP/smladx-1.ll

	; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -arm-parallel-dsp %s -S -o - \| FileCheck %s			; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -arm-parallel-dsp %s -S -o - \| FileCheck %s
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	; RUN: opt -mtriple=armeb-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=armeb-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED

	define i32 @smladx(i16* nocapture readonly %pIn1, i16* nocapture readonly %pIn2, i32 %j, i32 %limit) {			define i32 @smladx(i16* nocapture readonly %pIn1, i16* nocapture readonly %pIn2, i32 %j, i32 %limit) {

	; CHECK-LABEL: smladx			; CHECK-LABEL: smladx
	; CHECK: = phi i32 [ 0, %for.body.preheader.new ],			; CHECK: = phi i32 [ 0, %for.body.preheader.new ],
	; CHECK: [[ACC0:%[^ ]+]] = phi i32 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]			; CHECK: [[ACC0:%[^ ]+]] = phi i32 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]
				; CHECK: [[PIN21:%[^ ]+]] = bitcast i16* %pIn2.1 to i32*
				; CHECK: [[IN21:%[^ ]+]] = load i32, i32* [[PIN21]], align 2
				; CHECK: [[PIN10:%[^ ]+]] = bitcast i16* %pIn1.0 to i32*
				; CHECK: [[IN10:%[^ ]+]] = load i32, i32* [[PIN10]], align 2
	; CHECK: [[PIN23:%[^ ]+]] = bitcast i16* %pIn2.3 to i32*			; CHECK: [[PIN23:%[^ ]+]] = bitcast i16* %pIn2.3 to i32*
	; CHECK: [[IN23:%[^ ]+]] = load i32, i32* [[PIN23]], align 2			; CHECK: [[IN23:%[^ ]+]] = load i32, i32* [[PIN23]], align 2
	; CHECK: [[PIN12:%[^ ]+]] = bitcast i16* %pIn1.2 to i32*			; CHECK: [[PIN12:%[^ ]+]] = bitcast i16* %pIn1.2 to i32*
	; CHECK: [[IN12:%[^ ]+]] = load i32, i32* [[PIN12]], align 2			; CHECK: [[IN12:%[^ ]+]] = load i32, i32* [[PIN12]], align 2
	; CHECK: [[ACC1:%[^ ]+]] = call i32 @llvm.arm.smladx(i32 [[IN23]], i32 [[IN12]], i32 [[ACC0]])			; CHECK: [[ACC1:%[^ ]+]] = call i32 @llvm.arm.smladx(i32 [[IN23]], i32 [[IN12]], i32 [[ACC0]])
	; CHECK: [[PIN21:%[^ ]+]] = bitcast i16* %pIn2.1 to i32*
	; CHECK: [[IN21:%[^ ]+]] = load i32, i32* [[PIN21]], align 2
	; CHECK: [[PIN10:%[^ ]+]] = bitcast i16* %pIn1.0 to i32*
	; CHECK: [[IN10:%[^ ]+]] = load i32, i32* [[PIN10]], align 2
	; CHECK: [[ACC2]] = call i32 @llvm.arm.smladx(i32 [[IN21]], i32 [[IN10]], i32 [[ACC1]])			; CHECK: [[ACC2]] = call i32 @llvm.arm.smladx(i32 [[IN21]], i32 [[IN10]], i32 [[ACC1]])
	; CHECK-NOT: call i32 @llvm.arm.smlad			; CHECK-NOT: call i32 @llvm.arm.smlad
	; CHECK-UNSUPPORTED-NOT: call i32 @llvm.arm.smlad			; CHECK-UNSUPPORTED-NOT: call i32 @llvm.arm.smlad

	entry:			entry:
	%cmp9 = icmp eq i32 %limit, 0			%cmp9 = icmp eq i32 %limit, 0
	br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader			br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader

	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK: [[PIN2Base:[^ ]+]] = getelementptr i16, i16* %pIn2			; CHECK: [[PIN2Base:[^ ]+]] = getelementptr i16, i16* %pIn2

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: [[PIN2:%[^ ]+]] = phi i16* [ [[PIN2_NEXT:%[^ ]+]], %for.body ], [ [[PIN2Base]], %for.body.preheader.new ]			; CHECK: [[PIN2:%[^ ]+]] = phi i16* [ [[PIN2_NEXT:%[^ ]+]], %for.body ], [ [[PIN2Base]], %for.body.preheader.new ]
	; CHECK: [[PIN1:%[^ ]+]] = phi i16* [ [[PIN1_NEXT:%[^ ]+]], %for.body ], [ [[PIN1Base]], %for.body.preheader.new ]			; CHECK: [[PIN1:%[^ ]+]] = phi i16* [ [[PIN1_NEXT:%[^ ]+]], %for.body ], [ [[PIN1Base]], %for.body.preheader.new ]
	; CHECK: [[IV:%[^ ]+]] = phi i32			; CHECK: [[IV:%[^ ]+]] = phi i32
	; CHECK: [[ACC0:%[^ ]+]] = phi i32 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]			; CHECK: [[ACC0:%[^ ]+]] = phi i32 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]

	; CHECK: [[PIN1_2:%[^ ]+]] = getelementptr i16, i16* [[PIN1]], i32 -2			; CHECK: [[PIN2_CAST:%[^ ]+]] = bitcast i16* [[PIN2]] to i32*
	; CHECK: [[PIN2_2:%[^ ]+]] = getelementptr i16, i16* [[PIN2]], i32 -2			; CHECK: [[IN2:%[^ ]+]] = load i32, i32* [[PIN2_CAST]], align 2

				; CHECK: [[PIN1_2:%[^ ]+]] = getelementptr i16, i16* [[PIN1]], i32 -2
				; CHECK: [[PIN1_2_CAST:%[^ ]+]] = bitcast i16* [[PIN1_2]] to i32*
				; CHECK: [[IN1_2:%[^ ]+]] = load i32, i32* [[PIN1_2_CAST]], align 2

				; CHECK: [[PIN2_2:%[^ ]+]] = getelementptr i16, i16* [[PIN2]], i32 -2
	; CHECK: [[PIN2_2_CAST:%[^ ]+]] = bitcast i16* [[PIN2_2]] to i32*			; CHECK: [[PIN2_2_CAST:%[^ ]+]] = bitcast i16* [[PIN2_2]] to i32*
	; CHECK: [[IN2_2:%[^ ]+]] = load i32, i32* [[PIN2_2_CAST]], align 2			; CHECK: [[IN2_2:%[^ ]+]] = load i32, i32* [[PIN2_2_CAST]], align 2

	; CHECK: [[PIN1_CAST:%[^ ]+]] = bitcast i16* [[PIN1]] to i32*			; CHECK: [[PIN1_CAST:%[^ ]+]] = bitcast i16* [[PIN1]] to i32*
	; CHECK: [[IN1:%[^ ]+]] = load i32, i32* [[PIN1_CAST]], align 2			; CHECK: [[IN1:%[^ ]+]] = load i32, i32* [[PIN1_CAST]], align 2
	; CHECK: [[ACC1:%[^ ]+]] = call i32 @llvm.arm.smladx(i32 [[IN2_2]], i32 [[IN1]], i32 [[ACC0]])

	; CHECK: [[PIN2_CAST:%[^ ]+]] = bitcast i16* [[PIN2]] to i32*			; CHECK: [[ACC1:%[^ ]+]] = call i32 @llvm.arm.smladx(i32 [[IN2_2]], i32 [[IN1]], i32 [[ACC0]])
	; CHECK: [[IN2:%[^ ]+]] = load i32, i32* [[PIN2_CAST]], align 2
	; CHECK: [[PIN1_2_CAST:%[^ ]+]] = bitcast i16* [[PIN1_2]] to i32*
	; CHECK: [[IN1_2:%[^ ]+]] = load i32, i32* [[PIN1_2_CAST]], align 2
	; CHECK: [[ACC2]] = call i32 @llvm.arm.smladx(i32 [[IN2]], i32 [[IN1_2]], i32 [[ACC1]])			; CHECK: [[ACC2]] = call i32 @llvm.arm.smladx(i32 [[IN2]], i32 [[IN1_2]], i32 [[ACC1]])

	; CHECK: [[PIN1_NEXT]] = getelementptr i16, i16* [[PIN1]], i32 4			; CHECK: [[PIN1_NEXT]] = getelementptr i16, i16* [[PIN1]], i32 4
	; CHECK: [[PIN2_NEXT]] = getelementptr i16, i16* [[PIN2]], i32 -4			; CHECK: [[PIN2_NEXT]] = getelementptr i16, i16* [[PIN2]], i32 -4

	; CHECK-NOT: call i32 @llvm.arm.smlad			; CHECK-NOT: call i32 @llvm.arm.smlad
	; CHECK-UNSUPPORTED-NOT: call i32 @llvm.arm.smlad			; CHECK-UNSUPPORTED-NOT: call i32 @llvm.arm.smlad

	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlald0.ll

	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s
	; RUN: opt -mtriple=armeb-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=armeb-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	;			;
	; The Cortex-M0 does not support unaligned accesses:			; The Cortex-M0 does not support unaligned accesses:
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	;			;
	; Check DSP extension:			; Check DSP extension:
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED

	define dso_local i64 @OneReduction(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {			define dso_local i64 @OneReduction(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
	;			;
	; CHECK-LABEL: @OneReduction			; CHECK-LABEL: @OneReduction
	; CHECK: %mac1{{\.}}026 = phi i64 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]			; CHECK: %mac1{{\.}}026 = phi i64 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]
	; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
	; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*			; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*
	; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2			; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2
				; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
				; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V8]] = call i64 @llvm.arm.smlald(i32 [[V5]], i32 [[V7]], i64 %mac1{{\.}}026)			; CHECK: [[V8]] = call i64 @llvm.arm.smlald(i32 [[V5]], i32 [[V7]], i64 %mac1{{\.}}026)
	; CHECK-NOT: call i64 @llvm.arm.smlald			; CHECK-NOT: call i64 @llvm.arm.smlald
	;			;
	; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlald			; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlald
	;			;
	entry:			entry:
	%cmp24 = icmp sgt i32 %arg, 0			%cmp24 = icmp sgt i32 %arg, 0
	br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlald1.ll

	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s

	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: %mac1{{\.}}026 = phi i64 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]			; CHECK: %mac1{{\.}}026 = phi i64 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]
	; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
	; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*			; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*
	; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2			; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2
				; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
				; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V8]] = call i64 @llvm.arm.smlald(i32 [[V5]], i32 [[V7]], i64 %mac1{{\.}}026)			; CHECK: [[V8]] = call i64 @llvm.arm.smlald(i32 [[V5]], i32 [[V7]], i64 %mac1{{\.}}026)

	define dso_local i64 @test1(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {			define dso_local i64 @test1(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
	entry:			entry:
	%cmp24 = icmp sgt i32 %arg, 0			%cmp24 = icmp sgt i32 %arg, 0
	br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:			for.body.preheader:
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlald2.ll

	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 < %s -arm-parallel-dsp -S \| FileCheck %s
	;			;
	; The Cortex-M0 does not support unaligned accesses:			; The Cortex-M0 does not support unaligned accesses:
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	;			;
	; Check DSP extension:			; Check DSP extension:
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED

	define dso_local i64 @OneReduction(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {			define dso_local i64 @OneReduction(i32 %arg, i32* nocapture readnone %arg1, i16* nocapture readonly %arg2, i16* nocapture readonly %arg3) {
	;			;
	; CHECK-LABEL: @OneReduction			; CHECK-LABEL: @OneReduction
	; CHECK: %mac1{{\.}}026 = phi i64 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]			; CHECK: %mac1{{\.}}026 = phi i64 [ [[V8:%[0-9]+]], %for.body ], [ 0, %for.body.preheader ]
	; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
	; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*			; CHECK: [[V6:%[0-9]+]] = bitcast i16* %arrayidx to i32*
	; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2			; CHECK: [[V7:%[0-9]+]] = load i32, i32* [[V6]], align 2
				; CHECK: [[V4:%[0-9]+]] = bitcast i16* %arrayidx3 to i32*
				; CHECK: [[V5:%[0-9]+]] = load i32, i32* [[V4]], align 2
	; CHECK: [[V8]] = call i64 @llvm.arm.smlald(i32 [[V5]], i32 [[V7]], i64 %mac1{{\.}}026)			; CHECK: [[V8]] = call i64 @llvm.arm.smlald(i32 [[V5]], i32 [[V7]], i64 %mac1{{\.}}026)
	; CHECK-NOT: call i64 @llvm.arm.smlald			; CHECK-NOT: call i64 @llvm.arm.smlald
	;			;
	; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlald			; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlald
	;			;
	entry:			entry:
	%cmp24 = icmp sgt i32 %arg, 0			%cmp24 = icmp sgt i32 %arg, 0
	br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp24, label %for.body.preheader, label %for.cond.cleanup
	▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlaldx-1.ll

	; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -arm-parallel-dsp %s -S -o - \| FileCheck %s			; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -arm-parallel-dsp %s -S -o - \| FileCheck %s
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED

	define i64 @smlaldx(i16* nocapture readonly %pIn1, i16* nocapture readonly %pIn2, i32 %j, i32 %limit) {			define i64 @smlaldx(i16* nocapture readonly %pIn1, i16* nocapture readonly %pIn2, i32 %j, i32 %limit) {

	; CHECK-LABEL: smlaldx			; CHECK-LABEL: smlaldx
	; CHECK: = phi i32 [ 0, %for.body.preheader.new ],			; CHECK: = phi i32 [ 0, %for.body.preheader.new ],
	; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]			; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]
				; CHECK: [[PIN21:%[^ ]+]] = bitcast i16* %pIn2.1 to i32*
				; CHECK: [[IN21:%[^ ]+]] = load i32, i32* [[PIN21]], align 2
				; CHECK: [[PIN10:%[^ ]+]] = bitcast i16* %pIn1.0 to i32*
				; CHECK: [[IN10:%[^ ]+]] = load i32, i32* [[PIN10]], align 2
	; CHECK: [[PIN23:%[^ ]+]] = bitcast i16* %pIn2.3 to i32*			; CHECK: [[PIN23:%[^ ]+]] = bitcast i16* %pIn2.3 to i32*
	; CHECK: [[IN23:%[^ ]+]] = load i32, i32* [[PIN23]], align 2			; CHECK: [[IN23:%[^ ]+]] = load i32, i32* [[PIN23]], align 2
	; CHECK: [[PIN12:%[^ ]+]] = bitcast i16* %pIn1.2 to i32*			; CHECK: [[PIN12:%[^ ]+]] = bitcast i16* %pIn1.2 to i32*
	; CHECK: [[IN12:%[^ ]+]] = load i32, i32* [[PIN12]], align 2			; CHECK: [[IN12:%[^ ]+]] = load i32, i32* [[PIN12]], align 2
	; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN23]], i32 [[IN12]], i64 [[ACC0]])			; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN23]], i32 [[IN12]], i64 [[ACC0]])
	; CHECK: [[PIN21:%[^ ]+]] = bitcast i16* %pIn2.1 to i32*
	; CHECK: [[IN21:%[^ ]+]] = load i32, i32* [[PIN21]], align 2
	; CHECK: [[PIN10:%[^ ]+]] = bitcast i16* %pIn1.0 to i32*
	; CHECK: [[IN10:%[^ ]+]] = load i32, i32* [[PIN10]], align 2
	; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN21]], i32 [[IN10]], i64 [[ACC1]])			; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN21]], i32 [[IN10]], i64 [[ACC1]])
	; CHECK-NOT: call i64 @llvm.arm.smlad			; CHECK-NOT: call i64 @llvm.arm.smlad
	; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad			; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad

	entry:			entry:
	%cmp9 = icmp eq i32 %limit, 0			%cmp9 = icmp eq i32 %limit, 0
	br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader			br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader

	▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; CHECK: [[PIN2Base:[^ ]+]] = getelementptr i16, i16* %pIn2			; CHECK: [[PIN2Base:[^ ]+]] = getelementptr i16, i16* %pIn2

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: [[PIN2:%[^ ]+]] = phi i16* [ [[PIN2_NEXT:%[^ ]+]], %for.body ], [ [[PIN2Base]], %for.body.preheader.new ]			; CHECK: [[PIN2:%[^ ]+]] = phi i16* [ [[PIN2_NEXT:%[^ ]+]], %for.body ], [ [[PIN2Base]], %for.body.preheader.new ]
	; CHECK: [[PIN1:%[^ ]+]] = phi i16* [ [[PIN1_NEXT:%[^ ]+]], %for.body ], [ [[PIN1Base]], %for.body.preheader.new ]			; CHECK: [[PIN1:%[^ ]+]] = phi i16* [ [[PIN1_NEXT:%[^ ]+]], %for.body ], [ [[PIN1Base]], %for.body.preheader.new ]
	; CHECK: [[IV:%[^ ]+]] = phi i32			; CHECK: [[IV:%[^ ]+]] = phi i32
	; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]			; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]

				; CHECK: [[PIN2_CAST:%[^ ]+]] = bitcast i16* [[PIN2]] to i32*
				; CHECK: [[IN2:%[^ ]+]] = load i32, i32* [[PIN2_CAST]], align 2

	; CHECK: [[PIN1_2:%[^ ]+]] = getelementptr i16, i16* [[PIN1]], i32 -2			; CHECK: [[PIN1_2:%[^ ]+]] = getelementptr i16, i16* [[PIN1]], i32 -2
	; CHECK: [[PIN2_2:%[^ ]+]] = getelementptr i16, i16* [[PIN2]], i32 -2			; CHECK: [[PIN1_2_CAST:%[^ ]+]] = bitcast i16* [[PIN1_2]] to i32*
				; CHECK: [[IN1_2:%[^ ]+]] = load i32, i32* [[PIN1_2_CAST]], align 2

				; CHECK: [[PIN2_2:%[^ ]+]] = getelementptr i16, i16* [[PIN2]], i32 -2
	; CHECK: [[PIN2_2_CAST:%[^ ]+]] = bitcast i16* [[PIN2_2]] to i32*			; CHECK: [[PIN2_2_CAST:%[^ ]+]] = bitcast i16* [[PIN2_2]] to i32*
	; CHECK: [[IN2_2:%[^ ]+]] = load i32, i32* [[PIN2_2_CAST]], align 2			; CHECK: [[IN2_2:%[^ ]+]] = load i32, i32* [[PIN2_2_CAST]], align 2

	; CHECK: [[PIN1_CAST:%[^ ]+]] = bitcast i16* [[PIN1]] to i32*			; CHECK: [[PIN1_CAST:%[^ ]+]] = bitcast i16* [[PIN1]] to i32*
	; CHECK: [[IN1:%[^ ]+]] = load i32, i32* [[PIN1_CAST]], align 2			; CHECK: [[IN1:%[^ ]+]] = load i32, i32* [[PIN1_CAST]], align 2
	; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN2_2]], i32 [[IN1]], i64 [[ACC0]])

	; CHECK: [[PIN2_CAST:%[^ ]+]] = bitcast i16* [[PIN2]] to i32*			; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN2_2]], i32 [[IN1]], i64 [[ACC0]])
	; CHECK: [[IN2:%[^ ]+]] = load i32, i32* [[PIN2_CAST]], align 2
	; CHECK: [[PIN1_2_CAST:%[^ ]+]] = bitcast i16* [[PIN1_2]] to i32*
	; CHECK: [[IN1_2:%[^ ]+]] = load i32, i32* [[PIN1_2_CAST]], align 2
	; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN2]], i32 [[IN1_2]], i64 [[ACC1]])			; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN2]], i32 [[IN1_2]], i64 [[ACC1]])

	; CHECK: [[PIN1_NEXT]] = getelementptr i16, i16* [[PIN1]], i32 4			; CHECK: [[PIN1_NEXT]] = getelementptr i16, i16* [[PIN1]], i32 4
	; CHECK: [[PIN2_NEXT]] = getelementptr i16, i16* [[PIN2]], i32 -4			; CHECK: [[PIN2_NEXT]] = getelementptr i16, i16* [[PIN2]], i32 -4

	; CHECK-NOT: call i64 @llvm.arm.smlad			; CHECK-NOT: call i64 @llvm.arm.smlad
	; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad			; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad

	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/CodeGen/ARM/ParallelDSP/smlaldx-2.ll

	; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -arm-parallel-dsp %s -S -o - \| FileCheck %s			; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -arm-parallel-dsp %s -S -o - \| FileCheck %s
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m0 < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED
	; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED			; RUN: opt -mtriple=arm-arm-eabi -mcpu=cortex-m33 -mattr=-dsp < %s -arm-parallel-dsp -S \| FileCheck %s --check-prefix=CHECK-UNSUPPORTED

	define i64 @smlaldx(i16* nocapture readonly %pIn1, i16* nocapture readonly %pIn2, i32 %j, i32 %limit) {			define i64 @smlaldx(i16* nocapture readonly %pIn1, i16* nocapture readonly %pIn2, i32 %j, i32 %limit) {

	; CHECK-LABEL: smlaldx			; CHECK-LABEL: smlaldx
	; CHECK: = phi i32 [ 0, %for.body.preheader.new ],			; CHECK: = phi i32 [ 0, %for.body.preheader.new ],
	; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]			; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]
				; CHECK: [[PIN21:%[^ ]+]] = bitcast i16* %pIn2.1 to i32*
				; CHECK: [[IN21:%[^ ]+]] = load i32, i32* [[PIN21]], align 2
				; CHECK: [[PIN10:%[^ ]+]] = bitcast i16* %pIn1.0 to i32*
				; CHECK: [[IN10:%[^ ]+]] = load i32, i32* [[PIN10]], align 2
	; CHECK: [[PIN23:%[^ ]+]] = bitcast i16* %pIn2.3 to i32*			; CHECK: [[PIN23:%[^ ]+]] = bitcast i16* %pIn2.3 to i32*
	; CHECK: [[IN23:%[^ ]+]] = load i32, i32* [[PIN23]], align 2			; CHECK: [[IN23:%[^ ]+]] = load i32, i32* [[PIN23]], align 2
	; CHECK: [[PIN12:%[^ ]+]] = bitcast i16* %pIn1.2 to i32*			; CHECK: [[PIN12:%[^ ]+]] = bitcast i16* %pIn1.2 to i32*
	; CHECK: [[IN12:%[^ ]+]] = load i32, i32* [[PIN12]], align 2			; CHECK: [[IN12:%[^ ]+]] = load i32, i32* [[PIN12]], align 2
	; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN23]], i32 [[IN12]], i64 [[ACC0]])			; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN23]], i32 [[IN12]], i64 [[ACC0]])
	; CHECK: [[PIN21:%[^ ]+]] = bitcast i16* %pIn2.1 to i32*
	; CHECK: [[IN21:%[^ ]+]] = load i32, i32* [[PIN21]], align 2
	; CHECK: [[PIN10:%[^ ]+]] = bitcast i16* %pIn1.0 to i32*
	; CHECK: [[IN10:%[^ ]+]] = load i32, i32* [[PIN10]], align 2
	; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN21]], i32 [[IN10]], i64 [[ACC1]])			; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN21]], i32 [[IN10]], i64 [[ACC1]])
	; CHECK-NOT: call i64 @llvm.arm.smlad			; CHECK-NOT: call i64 @llvm.arm.smlad
	; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad			; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad

	entry:			entry:
	%cmp9 = icmp eq i32 %limit, 0			%cmp9 = icmp eq i32 %limit, 0
	br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader			br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader

	▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; CHECK: [[PIN1Base:[^ ]+]] = getelementptr i16, i16* %pIn1			; CHECK: [[PIN1Base:[^ ]+]] = getelementptr i16, i16* %pIn1
	; CHECK: [[PIN2Base:[^ ]+]] = getelementptr i16, i16* %pIn2			; CHECK: [[PIN2Base:[^ ]+]] = getelementptr i16, i16* %pIn2

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: [[PIN2:%[^ ]+]] = phi i16* [ [[PIN2_NEXT:%[^ ]+]], %for.body ], [ [[PIN2Base]], %for.body.preheader.new ]			; CHECK: [[PIN2:%[^ ]+]] = phi i16* [ [[PIN2_NEXT:%[^ ]+]], %for.body ], [ [[PIN2Base]], %for.body.preheader.new ]
	; CHECK: [[PIN1:%[^ ]+]] = phi i16* [ [[PIN1_NEXT:%[^ ]+]], %for.body ], [ [[PIN1Base]], %for.body.preheader.new ]			; CHECK: [[PIN1:%[^ ]+]] = phi i16* [ [[PIN1_NEXT:%[^ ]+]], %for.body ], [ [[PIN1Base]], %for.body.preheader.new ]
	; CHECK: [[IV:%[^ ]+]] = phi i32			; CHECK: [[IV:%[^ ]+]] = phi i32
	; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]			; CHECK: [[ACC0:%[^ ]+]] = phi i64 [ 0, %for.body.preheader.new ], [ [[ACC2:%[^ ]+]], %for.body ]
	; CHECK: [[PIN1_2:%[^ ]+]] = getelementptr i16, i16* [[PIN1]], i32 -2
	; CHECK: [[PIN2_2:%[^ ]+]] = getelementptr i16, i16* [[PIN2]], i32 -2

	; CHECK: [[PIN2_CAST:%[^ ]+]] = bitcast i16* [[PIN2]] to i32*			; CHECK: [[PIN2_CAST:%[^ ]+]] = bitcast i16* [[PIN2]] to i32*
	; CHECK: [[IN2:%[^ ]+]] = load i32, i32* [[PIN2_CAST]], align 2			; CHECK: [[IN2:%[^ ]+]] = load i32, i32* [[PIN2_CAST]], align 2

				; CHECK: [[PIN1_2:%[^ ]+]] = getelementptr i16, i16* [[PIN1]], i32 -2
	; CHECK: [[PIN1_2_CAST:%[^ ]+]] = bitcast i16* [[PIN1_2]] to i32*			; CHECK: [[PIN1_2_CAST:%[^ ]+]] = bitcast i16* [[PIN1_2]] to i32*
	; CHECK: [[IN1_2:%[^ ]+]] = load i32, i32* [[PIN1_2_CAST]], align 2			; CHECK: [[IN1_2:%[^ ]+]] = load i32, i32* [[PIN1_2_CAST]], align 2
	; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN2]], i32 [[IN1_2]], i64 [[ACC0]])

	; CHECK: [[PIN1_CAST:%[^ ]+]] = bitcast i16* [[PIN1]] to i32*			; CHECK: [[PIN2_2:%[^ ]+]] = getelementptr i16, i16* [[PIN2]], i32 -2
	; CHECK: [[IN1:%[^ ]+]] = load i32, i32* [[PIN1_CAST]], align 2
	; CHECK: [[PIN2_2_CAST:%[^ ]+]] = bitcast i16* [[PIN2_2]] to i32*			; CHECK: [[PIN2_2_CAST:%[^ ]+]] = bitcast i16* [[PIN2_2]] to i32*
	; CHECK: [[IN2_2:%[^ ]+]] = load i32, i32* [[PIN2_2_CAST]], align 2			; CHECK: [[IN2_2:%[^ ]+]] = load i32, i32* [[PIN2_2_CAST]], align 2

				; CHECK: [[PIN1_CAST:%[^ ]+]] = bitcast i16* [[PIN1]] to i32*
				; CHECK: [[IN1:%[^ ]+]] = load i32, i32* [[PIN1_CAST]], align 2

				; CHECK: [[ACC1:%[^ ]+]] = call i64 @llvm.arm.smlaldx(i32 [[IN2]], i32 [[IN1_2]], i64 [[ACC0]])
	; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN1]], i32 [[IN2_2]], i64 [[ACC1]])			; CHECK: [[ACC2]] = call i64 @llvm.arm.smlaldx(i32 [[IN1]], i32 [[IN2_2]], i64 [[ACC1]])

	; CHECK: [[PIN1_NEXT]] = getelementptr i16, i16* [[PIN1]], i32 4			; CHECK: [[PIN1_NEXT]] = getelementptr i16, i16* [[PIN1]], i32 4
	; CHECK: [[PIN2_NEXT]] = getelementptr i16, i16* [[PIN2]], i32 -4			; CHECK: [[PIN2_NEXT]] = getelementptr i16, i16* [[PIN2]], i32 -4

	; CHECK-NOT: call i64 @llvm.arm.smlad			; CHECK-NOT: call i64 @llvm.arm.smlad
	; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad			; CHECK-UNSUPPORTED-NOT: call i64 @llvm.arm.smlad

	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM][ParallelDSP] Relax alias checksClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198450

lib/Target/ARM/ARMParallelDSP.cpp

test/CodeGen/ARM/ParallelDSP/aliasing.ll

test/CodeGen/ARM/ParallelDSP/smlad0.ll

test/CodeGen/ARM/ParallelDSP/smlad1.ll

test/CodeGen/ARM/ParallelDSP/smlad11.ll

test/CodeGen/ARM/ParallelDSP/smlad6.ll

test/CodeGen/ARM/ParallelDSP/smlad7.ll

test/CodeGen/ARM/ParallelDSP/smladx-1.ll

test/CodeGen/ARM/ParallelDSP/smlald0.ll

test/CodeGen/ARM/ParallelDSP/smlald1.ll

test/CodeGen/ARM/ParallelDSP/smlald2.ll

test/CodeGen/ARM/ParallelDSP/smlaldx-1.ll

test/CodeGen/ARM/ParallelDSP/smlaldx-2.ll

[ARM][ParallelDSP] Relax alias checks
ClosedPublic