This is an archive of the discontinued LLVM Phabricator instance.

Detecte vector reduction operations just before instruction selection.
ClosedPublic

Authored by congh on Dec 4 2015, 3:43 PM.

Download Raw Diff

Details

Reviewers

spatel
davidxl
hfinkel

Commits

rG4ce0280a416c: Detecte vector reduction operations just before instruction selection.
rGbbd4e3b4003f: Detecte vector reduction operations just before instruction selection.
rL261804: Detecte vector reduction operations just before instruction selection.
rL261070: Detecte vector reduction operations just before instruction selection.

Summary

This patch detects vector reductions before instruction selection. Vector reductions are vectorized reduction operations, and for such operations we have freedom to reorganize the elements of the result as long as the reduction of them stay unchanged. This will enable some reduction pattern recognition during instruction combine such as SAD/dot-product on X86. A flag is added to SDNodeFlags to mark those vector reduction nodes to be checked during instruction combine.

To detect those vector reductions, we search def-use chains starting from the given instruction, and check if all uses fall into two categories:

Reduction with another vector.
Reduction on all elements.

in which 2 is detected by recognizing the pattern that the loop vectorizer generates to reduce all elements in the vector outside of the loop, which includes several ShuffleVector and one ExtractElement instructions.

Please checkout http://lists.llvm.org/pipermail/llvm-dev/2015-November/092379.html for discussions on this topic.

Diff Detail

Repository: rL LLVM

Event Timeline

Fix a small comment format.

hfinkel added inline comments.Dec 10 2015, 11:35 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2333 ↗	(On Diff #41956)	We need to also grab FAdd and FMul when appropriate fast-math flags are present.
2377 ↗	(On Diff #41956)	For PHIs, don't we need to check that the initial value is the operation identify value (0 for add, 1 for multiply, etc.)? And that there are only two unique incoming blocks?
2410 ↗	(On Diff #41956)	We should also catch the way that this is often programmed "by hand": typedef float v4f __attribute__((vector_size(16))); v4f foo(void); float bar() { v4f phi = { 0, 0, 0, 0 }; for (int i = 0; i < 1600; ++i) phi += foo(); return phi[0] + phi[1] + phi[2] + phi[3]; } thus, we have multiple extracts instead of shuffles. One might argue that we should canonicalize this to the shuffle form, but we don't have anything that does this currently (AFAIK). What do you think?
2457 ↗	(On Diff #41956)	You need to set this flag on the PHI node too (meaning the associate CopyFromReg, etc. instructions), right?
test/CodeGen/Generic/vector-redux.ll
2 ↗	(On Diff #41956)	You need to add: ; REQUIRES: asserts here because you're checking debug output which is available only in +Asserts builds.

congh marked an inline comment as done.Dec 10 2015, 2:53 PM

congh added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2377 ↗	(On Diff #41956)	I think this may not be necessary. We only care about how the values in the vector are used or how they flow to other defs. They can flow to phi nodes, but we need to check all uses of those phi nodes. Otherwise before reduction on elements, they can only flow to other defs of the same associative operation (e.g. add). As long as the reduction on elements is the only def where they are flowing to, it is safe to assume the operation is a vector-reduction one.
2410 ↗	(On Diff #41956)	OK, as long as this pattern often appears. After all, programs made by hand can have many different forms. For example, we can use target specific intrinsics (like movehl or hadd on X86) in the final reduction. As we don't have canonicalization now, I agree that we should make this function open to support more patterns.
2457 ↗	(On Diff #41956)	Yes, we can do this, but I am wondering how we use the flag on PHI node during instruction combine (in my case I only check flags on operation nodes, but I think there may be other cases in which reduction PHI node can help?).
test/CodeGen/Generic/vector-redux.ll
2 ↗	(On Diff #41956)	OK. Thanks for pointing it out!

hfinkel added inline comments.Dec 10 2015, 3:21 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2377 ↗	(On Diff #41956)	Sounds good.
2410 ↗	(On Diff #41956)	Thinking about it, I think it is better to canonicalize these in instcombine. The shuffle form will use fewer instructions in all cases (except the two-element case). Don't bother matching it here, we should add canonicalization to instcombine in a separate patch. FWIW, however, yes, I've seen code like this from several users at our facility.
2457 ↗	(On Diff #41956)	Okay; don't bother then. I was thinking you needed it for isel, but if not, wait until we have a concrete use case.

Update the patch according to Hal's comments.

FAdd and FMul are now supported when fast-math is used.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2410 ↗	(On Diff #41956)	I agree. Then I will not modify this part now.

spatel added inline comments.Dec 11 2015, 9:13 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2410 ↗	(On Diff #42479)	For reference, I filed this as PR25808 so I could link it to some other reduction bugs: https://llvm.org/bugs/show_bug.cgi?id=25808

congh added inline comments.Dec 11 2015, 10:45 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2410 ↗	(On Diff #42479)	Thanks for filing the bug!

suyog added a subscriber: suyog.Dec 14 2015, 12:15 AM

suyog added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2410 ↗	(On Diff #42479)	This may also be related to https://llvm.org/bugs/show_bug.cgi?id=20035

Ping?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2410 ↗	(On Diff #42479)	Yes, this is a special case of reduction-on-elements operations.

Ping again?

Ping?

One minor think I noticed, but overall I don't know this code well enough to review, sorry.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2338 ↗	(On Diff #42479)	return false here or add a 'fall through' comment here to make it clear that its intentional

Added a fall-through comment as suggested by Simon.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2338 ↗	(On Diff #42479)	OK, thanks!

In D15250#319690, @RKSimon wrote:

One minor think I noticed, but overall I don't know this code well enough to review, sorry.

Thanks for the review, Simon! I will let Hal or others to approve this patch.

hfinkel added inline comments.Jan 15 2016, 6:16 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2304 ↗	(On Diff #44149)	'reorganize' sounds a bit weak here. How about reorganize -> alter
2305 ↗	(On Diff #44149)	stay -> stays
2336 ↗	(On Diff #44149)	'same arithmetic' sounds odd. Maybe just say 'same opcode'.
2341 ↗	(On Diff #44149)	do -> does a
2347 ↗	(On Diff #44149)	reduction -> a reduction
2349 ↗	(On Diff #44149)	insturctions (typo)
2349 ↗	(On Diff #44149)	Remove 'supposed to be'
2367 ↗	(On Diff #44149)	We also need to check the fast-math flags on fadd/fmul here.
2367 ↗	(On Diff #44149)	We can allow selects here for the same reason we can allow phis, right? If so, we should.
2375 ↗	(On Diff #44149)	Do we need to check that we don't have ElemNumToReduce == 1 here?
2398 ↗	(On Diff #44149)	When ElemNumToReduce == 1 here, do we need to check that the only user is an ExtractElementInst? I'm concerned that it could be a another reduction operation before the extract.

hfinkel added inline comments.Jan 15 2016, 6:21 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2367 ↗	(On Diff #44149)	(Specifically, I mean selects with a scalar condition; I assume that selects with a vector condition would not work here).

RKSimon resigned from this revision.Jan 16 2016, 2:08 PM

RKSimon removed a reviewer: RKSimon.

congh marked 8 inline comments as done.Jan 19 2016, 12:56 PM

congh added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2367 ↗	(On Diff #44149)	When I wrote this pattern recognition, I am more concerned about the compiler vectorized code, which normally won't have selects. We detect phi because it can appear in the beginning of the loop, which is not the case of selects. Do you think if we should make this pattern recognition complicated enough to catch all cases that are manually composed?
2375 ↗	(On Diff #44149)	I am not sure if it could happen, but I added such a check just in case.
2398 ↗	(On Diff #44149)	I think even there is another reduction operation just before extract, it is still OK. We only care about if all values in vector are reduced to one element and this is still the case.

Update the patch according to Hal's comments.

Ping?

Please make sure the select case if handled (autovectorization test case provided below); otherwise, LGTM. Thanks!

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

2367 ↗

(On Diff #45294)

I understand, but this comes up in autovectorized code as well. Here's a quick example:

$ cat /tmp/v.c 
int foo(int * restrict a1, int * restrict a2, int * restrict a3, int * restrict a4, int * restrict a5,
        int * restrict a6, int * restrict a7, int * restrict a8, int * restrict a9, int * restrict a10,
        int * restrict a11, int * restrict a12, int * restrict a13, int * restrict a14, int * restrict a15,
        int * restrict a16, int * restrict a17, int * restrict a18, int * restrict a19, int * restrict a20,
        int * restrict a21, int * restrict a22, int * restrict a23, int * restrict a24, int * restrict a25,
        int * restrict a26, int * restrict a27, int * restrict a28, int * restrict a29, int * restrict a30,
        int * restrict b, int * restrict c, int x) {
  int r = 0;
  for (int i = 0; i < 1600; ++i)
    // Lots of other stuff to prevent loop unswitching from kicking in.
    r += a1[i] + a2[i] + a3[i] + a4[i] + a5[i] +
         a6[i] + a7[i] + a8[i] + a9[i] + a10[i] +
         a11[i] + a12[i] + a13[i] + a14[i] + a15[i] +
         a16[i] + a17[i] + a18[i] + a19[i] + a20[i] +
         a21[i] + a22[i] + a23[i] + a24[i] + a25[i] +
         a26[i] + a27[i] + a28[i] + a29[i] + a30[i] +
         b[i] + c[i] + (x > 5 ? b[i] : c[i]);

  return r;
}

Look at the IR from:

$ clang -target powerpc64 -mcpu=pwr7 -O3 -S -emit-llvm -fno-unroll-loops -o - /tmp/v.c

and you'll see:

  %64 = select i1 %cmp93, <4 x i32> %wide.load170, <4 x i32> %wide.load171
...
  %93 = add <4 x i32> %92, %wide.load168
%94 = add <4 x i32> %93, %wide.load169
%95 = add <4 x i32> %94, %wide.load170
%96 = add <4 x i32> %95, %wide.load171
%97 = add <4 x i32> %96, %64
...

And we really should handle this case.

This revision is now accepted and ready to land.Jan 26 2016, 3:18 PM

In D15250#336665, @hfinkel wrote:

Please make sure the select case if handled (autovectorization test case provided below); otherwise, LGTM. Thanks!

Sorry for replying your comments so late as I just finished a long vacation. In your proposed case, the definitions of reduction operations are never used by the select instruction, making the select instruction not an obstacle when detecting reduction operations (note that we only care how the definition of each reduction operation is used in the def-use chain).

Closed by commit rL261070: Detecte vector reduction operations just before instruction selection. (authored by conghou). · Explain WhyFeb 16 2016, 10:41 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

SelectionDAGNodes.h

4 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

126 lines

test/

CodeGen/

Generic/

vector-redux.ll

85 lines

Diff 48152

llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	private:
bool NoUnsignedWrap : 1;		bool NoUnsignedWrap : 1;
bool NoSignedWrap : 1;		bool NoSignedWrap : 1;
bool Exact : 1;		bool Exact : 1;
bool UnsafeAlgebra : 1;		bool UnsafeAlgebra : 1;
bool NoNaNs : 1;		bool NoNaNs : 1;
bool NoInfs : 1;		bool NoInfs : 1;
bool NoSignedZeros : 1;		bool NoSignedZeros : 1;
bool AllowReciprocal : 1;		bool AllowReciprocal : 1;
		bool VectorReduction : 1;

public:		public:
/// Default constructor turns off all optimization flags.		/// Default constructor turns off all optimization flags.
SDNodeFlags() {		SDNodeFlags() {
NoUnsignedWrap = false;		NoUnsignedWrap = false;
NoSignedWrap = false;		NoSignedWrap = false;
Exact = false;		Exact = false;
UnsafeAlgebra = false;		UnsafeAlgebra = false;
NoNaNs = false;		NoNaNs = false;
NoInfs = false;		NoInfs = false;
NoSignedZeros = false;		NoSignedZeros = false;
AllowReciprocal = false;		AllowReciprocal = false;
		VectorReduction = false;
}		}

// These are mutators for each flag.		// These are mutators for each flag.
void setNoUnsignedWrap(bool b) { NoUnsignedWrap = b; }		void setNoUnsignedWrap(bool b) { NoUnsignedWrap = b; }
void setNoSignedWrap(bool b) { NoSignedWrap = b; }		void setNoSignedWrap(bool b) { NoSignedWrap = b; }
void setExact(bool b) { Exact = b; }		void setExact(bool b) { Exact = b; }
void setUnsafeAlgebra(bool b) { UnsafeAlgebra = b; }		void setUnsafeAlgebra(bool b) { UnsafeAlgebra = b; }
void setNoNaNs(bool b) { NoNaNs = b; }		void setNoNaNs(bool b) { NoNaNs = b; }
void setNoInfs(bool b) { NoInfs = b; }		void setNoInfs(bool b) { NoInfs = b; }
void setNoSignedZeros(bool b) { NoSignedZeros = b; }		void setNoSignedZeros(bool b) { NoSignedZeros = b; }
void setAllowReciprocal(bool b) { AllowReciprocal = b; }		void setAllowReciprocal(bool b) { AllowReciprocal = b; }
		void setVectorReduction(bool b) { VectorReduction = b; }

// These are accessors for each flag.		// These are accessors for each flag.
bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }		bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }
bool hasNoSignedWrap() const { return NoSignedWrap; }		bool hasNoSignedWrap() const { return NoSignedWrap; }
bool hasExact() const { return Exact; }		bool hasExact() const { return Exact; }
bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }		bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }
bool hasNoNaNs() const { return NoNaNs; }		bool hasNoNaNs() const { return NoNaNs; }
bool hasNoInfs() const { return NoInfs; }		bool hasNoInfs() const { return NoInfs; }
bool hasNoSignedZeros() const { return NoSignedZeros; }		bool hasNoSignedZeros() const { return NoSignedZeros; }
bool hasAllowReciprocal() const { return AllowReciprocal; }		bool hasAllowReciprocal() const { return AllowReciprocal; }
		bool hasVectorReduction() const { return VectorReduction; }

/// Return a raw encoding of the flags.		/// Return a raw encoding of the flags.
/// This function should only be used to add data to the NodeID value.		/// This function should only be used to add data to the NodeID value.
unsigned getRawFlags() const {		unsigned getRawFlags() const {
return (NoUnsignedWrap << 0) \| (NoSignedWrap << 1) \| (Exact << 2) \|		return (NoUnsignedWrap << 0) \| (NoSignedWrap << 1) \| (Exact << 2) \|
(UnsafeAlgebra << 3) \| (NoNaNs << 4) \| (NoInfs << 5) \|		(UnsafeAlgebra << 3) \| (NoNaNs << 4) \| (NoInfs << 5) \|
(NoSignedZeros << 6) \| (AllowReciprocal << 7);		(NoSignedZeros << 6) \| (AllowReciprocal << 7);
}		}
▲ Show 20 Lines • Show All 1,967 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,302 Lines • ▼ Show 20 Lines	if (isa<Constant>(I.getOperand(0)) &&
setValue(&I, DAG.getNode(ISD::FNEG, getCurSDLoc(),		setValue(&I, DAG.getNode(ISD::FNEG, getCurSDLoc(),
Op2.getValueType(), Op2));		Op2.getValueType(), Op2));
return;		return;
}		}

visitBinary(I, ISD::FSUB);		visitBinary(I, ISD::FSUB);
}		}

		/// Checks if the given instruction performs a vector reduction, in which case
		/// we have the freedom to alter the elements in the result as long as the
		/// reduction of them stays unchanged.
		static bool isVectorReductionOp(const User *I) {
		const Instruction *Inst = dyn_cast<Instruction>(I);
		if (!Inst \|\| !Inst->getType()->isVectorTy())
		return false;

		auto OpCode = Inst->getOpcode();
		switch (OpCode) {
		case Instruction::Add:
		case Instruction::Mul:
		case Instruction::And:
		case Instruction::Or:
		case Instruction::Xor:
		break;
		case Instruction::FAdd:
		case Instruction::FMul:
		if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(Inst))
		if (FPOp->getFastMathFlags().unsafeAlgebra())
		break;
		// Fall through.
		default:
		return false;
		}

		unsigned ElemNum = Inst->getType()->getVectorNumElements();
		unsigned ElemNumToReduce = ElemNum;

		// Do DFS search on the def-use chain from the given instruction. We only
		// allow four kinds of operations during the search until we reach the
		// instruction that extracts the first element from the vector:
		//
		// 1. The reduction operation of the same opcode as the given instruction.
		//
		// 2. PHI node.
		//
		// 3. ShuffleVector instruction together with a reduction operation that
		// does a partial reduction.
		//
		// 4. ExtractElement that extracts the first element from the vector, and we
		// stop searching the def-use chain here.
		//
		// 3 & 4 above perform a reduction on all elements of the vector. We push defs
		// from 1-3 to the stack to continue the DFS. The given instruction is not
		// a reduction operation if we meet any other instructions other than those
		// listed above.

		SmallVector<const User *, 16> UsersToVisit{Inst};
		SmallPtrSet<const User *, 16> Visited;
		bool ReduxExtracted = false;

		while (!UsersToVisit.empty()) {
		auto User = UsersToVisit.back();
		UsersToVisit.pop_back();
		if (!Visited.insert(User).second)
		continue;

		for (const auto &U : User->users()) {
		auto Inst = dyn_cast<Instruction>(U);
		if (!Inst)
		return false;

		if (Inst->getOpcode() == OpCode \|\| isa<PHINode>(U)) {
		if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(Inst))
		if (!isa<PHINode>(FPOp) && !FPOp->getFastMathFlags().unsafeAlgebra())
		return false;
		UsersToVisit.push_back(U);
		} else if (const ShuffleVectorInst *ShufInst =
		dyn_cast<ShuffleVectorInst>(U)) {
		// Detect the following pattern: A ShuffleVector instruction together
		// with a reduction that do partial reduction on the first and second
		// ElemNumToReduce / 2 elements, and store the result in
		// ElemNumToReduce / 2 elements in another vector.

		if (ElemNumToReduce == 1)
		return false;
		if (!isa<UndefValue>(U->getOperand(1)))
		return false;
		for (unsigned i = 0; i < ElemNumToReduce / 2; ++i)
		if (ShufInst->getMaskValue(i) != int(i + ElemNumToReduce / 2))
		return false;
		for (unsigned i = ElemNumToReduce / 2; i < ElemNum; ++i)
		if (ShufInst->getMaskValue(i) != -1)
		return false;

		// There is only one user of this ShuffleVector instruction, which must
		// be a reduction operation.
		if (!U->hasOneUse())
		return false;

		auto U2 = dyn_cast<Instruction>(*U->user_begin());
		if (!U2 \|\| U2->getOpcode() != OpCode)
		return false;

		// Check operands of the reduction operation.
		if ((U2->getOperand(0) == U->getOperand(0) && U2->getOperand(1) == U) \|\|
		(U2->getOperand(1) == U->getOperand(0) && U2->getOperand(0) == U)) {
		UsersToVisit.push_back(U2);
		ElemNumToReduce /= 2;
		} else
		return false;
		} else if (isa<ExtractElementInst>(U)) {
		// At this moment we should have reduced all elements in the vector.
		if (ElemNumToReduce != 1)
		return false;

		const ConstantInt *Val = dyn_cast<ConstantInt>(U->getOperand(1));
		if (!Val \|\| Val->getZExtValue() != 0)
		return false;

		ReduxExtracted = true;
		} else
		return false;
		}
		}
		return ReduxExtracted;
		}

void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {		void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

bool nuw = false;		bool nuw = false;
bool nsw = false;		bool nsw = false;
bool exact = false;		bool exact = false;
		bool vec_redux = false;
FastMathFlags FMF;		FastMathFlags FMF;

if (const OverflowingBinaryOperator *OFBinOp =		if (const OverflowingBinaryOperator *OFBinOp =
dyn_cast<const OverflowingBinaryOperator>(&I)) {		dyn_cast<const OverflowingBinaryOperator>(&I)) {
nuw = OFBinOp->hasNoUnsignedWrap();		nuw = OFBinOp->hasNoUnsignedWrap();
nsw = OFBinOp->hasNoSignedWrap();		nsw = OFBinOp->hasNoSignedWrap();
}		}
if (const PossiblyExactOperator *ExactOp =		if (const PossiblyExactOperator *ExactOp =
dyn_cast<const PossiblyExactOperator>(&I))		dyn_cast<const PossiblyExactOperator>(&I))
exact = ExactOp->isExact();		exact = ExactOp->isExact();
if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(&I))		if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(&I))
FMF = FPOp->getFastMathFlags();		FMF = FPOp->getFastMathFlags();

		if (isVectorReductionOp(&I)) {
		vec_redux = true;
		DEBUG(dbgs() << "Detected a reduction operation:" << I << "\n");
		}

SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setExact(exact);		Flags.setExact(exact);
Flags.setNoSignedWrap(nsw);		Flags.setNoSignedWrap(nsw);
Flags.setNoUnsignedWrap(nuw);		Flags.setNoUnsignedWrap(nuw);
		Flags.setVectorReduction(vec_redux);
if (EnableFMFInDAG) {		if (EnableFMFInDAG) {
Flags.setAllowReciprocal(FMF.allowReciprocal());		Flags.setAllowReciprocal(FMF.allowReciprocal());
Flags.setNoInfs(FMF.noInfs());		Flags.setNoInfs(FMF.noInfs());
Flags.setNoNaNs(FMF.noNaNs());		Flags.setNoNaNs(FMF.noNaNs());
Flags.setNoSignedZeros(FMF.noSignedZeros());		Flags.setNoSignedZeros(FMF.noSignedZeros());
Flags.setUnsafeAlgebra(FMF.unsafeAlgebra());		Flags.setUnsafeAlgebra(FMF.unsafeAlgebra());
}		}
SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),		SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),
▲ Show 20 Lines • Show All 6,311 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Generic/vector-redux.ll

				; RUN: llc < %s -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				@a = global [1024 x i32] zeroinitializer, align 16

				define float @reduce_add_float(float* nocapture readonly %a) {
				; CHECK-LABEL: reduce_add_float
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				;
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ 0, %entry ], [ %index.next.4, %vector.body ]
				%vec.phi = phi <4 x float> [ zeroinitializer, %entry ], [ %28, %vector.body ]
				%vec.phi9 = phi <4 x float> [ zeroinitializer, %entry ], [ %29, %vector.body ]
				%0 = getelementptr inbounds float, float* %a, i64 %index
				%1 = bitcast float* %0 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %1, align 4
				%2 = getelementptr float, float* %0, i64 4
				%3 = bitcast float* %2 to <4 x float>*
				%wide.load10 = load <4 x float>, <4 x float>* %3, align 4
				%4 = fadd fast <4 x float> %wide.load, %vec.phi
				%5 = fadd fast <4 x float> %wide.load10, %vec.phi9
				%index.next = add nuw nsw i64 %index, 8
				%6 = getelementptr inbounds float, float* %a, i64 %index.next
				%7 = bitcast float* %6 to <4 x float>*
				%wide.load.1 = load <4 x float>, <4 x float>* %7, align 4
				%8 = getelementptr float, float* %6, i64 4
				%9 = bitcast float* %8 to <4 x float>*
				%wide.load10.1 = load <4 x float>, <4 x float>* %9, align 4
				%10 = fadd fast <4 x float> %wide.load.1, %4
				%11 = fadd fast <4 x float> %wide.load10.1, %5
				%index.next.1 = add nsw i64 %index, 16
				%12 = getelementptr inbounds float, float* %a, i64 %index.next.1
				%13 = bitcast float* %12 to <4 x float>*
				%wide.load.2 = load <4 x float>, <4 x float>* %13, align 4
				%14 = getelementptr float, float* %12, i64 4
				%15 = bitcast float* %14 to <4 x float>*
				%wide.load10.2 = load <4 x float>, <4 x float>* %15, align 4
				%16 = fadd fast <4 x float> %wide.load.2, %10
				%17 = fadd fast <4 x float> %wide.load10.2, %11
				%index.next.2 = add nsw i64 %index, 24
				%18 = getelementptr inbounds float, float* %a, i64 %index.next.2
				%19 = bitcast float* %18 to <4 x float>*
				%wide.load.3 = load <4 x float>, <4 x float>* %19, align 4
				%20 = getelementptr float, float* %18, i64 4
				%21 = bitcast float* %20 to <4 x float>*
				%wide.load10.3 = load <4 x float>, <4 x float>* %21, align 4
				%22 = fadd fast <4 x float> %wide.load.3, %16
				%23 = fadd fast <4 x float> %wide.load10.3, %17
				%index.next.3 = add nsw i64 %index, 32
				%24 = getelementptr inbounds float, float* %a, i64 %index.next.3
				%25 = bitcast float* %24 to <4 x float>*
				%wide.load.4 = load <4 x float>, <4 x float>* %25, align 4
				%26 = getelementptr float, float* %24, i64 4
				%27 = bitcast float* %26 to <4 x float>*
				%wide.load10.4 = load <4 x float>, <4 x float>* %27, align 4
				%28 = fadd fast <4 x float> %wide.load.4, %22
				%29 = fadd fast <4 x float> %wide.load10.4, %23
				%index.next.4 = add nsw i64 %index, 40
				%30 = icmp eq i64 %index.next.4, 1000
				br i1 %30, label %middle.block, label %vector.body

				middle.block:
				%.lcssa15 = phi <4 x float> [ %29, %vector.body ]
				%.lcssa = phi <4 x float> [ %28, %vector.body ]
				%bin.rdx = fadd fast <4 x float> %.lcssa15, %.lcssa
				%rdx.shuf = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
				%bin.rdx11 = fadd fast <4 x float> %bin.rdx, %rdx.shuf
				%rdx.shuf12 = shufflevector <4 x float> %bin.rdx11, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				%bin.rdx13 = fadd fast <4 x float> %bin.rdx11, %rdx.shuf12
				%31 = extractelement <4 x float> %bin.rdx13, i32 0
				ret float %31
				}

This is an archive of the discontinued LLVM Phabricator instance.

Detecte vector reduction operations just before instruction selection.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 48152

llvm/trunk/include/llvm/CodeGen/SelectionDAGNodes.h

llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/trunk/test/CodeGen/Generic/vector-redux.ll

Detecte vector reduction operations just before instruction selection.
ClosedPublic