This is an archive of the discontinued LLVM Phabricator instance.

Detecte vector reduction operations just before instruction selection.
ClosedPublic

Authored by congh on Dec 4 2015, 3:43 PM.

Download Raw Diff

Details

Reviewers

spatel
davidxl
hfinkel

Commits

rG4ce0280a416c: Detecte vector reduction operations just before instruction selection.
rGbbd4e3b4003f: Detecte vector reduction operations just before instruction selection.
rL261804: Detecte vector reduction operations just before instruction selection.
rL261070: Detecte vector reduction operations just before instruction selection.

Summary

This patch detects vector reductions before instruction selection. Vector reductions are vectorized reduction operations, and for such operations we have freedom to reorganize the elements of the result as long as the reduction of them stay unchanged. This will enable some reduction pattern recognition during instruction combine such as SAD/dot-product on X86. A flag is added to SDNodeFlags to mark those vector reduction nodes to be checked during instruction combine.

To detect those vector reductions, we search def-use chains starting from the given instruction, and check if all uses fall into two categories:

Reduction with another vector.
Reduction on all elements.

in which 2 is detected by recognizing the pattern that the loop vectorizer generates to reduce all elements in the vector outside of the loop, which includes several ShuffleVector and one ExtractElement instructions.

Please checkout http://lists.llvm.org/pipermail/llvm-dev/2015-November/092379.html for discussions on this topic.

Diff Detail

Event Timeline

Fix a small comment format.

hfinkel added inline comments.Dec 10 2015, 11:35 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2318	We need to also grab FAdd and FMul when appropriate fast-math flags are present.
2362	For PHIs, don't we need to check that the initial value is the operation identify value (0 for add, 1 for multiply, etc.)? And that there are only two unique incoming blocks?
2395	We should also catch the way that this is often programmed "by hand": typedef float v4f __attribute__((vector_size(16))); v4f foo(void); float bar() { v4f phi = { 0, 0, 0, 0 }; for (int i = 0; i < 1600; ++i) phi += foo(); return phi[0] + phi[1] + phi[2] + phi[3]; } thus, we have multiple extracts instead of shuffles. One might argue that we should canonicalize this to the shuffle form, but we don't have anything that does this currently (AFAIK). What do you think?
2448	You need to set this flag on the PHI node too (meaning the associate CopyFromReg, etc. instructions), right?
test/CodeGen/Generic/vector-redux.ll
3	You need to add: ; REQUIRES: asserts here because you're checking debug output which is available only in +Asserts builds.

congh marked an inline comment as done.Dec 10 2015, 2:53 PM

congh added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2362	I think this may not be necessary. We only care about how the values in the vector are used or how they flow to other defs. They can flow to phi nodes, but we need to check all uses of those phi nodes. Otherwise before reduction on elements, they can only flow to other defs of the same associative operation (e.g. add). As long as the reduction on elements is the only def where they are flowing to, it is safe to assume the operation is a vector-reduction one.
2395	OK, as long as this pattern often appears. After all, programs made by hand can have many different forms. For example, we can use target specific intrinsics (like movehl or hadd on X86) in the final reduction. As we don't have canonicalization now, I agree that we should make this function open to support more patterns.
2448	Yes, we can do this, but I am wondering how we use the flag on PHI node during instruction combine (in my case I only check flags on operation nodes, but I think there may be other cases in which reduction PHI node can help?).
test/CodeGen/Generic/vector-redux.ll
3	OK. Thanks for pointing it out!

hfinkel added inline comments.Dec 10 2015, 3:21 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2362	Sounds good.
2395	Thinking about it, I think it is better to canonicalize these in instcombine. The shuffle form will use fewer instructions in all cases (except the two-element case). Don't bother matching it here, we should add canonicalization to instcombine in a separate patch. FWIW, however, yes, I've seen code like this from several users at our facility.
2448	Okay; don't bother then. I was thinking you needed it for isel, but if not, wait until we have a concrete use case.

Update the patch according to Hal's comments.

FAdd and FMul are now supported when fast-math is used.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2395	I agree. Then I will not modify this part now.

spatel added inline comments.Dec 11 2015, 9:13 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2395	For reference, I filed this as PR25808 so I could link it to some other reduction bugs: https://llvm.org/bugs/show_bug.cgi?id=25808

congh added inline comments.Dec 11 2015, 10:45 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2395	Thanks for filing the bug!

suyog added a subscriber: suyog.Dec 14 2015, 12:15 AM

suyog added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2395	This may also be related to https://llvm.org/bugs/show_bug.cgi?id=20035

Ping?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2395	Yes, this is a special case of reduction-on-elements operations.

Ping again?

Ping?

One minor think I noticed, but overall I don't know this code well enough to review, sorry.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2323	return false here or add a 'fall through' comment here to make it clear that its intentional

Added a fall-through comment as suggested by Simon.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2323	OK, thanks!

In D15250#319690, @RKSimon wrote:

One minor think I noticed, but overall I don't know this code well enough to review, sorry.

Thanks for the review, Simon! I will let Hal or others to approve this patch.

hfinkel added inline comments.Jan 15 2016, 6:16 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2304	'reorganize' sounds a bit weak here. How about reorganize -> alter
2305	stay -> stays
2336	'same arithmetic' sounds odd. Maybe just say 'same opcode'.
2341	do -> does a
2347	reduction -> a reduction
2349	insturctions (typo)
2349	Remove 'supposed to be'
2367	We also need to check the fast-math flags on fadd/fmul here.
2367	We can allow selects here for the same reason we can allow phis, right? If so, we should.
2375	Do we need to check that we don't have ElemNumToReduce == 1 here?
2398	When ElemNumToReduce == 1 here, do we need to check that the only user is an ExtractElementInst? I'm concerned that it could be a another reduction operation before the extract.

hfinkel added inline comments.Jan 15 2016, 6:21 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2367	(Specifically, I mean selects with a scalar condition; I assume that selects with a vector condition would not work here).

RKSimon resigned from this revision.Jan 16 2016, 2:08 PM

RKSimon removed a reviewer: RKSimon.

congh marked 8 inline comments as done.Jan 19 2016, 12:56 PM

congh added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2367	When I wrote this pattern recognition, I am more concerned about the compiler vectorized code, which normally won't have selects. We detect phi because it can appear in the beginning of the loop, which is not the case of selects. Do you think if we should make this pattern recognition complicated enough to catch all cases that are manually composed?
2375	I am not sure if it could happen, but I added such a check just in case.
2398	I think even there is another reduction operation just before extract, it is still OK. We only care about if all values in vector are reduced to one element and this is still the case.

Update the patch according to Hal's comments.

Ping?

Please make sure the select case if handled (autovectorization test case provided below); otherwise, LGTM. Thanks!

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

2367

I understand, but this comes up in autovectorized code as well. Here's a quick example:

$ cat /tmp/v.c 
int foo(int * restrict a1, int * restrict a2, int * restrict a3, int * restrict a4, int * restrict a5,
        int * restrict a6, int * restrict a7, int * restrict a8, int * restrict a9, int * restrict a10,
        int * restrict a11, int * restrict a12, int * restrict a13, int * restrict a14, int * restrict a15,
        int * restrict a16, int * restrict a17, int * restrict a18, int * restrict a19, int * restrict a20,
        int * restrict a21, int * restrict a22, int * restrict a23, int * restrict a24, int * restrict a25,
        int * restrict a26, int * restrict a27, int * restrict a28, int * restrict a29, int * restrict a30,
        int * restrict b, int * restrict c, int x) {
  int r = 0;
  for (int i = 0; i < 1600; ++i)
    // Lots of other stuff to prevent loop unswitching from kicking in.
    r += a1[i] + a2[i] + a3[i] + a4[i] + a5[i] +
         a6[i] + a7[i] + a8[i] + a9[i] + a10[i] +
         a11[i] + a12[i] + a13[i] + a14[i] + a15[i] +
         a16[i] + a17[i] + a18[i] + a19[i] + a20[i] +
         a21[i] + a22[i] + a23[i] + a24[i] + a25[i] +
         a26[i] + a27[i] + a28[i] + a29[i] + a30[i] +
         b[i] + c[i] + (x > 5 ? b[i] : c[i]);

  return r;
}

Look at the IR from:

$ clang -target powerpc64 -mcpu=pwr7 -O3 -S -emit-llvm -fno-unroll-loops -o - /tmp/v.c

and you'll see:

  %64 = select i1 %cmp93, <4 x i32> %wide.load170, <4 x i32> %wide.load171
...
  %93 = add <4 x i32> %92, %wide.load168
%94 = add <4 x i32> %93, %wide.load169
%95 = add <4 x i32> %94, %wide.load170
%96 = add <4 x i32> %95, %wide.load171
%97 = add <4 x i32> %96, %64
...

And we really should handle this case.

This revision is now accepted and ready to land.Jan 26 2016, 3:18 PM

In D15250#336665, @hfinkel wrote:

Please make sure the select case if handled (autovectorization test case provided below); otherwise, LGTM. Thanks!

Sorry for replying your comments so late as I just finished a long vacation. In your proposed case, the definitions of reduction operations are never used by the select instruction, making the select instruction not an obstacle when detecting reduction operations (note that we only care how the definition of each reduction operation is used in the def-use chain).

Closed by commit rL261070: Detecte vector reduction operations just before instruction selection. (authored by conghou). · Explain WhyFeb 16 2016, 10:41 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

SelectionDAGNodes.h

4 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

122 lines

test/

CodeGen/

Generic/

vector-redux.ll

237 lines

Diff 44149

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	private:
bool NoUnsignedWrap : 1;		bool NoUnsignedWrap : 1;
bool NoSignedWrap : 1;		bool NoSignedWrap : 1;
bool Exact : 1;		bool Exact : 1;
bool UnsafeAlgebra : 1;		bool UnsafeAlgebra : 1;
bool NoNaNs : 1;		bool NoNaNs : 1;
bool NoInfs : 1;		bool NoInfs : 1;
bool NoSignedZeros : 1;		bool NoSignedZeros : 1;
bool AllowReciprocal : 1;		bool AllowReciprocal : 1;
		bool VectorReduction : 1;

public:		public:
/// Default constructor turns off all optimization flags.		/// Default constructor turns off all optimization flags.
SDNodeFlags() {		SDNodeFlags() {
NoUnsignedWrap = false;		NoUnsignedWrap = false;
NoSignedWrap = false;		NoSignedWrap = false;
Exact = false;		Exact = false;
UnsafeAlgebra = false;		UnsafeAlgebra = false;
NoNaNs = false;		NoNaNs = false;
NoInfs = false;		NoInfs = false;
NoSignedZeros = false;		NoSignedZeros = false;
AllowReciprocal = false;		AllowReciprocal = false;
		VectorReduction = false;
}		}

// These are mutators for each flag.		// These are mutators for each flag.
void setNoUnsignedWrap(bool b) { NoUnsignedWrap = b; }		void setNoUnsignedWrap(bool b) { NoUnsignedWrap = b; }
void setNoSignedWrap(bool b) { NoSignedWrap = b; }		void setNoSignedWrap(bool b) { NoSignedWrap = b; }
void setExact(bool b) { Exact = b; }		void setExact(bool b) { Exact = b; }
void setUnsafeAlgebra(bool b) { UnsafeAlgebra = b; }		void setUnsafeAlgebra(bool b) { UnsafeAlgebra = b; }
void setNoNaNs(bool b) { NoNaNs = b; }		void setNoNaNs(bool b) { NoNaNs = b; }
void setNoInfs(bool b) { NoInfs = b; }		void setNoInfs(bool b) { NoInfs = b; }
void setNoSignedZeros(bool b) { NoSignedZeros = b; }		void setNoSignedZeros(bool b) { NoSignedZeros = b; }
void setAllowReciprocal(bool b) { AllowReciprocal = b; }		void setAllowReciprocal(bool b) { AllowReciprocal = b; }
		void setVectorReduction(bool b) { VectorReduction = b; }

// These are accessors for each flag.		// These are accessors for each flag.
bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }		bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }
bool hasNoSignedWrap() const { return NoSignedWrap; }		bool hasNoSignedWrap() const { return NoSignedWrap; }
bool hasExact() const { return Exact; }		bool hasExact() const { return Exact; }
bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }		bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }
bool hasNoNaNs() const { return NoNaNs; }		bool hasNoNaNs() const { return NoNaNs; }
bool hasNoInfs() const { return NoInfs; }		bool hasNoInfs() const { return NoInfs; }
bool hasNoSignedZeros() const { return NoSignedZeros; }		bool hasNoSignedZeros() const { return NoSignedZeros; }
bool hasAllowReciprocal() const { return AllowReciprocal; }		bool hasAllowReciprocal() const { return AllowReciprocal; }
		bool hasVectorReduction() const { return VectorReduction; }

/// Return a raw encoding of the flags.		/// Return a raw encoding of the flags.
/// This function should only be used to add data to the NodeID value.		/// This function should only be used to add data to the NodeID value.
unsigned getRawFlags() const {		unsigned getRawFlags() const {
return (NoUnsignedWrap << 0) \| (NoSignedWrap << 1) \| (Exact << 2) \|		return (NoUnsignedWrap << 0) \| (NoSignedWrap << 1) \| (Exact << 2) \|
(UnsafeAlgebra << 3) \| (NoNaNs << 4) \| (NoInfs << 5) \|		(UnsafeAlgebra << 3) \| (NoNaNs << 4) \| (NoInfs << 5) \|
(NoSignedZeros << 6) \| (AllowReciprocal << 7);		(NoSignedZeros << 6) \| (AllowReciprocal << 7);
}		}
▲ Show 20 Lines • Show All 1,952 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,294 Lines • ▼ Show 20 Lines	if (isa<Constant>(I.getOperand(0)) &&
setValue(&I, DAG.getNode(ISD::FNEG, getCurSDLoc(),		setValue(&I, DAG.getNode(ISD::FNEG, getCurSDLoc(),
Op2.getValueType(), Op2));		Op2.getValueType(), Op2));
return;		return;
}		}

visitBinary(I, ISD::FSUB);		visitBinary(I, ISD::FSUB);
}		}

		/// Checks if the given instruction performs a vector reduction, in which case
		/// we have the freedom to reorganize the elements in the result as long as the
		hfinkelUnsubmitted Done Reply Inline Actions 'reorganize' sounds a bit weak here. How about reorganize -> alter hfinkel: 'reorganize' sounds a bit weak here. How about reorganize -> alter
		/// reduction of them stay unchanged.
		hfinkelUnsubmitted Done Reply Inline Actions stay -> stays hfinkel: stay -> stays
		static bool isVectorReductionOp(const User *I) {
		const Instruction *Inst = dyn_cast<Instruction>(I);
		if (!Inst \|\| !Inst->getType()->isVectorTy())
		return false;

		auto OpCode = Inst->getOpcode();
		switch (OpCode) {
		case Instruction::Add:
		case Instruction::Mul:
		case Instruction::And:
		case Instruction::Or:
		case Instruction::Xor:
		break;
		hfinkelUnsubmitted Done Reply Inline Actions We need to also grab FAdd and FMul when appropriate fast-math flags are present. hfinkel: We need to also grab FAdd and FMul when appropriate fast-math flags are present.
		case Instruction::FAdd:
		case Instruction::FMul:
		if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(Inst))
		if (FPOp->getFastMathFlags().unsafeAlgebra())
		break;
		RKSimonUnsubmitted Not Done Reply Inline Actions return false here or add a 'fall through' comment here to make it clear that its intentional RKSimon: return false here or add a 'fall through' comment here to make it clear that its intentional
		conghAuthorUnsubmitted Not Done Reply Inline Actions OK, thanks! congh: OK, thanks!
		// Fall through.
		default:
		return false;
		}

		unsigned ElemNum = Inst->getType()->getVectorNumElements();
		unsigned ElemNumToReduce = ElemNum;

		// Do DFS search on the def-use chain from the given instruction. We only
		// allow four kinds of operations during the search until we reach the
		// instruction that extracts the first element from the vector:
		//
		// 1. The reduction operation of the same arithmetic as the given
		hfinkelUnsubmitted Done Reply Inline Actions 'same arithmetic' sounds odd. Maybe just say 'same opcode'. hfinkel: 'same arithmetic' sounds odd. Maybe just say 'same opcode'.
		// instruction.
		//
		// 2. PHI node.
		//
		// 3. ShuffleVector instruction together with a reduction operation that do
		hfinkelUnsubmitted Done Reply Inline Actions do -> does a hfinkel: do -> does a
		// partial reduction.
		//
		// 4. ExtractElement that extracts the first element from the vector, and we
		// stop searching the def-use chain here.
		//
		// 3 & 4 above perform reduction on all elements of the vector. We push defs
		hfinkelUnsubmitted Done Reply Inline Actions reduction -> a reduction hfinkel: reduction -> a reduction
		// from 1-3 to the stack to continue the DFS. The given instruction is not
		// supposed to be a reduction operation if we meet any other insturctions
		hfinkelUnsubmitted Done Reply Inline Actions insturctions (typo) hfinkel: insturctions (typo)
		hfinkelUnsubmitted Done Reply Inline Actions Remove 'supposed to be' hfinkel: Remove 'supposed to be'
		// other than those listed above.

		SmallVector<const User *, 16> UsersToVisit{Inst};
		SmallPtrSet<const User *, 16> Visited;
		bool ReduxExtracted = false;

		while (!UsersToVisit.empty()) {
		auto User = UsersToVisit.back();
		UsersToVisit.pop_back();
		if (!Visited.insert(User).second)
		continue;

		for (const auto &U : User->users()) {
		hfinkelUnsubmitted Not Done Reply Inline Actions For PHIs, don't we need to check that the initial value is the operation identify value (0 for add, 1 for multiply, etc.)? And that there are only two unique incoming blocks? hfinkel: For PHIs, don't we need to check that the initial value is the operation identify value (0 for…
		conghAuthorUnsubmitted Not Done Reply Inline Actions I think this may not be necessary. We only care about how the values in the vector are used or how they flow to other defs. They can flow to phi nodes, but we need to check all uses of those phi nodes. Otherwise before reduction on elements, they can only flow to other defs of the same associative operation (e.g. add). As long as the reduction on elements is the only def where they are flowing to, it is safe to assume the operation is a vector-reduction one. congh: I think this may not be necessary. We only care about how the values in the vector are used or…
		hfinkelUnsubmitted Not Done Reply Inline Actions Sounds good. hfinkel: Sounds good.
		auto Inst = dyn_cast<Instruction>(U);
		if (!Inst)
		return false;

		if (Inst->getOpcode() == OpCode \|\| isa<PHINode>(U)) {
		hfinkelUnsubmitted Done Reply Inline Actions We also need to check the fast-math flags on fadd/fmul here. hfinkel: We also need to check the fast-math flags on fadd/fmul here.
		hfinkelUnsubmitted Not Done Reply Inline Actions We can allow selects here for the same reason we can allow phis, right? If so, we should. hfinkel: We can allow selects here for the same reason we can allow phis, right? If so, we should.
		hfinkelUnsubmitted Not Done Reply Inline Actions (Specifically, I mean selects with a scalar condition; I assume that selects with a vector condition would not work here). hfinkel: (Specifically, I mean selects with a scalar condition; I assume that selects with a vector…
		conghAuthorUnsubmitted Not Done Reply Inline Actions When I wrote this pattern recognition, I am more concerned about the compiler vectorized code, which normally won't have selects. We detect phi because it can appear in the beginning of the loop, which is not the case of selects. Do you think if we should make this pattern recognition complicated enough to catch all cases that are manually composed? congh: When I wrote this pattern recognition, I am more concerned about the compiler vectorized code…
		hfinkelUnsubmitted Not Done Reply Inline Actions I understand, but this comes up in autovectorized code as well. Here's a quick example: $ cat /tmp/v.c int foo(int * restrict a1, int * restrict a2, int * restrict a3, int * restrict a4, int * restrict a5, int * restrict a6, int * restrict a7, int * restrict a8, int * restrict a9, int * restrict a10, int * restrict a11, int * restrict a12, int * restrict a13, int * restrict a14, int * restrict a15, int * restrict a16, int * restrict a17, int * restrict a18, int * restrict a19, int * restrict a20, int * restrict a21, int * restrict a22, int * restrict a23, int * restrict a24, int * restrict a25, int * restrict a26, int * restrict a27, int * restrict a28, int * restrict a29, int * restrict a30, int * restrict b, int * restrict c, int x) { int r = 0; for (int i = 0; i < 1600; ++i) // Lots of other stuff to prevent loop unswitching from kicking in. r += a1[i] + a2[i] + a3[i] + a4[i] + a5[i] + a6[i] + a7[i] + a8[i] + a9[i] + a10[i] + a11[i] + a12[i] + a13[i] + a14[i] + a15[i] + a16[i] + a17[i] + a18[i] + a19[i] + a20[i] + a21[i] + a22[i] + a23[i] + a24[i] + a25[i] + a26[i] + a27[i] + a28[i] + a29[i] + a30[i] + b[i] + c[i] + (x > 5 ? b[i] : c[i]); return r; } Look at the IR from: $ clang -target powerpc64 -mcpu=pwr7 -O3 -S -emit-llvm -fno-unroll-loops -o - /tmp/v.c and you'll see: %64 = select i1 %cmp93, <4 x i32> %wide.load170, <4 x i32> %wide.load171 ... %93 = add <4 x i32> %92, %wide.load168 %94 = add <4 x i32> %93, %wide.load169 %95 = add <4 x i32> %94, %wide.load170 %96 = add <4 x i32> %95, %wide.load171 %97 = add <4 x i32> %96, %64 ... And we really should handle this case. hfinkel: I understand, but this comes up in autovectorized code as well. Here's a quick example: $…
		UsersToVisit.push_back(U);
		} else if (const ShuffleVectorInst *ShufInst =
		dyn_cast<ShuffleVectorInst>(U)) {
		// Detect the following pattern: A ShuffleVector instruction together
		// with a reduction that do partial reduction on the first and second
		// ElemNumToReduce / 2 elements, and store the result in
		// ElemNumToReduce / 2 elements in another vector.

		hfinkelUnsubmitted Not Done Reply Inline Actions Do we need to check that we don't have ElemNumToReduce == 1 here? hfinkel: Do we need to check that we don't have ElemNumToReduce == 1 here?
		conghAuthorUnsubmitted Not Done Reply Inline Actions I am not sure if it could happen, but I added such a check just in case. congh: I am not sure if it could happen, but I added such a check just in case.
		if (!isa<UndefValue>(U->getOperand(1)))
		return false;
		for (unsigned i = 0; i < ElemNumToReduce / 2; ++i)
		if (ShufInst->getMaskValue(i) != int(i + ElemNumToReduce / 2))
		return false;
		for (unsigned i = ElemNumToReduce / 2; i < ElemNum; ++i)
		if (ShufInst->getMaskValue(i) != -1)
		return false;

		// There is only one user of this ShuffleVector instruction, which must
		// be a reduction operation.
		if (!U->hasOneUse())
		return false;

		auto U2 = dyn_cast<Instruction>(*U->user_begin());
		if (!U2 \|\| U2->getOpcode() != OpCode)
		return false;

		// Check operands of the reduction operation.
		if ((U2->getOperand(0) == U->getOperand(0) && U2->getOperand(1) == U) \|\|
		hfinkelUnsubmitted Not Done Reply Inline Actions We should also catch the way that this is often programmed "by hand": typedef float v4f __attribute__((vector_size(16))); v4f foo(void); float bar() { v4f phi = { 0, 0, 0, 0 }; for (int i = 0; i < 1600; ++i) phi += foo(); return phi[0] + phi[1] + phi[2] + phi[3]; } thus, we have multiple extracts instead of shuffles. One might argue that we should canonicalize this to the shuffle form, but we don't have anything that does this currently (AFAIK). What do you think? hfinkel: We should also catch the way that this is often programmed "by hand": typedef float v4f…
		conghAuthorUnsubmitted Not Done Reply Inline Actions OK, as long as this pattern often appears. After all, programs made by hand can have many different forms. For example, we can use target specific intrinsics (like movehl or hadd on X86) in the final reduction. As we don't have canonicalization now, I agree that we should make this function open to support more patterns. congh: OK, as long as this pattern often appears. After all, programs made by hand can have many…
		hfinkelUnsubmitted Not Done Reply Inline Actions Thinking about it, I think it is better to canonicalize these in instcombine. The shuffle form will use fewer instructions in all cases (except the two-element case). Don't bother matching it here, we should add canonicalization to instcombine in a separate patch. FWIW, however, yes, I've seen code like this from several users at our facility. hfinkel: Thinking about it, I think it is better to canonicalize these in instcombine. The shuffle form…
		conghAuthorUnsubmitted Not Done Reply Inline Actions I agree. Then I will not modify this part now. congh: I agree. Then I will not modify this part now.
		spatelUnsubmitted Not Done Reply Inline Actions For reference, I filed this as PR25808 so I could link it to some other reduction bugs: https://llvm.org/bugs/show_bug.cgi?id=25808 spatel: For reference, I filed this as PR25808 so I could link it to some other reduction bugs: https…
		conghAuthorUnsubmitted Not Done Reply Inline Actions Thanks for filing the bug! congh: Thanks for filing the bug!
		suyogUnsubmitted Not Done Reply Inline Actions This may also be related to https://llvm.org/bugs/show_bug.cgi?id=20035 suyog: This may also be related to https://llvm.org/bugs/show_bug.cgi?id=20035
		conghAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is a special case of reduction-on-elements operations. congh: Yes, this is a special case of reduction-on-elements operations.
		(U2->getOperand(1) == U->getOperand(0) && U2->getOperand(0) == U)) {
		UsersToVisit.push_back(U2);
		ElemNumToReduce /= 2;
		hfinkelUnsubmitted Not Done Reply Inline Actions When ElemNumToReduce == 1 here, do we need to check that the only user is an ExtractElementInst? I'm concerned that it could be a another reduction operation before the extract. hfinkel: When ElemNumToReduce == 1 here, do we need to check that the only user is an…
		conghAuthorUnsubmitted Not Done Reply Inline Actions I think even there is another reduction operation just before extract, it is still OK. We only care about if all values in vector are reduced to one element and this is still the case. congh: I think even there is another reduction operation just before extract, it is still OK. We only…
		} else
		return false;
		} else if (isa<ExtractElementInst>(U)) {
		// At this moment we should have reduced all elements in the vector.
		if (ElemNumToReduce != 1)
		return false;

		const ConstantInt *Val = dyn_cast<ConstantInt>(U->getOperand(1));
		if (!Val \|\| Val->getZExtValue() != 0)
		return false;

		ReduxExtracted = true;
		} else
		return false;
		}
		}
		return ReduxExtracted;
		}

void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {		void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

bool nuw = false;		bool nuw = false;
bool nsw = false;		bool nsw = false;
bool exact = false;		bool exact = false;
		bool vec_redux = false;
FastMathFlags FMF;		FastMathFlags FMF;

if (const OverflowingBinaryOperator *OFBinOp =		if (const OverflowingBinaryOperator *OFBinOp =
dyn_cast<const OverflowingBinaryOperator>(&I)) {		dyn_cast<const OverflowingBinaryOperator>(&I)) {
nuw = OFBinOp->hasNoUnsignedWrap();		nuw = OFBinOp->hasNoUnsignedWrap();
nsw = OFBinOp->hasNoSignedWrap();		nsw = OFBinOp->hasNoSignedWrap();
}		}
if (const PossiblyExactOperator *ExactOp =		if (const PossiblyExactOperator *ExactOp =
dyn_cast<const PossiblyExactOperator>(&I))		dyn_cast<const PossiblyExactOperator>(&I))
exact = ExactOp->isExact();		exact = ExactOp->isExact();
if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(&I))		if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(&I))
FMF = FPOp->getFastMathFlags();		FMF = FPOp->getFastMathFlags();

		if (isVectorReductionOp(&I)) {
		vec_redux = true;
		DEBUG(dbgs() << "Detected a reduction operation:" << I << "\n");
		}

SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setExact(exact);		Flags.setExact(exact);
Flags.setNoSignedWrap(nsw);		Flags.setNoSignedWrap(nsw);
Flags.setNoUnsignedWrap(nuw);		Flags.setNoUnsignedWrap(nuw);
		Flags.setVectorReduction(vec_redux);
		hfinkelUnsubmitted Not Done Reply Inline Actions You need to set this flag on the PHI node too (meaning the associate CopyFromReg, etc. instructions), right? hfinkel: You need to set this flag on the PHI node too (meaning the associate CopyFromReg, etc.
		conghAuthorUnsubmitted Not Done Reply Inline Actions Yes, we can do this, but I am wondering how we use the flag on PHI node during instruction combine (in my case I only check flags on operation nodes, but I think there may be other cases in which reduction PHI node can help?). congh: Yes, we can do this, but I am wondering how we use the flag on PHI node during instruction…
		hfinkelUnsubmitted Not Done Reply Inline Actions Okay; don't bother then. I was thinking you needed it for isel, but if not, wait until we have a concrete use case. hfinkel: Okay; don't bother then. I was thinking you needed it for isel, but if not, wait until we have…
if (EnableFMFInDAG) {		if (EnableFMFInDAG) {
Flags.setAllowReciprocal(FMF.allowReciprocal());		Flags.setAllowReciprocal(FMF.allowReciprocal());
Flags.setNoInfs(FMF.noInfs());		Flags.setNoInfs(FMF.noInfs());
Flags.setNoNaNs(FMF.noNaNs());		Flags.setNoNaNs(FMF.noNaNs());
Flags.setNoSignedZeros(FMF.noSignedZeros());		Flags.setNoSignedZeros(FMF.noSignedZeros());
Flags.setUnsafeAlgebra(FMF.unsafeAlgebra());		Flags.setUnsafeAlgebra(FMF.unsafeAlgebra());
}		}
SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),		SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),
▲ Show 20 Lines • Show All 6,236 Lines • Show Last 20 Lines

test/CodeGen/Generic/vector-redux.ll

This file was added.

				; RUN: llc < %s -debug-only=isel -o /dev/null 2>&1 \| FileCheck %s
				; REQUIRES: asserts

				hfinkelUnsubmitted Not Done Reply Inline Actions You need to add: ; REQUIRES: asserts here because you're checking debug output which is available only in +Asserts builds. hfinkel: You need to add: ; REQUIRES: asserts here because you're checking debug output which is…
				conghAuthorUnsubmitted Not Done Reply Inline Actions OK. Thanks for pointing it out! congh: OK. Thanks for pointing it out!
				@a = global [1024 x i32] zeroinitializer, align 16

				define i32 @reduce_add() {
				; CHECK-LABEL: reduce_add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add
				; CHECK: Detected a reduction operation: {{.*}} add

				min.iters.checked:
				br label %vector.body

				vector.body:
				%index = phi i64 [ 0, %min.iters.checked ], [ %index.next.4, %vector.body ]
				%vec.phi = phi <4 x i32> [ zeroinitializer, %min.iters.checked ], [ %28, %vector.body ]
				%vec.phi4 = phi <4 x i32> [ zeroinitializer, %min.iters.checked ], [ %29, %vector.body ]
				%0 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index
				%1 = bitcast i32* %0 to <4 x i32>*
				%wide.load = load <4 x i32>, <4 x i32>* %1, align 16
				%2 = getelementptr i32, i32* %0, i64 4
				%3 = bitcast i32* %2 to <4 x i32>*
				%wide.load5 = load <4 x i32>, <4 x i32>* %3, align 16
				%4 = add nsw <4 x i32> %wide.load, %vec.phi
				%5 = add nsw <4 x i32> %wide.load5, %vec.phi4
				%index.next = add nuw nsw i64 %index, 8
				%6 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next
				%7 = bitcast i32* %6 to <4 x i32>*
				%wide.load.1 = load <4 x i32>, <4 x i32>* %7, align 16
				%8 = getelementptr i32, i32* %6, i64 4
				%9 = bitcast i32* %8 to <4 x i32>*
				%wide.load5.1 = load <4 x i32>, <4 x i32>* %9, align 16
				%10 = add nsw <4 x i32> %wide.load.1, %4
				%11 = add nsw <4 x i32> %wide.load5.1, %5
				%index.next.1 = add nsw i64 %index, 16
				%12 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next.1
				%13 = bitcast i32* %12 to <4 x i32>*
				%wide.load.2 = load <4 x i32>, <4 x i32>* %13, align 16
				%14 = getelementptr i32, i32* %12, i64 4
				%15 = bitcast i32* %14 to <4 x i32>*
				%wide.load5.2 = load <4 x i32>, <4 x i32>* %15, align 16
				%16 = add nsw <4 x i32> %wide.load.2, %10
				%17 = add nsw <4 x i32> %wide.load5.2, %11
				%index.next.2 = add nsw i64 %index, 24
				%18 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next.2
				%19 = bitcast i32* %18 to <4 x i32>*
				%wide.load.3 = load <4 x i32>, <4 x i32>* %19, align 16
				%20 = getelementptr i32, i32* %18, i64 4
				%21 = bitcast i32* %20 to <4 x i32>*
				%wide.load5.3 = load <4 x i32>, <4 x i32>* %21, align 16
				%22 = add nsw <4 x i32> %wide.load.3, %16
				%23 = add nsw <4 x i32> %wide.load5.3, %17
				%index.next.3 = add nsw i64 %index, 32
				%24 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %index.next.3
				%25 = bitcast i32* %24 to <4 x i32>*
				%wide.load.4 = load <4 x i32>, <4 x i32>* %25, align 16
				%26 = getelementptr i32, i32* %24, i64 4
				%27 = bitcast i32* %26 to <4 x i32>*
				%wide.load5.4 = load <4 x i32>, <4 x i32>* %27, align 16
				%28 = add nsw <4 x i32> %wide.load.4, %22
				%29 = add nsw <4 x i32> %wide.load5.4, %23
				%index.next.4 = add nsw i64 %index, 40
				%30 = icmp eq i64 %index.next.4, 1000
				br i1 %30, label %middle.block, label %vector.body

				middle.block:
				%.lcssa10 = phi <4 x i32> [ %29, %vector.body ]
				%.lcssa = phi <4 x i32> [ %28, %vector.body ]
				%bin.rdx = add <4 x i32> %.lcssa10, %.lcssa
				%rdx.shuf = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
				%bin.rdx6 = add <4 x i32> %bin.rdx, %rdx.shuf
				%rdx.shuf7 = shufflevector <4 x i32> %bin.rdx6, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				%bin.rdx8 = add <4 x i32> %bin.rdx6, %rdx.shuf7
				%31 = extractelement <4 x i32> %bin.rdx8, i32 0
				ret i32 %31
				}

				define i32 @reduce_and() {
				; CHECK-LABEL: reduce_and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and
				; CHECK: Detected a reduction operation: {{.*}} and

				entry:
				br label %vector.body

				vector.body:
				%lsr.iv = phi i64 [ %lsr.iv.next, %vector.body ], [ -4096, %entry ]
				%vec.phi = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, %entry ], [ %6, %vector.body ]
				%vec.phi9 = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, %entry ], [ %7, %vector.body ]
				%uglygep33 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv
				%uglygep3334 = bitcast i8* %uglygep33 to <4 x i32>*
				%scevgep35 = getelementptr <4 x i32>, <4 x i32>* %uglygep3334, i64 256
				%wide.load = load <4 x i32>, <4 x i32>* %scevgep35, align 16
				%scevgep36 = getelementptr <4 x i32>, <4 x i32>* %uglygep3334, i64 257
				%wide.load10 = load <4 x i32>, <4 x i32>* %scevgep36, align 16
				%0 = and <4 x i32> %wide.load, %vec.phi
				%1 = and <4 x i32> %wide.load10, %vec.phi9
				%uglygep30 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv
				%uglygep3031 = bitcast i8* %uglygep30 to <4 x i32>*
				%scevgep32 = getelementptr <4 x i32>, <4 x i32>* %uglygep3031, i64 258
				%wide.load.1 = load <4 x i32>, <4 x i32>* %scevgep32, align 16
				%uglygep27 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv
				%uglygep2728 = bitcast i8* %uglygep27 to <4 x i32>*
				%scevgep29 = getelementptr <4 x i32>, <4 x i32>* %uglygep2728, i64 259
				%wide.load10.1 = load <4 x i32>, <4 x i32>* %scevgep29, align 16
				%2 = and <4 x i32> %wide.load.1, %0
				%3 = and <4 x i32> %wide.load10.1, %1
				%uglygep24 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv
				%uglygep2425 = bitcast i8* %uglygep24 to <4 x i32>*
				%scevgep26 = getelementptr <4 x i32>, <4 x i32>* %uglygep2425, i64 260
				%wide.load.2 = load <4 x i32>, <4 x i32>* %scevgep26, align 16
				%uglygep21 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv
				%uglygep2122 = bitcast i8* %uglygep21 to <4 x i32>*
				%scevgep23 = getelementptr <4 x i32>, <4 x i32>* %uglygep2122, i64 261
				%wide.load10.2 = load <4 x i32>, <4 x i32>* %scevgep23, align 16
				%4 = and <4 x i32> %wide.load.2, %2
				%5 = and <4 x i32> %wide.load10.2, %3
				%uglygep18 = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv
				%uglygep1819 = bitcast i8* %uglygep18 to <4 x i32>*
				%scevgep20 = getelementptr <4 x i32>, <4 x i32>* %uglygep1819, i64 262
				%wide.load.3 = load <4 x i32>, <4 x i32>* %scevgep20, align 16
				%uglygep = getelementptr i8, i8* bitcast ([1024 x i32]* @a to i8*), i64 %lsr.iv
				%uglygep17 = bitcast i8* %uglygep to <4 x i32>*
				%scevgep = getelementptr <4 x i32>, <4 x i32>* %uglygep17, i64 263
				%wide.load10.3 = load <4 x i32>, <4 x i32>* %scevgep, align 16
				%6 = and <4 x i32> %wide.load.3, %4
				%7 = and <4 x i32> %wide.load10.3, %5
				%lsr.iv.next = add nsw i64 %lsr.iv, 128
				%8 = icmp eq i64 %lsr.iv.next, 0
				br i1 %8, label %middle.block, label %vector.body

				middle.block:
				%bin.rdx = and <4 x i32> %7, %6
				%rdx.shuf = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
				%bin.rdx11 = and <4 x i32> %bin.rdx, %rdx.shuf
				%rdx.shuf12 = shufflevector <4 x i32> %bin.rdx11, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				%bin.rdx13 = and <4 x i32> %bin.rdx11, %rdx.shuf12
				%9 = extractelement <4 x i32> %bin.rdx13, i32 0
				ret i32 %9
				}

				define float @reduce_add_float(float* nocapture readonly %a) {
				; CHECK-LABEL: reduce_add_float
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				; CHECK: Detected a reduction operation: {{.*}} fadd fast
				;
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ 0, %entry ], [ %index.next.4, %vector.body ]
				%vec.phi = phi <4 x float> [ zeroinitializer, %entry ], [ %28, %vector.body ]
				%vec.phi9 = phi <4 x float> [ zeroinitializer, %entry ], [ %29, %vector.body ]
				%0 = getelementptr inbounds float, float* %a, i64 %index
				%1 = bitcast float* %0 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %1, align 4
				%2 = getelementptr float, float* %0, i64 4
				%3 = bitcast float* %2 to <4 x float>*
				%wide.load10 = load <4 x float>, <4 x float>* %3, align 4
				%4 = fadd fast <4 x float> %wide.load, %vec.phi
				%5 = fadd fast <4 x float> %wide.load10, %vec.phi9
				%index.next = add nuw nsw i64 %index, 8
				%6 = getelementptr inbounds float, float* %a, i64 %index.next
				%7 = bitcast float* %6 to <4 x float>*
				%wide.load.1 = load <4 x float>, <4 x float>* %7, align 4
				%8 = getelementptr float, float* %6, i64 4
				%9 = bitcast float* %8 to <4 x float>*
				%wide.load10.1 = load <4 x float>, <4 x float>* %9, align 4
				%10 = fadd fast <4 x float> %wide.load.1, %4
				%11 = fadd fast <4 x float> %wide.load10.1, %5
				%index.next.1 = add nsw i64 %index, 16
				%12 = getelementptr inbounds float, float* %a, i64 %index.next.1
				%13 = bitcast float* %12 to <4 x float>*
				%wide.load.2 = load <4 x float>, <4 x float>* %13, align 4
				%14 = getelementptr float, float* %12, i64 4
				%15 = bitcast float* %14 to <4 x float>*
				%wide.load10.2 = load <4 x float>, <4 x float>* %15, align 4
				%16 = fadd fast <4 x float> %wide.load.2, %10
				%17 = fadd fast <4 x float> %wide.load10.2, %11
				%index.next.2 = add nsw i64 %index, 24
				%18 = getelementptr inbounds float, float* %a, i64 %index.next.2
				%19 = bitcast float* %18 to <4 x float>*
				%wide.load.3 = load <4 x float>, <4 x float>* %19, align 4
				%20 = getelementptr float, float* %18, i64 4
				%21 = bitcast float* %20 to <4 x float>*
				%wide.load10.3 = load <4 x float>, <4 x float>* %21, align 4
				%22 = fadd fast <4 x float> %wide.load.3, %16
				%23 = fadd fast <4 x float> %wide.load10.3, %17
				%index.next.3 = add nsw i64 %index, 32
				%24 = getelementptr inbounds float, float* %a, i64 %index.next.3
				%25 = bitcast float* %24 to <4 x float>*
				%wide.load.4 = load <4 x float>, <4 x float>* %25, align 4
				%26 = getelementptr float, float* %24, i64 4
				%27 = bitcast float* %26 to <4 x float>*
				%wide.load10.4 = load <4 x float>, <4 x float>* %27, align 4
				%28 = fadd fast <4 x float> %wide.load.4, %22
				%29 = fadd fast <4 x float> %wide.load10.4, %23
				%index.next.4 = add nsw i64 %index, 40
				%30 = icmp eq i64 %index.next.4, 1000
				br i1 %30, label %middle.block, label %vector.body

				middle.block:
				%.lcssa15 = phi <4 x float> [ %29, %vector.body ]
				%.lcssa = phi <4 x float> [ %28, %vector.body ]
				%bin.rdx = fadd fast <4 x float> %.lcssa15, %.lcssa
				%rdx.shuf = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
				%bin.rdx11 = fadd fast <4 x float> %bin.rdx, %rdx.shuf
				%rdx.shuf12 = shufflevector <4 x float> %bin.rdx11, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
				%bin.rdx13 = fadd fast <4 x float> %bin.rdx11, %rdx.shuf12
				%31 = extractelement <4 x float> %bin.rdx13, i32 0
				ret float %31
				}

This is an archive of the discontinued LLVM Phabricator instance.

Detecte vector reduction operations just before instruction selection.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 44149

include/llvm/CodeGen/SelectionDAGNodes.h

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

test/CodeGen/Generic/vector-redux.ll

Detecte vector reduction operations just before instruction selection.
ClosedPublic