This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/2
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
1/1
reduction.ll
-
used-reduced-op.ll

Differential D70148

[SLP] fix miscompile on min/max reductions with extra uses (PR43948)
ClosedPublic

Authored by spatel on Nov 12 2019, 2:48 PM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
dtemirbulatov
echristo

Commits

rG0a8e7ca402eb: [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try)
rGa3e61946c5bd: [SLP] fix miscompile on min/max reductions with extra uses (PR43948)

Summary

I noticed that we could miscompile reductions while working on the recent 2-way enhancements.

The problem appears to be limited to cases where a min/max reduction has extra uses of the compare operand to the select.

I assume that the existing test in used-reduced-op.ll also shows a miscompile, but nobody noticed in such a large test?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Nov 12 2019, 2:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2019, 2:48 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

spatel marked an inline comment as done.Nov 13 2019, 6:23 AM

spatel added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/reduction.ll
117–118	For reference (and I should improve the variable names or test comments): %t14 is the final step of what we recognize as the min/max reduction, and %EXTRA_USE is the compare part of that min/max op.

ABataev added inline comments.Nov 13 2019, 7:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6804–6813	I don't think you need matches here. You can rely on `ReductionData.getKind()` and then just do something like this: switch (ReductionData.getKind()) { case RK_Min: case RK_Max: case RK_UMin: case RK_UMax: cast<SelectInstruction>(ReductionRoot)->getCondition()->replaceAllUsesWith(cast<SelectInstruction>(VectorizedTree)->getCondition()); break; case RK_Arithmetic: break; } And even better to create a new member function in Operation data, which will replace all uses for the vectorized instruction like in this code + the final replacement.

spatel marked 2 inline comments as done.Nov 13 2019, 10:10 AM

spatel added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6804–6813	This doesn't work because we don't know that the vectorized code ends in a select at this point (it might end in an extractelement of a vector instruction instead). But I agree that we can simplify the logic a bit, so added a helper: rGe9bf7a60a036 I also looked at extending the OperationData class as suggested, but I don't see how to do that cleanly because the final element of the reduction (the thing that we want to substitute for the scalar op) isn't part of OperationData currently. Let's make that a follow-up cleanup step.

Patch updated:

Reduce code by using min/max helper (rGe9bf7a60a036).
I also tried to make the test clearer (rG142cbe73e9fe).

This revision is now accepted and ready to land.Nov 13 2019, 10:15 AM

Closed by commit rGa3e61946c5bd: [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (authored by spatel). · Explain WhyNov 13 2019, 1:05 PM

This revision was automatically updated to reflect the committed changes.

Reopening - reverted here:
rG6f1cc4151a5a

Looks like we need to adjust the IR insert point for the cmp (the whole reduction?) because we may create invalid IR otherwise ("Instruction does not dominate all uses!").

This revision is now accepted and ready to land.Nov 18 2019, 3:38 PM

For reference (this made it to the mailing list, but not Phab), the code example cited for the revert was likely already miscompiling:

// clang -c -O2 -msse4 repro.cc

using a = void (*)(const void *, long, int *, int *);
int b(int);
template <int, typename> void c(const void *, long, int *d, int *) {
  int a[8];
  int e[1];
  int f;
  for (int g = 0; g < 8; g += 2) {
    for (int h = 0; h < 5; ++h)
      a[3] += e;
    for (int h = 0; h < 3; ++h)
      f = b(f) * 2;
  }
  int i = *d = 0;
  for (int g = 0; g < 8; ++g)
    if (a[g] > i) {
      i = a[g];
      *d = g;
    }
}
a j;
void k() { j = c<8, char>; }

I reduced the IR using bugpoint and then cleaned it up manually a bit to still make some sense (bugpoint introduces undefs that would eventually allow reducing the code to nothing):

define i1 @bad_insertpoint_rdx([8 x i32]* %p) #0 {
; CHECK-LABEL: @bad_insertpoint_rdx(
; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* [[P:%.*]], i64 0, i64 0
; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[ARRAYIDX22]] to <2 x i32>*
; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 16
; CHECK-NEXT:    [[SPEC_STORE_SELECT87:%.*]] = zext i1 undef to i32
; CHECK-NEXT:    [[RDX_SHUF:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> undef, <2 x i32> <i32 1, i32 undef>
; CHECK-NEXT:    [[RDX_MINMAX_CMP:%.*]] = icmp sgt <2 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT:    [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x i32> [[TMP2]], <2 x i32> [[RDX_SHUF]]
; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <2 x i32> [[RDX_MINMAX_SELECT]], i32 0
; CHECK-NEXT:    [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], 0
; CHECK-NEXT:    [[OP_EXTRA:%.*]] = select i1 [[TMP4]], i32 [[TMP3]], i32 0
; CHECK-NEXT:    [[CMP23_2:%.*]] = icmp sgt i32 [[SPEC_STORE_SELECT87]], [[OP_EXTRA]]
; CHECK-NEXT:    ret i1 [[CMP23_2]]
;
  %arrayidx22 = getelementptr inbounds [8 x i32], [8 x i32]* %p, i64 0, i64 0
  %t0 = load i32, i32* %arrayidx22, align 16
  %cmp23 = icmp sgt i32 %t0, 0
  %spec.select = select i1 %cmp23, i32 %t0, i32 0
  %arrayidx22.1 = getelementptr inbounds [8 x i32], [8 x i32]* %p, i64 0, i64 1
  %t1 = load i32, i32* %arrayidx22.1, align 4
  %cmp23.1 = icmp sgt i32 %t1, %spec.select
  %spec.store.select87 = zext i1 %cmp23.1 to i32
  %spec.select88 = select i1 %cmp23.1, i32 %t1, i32 %spec.select
  %cmp23.2 = icmp sgt i32 %spec.store.select87, %spec.select88
  ret i1 %cmp23.2
}

The CHECK lines are based on trunk today (this RAUW patch is not in play). So this example miscompiles independently of this patch - see this line:

; CHECK-NEXT:    [[SPEC_STORE_SELECT87:%.*]] = zext i1 undef to i32

The result of the function is based on that value, so we can get the whole thing wrong depending on what we choose for 'undef'.

In D70148#1751608, @spatel wrote:

The result of the function is based on that value, so we can get the whole thing wrong depending on what we choose for 'undef'.

In case that's not clear because SLP leaves dead code everywhere - if you pass the output of SLP to instcombine:

define i1 @bad_insertpoint_rdx([8 x i32]* %p) #0 {
  ret i1 false
}

In D70148#1750726, @spatel wrote:

Reopening - reverted here:
rG6f1cc4151a5a

Looks like we need to adjust the IR insert point for the cmp (the whole reduction?) because we may create invalid IR otherwise ("Instruction does not dominate all uses!").

Seems to me, need to fix the code in line 6745. Instead of Builder.SetInsertPoint(cast<Instruction>(ReductionRoot)); need to set the insertion point to the CMPInst if ReductionRoot is SelectInst.

spatel mentioned this in rG6265be2782d7: [SLP] add test for reduction miscompile; NFC.Nov 19 2019, 7:26 AM

spatel mentioned this in rG39de82ecc9c2: [SLP] fix insertion point for min/max reduction.Nov 19 2019, 7:56 AM

Closed by commit rG0a8e7ca402eb: [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try) (authored by spatel). · Explain WhyNov 19 2019, 12:06 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D99753: [SLP]Fix a bug in min/max reduction, number of condition uses..Apr 1 2021, 2:00 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

35 lines

test/

Transforms/

SLPVectorizer/

X86/

reduction.ll

4 lines

used-reduced-op.ll

2 lines

Diff 230118

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,680 Lines • ▼ Show 20 Lines	bool tryToReduce(BoUpSLP &V, TargetTransformInfo *TTI, bool Try2WayRdx) {

BoUpSLP::ExtraValueToDebugLocsMap ExternallyUsedValues;		BoUpSLP::ExtraValueToDebugLocsMap ExternallyUsedValues;
// The same extra argument may be used several time, so log each attempt		// The same extra argument may be used several time, so log each attempt
// to use it.		// to use it.
for (auto &Pair : ExtraArgs) {		for (auto &Pair : ExtraArgs) {
assert(Pair.first && "DebugLoc must be set.");		assert(Pair.first && "DebugLoc must be set.");
ExternallyUsedValues[Pair.second].push_back(Pair.first);		ExternallyUsedValues[Pair.second].push_back(Pair.first);
}		}

		// The compare instruction of a min/max is the insertion point for new
		// instructions and may be replaced with a new compare instruction.
		auto getCmpForMinMaxReduction = [](Instruction *RdxRootInst) {
		assert(isa<SelectInst>(RdxRootInst) &&
		"Expected min/max reduction to have select root instruction");
		Value *ScalarCond = cast<SelectInst>(RdxRootInst)->getCondition();
		assert(isa<Instruction>(ScalarCond) &&
		"Expected min/max reduction to have compare condition");
		return cast<Instruction>(ScalarCond);
		};

// The reduction root is used as the insertion point for new instructions,		// The reduction root is used as the insertion point for new instructions,
// so set it as externally used to prevent it from being deleted.		// so set it as externally used to prevent it from being deleted.
ExternallyUsedValues[ReductionRoot];		ExternallyUsedValues[ReductionRoot];
SmallVector<Value *, 16> IgnoreList;		SmallVector<Value *, 16> IgnoreList;
for (auto &V : ReductionOps)		for (auto &V : ReductionOps)
IgnoreList.append(V.begin(), V.end());		IgnoreList.append(V.begin(), V.end());
while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > MinRdxWidth) {		while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > MinRdxWidth) {
auto VL = makeArrayRef(&ReducedVals[i], ReduxWidth);		auto VL = makeArrayRef(&ReducedVals[i], ReduxWidth);
Show All 39 Lines	while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > MinRdxWidth) {
<< ore::NV("Cost", Cost) << " and with tree size "		<< ore::NV("Cost", Cost) << " and with tree size "
<< ore::NV("TreeSize", V.getTreeSize());		<< ore::NV("TreeSize", V.getTreeSize());
});		});

// Vectorize a tree.		// Vectorize a tree.
DebugLoc Loc = cast<Instruction>(ReducedVals[i])->getDebugLoc();		DebugLoc Loc = cast<Instruction>(ReducedVals[i])->getDebugLoc();
Value *VectorizedRoot = V.vectorizeTree(ExternallyUsedValues);		Value *VectorizedRoot = V.vectorizeTree(ExternallyUsedValues);

auto getCmpForMinMaxReduction = [](Instruction *RdxRootInst) {
assert(isa<SelectInst>(RdxRootInst) &&
"Expected min/max reduction to have select root instruction");
Value *ScalarCond = cast<SelectInst>(RdxRootInst)->getCondition();
assert(isa<Instruction>(ScalarCond) &&
"Expected min/max reduction to have compare condition");
return cast<Instruction>(ScalarCond);
};

// Emit a reduction. For min/max, the root is a select, but the insertion		// Emit a reduction. For min/max, the root is a select, but the insertion
// point is the compare condition of that select.		// point is the compare condition of that select.
Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);		Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);
if (ReductionData.isMinMax())		if (ReductionData.isMinMax())
Builder.SetInsertPoint(getCmpForMinMaxReduction(RdxRootInst));		Builder.SetInsertPoint(getCmpForMinMaxReduction(RdxRootInst));
else		else
Builder.SetInsertPoint(RdxRootInst);		Builder.SetInsertPoint(RdxRootInst);

Show All 27 Lines	if (VectorizedTree) {
for (auto *I : Pair.second) {		for (auto *I : Pair.second) {
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
OperationData VectReductionData(ReductionData.getOpcode(),		OperationData VectReductionData(ReductionData.getOpcode(),
VectorizedTree, Pair.first,		VectorizedTree, Pair.first,
ReductionData.getKind());		ReductionData.getKind());
VectorizedTree = VectReductionData.createOp(Builder, "op.extra", I);		VectorizedTree = VectReductionData.createOp(Builder, "op.extra", I);
}		}
}		}
// Update users.
		// Update users. For a min/max reduction that ends with a compare and
		// select, we also have to RAUW for the compare instruction feeding the
		// reduction root. That's because the original compare may have extra uses
		// besides the final select of the reduction.
		if (ReductionData.isMinMax()) {
		if (auto *VecSelect = dyn_cast<SelectInst>(VectorizedTree)) {
		Instruction *ScalarCmp =
		getCmpForMinMaxReduction(cast<Instruction>(ReductionRoot));
		ScalarCmp->replaceAllUsesWith(VecSelect->getCondition());
		}
		}
ReductionRoot->replaceAllUsesWith(VectorizedTree);		ReductionRoot->replaceAllUsesWith(VectorizedTree);

// Mark all scalar reduction ops for deletion, they are replaced by the		// Mark all scalar reduction ops for deletion, they are replaced by the
		ABataevUnsubmitted Done Reply Inline Actions I don't think you need matches here. You can rely on `ReductionData.getKind()` and then just do something like this: switch (ReductionData.getKind()) { case RK_Min: case RK_Max: case RK_UMin: case RK_UMax: cast<SelectInstruction>(ReductionRoot)->getCondition()->replaceAllUsesWith(cast<SelectInstruction>(VectorizedTree)->getCondition()); break; case RK_Arithmetic: break; } And even better to create a new member function in Operation data, which will replace all uses for the vectorized instruction like in this code + the final replacement. ABataev: I don't think you need matches here. You can rely on `ReductionData.getKind()` and then just do…
		spatelAuthorUnsubmitted Done Reply Inline Actions This doesn't work because we don't know that the vectorized code ends in a select at this point (it might end in an extractelement of a vector instruction instead). But I agree that we can simplify the logic a bit, so added a helper: rGe9bf7a60a036 I also looked at extending the OperationData class as suggested, but I don't see how to do that cleanly because the final element of the reduction (the thing that we want to substitute for the scalar op) isn't part of OperationData currently. Let's make that a follow-up cleanup step. spatel: This doesn't work because we don't know that the vectorized code ends in a select at this point…
// vector reductions.		// vector reductions.
V.eraseInstructions(IgnoreList);		V.eraseInstructions(IgnoreList);
}		}
return VectorizedTree != nullptr;		return VectorizedTree != nullptr;
}		}

unsigned numReductionValues() const {		unsigned numReductionValues() const {
return ReducedVals.size();		return ReducedVals.size();
▲ Show 20 Lines • Show All 653 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction.ll

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], [[T4]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], [[T4]]
	; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP3]], i32 [[T4]]			; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP3]], i32 [[T4]]
	; CHECK-NEXT: [[C012345:%.*]] = icmp sgt i32 [[TMP5]], [[T5]]			; CHECK-NEXT: [[C012345:%.*]] = icmp sgt i32 [[TMP5]], [[T5]]
	; CHECK-NEXT: [[T17:%.*]] = select i1 [[C012345]], i32 [[TMP5]], i32 [[T5]]			; CHECK-NEXT: [[T17:%.*]] = select i1 [[C012345]], i32 [[TMP5]], i32 [[T5]]
	; CHECK-NEXT: [[THREE_OR_FOUR:%.*]] = select i1 undef, i32 3, i32 4			; CHECK-NEXT: [[THREE_OR_FOUR:%.*]] = select i1 [[TMP4]], i32 3, i32 4
	; CHECK-NEXT: store i32 [[THREE_OR_FOUR]], i32* [[P:%.*]], align 8			; CHECK-NEXT: store i32 [[THREE_OR_FOUR]], i32* [[P:%.*]], align 8
	; CHECK-NEXT: ret i32 [[T17]]			; CHECK-NEXT: ret i32 [[T17]]
	;			;
	%x0 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 0			%x0 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 0
	%x1 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 1			%x1 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 1
	%x2 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 2			%x2 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 2
	%x3 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 3			%x3 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 3
	%x4 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 4			%x4 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 4
	%x5 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 5			%x5 = getelementptr [32 x i32], [32 x i32]* %x, i64 0, i64 5

	%t0 = load i32, i32* %x0			%t0 = load i32, i32* %x0
	%t1 = load i32, i32* %x1			%t1 = load i32, i32* %x1
	%t2 = load i32, i32* %x2			%t2 = load i32, i32* %x2
	%t3 = load i32, i32* %x3			%t3 = load i32, i32* %x3
	%t4 = load i32, i32* %x4			%t4 = load i32, i32* %x4
	%t5 = load i32, i32* %x5			%t5 = load i32, i32* %x5

	%c01 = icmp sgt i32 %t0, %t1			%c01 = icmp sgt i32 %t0, %t1
	%s5 = select i1 %c01, i32 %t0, i32 %t1			%s5 = select i1 %c01, i32 %t0, i32 %t1
	%c012 = icmp sgt i32 %s5, %t2			%c012 = icmp sgt i32 %s5, %t2
	%t8 = select i1 %c012, i32 %s5, i32 %t2			%t8 = select i1 %c012, i32 %s5, i32 %t2
	%c0123 = icmp sgt i32 %t8, %t3			%c0123 = icmp sgt i32 %t8, %t3
	%rdx4 = select i1 %c0123, i32 %t8, i32 %t3			%rdx4 = select i1 %c0123, i32 %t8, i32 %t3
	%MAX_ROOT_CMP = icmp sgt i32 %rdx4, %t4			%MAX_ROOT_CMP = icmp sgt i32 %rdx4, %t4
				spatelAuthorUnsubmitted Done Reply Inline Actions For reference (and I should improve the variable names or test comments): %t14 is the final step of what we recognize as the min/max reduction, and %EXTRA_USE is the compare part of that min/max op. spatel: For reference (and I should improve the variable names or test comments): %t14 is the final…
	%MAX_ROOT_SEL = select i1 %MAX_ROOT_CMP, i32 %rdx4, i32 %t4			%MAX_ROOT_SEL = select i1 %MAX_ROOT_CMP, i32 %rdx4, i32 %t4
	%c012345 = icmp sgt i32 %MAX_ROOT_SEL, %t5			%c012345 = icmp sgt i32 %MAX_ROOT_SEL, %t5
	%t17 = select i1 %c012345, i32 %MAX_ROOT_SEL, i32 %t5			%t17 = select i1 %c012345, i32 %MAX_ROOT_SEL, i32 %t5
	%three_or_four = select i1 %MAX_ROOT_CMP, i32 3, i32 4			%three_or_four = select i1 %MAX_ROOT_CMP, i32 3, i32 4
	store i32 %three_or_four, i32* %p, align 8			store i32 %three_or_four, i32* %p, align 8
	ret i32 %t17			ret i32 %t17
	}			}

	; FIXME: This is a miscompile (see the undef operand) and/or test for invalid IR.			; FIXME: This is a miscompile (see the undef operand) and/or test for invalid IR.

	define i1 @bad_insertpoint_rdx([8 x i32]* %p) #0 {			define i1 @bad_insertpoint_rdx([8 x i32]* %p) #0 {
	; CHECK-LABEL: @bad_insertpoint_rdx(			; CHECK-LABEL: @bad_insertpoint_rdx(
	; CHECK-NEXT: [[ARRAYIDX22:%.]] = getelementptr inbounds [8 x i32], [8 x i32] [[P:%.*]], i64 0, i64 0			; CHECK-NEXT: [[ARRAYIDX22:%.]] = getelementptr inbounds [8 x i32], [8 x i32] [[P:%.*]], i64 0, i64 0
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[ARRAYIDX22]] to <2 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[ARRAYIDX22]] to <2 x i32>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 16			; CHECK-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 16
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> undef, <2 x i32> <i32 1, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> undef, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <2 x i32> [[TMP2]], [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <2 x i32> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x i32> [[TMP2]], <2 x i32> [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <2 x i1> [[RDX_MINMAX_CMP]], <2 x i32> [[TMP2]], <2 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[RDX_MINMAX_SELECT]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[RDX_MINMAX_SELECT]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], 0			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP4]], i32 [[TMP3]], i32 0			; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP4]], i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[SPEC_STORE_SELECT87:%.*]] = zext i1 undef to i32			; CHECK-NEXT: [[SPEC_STORE_SELECT87:%.*]] = zext i1 [[TMP4]] to i32
	; CHECK-NEXT: [[CMP23_2:%.*]] = icmp sgt i32 [[SPEC_STORE_SELECT87]], [[OP_EXTRA]]			; CHECK-NEXT: [[CMP23_2:%.*]] = icmp sgt i32 [[SPEC_STORE_SELECT87]], [[OP_EXTRA]]
	; CHECK-NEXT: ret i1 [[CMP23_2]]			; CHECK-NEXT: ret i1 [[CMP23_2]]
	;			;
	%arrayidx22 = getelementptr inbounds [8 x i32], [8 x i32]* %p, i64 0, i64 0			%arrayidx22 = getelementptr inbounds [8 x i32], [8 x i32]* %p, i64 0, i64 0
	%t0 = load i32, i32* %arrayidx22, align 16			%t0 = load i32, i32* %arrayidx22, align 16
	%cmp23 = icmp sgt i32 %t0, 0			%cmp23 = icmp sgt i32 %t0, 0
	%spec.select = select i1 %cmp23, i32 %t0, i32 0			%spec.select = select i1 %cmp23, i32 %t0, i32 0
	%arrayidx22.1 = getelementptr inbounds [8 x i32], [8 x i32]* %p, i64 0, i64 1			%arrayidx22.1 = getelementptr inbounds [8 x i32], [8 x i32]* %p, i64 0, i64 1
	%t1 = load i32, i32* %arrayidx22.1, align 4			%t1 = load i32, i32* %arrayidx22.1, align 4
	%cmp23.1 = icmp sgt i32 %t1, %spec.select			%cmp23.1 = icmp sgt i32 %t1, %spec.select
	%spec.store.select87 = zext i1 %cmp23.1 to i32			%spec.store.select87 = zext i1 %cmp23.1 to i32
	%spec.select88 = select i1 %cmp23.1, i32 %t1, i32 %spec.select			%spec.select88 = select i1 %cmp23.1, i32 %t1, i32 %spec.select
	%cmp23.2 = icmp sgt i32 %spec.store.select87, %spec.select88			%cmp23.2 = icmp sgt i32 %spec.store.select87, %spec.select88
	ret i1 %cmp23.2			ret i1 %cmp23.2
	}			}

llvm/test/Transforms/SLPVectorizer/X86/used-reduced-op.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP42]], i32 [[TMP41]], i32 [[TMP32]]			; CHECK-NEXT: [[TMP43:%.*]] = select i1 [[TMP42]], i32 [[TMP41]], i32 [[TMP32]]
	; CHECK-NEXT: [[TMP44:%.*]] = icmp slt i32 [[TMP43]], [[B_0]]			; CHECK-NEXT: [[TMP44:%.*]] = icmp slt i32 [[TMP43]], [[B_0]]
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP44]], i32 [[TMP43]], i32 [[B_0]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP44]], i32 [[TMP43]], i32 [[B_0]]
	; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP30]], [[TMP2]]			; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP30]], [[TMP2]]
	; CHECK-NEXT: [[TMP45:%.*]] = icmp slt i32 [[SUB_1_1]], 0			; CHECK-NEXT: [[TMP45:%.*]] = icmp slt i32 [[SUB_1_1]], 0
	; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]			; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]
	; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP45]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]			; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP45]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]
	; CHECK-NEXT: [[CMP12_1_1:%.*]] = icmp slt i32 [[TMP46]], [[OP_EXTRA]]			; CHECK-NEXT: [[CMP12_1_1:%.*]] = icmp slt i32 [[TMP46]], [[OP_EXTRA]]
	; CHECK-NEXT: [[NARROW:%.*]] = or i1 [[CMP12_1_1]], undef			; CHECK-NEXT: [[NARROW:%.*]] = or i1 [[CMP12_1_1]], [[TMP44]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_1:%.*]] = select i1 [[CMP12_1_1]], i32 [[TMP46]], i32 [[OP_EXTRA]]			; CHECK-NEXT: [[SPEC_SELECT8_1_1:%.*]] = select i1 [[CMP12_1_1]], i32 [[TMP46]], i32 [[OP_EXTRA]]
	; CHECK-NEXT: [[SUB_2_1:%.*]] = sub i32 [[TMP30]], [[TMP3]]			; CHECK-NEXT: [[SUB_2_1:%.*]] = sub i32 [[TMP30]], [[TMP3]]
	; CHECK-NEXT: [[TMP47:%.*]] = icmp slt i32 [[SUB_2_1]], 0			; CHECK-NEXT: [[TMP47:%.*]] = icmp slt i32 [[SUB_2_1]], 0
	; CHECK-NEXT: [[NEG_2_1:%.*]] = sub nsw i32 0, [[SUB_2_1]]			; CHECK-NEXT: [[NEG_2_1:%.*]] = sub nsw i32 0, [[SUB_2_1]]
	; CHECK-NEXT: [[TMP48:%.*]] = select i1 [[TMP47]], i32 [[NEG_2_1]], i32 [[SUB_2_1]]			; CHECK-NEXT: [[TMP48:%.*]] = select i1 [[TMP47]], i32 [[NEG_2_1]], i32 [[SUB_2_1]]
	; CHECK-NEXT: [[CMP12_2_1:%.*]] = icmp slt i32 [[TMP48]], [[SPEC_SELECT8_1_1]]			; CHECK-NEXT: [[CMP12_2_1:%.*]] = icmp slt i32 [[TMP48]], [[SPEC_SELECT8_1_1]]
	; CHECK-NEXT: [[NARROW34:%.*]] = or i1 [[CMP12_2_1]], [[NARROW]]			; CHECK-NEXT: [[NARROW34:%.*]] = or i1 [[CMP12_2_1]], [[NARROW]]
	; CHECK-NEXT: [[SPEC_SELECT8_2_1:%.*]] = select i1 [[CMP12_2_1]], i32 [[TMP48]], i32 [[SPEC_SELECT8_1_1]]			; CHECK-NEXT: [[SPEC_SELECT8_2_1:%.*]] = select i1 [[CMP12_2_1]], i32 [[TMP48]], i32 [[SPEC_SELECT8_1_1]]
	▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines