This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IRBuilder.h
-
Intrinsics.td
-
lib/
-
CodeGen/
-
ExpandReductions.cpp
-
IR/
-
AutoUpgrade.cpp
-
IRBuilder.cpp
-
test/Verifier/
-
Verifier/
-
reduction-intrinsics.ll

Differential D117480

[IR] Extend llvm.vector.reduce.fadd
AbandonedPublic

Authored by junaire on Jan 17 2022, 6:42 AM.

Download Raw Diff

Details

Reviewers

fhahn
craig.topper
RKSimon
kpn
spatel
nikic
lebedev.ri

Summary

This patch extend the @llvm.vector.reduce.fadd to take another integer
argument to indicate the order to apply.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,350 ms	x64 debian > Clang.CodeGen/X86::avx512-reduceIntrin.c
	1,680 ms	x64 debian > Clang.CodeGen/X86::avx512fp16-builtins.c
	1,640 ms	x64 debian > Clang.CodeGen/X86::avx512vlfp16-builtins.c
	1,340 ms	x64 debian > Clang.utils/update_cc_test_checks::check-globals.test
	600 ms	x64 debian > Clang.utils/update_cc_test_checks::global-hex-value-regex.test
		View Full Test Results (516 Failed)

Event Timeline

junaire created this revision.Jan 17 2022, 6:42 AM

Herald added subscribers: dexonsmith, hiraditya. · View Herald TranscriptJan 17 2022, 6:42 AM

junaire requested review of this revision.Jan 17 2022, 6:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 6:42 AM

Herald added subscribers: llvm-commits, jdoerfert. · View Herald Transcript

I don't get this change. We already use reassoc FMF to allow a non-ordered reduction, what's the purpose of the new flag?

This also needs a LangRef update.

Please note that this patch is WIP.

After we add a new argument to the intrinsic, how can we update the previous lit tests automatically? I haven't tried to update all the tests by hand, cuz it looks like takes a huge amount of work and I believe there should be a way to do things more easily and I don't know.

In D117480#3248421, @nikic wrote:

I don't get this change. We already use reassoc FMF to allow a non-ordered reduction, what's the purpose of the new flag?

This also needs a LangRef update.

Thanks to have a look at this!
The relevant context is here: https://reviews.llvm.org/D116736

I don't get this change. We already use reassoc FMF to allow a non-ordered reduction, what's the purpose of the new flag?

Well, maybe I didn't get the reviewer's ideas right? Any suggestions are appreciated!

In D117480#3248421, @nikic wrote:

I don't get this change. We already use reassoc FMF to allow a non-ordered reduction, what's the purpose of the new flag?

This also needs a LangRef update.

reassoc implies 'any order', but in some cases it is desirable to specify a specific order, e.g. for the vector reduction builtin provided by Clang.

In D117480#3248511, @fhahn wrote:

In D117480#3248421, @nikic wrote:

I don't get this change. We already use reassoc FMF to allow a non-ordered reduction, what's the purpose of the new flag?

This also needs a LangRef update.

reassoc implies 'any order', but in some cases it is desirable to specify a specific order, e.g. for the vector reduction builtin provided by Clang.

I'm not convinced about all this implied complexity.
I think clang will just have to emit expanded form of the reduction in that case.

This revision now requires changes to proceed.Jan 17 2022, 7:07 AM

In D117480#3248511, @fhahn wrote:

In D117480#3248421, @nikic wrote:

I don't get this change. We already use reassoc FMF to allow a non-ordered reduction, what's the purpose of the new flag?

This also needs a LangRef update.

reassoc implies 'any order', but in some cases it is desirable to specify a specific order, e.g. for the vector reduction builtin provided by Clang.

Are we following someone else's standard here, or can we specify the behavior? If a specific reduction order is required, I would very much expect that to be "ordered reduction".

I'm not a fan of introducing a third order here if we can avoid it. The value of having something other than ordered/unordered is not clear to me.

Harbormaster completed remote builds in B143785: Diff 400527.Jan 17 2022, 7:11 AM

In D117480#3248549, @nikic wrote:

In D117480#3248511, @fhahn wrote:

reassoc implies 'any order', but in some cases it is desirable to specify a specific order, e.g. for the vector reduction builtin provided by Clang.

Are we following someone else's standard here, or can we specify the behavior? If a specific reduction order is required, I would very much expect that to be "ordered reduction".

I'm not a fan of introducing a third order here if we can avoid it. The value of having something other than ordered/unordered is not clear to me.

I assume ordered here means sequential. The clang builtin specifies a tree-wise reduction order. The main motivation for that order is to enable faster execution and consistent results on different HW architectures. At the moment the existing intrinsic is either slow & consistent or fast & inconsistent.

I'm not convinced about all this implied complexity.
I think clang will just have to emit expanded form of the reduction in that case.

Sure it could, but it seems unfortunate that a common reduction pattern cannot be expressed with the reduction intrinsic. The main benefit of using the reduction intrinsic here is to make instruction selection substantially easier. AFAICT the main complexity will be in the backends, which will now have to be able to deal with the new order. But given that at the moment only AArch64 and RISCV, it should't be too bad?

In D117480#3248644, @fhahn wrote:

I assume ordered here means sequential. The clang builtin specifies a tree-wise reduction order.

It's worth noting that "sequential" is only desirable for matching scalar code (which is important, but isn't everything). Explicit vector code essentially always prefers a tree reduction (either even + odd or low + high, depending on arch/uarch, but either is better than sequential). Either normal tree reduction delivers faster results with smaller average errors, and they are still reproducible. Having a defined tree reduction is a great tool for explicit vectorizers.

dexonsmith removed a subscriber: dexonsmith.Jan 18 2022, 12:26 PM

fhahn mentioned this in D117829: [Clang] Add integer mul reduction builtin.Jan 21 2022, 2:09 AM

Abandon this since other folks now seem have better ways to solve the issue.

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 11:52 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IRBuilder.h

7 lines

Intrinsics.td

2 lines

lib/

CodeGen/

ExpandReductions.cpp

21 lines

IR/

AutoUpgrade.cpp

11 lines

IRBuilder.cpp

14 lines

test/

Verifier/

reduction-intrinsics.ll

8 lines

Diff 400527

llvm/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 1,414 Lines • ▼ Show 20 Lines	Value CreateXor(Value LHS, const APInt &RHS, const Twine &Name = "") {
return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);		return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
}		}

Value CreateXor(Value LHS, uint64_t RHS, const Twine &Name = "") {		Value CreateXor(Value LHS, uint64_t RHS, const Twine &Name = "") {
return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);		return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
}		}

Value CreateFAdd(Value L, Value *R, const Twine &Name = "",		Value CreateFAdd(Value L, Value *R, const Twine &Name = "",
MDNode *FPMD = nullptr) {		MDNode FPMD = nullptr, Value Flag = nullptr) {
if (IsFPConstrained)		if (IsFPConstrained)
return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fadd,		return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fadd,
L, R, nullptr, Name, FPMD);		L, R, nullptr, Name, FPMD, llvm::None,
		llvm::None, Flag);

if (Value *V = foldConstant(Instruction::FAdd, L, R, Name)) return V;		if (Value *V = foldConstant(Instruction::FAdd, L, R, Name)) return V;
Instruction *I = setFPAttrs(BinaryOperator::CreateFAdd(L, R), FPMD, FMF);		Instruction *I = setFPAttrs(BinaryOperator::CreateFAdd(L, R), FPMD, FMF);
return Insert(I, Name);		return Insert(I, Name);
}		}

/// Copy fast-math-flags from an instruction rather than using the builder's		/// Copy fast-math-flags from an instruction rather than using the builder's
/// default FMF.		/// default FMF.
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	for (unsigned i = 1; i < Ops.size(); i++)
Accum = CreateLogicalOr(Accum, Ops[i]);		Accum = CreateLogicalOr(Accum, Ops[i]);
return Accum;		return Accum;
}		}

CallInst *CreateConstrainedFPBinOp(		CallInst *CreateConstrainedFPBinOp(
Intrinsic::ID ID, Value L, Value R, Instruction *FMFSource = nullptr,		Intrinsic::ID ID, Value L, Value R, Instruction *FMFSource = nullptr,
const Twine &Name = "", MDNode *FPMathTag = nullptr,		const Twine &Name = "", MDNode *FPMathTag = nullptr,
Optional<RoundingMode> Rounding = None,		Optional<RoundingMode> Rounding = None,
Optional<fp::ExceptionBehavior> Except = None);		Optional<fp::ExceptionBehavior> Except = None, Value *Flag = nullptr);

Value CreateNeg(Value V, const Twine &Name = "",		Value CreateNeg(Value V, const Twine &Name = "",
bool HasNUW = false, bool HasNSW = false) {		bool HasNUW = false, bool HasNSW = false) {
if (auto *VC = dyn_cast<Constant>(V))		if (auto *VC = dyn_cast<Constant>(V))
return Insert(Folder.CreateNeg(VC, HasNUW, HasNSW), Name);		return Insert(Folder.CreateNeg(VC, HasNUW, HasNSW), Name);
BinaryOperator *BO = Insert(BinaryOperator::CreateNeg(V), Name);		BinaryOperator *BO = Insert(BinaryOperator::CreateNeg(V), Name);
if (HasNUW) BO->setHasNoUnsignedWrap();		if (HasNUW) BO->setHasNoUnsignedWrap();
if (HasNSW) BO->setHasNoSignedWrap();		if (HasNSW) BO->setHasNoSignedWrap();
▲ Show 20 Lines • Show All 994 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,703 Lines • ▼ Show 20 Lines	: Intrinsic<[], [llvm_anyptr_ty, llvm_i8_ty, llvm_anyint_ty, llvm_i32_ty],
ImmArg<ArgIndex<3>>]>;		ImmArg<ArgIndex<3>>]>;

//===------------------------ Reduction Intrinsics ------------------------===//		//===------------------------ Reduction Intrinsics ------------------------===//
//		//
let IntrProperties = [IntrNoMem] in {		let IntrProperties = [IntrNoMem] in {

def int_vector_reduce_fadd : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],		def int_vector_reduce_fadd : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
[LLVMVectorElementType<0>,		[LLVMVectorElementType<0>,
llvm_anyvector_ty]>;		llvm_anyvector_ty, llvm_i32_ty]>;
def int_vector_reduce_fmul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],		def int_vector_reduce_fmul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
[LLVMVectorElementType<0>,		[LLVMVectorElementType<0>,
llvm_anyvector_ty]>;		llvm_anyvector_ty]>;
def int_vector_reduce_add : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],		def int_vector_reduce_add : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
[llvm_anyvector_ty]>;		[llvm_anyvector_ty]>;
def int_vector_reduce_mul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],		def int_vector_reduce_mul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
[llvm_anyvector_ty]>;		[llvm_anyvector_ty]>;
def int_vector_reduce_and : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],		def int_vector_reduce_and : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ExpandReductions.cpp

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	for (auto *II : Worklist) {
RecurKind RK = getRK(ID);		RecurKind RK = getRK(ID);

Value *Rdx = nullptr;		Value *Rdx = nullptr;
IRBuilder<> Builder(II);		IRBuilder<> Builder(II);
IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);		IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);
Builder.setFastMathFlags(FMF);		Builder.setFastMathFlags(FMF);
switch (ID) {		switch (ID) {
default: llvm_unreachable("Unexpected intrinsic!");		default: llvm_unreachable("Unexpected intrinsic!");
case Intrinsic::vector_reduce_fadd:		case Intrinsic::vector_reduce_fadd: {
		// FMFs must be attached to the call, otherwise it's an ordered reduction
		// and it can't be handled by generating a shuffle sequence.
		Value *Acc = II->getArgOperand(0);
		Value *Vec = II->getArgOperand(1);
		Value *Flag = II->getArgOperand(2);

		if (!FMF.allowReassoc() && !Flag)
		Rdx = getOrderedReduction(Builder, Acc, Vec, getOpcode(ID), RK);
		else {
		if (!isPowerOf2_32(
		cast<FixedVectorType>(Vec->getType())->getNumElements()))
		continue;

		Rdx = getShuffleReduction(Builder, Vec, getOpcode(ID), RK);
		Rdx = Builder.CreateBinOp((Instruction::BinaryOps)getOpcode(ID), Acc,
		Rdx, "bin.rdx");
		}
		break;
		}
case Intrinsic::vector_reduce_fmul: {		case Intrinsic::vector_reduce_fmul: {
// FMFs must be attached to the call, otherwise it's an ordered reduction		// FMFs must be attached to the call, otherwise it's an ordered reduction
// and it can't be handled by generating a shuffle sequence.		// and it can't be handled by generating a shuffle sequence.
Value *Acc = II->getArgOperand(0);		Value *Acc = II->getArgOperand(0);
Value *Vec = II->getArgOperand(1);		Value *Vec = II->getArgOperand(1);
if (!FMF.allowReassoc())		if (!FMF.allowReassoc())
Rdx = getOrderedReduction(Builder, Acc, Vec, getOpcode(ID), RK);		Rdx = getOrderedReduction(Builder, Acc, Vec, getOpcode(ID), RK);
else {		else {
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/lib/IR/AutoUpgrade.cpp

Show First 20 Lines • Show All 2,185 Lines • ▼ Show 20 Lines	if (IsX86 && (Name.startswith("sse2.pcmp") \|\|
Name == "sse.mul.ss" \|\| Name == "sse2.mul.sd" \|\|		Name == "sse.mul.ss" \|\| Name == "sse2.mul.sd" \|\|
Name == "sse.div.ss" \|\| Name == "sse2.div.sd")) {		Name == "sse.div.ss" \|\| Name == "sse2.div.sd")) {
Type *I32Ty = Type::getInt32Ty(C);		Type *I32Ty = Type::getInt32Ty(C);
Value *Elt0 = Builder.CreateExtractElement(CI->getArgOperand(0),		Value *Elt0 = Builder.CreateExtractElement(CI->getArgOperand(0),
ConstantInt::get(I32Ty, 0));		ConstantInt::get(I32Ty, 0));
Value *Elt1 = Builder.CreateExtractElement(CI->getArgOperand(1),		Value *Elt1 = Builder.CreateExtractElement(CI->getArgOperand(1),
ConstantInt::get(I32Ty, 0));		ConstantInt::get(I32Ty, 0));
Value *EltOp;		Value *EltOp;
if (Name.contains(".add."))		if (Name.contains(".add.")) {
EltOp = Builder.CreateFAdd(Elt0, Elt1);		Value *Elt2 = Builder.CreateExtractElement(CI->getArgOperand(2),
else if (Name.contains(".sub."))		ConstantInt::get(I32Ty, 0));
		EltOp = Builder.CreateFAdd(Elt0, Elt1, "", nullptr, Elt2);
		} else if (Name.contains(".sub."))
EltOp = Builder.CreateFSub(Elt0, Elt1);		EltOp = Builder.CreateFSub(Elt0, Elt1);
else if (Name.contains(".mul."))		else if (Name.contains(".mul."))
EltOp = Builder.CreateFMul(Elt0, Elt1);		EltOp = Builder.CreateFMul(Elt0, Elt1);
else		else
EltOp = Builder.CreateFDiv(Elt0, Elt1);		EltOp = Builder.CreateFDiv(Elt0, Elt1);
Rep = Builder.CreateInsertElement(CI->getArgOperand(0), EltOp,		Rep = Builder.CreateInsertElement(CI->getArgOperand(0), EltOp,
ConstantInt::get(I32Ty, 0));		ConstantInt::get(I32Ty, 0));
} else if (IsX86 && Name.startswith("avx512.mask.pcmp")) {		} else if (IsX86 && Name.startswith("avx512.mask.pcmp")) {
▲ Show 20 Lines • Show All 826 Lines • ▼ Show 20 Lines	if (IsX86 && (Name.startswith("sse2.pcmp") \|\|
IID = Intrinsic::x86_avx512_add_ps_512;		IID = Intrinsic::x86_avx512_add_ps_512;
else		else
IID = Intrinsic::x86_avx512_add_pd_512;		IID = Intrinsic::x86_avx512_add_pd_512;

Rep = Builder.CreateCall(Intrinsic::getDeclaration(F->getParent(), IID),		Rep = Builder.CreateCall(Intrinsic::getDeclaration(F->getParent(), IID),
{ CI->getArgOperand(0), CI->getArgOperand(1),		{ CI->getArgOperand(0), CI->getArgOperand(1),
CI->getArgOperand(4) });		CI->getArgOperand(4) });
} else {		} else {
Rep = Builder.CreateFAdd(CI->getArgOperand(0), CI->getArgOperand(1));		Rep = Builder.CreateFAdd(CI->getArgOperand(0), CI->getArgOperand(1), "",
		nullptr, CI->getArgOperand(2));
}		}
Rep = EmitX86Select(Builder, CI->getArgOperand(3), Rep,		Rep = EmitX86Select(Builder, CI->getArgOperand(3), Rep,
CI->getArgOperand(2));		CI->getArgOperand(2));
} else if (IsX86 && Name.startswith("avx512.mask.div.p")) {		} else if (IsX86 && Name.startswith("avx512.mask.div.p")) {
if (Name.endswith(".512")) {		if (Name.endswith(".512")) {
Intrinsic::ID IID;		Intrinsic::ID IID;
if (Name[17] == 's')		if (Name[17] == 's')
IID = Intrinsic::x86_avx512_div_ps_512;		IID = Intrinsic::x86_avx512_div_ps_512;
▲ Show 20 Lines • Show All 1,578 Lines • Show Last 20 Lines

llvm/lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 854 Lines • ▼ Show 20 Lines	CallInst *IRBuilderBase::CreateIntrinsic(Intrinsic::ID ID,
const Twine &Name) {		const Twine &Name) {
Module *M = BB->getModule();		Module *M = BB->getModule();
Function *Fn = Intrinsic::getDeclaration(M, ID, Types);		Function *Fn = Intrinsic::getDeclaration(M, ID, Types);
return createCallHelper(Fn, Args, this, Name, FMFSource);		return createCallHelper(Fn, Args, this, Name, FMFSource);
}		}

CallInst *IRBuilderBase::CreateConstrainedFPBinOp(		CallInst *IRBuilderBase::CreateConstrainedFPBinOp(
Intrinsic::ID ID, Value L, Value R, Instruction *FMFSource,		Intrinsic::ID ID, Value L, Value R, Instruction *FMFSource,
const Twine &Name, MDNode *FPMathTag,		const Twine &Name, MDNode *FPMathTag, Optional<RoundingMode> Rounding,
Optional<RoundingMode> Rounding,		Optional<fp::ExceptionBehavior> Except, Value *Flag) {
Optional<fp::ExceptionBehavior> Except) {
Value *RoundingV = getConstrainedFPRounding(Rounding);		Value *RoundingV = getConstrainedFPRounding(Rounding);
Value *ExceptV = getConstrainedFPExcept(Except);		Value *ExceptV = getConstrainedFPExcept(Except);

FastMathFlags UseFMF = FMF;		FastMathFlags UseFMF = FMF;
if (FMFSource)		if (FMFSource)
UseFMF = FMFSource->getFastMathFlags();		UseFMF = FMFSource->getFastMathFlags();

CallInst *C = CreateIntrinsic(ID, {L->getType()},		auto Args = {L, R, RoundingV, ExceptV};
{L, R, RoundingV, ExceptV}, nullptr, Name);
		if (Flag)
		Args = {L, R, Flag, RoundingV, ExceptV};

		CallInst *C =
		CreateIntrinsic(ID, {L->getType()}, std::move(Args), nullptr, Name);
setConstrainedFPCallAttr(C);		setConstrainedFPCallAttr(C);
setFPAttrs(C, FPMathTag, UseFMF);		setFPAttrs(C, FPMathTag, UseFMF);
return C;		return C;
}		}

Value IRBuilderBase::CreateNAryOp(unsigned Opc, ArrayRef<Value > Ops,		Value IRBuilderBase::CreateNAryOp(unsigned Opc, ArrayRef<Value > Ops,
const Twine &Name, MDNode *FPMathTag) {		const Twine &Name, MDNode *FPMathTag) {
if (Instruction::isBinaryOp(Opc)) {		if (Instruction::isBinaryOp(Opc)) {
▲ Show 20 Lines • Show All 389 Lines • Show Last 20 Lines

llvm/test/Verifier/reduction-intrinsics.ll

Show All 12 Lines	; CHECK: Intrinsic has incorrect argument type!
%r0 = call i32 @llvm.vector.reduce.smax.i32(i32 %x)		%r0 = call i32 @llvm.vector.reduce.smax.i32(i32 %x)
ret i32 %r0		ret i32 %r0
}		}

; Type mismatch for start value.		; Type mismatch for start value.

define float @fadd_match_arg_types(<4 x float> %x) {		define float @fadd_match_arg_types(<4 x float> %x) {
; CHECK: Intrinsic has incorrect argument type!		; CHECK: Intrinsic has incorrect argument type!
%r = call float @llvm.vector.reduce.fadd.v4f32(double 0.0, <4 x float> %x)		%r = call float @llvm.vector.reduce.fadd.v4f32(double 0.0, <4 x float> %x, i32 0)
ret float %r		ret float %r
}		}

; Wrong result type.		; Wrong result type.

define i64 @result_too_wide(<4 x i32> %x) {		define i64 @result_too_wide(<4 x i32> %x) {
; CHECK: Intrinsic has incorrect return type!		; CHECK: Intrinsic has incorrect return type!
%r = call i64 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i64 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
Show All 12 Lines
define i32* @not_pointer_reduce(<4 x i32*> %x) {		define i32* @not_pointer_reduce(<4 x i32*> %x) {
; CHECK: Intrinsic has incorrect argument type!		; CHECK: Intrinsic has incorrect argument type!
%r = call i32* @llvm.vector.reduce.or.v4p0i32(<4 x i32*> %x)		%r = call i32* @llvm.vector.reduce.or.v4p0i32(<4 x i32*> %x)
ret i32* %r		ret i32* %r
}		}

define i32 @not_integer_reduce(<4 x i32> %x) {		define i32 @not_integer_reduce(<4 x i32> %x) {
; CHECK: Intrinsic has incorrect argument type!		; CHECK: Intrinsic has incorrect argument type!
%r = call i32 @llvm.vector.reduce.fadd.v4i32(i32 0, <4 x i32> %x)		%r = call i32 @llvm.vector.reduce.fadd.v4i32(i32 0, <4 x i32> %x, i32 0)
ret i32 %r		ret i32 %r
}		}

define i32* @not_pointer_reduce2(<4 x i32*> %x) {		define i32* @not_pointer_reduce2(<4 x i32*> %x) {
; CHECK: Intrinsic has incorrect argument type!		; CHECK: Intrinsic has incorrect argument type!
%r = call i32* @llvm.vector.reduce.fmin.v4p0i32(<4 x i32*> %x)		%r = call i32* @llvm.vector.reduce.fmin.v4p0i32(<4 x i32*> %x)
ret i32* %r		ret i32* %r
}		}

declare float @llvm.vector.reduce.umin.v4f32(<4 x float>)		declare float @llvm.vector.reduce.umin.v4f32(<4 x float>)
declare i32* @llvm.vector.reduce.or.v4p0i32(<4 x i32*>)		declare i32* @llvm.vector.reduce.or.v4p0i32(<4 x i32*>)
declare i32 @llvm.vector.reduce.fadd.v4i32(i32, <4 x i32>)		declare i32 @llvm.vector.reduce.fadd.v4i32(i32, <4 x i32>, i32)
declare float @llvm.vector.reduce.fadd.v4f32(double, <4 x float>)		declare float @llvm.vector.reduce.fadd.v4f32(double, <4 x float>, i32)
declare i32* @llvm.vector.reduce.fmin.v4p0i32(<4 x i32*>)		declare i32* @llvm.vector.reduce.fmin.v4p0i32(<4 x i32*>)
declare float @llvm.vector.reduce.fmax.f32(float)		declare float @llvm.vector.reduce.fmax.f32(float)
declare i32 @llvm.vector.reduce.smax.i32(i32)		declare i32 @llvm.vector.reduce.smax.i32(i32)
declare i64 @llvm.vector.reduce.add.v4i32(<4 x i32>)		declare i64 @llvm.vector.reduce.add.v4i32(<4 x i32>)