Download Raw Diff

Details

Reviewers

nadav
aschwaighofer
karthikthecool
nicholas
hfinkel

Summary

Hi Nadav, Arnold, Hal,
This patch adds support to recognize and vectorize llvm intrinsics ctlz,cttz and powi. These intrinsics are different from other intrinsics handled so far in SLPVectorizer as 2nd argument of these intrinsics should be a scalar and we can vectorize these intrinsics only if the second argument is same.
Does this look good to commit?
Thanks
Karthik Bhat

Diff Detail

Event Timeline

karthikthecool updated this revision to Diff 9638.May 20 2014, 11:52 AM

karthikthecool retitled this revision from to Add support to vectorize ctlz,cttz and powi intrinsics in SLPVectorizer.

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

karthikthecool added reviewers: nadav, hfinkel, aschwaighofer.

karthikthecool added a subscriber: Unknown Object (MLST).

Hi Karthik,

Thanks for working on it. It would be great to vectorize powi, ctlz and cttz. Why did you decide to use SCEV and not simply check the last argument for equality?

Thanks,
Nadav

Hi Nadav,
Thanks for the review. We need to use SCEV as it will detect cases were the Value* may be different but underlying value may be same.

For e.g. i tried out the following example -

declare float @llvm.powi.f32(float, i32)
define void @vec_powi_f32(float* %a, float* %b, float* %c, i32 %A, i32 %B) {
entry:
%0 = alloca i32, align 4
%1 = alloca i32, align 4
%C = alloca i32, align 4
%D = alloca i32, align 4
store i32 %A, i32* %0, align 4
store i32 %B, i32* %1, align 4
%2 = load i32* %0, align 4
%3 = load i32* %1, align 4
%4 = add nsw i32 %2, %3
%5 = add nsw i32 %2, %3
store i32 %4, i32* %C, align 4
store i32 %5, i32* %D, align 4

%i0 = load float* %a, align 4
%i1 = load float* %b, align 4
%add1 = fadd float %i0, %i1
%call1 = tail call float @llvm.powi.f32(float %add1,i32 %4) nounwind readnone

%arrayidx2 = getelementptr inbounds float* %a, i32 1
%i2 = load float* %arrayidx2, align 4
%arrayidx3 = getelementptr inbounds float* %b, i32 1
%i3 = load float* %arrayidx3, align 4
%add2 = fadd float %i2, %i3
%call2 = tail call float @llvm.powi.f32(float %add2,i32 %5) nounwind readnone

%arrayidx4 = getelementptr inbounds float* %a, i32 2
%i4 = load float* %arrayidx4, align 4
%arrayidx5 = getelementptr inbounds float* %b, i32 2
%i5 = load float* %arrayidx5, align 4
%add3 = fadd float %i4, %i5
%call3 = tail call float @llvm.powi.f32(float %add3,i32 %5) nounwind readnone

%arrayidx6 = getelementptr inbounds float* %a, i32 3
%i6 = load float* %arrayidx6, align 4
%arrayidx7 = getelementptr inbounds float* %b, i32 3
%i7 = load float* %arrayidx7, align 4
%add4 = fadd float %i6, %i7
%call4 = tail call float @llvm.powi.f32(float %add4,i32 %4) nounwind readnone

store float %call1, float* %c, align 4
%arrayidx8 = getelementptr inbounds float* %c, i32 1
store float %call2, float* %arrayidx8, align 4
%arrayidx9 = getelementptr inbounds float* %c, i32 2
store float %call3, float* %arrayidx9, align 4
%arrayidx10 = getelementptr inbounds float* %c, i32 3
store float %call4, float* %arrayidx10, align 4
ret void
}

Here %4 and %5 are referring to same value. If we just compare (Value*) for equality it will not be able to vectorize the powi in the above code. But if we use SCEV compare it is able to conclude that %4 is actually same as %5 and hence vectorizes the powi intrinsic.

The same approach is used in BBVectorizer to detect if arguments are equal for these intrinsics.

Thanks

Fix a compilation error in debug mode. A1I should be declared outside if loop.

201405211514150_TIC2ESYT.gif12 KBDownload

llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

201405211514150_TIC2ESYT.gif12 KBDownload

Hi All,
First of sorry my last reply from my id seems to have introduced some junk char in the mail chain.

I have updated the patch to directly compare Value* instead of comparing SCEV as per comments.
As Nick mentioned the above mentioned IR example may not be generated when compiling with optimizations as gvn,cse,basicaa and dce would have removed these redundant code.
The above mentioned IR was handcoded to highlight the benifit of using SCEV but as the basic transforms such as gvn,cse etc runs before vectorization this may not be required.

I have updated the patch accordingly to directly compare arguments instead of SCEV. Does this look good to commit?

Thanks
Karthik Bhat

Karthik,

Please add a testcase for one of the functions where the last argument is different and the SLPvectorizer is unable to vectorize the function.

Thanks,
Nadav

aschwaighofer added inline comments.May 22 2014, 9:39 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
982	The code is repeated three times in this file and likely will make it into the loop vectorizer, too. Can this be refactor into a utility function 'bool hasVectorInstrinsicScalarOpd(ID, unsigned &ScalarOpdIdx)' in VectorUtils.h? (once we have instructions with more than one operand we can change this to be a vector of operands but there is no need now I think)

Hi Nadav, Arnold,
Updated the patch as per review comments. Added test cases to check negative cases were these intrinsics should not be vectorized.
Thanks
Karthik Bhat

Hi Arnold, Nadav,
Any more inputs on this patch? Does this look good to commit?
Thanks
Karthik Bhat

Fix a 80 char column width formatting issue in VectorUtils.h.

With those little nitpicks fixed. LGTM.

lib/Transforms/Vectorize/SLPVectorizer.cpp
966	You could remove the braces.
978	whose second argument should be the same ...
1678	whose second argument is a scalar. This argument should ...

Hi Arnold, Thanks for the review.
I will update the patch.
Can you also have a look at http://reviews.llvm.org/D3937 which implements vectorization of these intrinsics in Loopvectorizer?

I feel we should commit these 2 as a single revision or alteast one after another.
isTriviallyVectorizable function in VectorUtils is a common function and used by both SLP and Loop Vectorizer commiting just this patch without handling the intrinsics in LoopVectorizer will result in miscompilation when we try to vectorize these intrinsics inside a loop.

Thanks for your time and inputs.
Regards
Karthik Bhat

Thanks Nadav and Arnold for the review.
Committed as r209873 along with code to vectorize these intrinsics in Loopvectorizer.

This revision is now accepted and ready to land.May 29 2014, 9:41 PM

karthikthecool closed this revision.Jun 1 2014, 9:51 PM

Diff 9867

include/llvm/Transforms/Utils/VectorUtils.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	static inline bool isTriviallyVectorizable(Intrinsic::ID ID) {
case Intrinsic::rint:		case Intrinsic::rint:
case Intrinsic::nearbyint:		case Intrinsic::nearbyint:
case Intrinsic::round:		case Intrinsic::round:
case Intrinsic::bswap:		case Intrinsic::bswap:
case Intrinsic::ctpop:		case Intrinsic::ctpop:
case Intrinsic::pow:		case Intrinsic::pow:
case Intrinsic::fma:		case Intrinsic::fma:
case Intrinsic::fmuladd:		case Intrinsic::fmuladd:
		case Intrinsic::ctlz:
		case Intrinsic::cttz:
		case Intrinsic::powi:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

		static bool hasVectorInstrinsicScalarOpd(Intrinsic::ID ID,
		unsigned ScalarOpdIdx) {
		switch (ID) {
		case Intrinsic::ctlz:
		case Intrinsic::cttz:
		case Intrinsic::powi:
		return (ScalarOpdIdx == 1);
		default:
		return false;
		}
		}

static Intrinsic::ID checkUnaryFloatSignature(const CallInst &I,		static Intrinsic::ID checkUnaryFloatSignature(const CallInst &I,
Intrinsic::ID ValidIntrinsicID) {		Intrinsic::ID ValidIntrinsicID) {
if (I.getNumArgOperands() != 1 \|\|		if (I.getNumArgOperands() != 1 \|\|
!I.getArgOperand(0)->getType()->isFloatingPointTy() \|\|		!I.getArgOperand(0)->getType()->isFloatingPointTy() \|\|
I.getType() != I.getArgOperand(0)->getType() \|\|		I.getType() != I.getArgOperand(0)->getType() \|\|
!I.onlyReadsMemory())		!I.onlyReadsMemory())
return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;

▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 955 Lines • ▼ Show 20 Lines	case Instruction::Call: {
// Check if this is an Intrinsic call or something that can be		// Check if this is an Intrinsic call or something that can be
// represented by an intrinsic call		// represented by an intrinsic call
Intrinsic::ID ID = getIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getIntrinsicIDForCall(CI, TLI);
if (!isTriviallyVectorizable(ID)) {		if (!isTriviallyVectorizable(ID)) {
newTreeEntry(VL, false);		newTreeEntry(VL, false);
DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");		DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");
return;		return;
}		}

Function *Int = CI->getCalledFunction();		Function *Int = CI->getCalledFunction();
		Value *A1I = nullptr;
		if (hasVectorInstrinsicScalarOpd(ID, 1)) {
		aschwaighoferUnsubmitted Not Done Reply Inline Actions You could remove the braces. aschwaighofer: You could remove the braces.
		A1I = CI->getArgOperand(1);
		}
for (unsigned i = 1, e = VL.size(); i != e; ++i) {		for (unsigned i = 1, e = VL.size(); i != e; ++i) {
CallInst *CI2 = dyn_cast<CallInst>(VL[i]);		CallInst *CI2 = dyn_cast<CallInst>(VL[i]);
if (!CI2 \|\| CI2->getCalledFunction() != Int \|\|		if (!CI2 \|\| CI2->getCalledFunction() != Int \|\|
getIntrinsicIDForCall(CI2, TLI) != ID) {		getIntrinsicIDForCall(CI2, TLI) != ID) {
newTreeEntry(VL, false);		newTreeEntry(VL, false);
DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << VL[i]		DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << VL[i]
<< "\n");		<< "\n");
return;		return;
}		}
		// ctlz,cttz and powi are special intrinsics whose 2nd argument
		aschwaighoferUnsubmitted Not Done Reply Inline Actions whose second argument should be the same ... aschwaighofer: whose second argument should be the same ...
		// should be same in order for them to be vectorized.
		if (hasVectorInstrinsicScalarOpd(ID, 1)) {
		Value *A1J = CI2->getArgOperand(1);
		if (A1I != A1J) {
		aschwaighoferUnsubmitted Not Done Reply Inline Actions The code is repeated three times in this file and likely will make it into the loop vectorizer, too. Can this be refactor into a utility function 'bool hasVectorInstrinsicScalarOpd(ID, unsigned &ScalarOpdIdx)' in VectorUtils.h? (once we have instructions with more than one operand we can change this to be a vector of operands but there is no need now I think) aschwaighofer: The code is repeated three times in this file and likely will make it into the loop vectorizer…
		newTreeEntry(VL, false);
		DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI
		<< " argument "<< A1I<<"!=" << A1J
		<< "\n");
		return;
		}
		}
}		}

newTreeEntry(VL, true);		newTreeEntry(VL, true);
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {		for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (unsigned j = 0; j < VL.size(); ++j) {		for (unsigned j = 0; j < VL.size(); ++j) {
CallInst *CI2 = dyn_cast<CallInst>(VL[j]);		CallInst *CI2 = dyn_cast<CallInst>(VL[j]);
▲ Show 20 Lines • Show All 663 Lines • ▼ Show 20 Lines	case Instruction::Store: {
Alignment = DL->getABITypeAlignment(SI->getPointerOperand()->getType());		Alignment = DL->getABITypeAlignment(SI->getPointerOperand()->getType());
S->setAlignment(Alignment);		S->setAlignment(Alignment);
E->VectorizedValue = S;		E->VectorizedValue = S;
return propagateMetadata(S, E->Scalars);		return propagateMetadata(S, E->Scalars);
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars);
		Function *FI;
		Intrinsic::ID IID = Intrinsic::not_intrinsic;
		if (CI && (FI = CI->getCalledFunction())) {
		IID = (Intrinsic::ID) FI->getIntrinsicID();
		}
std::vector<Value *> OpVecs;		std::vector<Value *> OpVecs;
for (int j = 0, e = CI->getNumArgOperands(); j < e; ++j) {		for (int j = 0, e = CI->getNumArgOperands(); j < e; ++j) {
ValueList OpVL;		ValueList OpVL;
		// ctlz,cttz and powi are special intrinsics whose 2nd argument is a
		// scalar this argument should not be vectorized.
		aschwaighoferUnsubmitted Not Done Reply Inline Actions whose second argument is a scalar. This argument should ... aschwaighofer: whose second argument is a scalar. This argument should ...
		if (hasVectorInstrinsicScalarOpd(IID, 1) && j == 1) {
		CallInst *CEI = cast<CallInst>(E->Scalars[0]);
		OpVecs.push_back(CEI->getArgOperand(j));
		continue;
		}
for (int i = 0, e = E->Scalars.size(); i < e; ++i) {		for (int i = 0, e = E->Scalars.size(); i < e; ++i) {
CallInst *CEI = cast<CallInst>(E->Scalars[i]);		CallInst *CEI = cast<CallInst>(E->Scalars[i]);
OpVL.push_back(CEI->getArgOperand(j));		OpVL.push_back(CEI->getArgOperand(j));
}		}

Value *OpVec = vectorizeTree(OpVL);		Value *OpVec = vectorizeTree(OpVL);
DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");		DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
▲ Show 20 Lines • Show All 1,161 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/intrinsic.ll

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines

	; CHECK-LABEL: @vec_bswap_i32(			; CHECK-LABEL: @vec_bswap_i32(
	; CHECK: load <4 x i32>			; CHECK: load <4 x i32>
	; CHECK: load <4 x i32>			; CHECK: load <4 x i32>
	; CHECK: call <4 x i32> @llvm.bswap.v4i32			; CHECK: call <4 x i32> @llvm.bswap.v4i32
	; CHECK: store <4 x i32>			; CHECK: store <4 x i32>
	; CHECK: ret			; CHECK: ret
	}			}

				declare i32 @llvm.ctlz.i32(i32,i1) nounwind readnone

				define void @vec_ctlz_i32(i32* %a, i32* %b, i32* %c, i1) {
				entry:
				%i0 = load i32* %a, align 4
				%i1 = load i32* %b, align 4
				%add1 = add i32 %i0, %i1
				%call1 = tail call i32 @llvm.ctlz.i32(i32 %add1,i1 true) nounwind readnone

				%arrayidx2 = getelementptr inbounds i32* %a, i32 1
				%i2 = load i32* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds i32* %b, i32 1
				%i3 = load i32* %arrayidx3, align 4
				%add2 = add i32 %i2, %i3
				%call2 = tail call i32 @llvm.ctlz.i32(i32 %add2,i1 true) nounwind readnone

				%arrayidx4 = getelementptr inbounds i32* %a, i32 2
				%i4 = load i32* %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds i32* %b, i32 2
				%i5 = load i32* %arrayidx5, align 4
				%add3 = add i32 %i4, %i5
				%call3 = tail call i32 @llvm.ctlz.i32(i32 %add3,i1 true) nounwind readnone

				%arrayidx6 = getelementptr inbounds i32* %a, i32 3
				%i6 = load i32* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds i32* %b, i32 3
				%i7 = load i32* %arrayidx7, align 4
				%add4 = add i32 %i6, %i7
				%call4 = tail call i32 @llvm.ctlz.i32(i32 %add4,i1 true) nounwind readnone

				store i32 %call1, i32* %c, align 4
				%arrayidx8 = getelementptr inbounds i32* %c, i32 1
				store i32 %call2, i32* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i32* %c, i32 2
				store i32 %call3, i32* %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds i32* %c, i32 3
				store i32 %call4, i32* %arrayidx10, align 4
				ret void

				; CHECK-LABEL: @vec_ctlz_i32(
				; CHECK: load <4 x i32>
				; CHECK: load <4 x i32>
				; CHECK: call <4 x i32> @llvm.ctlz.v4i32
				; CHECK: store <4 x i32>
				; CHECK: ret
				}

				define void @vec_ctlz_i32_neg(i32* %a, i32* %b, i32* %c, i1) {
				entry:
				%i0 = load i32* %a, align 4
				%i1 = load i32* %b, align 4
				%add1 = add i32 %i0, %i1
				%call1 = tail call i32 @llvm.ctlz.i32(i32 %add1,i1 true) nounwind readnone

				%arrayidx2 = getelementptr inbounds i32* %a, i32 1
				%i2 = load i32* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds i32* %b, i32 1
				%i3 = load i32* %arrayidx3, align 4
				%add2 = add i32 %i2, %i3
				%call2 = tail call i32 @llvm.ctlz.i32(i32 %add2,i1 false) nounwind readnone

				%arrayidx4 = getelementptr inbounds i32* %a, i32 2
				%i4 = load i32* %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds i32* %b, i32 2
				%i5 = load i32* %arrayidx5, align 4
				%add3 = add i32 %i4, %i5
				%call3 = tail call i32 @llvm.ctlz.i32(i32 %add3,i1 true) nounwind readnone

				%arrayidx6 = getelementptr inbounds i32* %a, i32 3
				%i6 = load i32* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds i32* %b, i32 3
				%i7 = load i32* %arrayidx7, align 4
				%add4 = add i32 %i6, %i7
				%call4 = tail call i32 @llvm.ctlz.i32(i32 %add4,i1 false) nounwind readnone

				store i32 %call1, i32* %c, align 4
				%arrayidx8 = getelementptr inbounds i32* %c, i32 1
				store i32 %call2, i32* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i32* %c, i32 2
				store i32 %call3, i32* %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds i32* %c, i32 3
				store i32 %call4, i32* %arrayidx10, align 4
				ret void

				; CHECK-LABEL: @vec_ctlz_i32_neg(
				; CHECK-NOT: call <4 x i32> @llvm.ctlz.v4i32

				}


				declare i32 @llvm.cttz.i32(i32,i1) nounwind readnone

				define void @vec_cttz_i32(i32* %a, i32* %b, i32* %c, i1) {
				entry:
				%i0 = load i32* %a, align 4
				%i1 = load i32* %b, align 4
				%add1 = add i32 %i0, %i1
				%call1 = tail call i32 @llvm.cttz.i32(i32 %add1,i1 true) nounwind readnone

				%arrayidx2 = getelementptr inbounds i32* %a, i32 1
				%i2 = load i32* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds i32* %b, i32 1
				%i3 = load i32* %arrayidx3, align 4
				%add2 = add i32 %i2, %i3
				%call2 = tail call i32 @llvm.cttz.i32(i32 %add2,i1 true) nounwind readnone

				%arrayidx4 = getelementptr inbounds i32* %a, i32 2
				%i4 = load i32* %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds i32* %b, i32 2
				%i5 = load i32* %arrayidx5, align 4
				%add3 = add i32 %i4, %i5
				%call3 = tail call i32 @llvm.cttz.i32(i32 %add3,i1 true) nounwind readnone

				%arrayidx6 = getelementptr inbounds i32* %a, i32 3
				%i6 = load i32* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds i32* %b, i32 3
				%i7 = load i32* %arrayidx7, align 4
				%add4 = add i32 %i6, %i7
				%call4 = tail call i32 @llvm.cttz.i32(i32 %add4,i1 true) nounwind readnone

				store i32 %call1, i32* %c, align 4
				%arrayidx8 = getelementptr inbounds i32* %c, i32 1
				store i32 %call2, i32* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i32* %c, i32 2
				store i32 %call3, i32* %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds i32* %c, i32 3
				store i32 %call4, i32* %arrayidx10, align 4
				ret void

				; CHECK-LABEL: @vec_cttz_i32(
				; CHECK: load <4 x i32>
				; CHECK: load <4 x i32>
				; CHECK: call <4 x i32> @llvm.cttz.v4i32
				; CHECK: store <4 x i32>
				; CHECK: ret
				}

				define void @vec_cttz_i32_neg(i32* %a, i32* %b, i32* %c, i1) {
				entry:
				%i0 = load i32* %a, align 4
				%i1 = load i32* %b, align 4
				%add1 = add i32 %i0, %i1
				%call1 = tail call i32 @llvm.cttz.i32(i32 %add1,i1 true) nounwind readnone

				%arrayidx2 = getelementptr inbounds i32* %a, i32 1
				%i2 = load i32* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds i32* %b, i32 1
				%i3 = load i32* %arrayidx3, align 4
				%add2 = add i32 %i2, %i3
				%call2 = tail call i32 @llvm.cttz.i32(i32 %add2,i1 false) nounwind readnone

				%arrayidx4 = getelementptr inbounds i32* %a, i32 2
				%i4 = load i32* %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds i32* %b, i32 2
				%i5 = load i32* %arrayidx5, align 4
				%add3 = add i32 %i4, %i5
				%call3 = tail call i32 @llvm.cttz.i32(i32 %add3,i1 true) nounwind readnone

				%arrayidx6 = getelementptr inbounds i32* %a, i32 3
				%i6 = load i32* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds i32* %b, i32 3
				%i7 = load i32* %arrayidx7, align 4
				%add4 = add i32 %i6, %i7
				%call4 = tail call i32 @llvm.cttz.i32(i32 %add4,i1 false) nounwind readnone

				store i32 %call1, i32* %c, align 4
				%arrayidx8 = getelementptr inbounds i32* %c, i32 1
				store i32 %call2, i32* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i32* %c, i32 2
				store i32 %call3, i32* %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds i32* %c, i32 3
				store i32 %call4, i32* %arrayidx10, align 4
				ret void

				; CHECK-LABEL: @vec_cttz_i32_neg(
				; CHECK-NOT: call <4 x i32> @llvm.cttz.v4i32
				}


				declare float @llvm.powi.f32(float, i32)
				define void @vec_powi_f32(float* %a, float* %b, float* %c, i32 %P) {
				entry:
				%i0 = load float* %a, align 4
				%i1 = load float* %b, align 4
				%add1 = fadd float %i0, %i1
				%call1 = tail call float @llvm.powi.f32(float %add1,i32 %P) nounwind readnone

				%arrayidx2 = getelementptr inbounds float* %a, i32 1
				%i2 = load float* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds float* %b, i32 1
				%i3 = load float* %arrayidx3, align 4
				%add2 = fadd float %i2, %i3
				%call2 = tail call float @llvm.powi.f32(float %add2,i32 %P) nounwind readnone

				%arrayidx4 = getelementptr inbounds float* %a, i32 2
				%i4 = load float* %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds float* %b, i32 2
				%i5 = load float* %arrayidx5, align 4
				%add3 = fadd float %i4, %i5
				%call3 = tail call float @llvm.powi.f32(float %add3,i32 %P) nounwind readnone

				%arrayidx6 = getelementptr inbounds float* %a, i32 3
				%i6 = load float* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds float* %b, i32 3
				%i7 = load float* %arrayidx7, align 4
				%add4 = fadd float %i6, %i7
				%call4 = tail call float @llvm.powi.f32(float %add4,i32 %P) nounwind readnone

				store float %call1, float* %c, align 4
				%arrayidx8 = getelementptr inbounds float* %c, i32 1
				store float %call2, float* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds float* %c, i32 2
				store float %call3, float* %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds float* %c, i32 3
				store float %call4, float* %arrayidx10, align 4
				ret void

				; CHECK-LABEL: @vec_powi_f32(
				; CHECK: load <4 x float>
				; CHECK: load <4 x float>
				; CHECK: call <4 x float> @llvm.powi.v4f32
				; CHECK: store <4 x float>
				; CHECK: ret
				}


				define void @vec_powi_f32_neg(float* %a, float* %b, float* %c, i32 %P, i32 %Q) {
				entry:
				%i0 = load float* %a, align 4
				%i1 = load float* %b, align 4
				%add1 = fadd float %i0, %i1
				%call1 = tail call float @llvm.powi.f32(float %add1,i32 %P) nounwind readnone

				%arrayidx2 = getelementptr inbounds float* %a, i32 1
				%i2 = load float* %arrayidx2, align 4
				%arrayidx3 = getelementptr inbounds float* %b, i32 1
				%i3 = load float* %arrayidx3, align 4
				%add2 = fadd float %i2, %i3
				%call2 = tail call float @llvm.powi.f32(float %add2,i32 %Q) nounwind readnone

				%arrayidx4 = getelementptr inbounds float* %a, i32 2
				%i4 = load float* %arrayidx4, align 4
				%arrayidx5 = getelementptr inbounds float* %b, i32 2
				%i5 = load float* %arrayidx5, align 4
				%add3 = fadd float %i4, %i5
				%call3 = tail call float @llvm.powi.f32(float %add3,i32 %P) nounwind readnone

				%arrayidx6 = getelementptr inbounds float* %a, i32 3
				%i6 = load float* %arrayidx6, align 4
				%arrayidx7 = getelementptr inbounds float* %b, i32 3
				%i7 = load float* %arrayidx7, align 4
				%add4 = fadd float %i6, %i7
				%call4 = tail call float @llvm.powi.f32(float %add4,i32 %Q) nounwind readnone

				store float %call1, float* %c, align 4
				%arrayidx8 = getelementptr inbounds float* %c, i32 1
				store float %call2, float* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds float* %c, i32 2
				store float %call3, float* %arrayidx9, align 4
				%arrayidx10 = getelementptr inbounds float* %c, i32 3
				store float %call4, float* %arrayidx10, align 4
				ret void

				; CHECK-LABEL: @vec_powi_f32_neg(
				; CHECK-NOT: call <4 x float> @llvm.powi.v4f32
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add support to vectorize ctlz,cttz and powi intrinsics in SLPVectorizer
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 9867

include/llvm/Transforms/Utils/VectorUtils.h

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Transforms/SLPVectorizer/X86/intrinsic.ll

This is an archive of the discontinued LLVM Phabricator instance.

Add support to vectorize ctlz,cttz and powi intrinsics in SLPVectorizerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 9867

include/llvm/Transforms/Utils/VectorUtils.h

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Transforms/SLPVectorizer/X86/intrinsic.ll

Add support to vectorize ctlz,cttz and powi intrinsics in SLPVectorizer
ClosedPublic