This is an archive of the discontinued LLVM Phabricator instance.

Add support to vectorize ctlz,cttz and powi intrinsics in SLPVectorizer
ClosedPublic

Authored by karthikthecool on May 20 2014, 11:52 AM.

Details

Summary

Hi Nadav, Arnold, Hal,
This patch adds support to recognize and vectorize llvm intrinsics ctlz,cttz and powi. These intrinsics are different from other intrinsics handled so far in SLPVectorizer as 2nd argument of these intrinsics should be a scalar and we can vectorize these intrinsics only if the second argument is same.
Does this look good to commit?
Thanks
Karthik Bhat

Diff Detail

Event Timeline

karthikthecool retitled this revision from to Add support to vectorize ctlz,cttz and powi intrinsics in SLPVectorizer.
karthikthecool updated this object.
karthikthecool edited the test plan for this revision. (Show Details)
karthikthecool added a subscriber: Unknown Object (MLST).
nadav edited edge metadata.May 20 2014, 11:58 AM

Hi Karthik,

Thanks for working on it. It would be great to vectorize powi, ctlz and cttz. Why did you decide to use SCEV and not simply check the last argument for equality?

Thanks,
Nadav

Hi Nadav,
Thanks for the review. We need to use SCEV as it will detect cases were the Value* may be different but underlying value may be same.

For e.g. i tried out the following example -

declare float @llvm.powi.f32(float, i32)
define void @vec_powi_f32(float* %a, float* %b, float* %c, i32 %A, i32 %B) {
entry:
%0 = alloca i32, align 4
%1 = alloca i32, align 4
%C = alloca i32, align 4
%D = alloca i32, align 4
store i32 %A, i32* %0, align 4
store i32 %B, i32* %1, align 4
%2 = load i32* %0, align 4
%3 = load i32* %1, align 4
%4 = add nsw i32 %2, %3
%5 = add nsw i32 %2, %3
store i32 %4, i32* %C, align 4
store i32 %5, i32* %D, align 4

%i0 = load float* %a, align 4
%i1 = load float* %b, align 4
%add1 = fadd float %i0, %i1
%call1 = tail call float @llvm.powi.f32(float %add1,i32 %4) nounwind readnone

%arrayidx2 = getelementptr inbounds float* %a, i32 1
%i2 = load float* %arrayidx2, align 4
%arrayidx3 = getelementptr inbounds float* %b, i32 1
%i3 = load float* %arrayidx3, align 4
%add2 = fadd float %i2, %i3
%call2 = tail call float @llvm.powi.f32(float %add2,i32 %5) nounwind readnone

%arrayidx4 = getelementptr inbounds float* %a, i32 2
%i4 = load float* %arrayidx4, align 4
%arrayidx5 = getelementptr inbounds float* %b, i32 2
%i5 = load float* %arrayidx5, align 4
%add3 = fadd float %i4, %i5
%call3 = tail call float @llvm.powi.f32(float %add3,i32 %5) nounwind readnone

%arrayidx6 = getelementptr inbounds float* %a, i32 3
%i6 = load float* %arrayidx6, align 4
%arrayidx7 = getelementptr inbounds float* %b, i32 3
%i7 = load float* %arrayidx7, align 4
%add4 = fadd float %i6, %i7
%call4 = tail call float @llvm.powi.f32(float %add4,i32 %4) nounwind readnone

store float %call1, float* %c, align 4
%arrayidx8 = getelementptr inbounds float* %c, i32 1
store float %call2, float* %arrayidx8, align 4
%arrayidx9 = getelementptr inbounds float* %c, i32 2
store float %call3, float* %arrayidx9, align 4
%arrayidx10 = getelementptr inbounds float* %c, i32 3
store float %call4, float* %arrayidx10, align 4
ret void
}

Here %4 and %5 are referring to same value. If we just compare (Value*) for equality it will not be able to vectorize the powi in the above code. But if we use SCEV compare it is able to conclude that %4 is actually same as %5 and hence vectorizes the powi intrinsic.

The same approach is used in BBVectorizer to detect if arguments are equal for these intrinsics.

Thanks

karthikthecool edited edge metadata.

Fix a compilation error in debug mode. A1I should be declared outside if loop.


llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

karthikthecool added a reviewer: nicholas.

Hi All,
First of sorry my last reply from my id seems to have introduced some junk char in the mail chain.

I have updated the patch to directly compare Value* instead of comparing SCEV as per comments.
As Nick mentioned the above mentioned IR example may not be generated when compiling with optimizations as gvn,cse,basicaa and dce would have removed these redundant code.
The above mentioned IR was handcoded to highlight the benifit of using SCEV but as the basic transforms such as gvn,cse etc runs before vectorization this may not be required.

I have updated the patch accordingly to directly compare arguments instead of SCEV. Does this look good to commit?

Thanks
Karthik Bhat

nadav added a comment.May 22 2014, 9:02 AM

Karthik,

Please add a testcase for one of the functions where the last argument is different and the SLPvectorizer is unable to vectorize the function.

Thanks,
Nadav

aschwaighofer added inline comments.May 22 2014, 9:39 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
982

The code is repeated three times in this file and likely will make it into the loop vectorizer, too. Can this be refactor into a utility function 'bool hasVectorInstrinsicScalarOpd(ID, unsigned &ScalarOpdIdx)' in VectorUtils.h? (once we have instructions with more than one operand we can change this to be a vector of operands but there is no need now I think)

Hi Nadav, Arnold,
Updated the patch as per review comments. Added test cases to check negative cases were these intrinsics should not be vectorized.
Thanks
Karthik Bhat

Hi Arnold, Nadav,
Any more inputs on this patch? Does this look good to commit?
Thanks
Karthik Bhat

Fix a 80 char column width formatting issue in VectorUtils.h.

With those little nitpicks fixed. LGTM.

lib/Transforms/Vectorize/SLPVectorizer.cpp
966

You could remove the braces.

978

whose second argument should be the same ...

1678

whose second argument is a scalar. This argument should ...

Hi Arnold, Thanks for the review.
I will update the patch.
Can you also have a look at http://reviews.llvm.org/D3937 which implements vectorization of these intrinsics in Loopvectorizer?

I feel we should commit these 2 as a single revision or alteast one after another.
isTriviallyVectorizable function in VectorUtils is a common function and used by both SLP and Loop Vectorizer commiting just this patch without handling the intrinsics in LoopVectorizer will result in miscompilation when we try to vectorize these intrinsics inside a loop.

Thanks for your time and inputs.
Regards
Karthik Bhat

karthikthecool accepted this revision.May 29 2014, 9:41 PM
karthikthecool added a reviewer: karthikthecool.

Thanks Nadav and Arnold for the review.
Committed as r209873 along with code to vectorize these intrinsics in Loopvectorizer.

This revision is now accepted and ready to land.May 29 2014, 9:41 PM
karthikthecool closed this revision.Jun 1 2014, 9:51 PM