This is an archive of the discontinued LLVM Phabricator instance.

[SLP] PR32078: convert scalar operations to vector.
Needs ReviewPublic

Authored by ABataev on Mar 7 2017, 1:38 AM.

Details

Summary

If we have a code like:

%xn = extractelement <2 x i32> %x, i32 %n
%yn = extractelement <2 x i32> %y, i32 %n
%cmpn = icmp eq i32 %xn, %yn

we can convert it to something like this:

%cmp = icmp eq <2 x i32> %x, %y
%cmpn = extractelement <2 x i1> %cmp0, i32 %n

if the cost of the second is less than the cost of the original code.

Event Timeline

ABataev created this revision.Mar 7 2017, 1:38 AM
mkuper edited edge metadata.Mar 7 2017, 11:36 AM

I haven't looked at the patch in detail yet, but I don't understand the rationale for putting it in the SLP vectorizer.
If transforming this pattern is a win, it's a win regardless of whether it originated in the SLP vectorizer, or just happened to appear in the IR.

Would this be something InstCombine could do? Or, if it's necessarily cost-dependent, then some later cleanup pass? Or DAGCombine?
If the reason you're not doing it in a later pass is because you don't have the cost model, then it may be a good idea to split out the change you have in D30649 into a utility, instead of having it live in the SLP vectorizer. We arleady have lib/Analysis/CostModel, but it doesn't really provide any useful interface.

ABataev abandoned this revision.Apr 14 2017, 11:48 AM
ABataev reclaimed this revision.May 18 2017, 8:21 AM

Michael, tried to implement it in InstCombiner, but we need cost analysis. Later passes are not suitable for this, because we need transformed code in InstCombiner. So, reopened it.

For reference, the instcombine patch proposal was D32093.

@ABataev Would it be worth resurrecting this or starting again? We have a similar test case in https://bugs.llvm.org/show_bug.cgi?id=44008

current:

%L0 = extractelement <2 x float> %1, i32 0
%L1 = extractelement <2 x float> %1, i32 1
%Mul0 = fmul float %L0, 2.000000e+00
%Mul1 = fmul float %L1, 2.000000e+00

better:

%Mul = fmul <2 x float> %1, <2.000000e+00, 2.000000e+00>
%L0 = extractelement <2 x float> %Mul, i32 0
%L1 = extractelement <2 x float> %Mul, i32 1

@ABataev Would it be worth resurrecting this or starting again? We have a similar test case in https://bugs.llvm.org/show_bug.cgi?id=44008

current:

%L0 = extractelement <2 x float> %1, i32 0
%L1 = extractelement <2 x float> %1, i32 1
%Mul0 = fmul float %L0, 2.000000e+00
%Mul1 = fmul float %L1, 2.000000e+00

better:

%Mul = fmul <2 x float> %1, <2.000000e+00, 2.000000e+00>
%L0 = extractelement <2 x float> %Mul, i32 0
%L1 = extractelement <2 x float> %Mul, i32 1

Will try to rework it.

Michael, tried to implement it in InstCombiner, but we need cost analysis. Later passes are not suitable for this, because we need transformed code in InstCombiner. So, reopened it.

Use vectorcombine now?

@spatel

Michael, tried to implement it in InstCombiner, but we need cost analysis. Later passes are not suitable for this, because we need transformed code in InstCombiner. So, reopened it.

Use vectorcombine now?

@spatel

I think so: https://godbolt.org/z/YW6rMc