Page MenuHomePhabricator

Improve ISel across lane float min/max reduction
ClosedPublic

Authored by junbuml on Sep 23 2015, 3:40 PM.

Details

Reviewers
jmolloy
mcrosier
Summary

In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

svn0 = vector_shuffle t0, undef<2,3,u,u>
fmin = fminnum t0,svn0
svn1 = vector_shuffle fmin, undef<1,u,u,u>
cc = setcc fmin, svn1, ole
n0 = extract_vector_elt cc, #0
n1 = extract_vector_elt fmin, #0
n2 = extract_vector_elt fmin, #1
result = select n0, n1,n2

becomes:

result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 35564.Sep 23 2015, 3:40 PM
junbuml retitled this revision from to Improve ISel across lane float min/max reduction.
junbuml updated this object.
junbuml added a subscriber: llvm-commits.
junbuml updated this revision to Diff 35667.Sep 24 2015, 12:45 PM
jmolloy added inline comments.Sep 25 2015, 8:04 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
8621–8627

What about FMAXNAN and FMINNAN (-> FMAXV, FMINV)?

8684

Why are you making this change? What's the rationale?

junbuml added inline comments.Sep 25 2015, 10:50 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
8621–8627

I also thought the same, but I wasn't able to generate FMAXNAN with a vector input. It appears that matchSelectPattern() cannot return SPNB_RETURNS_NAN with fcmp fast. And even without fast math flag, SPNB_RETURNS_NAN cannot be returned because both LHSSafe and RHSSafe in matchSelectPattern() are false for vector inputs. Please let me know if I miss something.

8684

The only reason that I change to use intrinsics here is just to be consistent in the way I handle nodes because I could see only intrinsic of FMAXNMV, no SDNode for FMAXNMV. If SDNode need to be used I will add SDNode definition in td file for FMAXNMV. Please let me know.

jmolloy accepted this revision.Oct 7 2015, 10:14 AM
jmolloy edited edge metadata.

Hi Jun,

This looks fine, if you address my last comment.

Thanks,

James

lib/Target/AArch64/AArch64ISelLowering.cpp
8621–8627

Yes, it does appear you're right. I'll have to go fix that.

8684

Given that we have these cross-lane nodes, it makes sense to use them in preference to the intrinsic version.

As there are no such nodes for FMAXNMV and friends, the intrinsics make sense. I don't think there is any need to create ISDNodes for FMAXNMV and friends.

This revision is now accepted and ready to land.Oct 7 2015, 10:14 AM

Thanks James for the review. I will address your last comment.

junbuml updated this revision to Diff 36788.Oct 7 2015, 1:34 PM
junbuml edited edge metadata.

Just final minor changes.
I will commit this tomorrow unless I get any comment for this minor changes.

lib/Target/AArch64/AArch64ISelLowering.cpp
8684

Use intrinsic for F[MAX|MIN]NMV, but use SDNode for addv and [s|u][min|max]v.

8866

Add check for minimum size in case of the input vector is too small.

Committed in r249834.

junbuml closed this revision.Oct 9 2015, 7:15 AM

Hi Jun, I've just got around to looking at this, and you need to update the test names, please see below.

test/CodeGen/AArch64/aarch64-minmaxv.ll
289–291

Need to rename this test, the function body can be anything and this test will pass. I've been caught out before naming a function the same as an instruction I'm looking for. The CHECK: fmaxmv will actually match the label in this case.

305–307

Same as above, please change the test name.

Thanks Charlie for finding this bug. I will fix it immediately!

Rename the function name in r250052.