This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Match select(icmp(x,y),sub(x,y),sub(y,x)) -> abd(x,y) patterns
ClosedPublic

Authored by RKSimon on Feb 25 2023, 1:19 PM.

Details

Summary

Pulled out of PowerPC, and added ABDS support as well (hence the additional v4i32 PPC matches)

Diff Detail

Event Timeline

RKSimon created this revision.Feb 25 2023, 1:19 PM
RKSimon requested review of this revision.Feb 25 2023, 1:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2023, 1:19 PM
RKSimon edited the summary of this revision. (Show Details)Feb 25 2023, 1:20 PM
goldstein.w.n added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11807

Maybe there should be one-use checks? Is abd more efficient if the sub operations are already needed?

RKSimon added inline comments.Feb 26 2023, 2:27 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11807

X86 is probably the worst case for this as it usually folds to sub(max(x,y),min(x,y)) - most other cases currently have a ABD instruction, or would gain from better ILP (see PPC's ABDS v4i32 variant).

I'll add more extensive x86 test coverage and see what it looks like.

RKSimon updated this revision to Diff 500593.Feb 26 2023, 9:09 AM

Add X86 test coverage (still need to add multiuse coverage)

RKSimon updated this revision to Diff 500600.Feb 26 2023, 10:07 AM

Rebase showing multiuse test case changes

goldstein.w.n added inline comments.Feb 26 2023, 10:24 AM
llvm/test/CodeGen/X86/abds-vector-128.ll
1106

so seems multiuse cmp is slight regression. Think it could be fixed if

1108

the:

pcmpgtq %xmm2, %xmm1
pcmpeqd %xmm0, %xmm0
pxor %xmm1, %xmm0

is just a duplicate of pcmpgtq %xmm1, %xmm0?

Likewise for the avx/avx2 tests.

Know whats going on there?

But guess at the end of the day, this can be fixed in LowerABD by looking for existing dag nodes and will be probably be easier to do after this patch.

Maybe add a TODO in x86 LowerABD to fixup missed optimizations. (Likewise we always blendv on cmp result, but should be doable on the sub, and for avx512 should be vpternlogd instead of blendv)

RKSimon added inline comments.Feb 26 2023, 10:35 AM
llvm/test/CodeGen/X86/abds-vector-128.ll
1108

Yes, IIRC we have an existing problem with other problems with patterns using min/max pairs - we don't do enough to share SETCC nodes.

RKSimon added inline comments.Feb 26 2023, 10:43 AM
llvm/test/CodeGen/X86/abds-vector-128.ll
1108

We last looked at this for rG813459ed2b0b but I wonder if really we should be doing more generically with SETCC nodes before we get this far.

ping - apart from the x86 multiuse cmp issues does anyone have anymore comments?

llvm/test/CodeGen/X86/abds-vector-128.ll
1108

We're also being hit by the freeze nodes making it tricky to match setcc(x,y) and setcc(freeze(y),freeze(x)) - I think I'd prefer to add this to the list of existing issues we're having with duplicate equivalent compares.

nemanjai accepted this revision.Mar 13 2023, 12:38 PM

LGTM as far as PPC is concerned.

This revision is now accepted and ready to land.Mar 13 2023, 12:38 PM
dmgreen accepted this revision.Mar 14 2023, 1:23 AM

I checked the AArch64 test and they all seem to be valid. LGTM, thanks.

This revision was landed with ongoing or failed builds.Mar 14 2023, 8:10 AM
This revision was automatically updated to reflect the committed changes.