Page MenuHomePhabricator

[X86][BMI] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) (PR44136)
ClosedPublic

Authored by RKSimon on Apr 9 2021, 4:26 AM.

Details

Summary

I've initially just enabled this for BMI which has the ANDN instruction for i32/i64 - the i16/i8 cases give an idea of what'd we get if we enabled it in all cases.

Additionally, the i16/i8 cases could be freely promoted to i32 (as the args are already zeroext) and we could then make use of ANDN + the free cmp0 there as well - this has come up in PR48768 and PR49028 so I'm going to look at this soon.

https://alive2.llvm.org/ce/z/QVWHP_
https://alive2.llvm.org/ce/z/pLngT-

Vector cases do not appear to benefit from this as we end up with having to generate the zero vector as well - this is one of the reasons I didn't try to tie this into hasAndNot/hasAndNotCompare.

Diff Detail

Event Timeline

RKSimon created this revision.Apr 9 2021, 4:26 AM
RKSimon requested review of this revision.Apr 9 2021, 4:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2021, 4:26 AM

Just noticed this works for cmpne as well - https://alive2.llvm.org/ce/z/pLngT- - I'll update the tests/patch

RKSimon updated this revision to Diff 336420.Apr 9 2021, 5:57 AM
RKSimon retitled this revision from [X86][BMI] Fold cmpeq(or(X,Y),X) --> cmpeq(and(~X,Y),0) (PR44136) to [X86][BMI] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) (PR44136).
RKSimon edited the summary of this revision. (Show Details)

Added cmpne handling

spatel added inline comments.Apr 9 2021, 6:06 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
48170

Is isOnlyUserOf() safer/better than hasOneUse() in this pattern (or in general)?

RKSimon added inline comments.Apr 9 2021, 6:32 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
48170

Nice catch - its a force of habit from shuffle combining!

RKSimon updated this revision to Diff 336445.Apr 9 2021, 7:10 AM

Fixed oneuse check

spatel accepted this revision.Apr 9 2021, 7:21 AM

LGTM - unless there's a danger of reversal (infinite looping), it should be fine to remove the BMI predicate to make things uniform.
Every x86 has test, and I don't see any implementations where it would perform at lower perf than other basic ALU instructions, so that ensures we're no worse off using not+test.

This revision is now accepted and ready to land.Apr 9 2021, 7:21 AM

LGTM - unless there's a danger of reversal (infinite looping), it should be fine to remove the BMI predicate to make things uniform.
Every x86 has test, and I don't see any implementations where it would perform at lower perf than other basic ALU instructions, so that ensures we're no worse off using not+test.

Cheers, I'll remove the BMI limit - the plan is to add the reverse fold in InstCombine (see PR44136) but there isn't an equivalent in DAG afaict.

lebedev.ri accepted this revision.Apr 9 2021, 7:27 AM
This revision was landed with ongoing or failed builds.Apr 9 2021, 7:52 AM
This revision was automatically updated to reflect the committed changes.