Page MenuHomePhabricator

[x86] fold vector (X > -1) & Y to shift+andn
ClosedPublic

Authored by spatel on Wed, Nov 10, 12:50 PM.

Details

Summary

and (pcmpgt X, -1), Y --> pandn (sra X, BitWidth-1), Y

This avoids the -1 constant vector in favor of an arithmetic shift instruction if it exists (the ISA is still not complete after all these years...).
We catch this pattern late in combining by matching PCMPGT, so it should not interfere with more general folds.

The test diffs are against trunk. This should eliminate lumps in D113426 (if we want to see the cumulative results, I can rebase on top of that).

Diff Detail

Event Timeline

spatel created this revision.Wed, Nov 10, 12:50 PM
spatel requested review of this revision.Wed, Nov 10, 12:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Nov 10, 12:50 PM
RKSimon added inline comments.Wed, Nov 10, 1:53 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
45821

I know this shouldn't technically fall through, but its still a little scary to be commuting Op0/Op1 like this - maybe just duplicate fold?

45827

Use getTargetVShiftByConstNode ?

pengfei added inline comments.Wed, Nov 10, 7:35 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
45816

What's the BW mean? Byte and word? But the tests show for i16 and i32.

craig.topper added inline comments.Wed, Nov 10, 7:37 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
45816

I assume it’s BitWidth

spatel marked 4 inline comments as done.Thu, Nov 11, 9:24 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
45816

Yes, that was an abbreviation for BitWidth that I've used in other places, but that's less clear here with x86's lingo in play. I'll spell that out to be less confusing.

45821

I can make those block-local-variables with better names to avoid risk.

45827

Sure, looks like we can go directly to an x86 node at this point without changing any results.

spatel updated this revision to Diff 386544.Thu, Nov 11, 9:32 AM
spatel marked 3 inline comments as done.
spatel edited the summary of this revision. (Show Details)

Patch updated:

  1. Spell out "BitWidth" in code comment and patch description for less confusion.
  2. Use local SDValue names for matching to avoid swap risk in subsequent folds.
  3. Create x86 shift node directly.
  4. Rebase after 11522cfcad6b / D113426 - so all of the results in the "vselect-zero.ll" file are ideal now if I'm seeing it properly.
spatel updated this revision to Diff 386548.Thu, Nov 11, 9:38 AM

Minor update from previous rev - use "X" and "Y" names for captured values since that matches the code comment.

Is there a test along the lines of

define <4 x i1> @is_positive_mask_v4i1(<4 x i32> %x, <4 x i1> %y) {
  %cmp = icmp sgt <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>
  %and = and <4 x i1> %y, %cmp
  ret <4 x i1> %and
}

?

llvm/test/CodeGen/X86/vector-pcmp.ll
2–7

Worth adding some avx512 run lines, like in the next test file?

spatel updated this revision to Diff 386604.Thu, Nov 11, 11:25 AM
spatel marked an inline comment as done.

Patch updated:
Added tests with i1 types and runs to exercise AVX512 targets. I'm not very familiar with those patterns, but it looks like that could use some follow-ups? Pre-AVX512 looks ideal to me.

This revision is now accepted and ready to land.Thu, Nov 11, 11:32 AM
RKSimon accepted this revision.Thu, Nov 11, 12:13 PM

LGTM - cheers

This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Mon, Nov 15, 3:36 AM

This caused assertions in Chromium builds targeting Windows. See https://bugs.chromium.org/p/chromium/issues/detail?id=1270222#c1 for a reproducer.

I've reverted in 5be64d416481c59dba5faae5e8b8a6fecb578082 until it can be fixed.

This caused assertions in Chromium builds targeting Windows. See https://bugs.chromium.org/p/chromium/issues/detail?id=1270222#c1 for a reproducer.

I've reverted in 5be64d416481c59dba5faae5e8b8a6fecb578082 until it can be fixed.

I think https://bugs.llvm.org/show_bug.cgi?id=52504 is the same issue

This caused assertions in Chromium builds targeting Windows. See https://bugs.chromium.org/p/chromium/issues/detail?id=1270222#c1 for a reproducer.

I've reverted in 5be64d416481c59dba5faae5e8b8a6fecb578082 until it can be fixed.

I think https://bugs.llvm.org/show_bug.cgi?id=52504 is the same issue

Thanks for revert and reproducers - taking a look now.

creduce finished, posted the reduced input here: https://bugs.chromium.org/p/chromium/issues/detail?id=1270222#c5

Thanks! It's the same root cause as https://llvm.org/PR52504 - and I can manually shrink that to a ~3 line IR function. I'll push another try at this with the fix and a crasher regression test shortly.