This is an archive of the discontinued LLVM Phabricator instance.

[X86] Support ANDNP combine through vector_shuffle
ClosedPublic

Authored by e-kud on Nov 22 2022, 1:45 PM.

Details

Summary

Combine

and (vector_shuffle<Z,...,Z>
         (insert_vector_elt undef, (xor X, -1), Z), undef), Y
->
andnp (vector_shuffle<Z,...,Z>
           (insert_vector_elt undef, X, Z), undef), Y

Diff Detail

Event Timeline

e-kud created this revision.Nov 22 2022, 1:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2022, 1:45 PM
e-kud updated this revision to Diff 477323.Nov 22 2022, 3:26 PM

Guard bitcasts and pshufd with single use

e-kud retitled this revision from [X86] Support ANDN combine through broadcast instructions to [X86] Support ANDNP combine through broadcast instructions with scalar input.Nov 23 2022, 8:45 AM
e-kud edited the summary of this revision. (Show Details)
e-kud published this revision for review.Nov 23 2022, 9:04 AM
e-kud added reviewers: RKSimon, pengfei.
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 9:04 AM
pengfei added inline comments.Nov 24 2022, 1:29 AM
llvm/test/CodeGen/X86/combine-bitselect.ll
568 ↗(On Diff #477323)

The name seems a bit confusing. It is not just a v4i32 version of bitselect_v4i64_broadcast_rrr. Better give a meaningful name. Besides, why don't use i64 type like others? And it's better to pre-commit the test to just show the diff in this patch.

RKSimon added inline comments.Nov 24 2022, 2:38 AM
llvm/test/CodeGen/X86/combine-bitselect.ll
615 ↗(On Diff #477323)

This test would probably be better inside combine-and.ll - I've improved test coverage in rG6f11c395f5899a10cbcc823cfd6ef7b08525a17e

e-kud added inline comments.Nov 24 2022, 2:28 PM
llvm/test/CodeGen/X86/combine-bitselect.ll
615 ↗(On Diff #477323)

Yes, thank you! I'll move the test to combine-and.ll. I'd like to check with both AVX512 (broadcast) and SSE (pshufd). So I'm going to add another RUN with AVX512F prefix. Should I rename existing CHECK to something like SSE42 or leave it as is?

e-kud updated this revision to Diff 478023.Nov 25 2022, 2:03 PM

Address to comments and put tests into https://reviews.llvm.org/D138734

e-kud updated this revision to Diff 478118.Nov 27 2022, 3:19 PM

Add peeking through bitcasts for BROADCAST as well.

e-kud updated this revision to Diff 478405.Nov 28 2022, 3:47 PM

Rebased on precommit tests

pengfei accepted this revision.Nov 28 2022, 11:27 PM

LGTM.

This revision is now accepted and ready to land.Nov 28 2022, 11:27 PM
RKSimon added inline comments.Nov 29 2022, 3:14 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
48173

Did you look at matching a generic ShuffleSDNode with a broadcast mask instead of PSHUFD?

e-kud updated this revision to Diff 480294.Dec 5 2022, 5:35 PM

Propose more generic approach using vector_shuffle

RKSimon added inline comments.Dec 6 2022, 6:16 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
48201

(style) Missing assert message

48205

256-bit vectors should be OK on AVX1 - but OK to just add a TODO if there's no current test coverage.

48212

(style) Use early-out instead of nested ifs (and move oneuse to the end as its the most expensive call)

if (!SVN || !SVN->isSplat() || !SVN->getOperand(1).isUndef() || !SVN->hasOneUse())
  return SDValue();
48237

(style) Missing assert message

Do we have any 256/512 bit test coverage?

e-kud added a comment.Dec 6 2022, 1:14 PM

Do we have any 256/512 bit test coverage?

Unfortunately, no. I only tested locally with 256 bits.

I found out that the generic approach has a problem. If we have a 256 or 512 bit vector, then we can't generate ANDNP with SSE because there is no type for 256 bits. But after type-legalizer we have several ISD::AND that can be combined into ANDNP.
It works with BROADCAST/PSHUFD approach but performing the generic-combine before type-legalization prevents it.

I think about and (shuffle(not)) -> and(not(shuffle)) reordering before type-legalization and then after legalization we combine them into ANDNP. However I'm not sure about such two-step combine. What do you think?

SplitOpsAndApply might be able to help here

RKSimon requested changes to this revision.Dec 10 2022, 11:09 AM
This revision now requires changes to proceed.Dec 10 2022, 11:09 AM
e-kud updated this revision to Diff 482692.Dec 13 2022, 7:32 PM

Added vector splitting

RKSimon accepted this revision.Dec 14 2022, 5:32 AM

LGTM with one minor comment - cheers

llvm/lib/Target/X86/X86ISelLowering.cpp
48209

You might need to bitcast this back to Src.getValueType() - its less likely for scalars but IsNOT can peek through bitcasts

This revision is now accepted and ready to land.Dec 14 2022, 5:32 AM
e-kud updated this revision to Diff 483070.Dec 14 2022, 7:05 PM

Rebased and added a bitcast of IsNOT.

e-kud marked an inline comment as done.Dec 14 2022, 7:10 PM
e-kud added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
48209

Indeed, I added a test where a result of xor is bitcasted and got a failed assertion. Thank you!

e-kud retitled this revision from [X86] Support ANDNP combine through broadcast instructions with scalar input to [X86] Support ANDNP combine through vector_shuffle.Dec 14 2022, 7:12 PM
e-kud edited the summary of this revision. (Show Details)
pengfei accepted this revision.Dec 21 2022, 3:50 AM
e-kud updated this revision to Diff 484536.Dec 21 2022, 4:44 AM
e-kud marked an inline comment as done.

Rebased.

RKSimon accepted this revision.Dec 21 2022, 1:39 PM

LGTM

This revision was automatically updated to reflect the committed changes.