This is an archive of the discontinued LLVM Phabricator instance.

[X86][XOP] Support for VPPERM byte shuffle instruction
ClosedPublic

Authored by RKSimon on Mar 15 2016, 9:21 AM.

Details

Summary

This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode and shuffle mask decoding (for the more basic shuffle operations that the instruction supports).

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrCOmments.cpp. This should be useful for some AVX512 binary shuffles (VPERMT2 etc.) as well.

A followup patch will enable VPPERM as a target shuffle for combining.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 50744.Mar 15 2016, 9:21 AM
RKSimon retitled this revision from to [X86][XOP] Support for VPPERM byte shuffle instruction.
RKSimon updated this object.
RKSimon added reviewers: delena, spatel, andreadb, silvas.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.

ping - I'm primarily interested in whether anybody has any concerns about the changes to X86MCInstLower.cpp - I realise not many people are up on the workings of XOP ;-)

spatel edited edge metadata.Mar 23 2016, 9:47 AM

ping - I'm primarily interested in whether anybody has any concerns about the changes to X86MCInstLower.cpp - I realise not many people are up on the workings of XOP ;-)

I'm not a good reviewer for XOP or AVX512... :)
But can this patch be split up as:

  1. Add support for VPPERM without changing getShuffleComment()
  2. Enhance getShuffleComment() to support VPPERM (and VPERMT2?)

ping - I'm primarily interested in whether anybody has any concerns about the changes to X86MCInstLower.cpp - I realise not many people are up on the workings of XOP ;-)

I'm not a good reviewer for XOP or AVX512... :)
But can this patch be split up as:

  1. Add support for VPPERM without changing getShuffleComment()
  2. Enhance getShuffleComment() to support VPPERM (and VPERMT2?)

Yes thats pretty trivial - I was really trying to avoid overloading the list with patches about instruction sets that not many people are interested in ;-) The split is very clean already:

1 - Add support for VPPERM:
X86ISelLowering.h
X86ISelLowering.cpp
X86InstrFragmentsSIMD.td
X86InstrXOP.td
X86IntrinsicsInfo.h

2 - Add support for constant pool decoding of 2 input shuffles:
X86MCInstLower.cpp

3 - Enable VPPERM constant pool decoding:
X86ShuffleDecodeConstantPool.h
X86ShuffleDecodeConstantPool.cpp
vector-shuffle-combining-xop.ll

Whether we merge 2 + 3 depends on if we want immediate test cases for (2) - without them all we show is that it doesn't break existing unary shuffle decodes.

Yes thats pretty trivial - I was really trying to avoid overloading the list with patches about instruction sets that not many people are interested in ;-) The split is very clean already:

1 - Add support for VPPERM:
X86ISelLowering.h
X86ISelLowering.cpp
X86InstrFragmentsSIMD.td
X86InstrXOP.td
X86IntrinsicsInfo.h

Can you limit this patch to only this part? I have my AMD Vol. 4 open to the vpperm page, so I can review that part first. AMD managed to redo Altivec vperm and add extra magic via the unused control bits...who knew? :)

RKSimon updated this revision to Diff 51493.Mar 23 2016, 5:04 PM
RKSimon edited edge metadata.

Reduced the patch to just the plumbing for the X86ISD::VPPERM opcode.

Yeah, VPPERM is pretty nifty - I'll probably try to make use of at least some of its features (bitreverse comes to mind) in future patches, but for now I'm just interested in binary shuffle combining.

spatel accepted this revision.Mar 23 2016, 5:59 PM
spatel edited edge metadata.

LGTM. See inline comments for a couple of small changes.

lib/Target/X86/X86InstrXOP.td
225–244 ↗(On Diff #51493)

The reg/mem suffixes here confused me at first. I realize this is copying existing code, but I'd prefer if these were more accurate for the 3 input operands: "rrr", "rrm", "rmr". It's fine if that's a separate commit for that NFC change.

test/CodeGen/X86/vector-shuffle-combining-xop.ll
33–40 ↗(On Diff #51493)

Add 2 tests for the load folding variants?

This revision is now accepted and ready to land.Mar 23 2016, 5:59 PM
RKSimon added inline comments.Mar 24 2016, 2:08 AM
test/CodeGen/X86/vector-shuffle-combining-xop.ll
33–40 ↗(On Diff #51493)

Those are already tested in xop-intrinsics-x86_64.ll

This revision was automatically updated to reflect the committed changes.