Page MenuHomePhabricator

[Power9] Add codegen for VSX word insert/extract instructions
ClosedPublic

Authored by nemanjai on May 13 2016, 8:22 AM.

Details

Summary

The new instructions are not quite a match for the corresponding IR instructions (extractelement/insertelement) because both the sources the results of these instructions are VSX registers. However, there are patterns where these are useful:

  • Insert integer element can be accomplished with a direct move followed by in insert
  • Insert floating element can be accomplished with a conversion (from 64-bit single to 32-bit single) followed by an insert
  • Extract element on its own likely isn't profitable using these instructions
  • Extracting an integer element when the result will be converted to float (i.e. stay in one of the bottom 32 VSX registers) can be accomplished by an extract and a convert, thereby saving 2 direct moves (VSR <-> GPR)

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 57189.May 13 2016, 8:22 AM
nemanjai retitled this revision from to [Power9] Add codegen for VSX word insert/extract instructions.
nemanjai updated this object.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added subscribers: llvm-commits, echristo.
amehsan added inline comments.May 20 2016, 11:40 AM
lib/Target/PowerPC/PPCISelLowering.cpp
1507–1508

I would rename this two variables: Op2Shift and Op1Byte, in a way that are consistent with each other. Because they are named very similarly, reader thinks that they should be interpreted in the same way. Op2Shift is easy to understand (Operand 2 of Shift instruction, may be Op2OfShift?) but the other one is hard to understand from the name. This should be what we want to choose as UIM in xxinsertw. is that correct? I think it will be easier to understand if the name reflects that.

1527–1528

A comment here to say what le and be means, or changing the name to LittleEndinanShifts and BigEndianShifts, could help people who do not have enough context about this, if they happen to look at the code.

1530–1531

This is a cryptic comment. What is H, what is L?

1532–1562

I think if the two vector operands of shuffle vector are the same, then you can allow some other combinations of M1 and M2.

amehsan added inline comments.May 20 2016, 12:55 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1500–1508

I think people usually write comments for a member function in the header files and do not repeat the same comment in the implementation. It is natural to expect reader to go to header file if they are looking for comments, because for member variables and class, that is the only reasonable place to add a comment.

My personal preference is not to replicate the comment twice. I'd like to see what others think about this.

1507–1508

It seems that Op1Byte and Op2Shift refer the vector operand of shuffle_vector. In that case shouldn't it be Op0 and Op1? Maybe you don't need to even mention Operand index in the variable name. Something like ShiftSize and InsertIndex, might work....

nemanjai added inline comments.May 31 2016, 6:03 AM
lib/Target/PowerPC/PPCISelLowering.cpp
1500–1508

In terms of comments in a single place, I agree that this makes sense (in case we want to change the comment, we don't have to change it in two places). However, all the other functions have the comments duplicated. It could be so that Doxygen picks them up in both files, but I don't really know.

nemanjai updated this revision to Diff 59049.May 31 2016, 6:06 AM

Addressed the comments regarding naming, comments and situations where the vector operands to the shuffle are the same vector.
Provided test cases for the additional cases.

amehsan added inline comments.Jun 8 2016, 2:25 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1565–1566

How is this guaranteed?

nemanjai added inline comments.Jun 13 2016, 10:51 AM
lib/Target/PowerPC/PPCISelLowering.cpp
1565–1566

It is not guaranteed. It is just what Clang does (and presumably other transformations don't break it). Sure, we could provide code here that handles bizarre cases where a shuffle is performed and both vector operands are the same. However, I don't believe such code will be the output of any sane transformation as absolutely no further information is gained by having two of the same operands vs. having one operand be undef. So adding special code to handle this is probably overkill - but if I'm wrong about this, I'm happy to add code to handle this.

nemanjai added inline comments.Jun 13 2016, 10:55 AM
lib/Target/PowerPC/PPCISelLowering.cpp
1565–1566

I should correct that statement. I don't know whether this is guaranteed. Certainly, when I feed a test case with a repeated vector operand to llc, the initial selection DAG already contains that vector as the first operand and undef as the second operand. Consider this:

%vecins = shufflevector <4 x i32> %a, <4 x i32> %a, <4 x i32> <i32 0, i32 6, i32 2, i32 3>

becomes:

t4: v4i32 = vector_shuffle<0,2,2,3> t2, undef:v4i32
hfinkel added inline comments.Jun 13 2016, 1:19 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1565–1566

We should be able to assume that the mid-level optimizer, DAGCombine, etc. canonicalize vector_shuffle mask, a, a to vector_shuffle mask' a, undef. Please add the above as a regression test so we can make sure this continues to happen as expected.

amehsan edited edge metadata.Jun 14 2016, 11:55 AM

As we discussed, before you commit the change, please add -verify-machineinstrs to your regression tests. No need to upload the patch again. Thanks.

nemanjai updated this revision to Diff 61336.Jun 20 2016, 10:39 PM
nemanjai edited edge metadata.

The regression test that Hal requested is already part of the test case (i.e. the testSameVec* variations). This patch just adds -verify-machineinstrs to the test case.

nemanjai added inline comments.Jun 21 2016, 2:18 AM
lib/Target/PowerPC/PPCInstrVSX.td
2171

This needs HasP9Vector as well since the Predicates don't nest (i.e. this statement overrides what is set for the block).

2198

Same as above.

nemanjai updated this revision to Diff 61968.Jun 27 2016, 8:27 AM

Tested the patch on a P9 LE simulator. This prompted one change - I had a missing swap when the second operand is undef. Updated the code and test case accordingly.

kbarton accepted this revision.Jul 11 2016, 11:52 AM
kbarton edited edge metadata.

Aside from minor comments, this LGTM.

lib/Target/PowerPC/PPCISelLowering.cpp
1500–1508

From the coding guidelines:
"Don’t duplicate the documentation comment in the header file and in the implementation file. Put the documentation comments for public APIs into the header file. Documentation comments for private APIs can go to the implementation file. In any case, implementation files can include additional comments (not necessarily in Doxygen markup) to explain implementation details as needed."

I think you should remove these comments, and leave them in the header file.

1508

Can this method be marked as const?

This revision is now accepted and ready to land.Jul 11 2016, 11:52 AM
nemanjai added inline comments.Jul 11 2016, 12:02 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1500–1508

OK, I'll remove them but leave the other ones intact.

1508

This isn't a member function - just a free function in namespace PPC.

nemanjai closed this revision.Jul 12 2016, 2:08 PM

Committed revision 275215.