Page MenuHomePhabricator

[PowerPC] Fix a performance bug for PPC::XXPERMDI.
ClosedPublic

Authored by jtony on May 22 2017, 4:34 AM.

Details

Summary

There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance.

Diff Detail

Repository
rL LLVM

Event Timeline

jtony created this revision.May 22 2017, 4:34 AM
nemanjai added inline comments.May 23 2017, 1:43 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1669 ↗(On Diff #99739)

Seems like a bit of code duplication with isWordShuffleMask(). Perhaps it would be more consistent to just implement something like isNByteElemShuffleMask(unsigned Width) which can be called for width 1, 2, 4, 8, 16 as needed.

1670 ↗(On Diff #99739)

Nit: formatting. This is indented more than the rest of the function for some reason.

1671 ↗(On Diff #99739)

I'm not a fan of loops with 2 iterations. It's probably clearer if this is just straight-line code.

1743 ↗(On Diff #99739)

Some people really don't like boolean parameters :). I really don't think this function is needed - just do the computation inline.
Especially when I look at the call sites - it looks like there are 3 and you've already checked the endianness on 2 of them.

1748 ↗(On Diff #99739)

Nit: indentation.

1749 ↗(On Diff #99739)

Is there a reason for this check (i.e. vs. an assert)? It seems we should be safe to assert this is the case.

1758 ↗(On Diff #99739)

I think maybe an assert is in order here... Perhaps:
assert((M0 | M1 < 4) && "A mask element out of bounds?")

1763 ↗(On Diff #99739)

Isn't this just (M0 | M1) < 2? Namely, neither is greater than 1.

1771 ↗(On Diff #99739)

This entire if statement makes my head hurt. At the very least, it should be documented. But I'm sure when you document it, it'll be obvious how it can be rewritten to be much simpler. Comment it with something along the lines of:

// If element 0 of the result comes from the first input (LE) or second input (BE)
// the inputs need to be swapped and elements adjusted accordingly.

I've suggested simplifications for the LE conditions, but you should be able to simplify the BE conditions accordingly.

1772 ↗(On Diff #99739)

If we ignore the fact that the second half of this can never be satisfied (M1 can't be 0 and 1), I assume this should be just:
if (M0 > 1 && M1 < 2)

1774 ↗(On Diff #99739)

Similarly, this one seems like it should just be:
if (M0 < 2 && M1 > 1)

kbarton added inline comments.May 25 2017, 11:23 AM
lib/Target/PowerPC/PPCISelLowering.cpp
1671 ↗(On Diff #99739)

I agree - this should be just straight line code, with a short comment explaining the logic.

1747 ↗(On Diff #99739)

Please add a comment explaining the semantics, and what what conditions the parameters will be modified.

1782 ↗(On Diff #99739)

This is a bit hard to follow, but I don't think Swap is modified on this path. Is that intentional? If so, it needs to be documented clearly in the comments above.

1794 ↗(On Diff #99739)

Same comment for Swap.

lib/Target/PowerPC/PPCISelLowering.h
457 ↗(On Diff #99739)

Please add a comment above to be consistent with all of the other declarations here.

jtony marked 16 inline comments as done.May 25 2017, 1:08 PM
jtony added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
1669 ↗(On Diff #99739)

Good suggestion, I have added the isNByteElemShuffleMask function you mentioned here to replace the isWordShuffleMask and isDoubleWordShuffleMask function, but only for 2,4,8,16 bytes since there is no need to call this function to check it is 1 byte element shuffle mask, it is always true.

1671 ↗(On Diff #99739)

This function have been refactored to isNByteElemShuffleMask mentioned by Nemanja above.

jtony updated this revision to Diff 100300.May 25 2017, 2:32 PM
jtony marked 2 inline comments as done.

Address comments from Nemanja and Kit.

echristo added inline comments.May 25 2017, 5:57 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1772 ↗(On Diff #100300)

Since this is basically two functions concatenated on each other with a boolean parameter it seems reasonable to just remove the parameter and split it into two functions and dispatch in the caller.

Or do it as a followup with the rest of them, but either way it'd be good to get the endianness stuff cleaned up.

nemanjai added inline comments.May 27 2017, 10:27 AM
lib/Target/PowerPC/PPCISelLowering.cpp
1606 ↗(On Diff #100300)

Is there a compelling reason to manually allocate memory for this over using efficient containers (something like SmallVector or similar)?

I'd much rather avoid manual memory management of this sort if we can. And I don't see anything in this function to indicate we can't.

hfinkel added inline comments.May 27 2017, 12:23 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1606 ↗(On Diff #100300)

We should definitely be using a SmallVector here, if we need to keep track of the size easily. Otherwise, given that Width is never greater than 16, we can just use:

unsigned MaskVal[16];
jtony updated this revision to Diff 100564.May 28 2017, 12:56 PM
jtony marked an inline comment as done.

Address comments from Hal and Nemanja.

jtony marked 2 inline comments as done.May 28 2017, 12:56 PM
jtony added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
1772 ↗(On Diff #100300)

This can be done together with the same issue created for the xxsldwi patch.

jtony marked an inline comment as done.May 28 2017, 12:57 PM
nemanjai accepted this revision.May 28 2017, 9:13 PM

In addition to the inline nit, I think it'd be more natural to use the v2i64 type for this PPCISD node (and the instruction) since the instruction inherently operates on doublewords. You should be able to just change the type you bitcast to in the C++ code as well as the type you match in the TblGen code.

Otherwise, LGTM.

lib/Target/PowerPC/PPCISelLowering.cpp
1761 ↗(On Diff #100564)

A couple of nits that can just be addressed on the commit (no need for another revision):

  • This appears to use doxygen-frendly comments with parameter tags, etc. I think doxygen will only use comments that start with three slashes (///).
  • I find it confusing for the comment to jump into the implementation like this. A short sentence explaining what the function does, then some details as needed. For example:
/// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap
/// if the inputs to the instruction should be swapped and set \p DM to the
/// value for the immediate.
This revision is now accepted and ready to land.May 28 2017, 9:13 PM
jtony marked an inline comment as done.May 29 2017, 7:21 PM
This revision was automatically updated to reflect the committed changes.