There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance.
Details
Diff Detail
Event Timeline
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1700 | Seems like a bit of code duplication with isWordShuffleMask(). Perhaps it would be more consistent to just implement something like isNByteElemShuffleMask(unsigned Width) which can be called for width 1, 2, 4, 8, 16 as needed. | |
1701 | Nit: formatting. This is indented more than the rest of the function for some reason. | |
1702 | I'm not a fan of loops with 2 iterations. It's probably clearer if this is just straight-line code. | |
1762 | Some people really don't like boolean parameters :). I really don't think this function is needed - just do the computation inline. | |
1767 | Nit: indentation. | |
1768 | Is there a reason for this check (i.e. vs. an assert)? It seems we should be safe to assert this is the case. | |
1777 | I think maybe an assert is in order here... Perhaps: | |
1782 | Isn't this just (M0 | M1) < 2? Namely, neither is greater than 1. | |
1790 | This entire if statement makes my head hurt. At the very least, it should be documented. But I'm sure when you document it, it'll be obvious how it can be rewritten to be much simpler. Comment it with something along the lines of: // If element 0 of the result comes from the first input (LE) or second input (BE) // the inputs need to be swapped and elements adjusted accordingly. I've suggested simplifications for the LE conditions, but you should be able to simplify the BE conditions accordingly. | |
1791 | If we ignore the fact that the second half of this can never be satisfied (M1 can't be 0 and 1), I assume this should be just: | |
1793 | Similarly, this one seems like it should just be: |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1702 | I agree - this should be just straight line code, with a short comment explaining the logic. | |
1766 | Please add a comment explaining the semantics, and what what conditions the parameters will be modified. | |
1801 | This is a bit hard to follow, but I don't think Swap is modified on this path. Is that intentional? If so, it needs to be documented clearly in the comments above. | |
1813 | Same comment for Swap. | |
lib/Target/PowerPC/PPCISelLowering.h | ||
461 | Please add a comment above to be consistent with all of the other declarations here. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1700 | Good suggestion, I have added the isNByteElemShuffleMask function you mentioned here to replace the isWordShuffleMask and isDoubleWordShuffleMask function, but only for 2,4,8,16 bytes since there is no need to call this function to check it is 1 byte element shuffle mask, it is always true. | |
1702 | This function have been refactored to isNByteElemShuffleMask mentioned by Nemanja above. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1769 | Since this is basically two functions concatenated on each other with a boolean parameter it seems reasonable to just remove the parameter and split it into two functions and dispatch in the caller. Or do it as a followup with the rest of them, but either way it'd be good to get the endianness stuff cleaned up. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1606 | Is there a compelling reason to manually allocate memory for this over using efficient containers (something like SmallVector or similar)? I'd much rather avoid manual memory management of this sort if we can. And I don't see anything in this function to indicate we can't. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1606 | We should definitely be using a SmallVector here, if we need to keep track of the size easily. Otherwise, given that Width is never greater than 16, we can just use: unsigned MaskVal[16]; |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1769 | This can be done together with the same issue created for the xxsldwi patch. |
In addition to the inline nit, I think it'd be more natural to use the v2i64 type for this PPCISD node (and the instruction) since the instruction inherently operates on doublewords. You should be able to just change the type you bitcast to in the C++ code as well as the type you match in the TblGen code.
Otherwise, LGTM.
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1761 | A couple of nits that can just be addressed on the commit (no need for another revision):
/// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap /// if the inputs to the instruction should be swapped and set \p DM to the /// value for the immediate. |
Please add a comment above to be consistent with all of the other declarations here.