This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine.
Details
Diff Detail
Event Timeline
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
11226 | I'm afraid that having such a generically named function in such a large file may get confusing down the road. Perhaps it would be clearer to fold this logic into combineBVOfVecExtend() and just use getScalarSizeInBits(). Perhaps something like: if (InputSize + OutputSize == 5) TgtElemArrayIdx = 0; // ... In any case, you don't want to use a valid array index for an invalid combination as you do here. For example, I don't really see anything in this patch that will prevent you from doing something weird for the v16i8 -> v8i16 case (which there isn't an instruction for as far as I can tell)*. Also, please add that as a test case. *I realize there isn't a pattern for this in the .td file, but I imagine you'll end up with some weird result in this case. | |
11255 | I think it would be much cleaner to make this array local to combineBVOfVecExtend (you can then pass the target encoding to addShuffleForVecExtend rather than indexing into it again). | |
11271 | Just initialize this at construction time and remove the loop. | |
11288 | Do we really need this? I thought there were overloads of SelectionDAG::getNode() that take two SDValue's. | |
11313 | I don't really see a reason for this to return an int rather than a bool. The reader might assume that the lambda returns values from a larger space if the return type is int. | |
11335 | Just a nit: it'd probably be more concise and readable to just use the ternary operator here. | |
11337 | I don't actually understand how this works. Won't the nibbles that correspond to the target elements for the opposite endianness always just be zero? And if that's the case, won't the comparison below always fail? (see below) Perhaps the unnecessary shuffle produced will just be optimized out, but it's probably better not to emit it to begin with. | |
11352 | Shouldn't both operands of this comparison just have the nibbles corresponding to the opposite endianness masked out? | |
11355 | Just a comment along the lines of | |
lib/Target/PowerPC/PPCISelLowering.h | ||
70 | Nit: line length. | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
2721 | Weren't these added with the previous patch? Or maybe it was only the LE ones? If so, can you please rebase this patch so that it would apply cleanly? | |
3051 | The versions with the new node are not endianness specific are they? If not, please move them out of the blocks and have a single copy of each. | |
test/CodeGen/PowerPC/vec_int_ext.ll | ||
4 | Please add a test that checks that there are no shuffles when the input elements are correct. |
test/CodeGen/PowerPC/vec_int_ext.ll | ||
---|---|---|
4 | Each of these tests is already checking both correct and incorrect elements. The *LE tests are using correct elements for LE and so the BE pattern match should have the shuffle. The *BE tests are using correct elements for BE and so the LE pattern match should have the shuffle. |
test/CodeGen/PowerPC/vec_int_ext.ll | ||
---|---|---|
4 | Ah I see. Not sure how I missed that. Sorry. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
11332 | I realize that you are trying to check for the |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
11332 | Actually, I think it's okay since things like word -> byte would fail the isSExtOfVecExtract pattern matching so we wouldn't reach this far. |
LGTM. My comments are only about, well, comments. So feel free to fix them on the commit.
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
11235 | Maybe just a quick note about the algorithm to make it easier to understand at first glance. Something like: // Knowing the element indices being extracted from the original // vector and the order in which they're being inserted, just put // them at element indices required for the instruction. | |
11255 | Small nit: get rid of the use of new in the comment because these things won't be new for very long :). | |
11258 | All of this can just go away. You're repeating it in a clean and concise way at the definition of the TargetElems array. | |
11268 | It's not just any extend, right? |
Nit: line length.