This is an archive of the discontinued LLVM Phabricator instance.

[Power9] Exploit vector integer extend instructions when indices aren't correct
ClosedPublic

Authored by syzaara on Jun 7 2017, 12:49 PM.

Details

Summary

This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine.

Diff Detail

Repository
rL LLVM

Event Timeline

syzaara created this revision.Jun 7 2017, 12:49 PM
syzaara removed a subscriber: echristo.
nemanjai added inline comments.Jun 12 2017, 3:50 AM
lib/Target/PowerPC/PPCISelLowering.cpp
11226 ↗(On Diff #101797)

I'm afraid that having such a generically named function in such a large file may get confusing down the road. Perhaps it would be clearer to fold this logic into combineBVOfVecExtend() and just use getScalarSizeInBits(). Perhaps something like:

if (InputSize + OutputSize == 5)
  TgtElemArrayIdx = 0;
// ...

In any case, you don't want to use a valid array index for an invalid combination as you do here. For example, I don't really see anything in this patch that will prevent you from doing something weird for the v16i8 -> v8i16 case (which there isn't an instruction for as far as I can tell)*. Also, please add that as a test case.

*I realize there isn't a pattern for this in the .td file, but I imagine you'll end up with some weird result in this case.

11255 ↗(On Diff #101797)

I think it would be much cleaner to make this array local to combineBVOfVecExtend (you can then pass the target encoding to addShuffleForVecExtend rather than indexing into it again).

11271 ↗(On Diff #101797)

Just initialize this at construction time and remove the loop.

11288 ↗(On Diff #101797)

Do we really need this? I thought there were overloads of SelectionDAG::getNode() that take two SDValue's.

11313 ↗(On Diff #101797)

I don't really see a reason for this to return an int rather than a bool. The reader might assume that the lambda returns values from a larger space if the return type is int.

11335 ↗(On Diff #101797)

Just a nit: it'd probably be more concise and readable to just use the ternary operator here.

11337 ↗(On Diff #101797)

I don't actually understand how this works. Won't the nibbles that correspond to the target elements for the opposite endianness always just be zero? And if that's the case, won't the comparison below always fail? (see below)

Perhaps the unnecessary shuffle produced will just be optimized out, but it's probably better not to emit it to begin with.

11352 ↗(On Diff #101797)

Shouldn't both operands of this comparison just have the nibbles corresponding to the opposite endianness masked out?
i.e. both should be and-ed with 0x0F0F0F0F0F0F0F0F of 0xF0F0F0F0F0F0F0F0

11355 ↗(On Diff #101797)

Just a comment along the lines of
// Regular lowering will catch cases where a shuffle is not needed.

lib/Target/PowerPC/PPCISelLowering.h
70 ↗(On Diff #101797)

Nit: line length.

lib/Target/PowerPC/PPCInstrVSX.td
2721 ↗(On Diff #101797)

Weren't these added with the previous patch? Or maybe it was only the LE ones? If so, can you please rebase this patch so that it would apply cleanly?

3043 ↗(On Diff #101797)

The versions with the new node are not endianness specific are they? If not, please move them out of the blocks and have a single copy of each.

test/CodeGen/PowerPC/vec_int_ext.ll
4 ↗(On Diff #101797)

Please add a test that checks that there are no shuffles when the input elements are correct.

syzaara updated this revision to Diff 102253.Jun 12 2017, 3:41 PM
syzaara added inline comments.Jun 12 2017, 3:44 PM
test/CodeGen/PowerPC/vec_int_ext.ll
4 ↗(On Diff #101797)

Each of these tests is already checking both correct and incorrect elements. The *LE tests are using correct elements for LE and so the BE pattern match should have the shuffle. The *BE tests are using correct elements for BE and so the LE pattern match should have the shuffle.

nemanjai added inline comments.Jun 12 2017, 9:51 PM
test/CodeGen/PowerPC/vec_int_ext.ll
4 ↗(On Diff #101797)

Ah I see. Not sure how I missed that. Sorry.

stefanp added inline comments.Jun 13 2017, 6:10 AM
lib/Target/PowerPC/PPCISelLowering.cpp
11332 ↗(On Diff #102253)

I realize that you are trying to check for the
byte -> word
case with this if statement.
However, is it possible to also catch word -> byte too?
You may have thought of this already and filtered out unwanted cases earlier up... I don't know. I just wanted to bring this to your attention in case it might be a problem.

syzaara added inline comments.Jun 13 2017, 12:49 PM
lib/Target/PowerPC/PPCISelLowering.cpp
11332 ↗(On Diff #102253)

Actually, I think it's okay since things like word -> byte would fail the isSExtOfVecExtract pattern matching so we wouldn't reach this far.

nemanjai accepted this revision.Jun 28 2017, 2:48 PM

LGTM. My comments are only about, well, comments. So feel free to fix them on the commit.

lib/Target/PowerPC/PPCISelLowering.cpp
11235 ↗(On Diff #102253)

Maybe just a quick note about the algorithm to make it easier to understand at first glance. Something like:

// Knowing the element indices being extracted from the original
// vector and the order in which they're being inserted, just put
// them at element indices required for the instruction.
11255 ↗(On Diff #102253)

Small nit: get rid of the use of new in the comment because these things won't be new for very long :).

11258 ↗(On Diff #102253)

All of this can just go away. You're repeating it in a clean and concise way at the definition of the TargetElems array.

11268 ↗(On Diff #102253)

It's not just any extend, right?
Probably combineBVOfVecSExt() then.

This revision is now accepted and ready to land.Jun 28 2017, 2:48 PM
This revision was automatically updated to reflect the committed changes.