Page MenuHomePhabricator

CodeGen: Power: Add lowering for shifts of v1i128.
ClosedPublic

Authored by iteratee on May 2 2017, 4:50 PM.

Details

Summary

When legalizing vector operations on vNi128, they will be split to v1i128
because that is a legal type on ppc64, but then the compiler will crash in
selection dag because it fails to select for these operations. This patch fixes
shift operations. Logical shift right and left shift can be performed in the
vector unit, but algebraic shift right requires being split. We only perform
them in the vector unit for Power 9, as it is too costly on other platforms.

Diff Detail

Repository
rL LLVM

Event Timeline

iteratee created this revision.May 2 2017, 4:50 PM
iteratee updated this revision to Diff 97528.May 2 2017, 5:18 PM

Update test to check for the power9 sequence.

Note that we should eventually produce the power9 sequence for the non-vector case as well.

nemanjai added inline comments.May 2 2017, 7:56 PM
lib/Target/PowerPC/PPCInstrAltivec.td
991 ↗(On Diff #97528)

I think these patterns are missing something. I understand that the shift amount is split up between the VSLO and VSL (i.e. shifting by 9 is meant to shift $vA by one byte and one bit). However, for the VSL, every byte in $vB needs to have the same value in the low order 3 bits or the result is undefined. So I think we need to VSPLTB 15 the $vB.

kbarton added inline comments.May 4 2017, 7:55 PM
test/CodeGen/PowerPC/shift128.ll
4 ↗(On Diff #97528)

I don't recognize this triple. We typically use powerpc64le-unknown-linux-gnu.

echristo added inline comments.May 4 2017, 8:52 PM
test/CodeGen/PowerPC/shift128.ll
4 ↗(On Diff #97528)

Yeah, the vendor part is a nop here, but should be changed.

iteratee updated this revision to Diff 97980.May 5 2017, 10:09 AM
iteratee set the repository for this revision to rL LLVM.

Change vendor to unknown

lib/Target/PowerPC/PPCInstrAltivec.td
991 ↗(On Diff #97528)

The documentation says that for VSL it just uses bits 125:127 from VRB.

nemanjai added inline comments.May 5 2017, 12:36 PM
lib/Target/PowerPC/PPCInstrAltivec.td
991 ↗(On Diff #97528)

From Power ISA 3.0:

The result is place into VR[VRT], except if, for any byte
element in register VR[VRB], the low-order 3 bits are not
equal to the shift amount, then VR[VRT] is undefined.

And there have apparently been cores in the past (I don't know which ones) that would not produce the correct results if the bottom 3 bits were not the same in all the byte elements. So I'm afraid we'll need this VSPLTB.

iteratee added inline comments.May 5 2017, 1:30 PM
lib/Target/PowerPC/PPCInstrAltivec.td
991 ↗(On Diff #97528)

OK. But we only want to generate this sequence on Power 9 anyway. Is this well defined on power9?

Ping? Do we know if this vector operation requires the byte splat on power9?

Ping? Do we know if this vector operation requires the byte splat on power9?

The unfortunate situation is that the ISA provides no guarantee that the implementation will do the right thing in this situation. With the most up-to-date ISA, we unfortunately still have to make this a 3 instruction sequence.
However, what's also missing here is the ability to generate better code for "shift immediate".

For reg/reg shifts, we need
(vsl (vslo %a, %b), (vspltb %b, 15)) - could also reduce reg pressure (at the cost of longer path length for the vslo) with (vsl (vslo %a, (vspltb %b, 15)), (vspltb %b, 15))
for reg/imm shifts, we need
(vsl (vslo %a, (xxspltib <imm>)), (xxspltib <imm>))

It may appear that the immediate form is no better than the register form since we're using the same number of instructions. However, if we're materializing a constant shift amount, it will be a load-immediate followed by a direct-move. So this is a significantly better overall sequence.

iteratee updated this revision to Diff 98555.May 10 2017, 4:20 PM

Add splat to the output.

I think this is fine now (other than a couple of inline nits). LGTM but I'll let @kbarton have a look as well rather than approving the patch. The output patterns could use (VSPLTB 15, %b) in the VSLO portion as well so that it can go in the same register, but I'll leave the choice to do so or not up to you.

We should also probably have a follow-up patch for immediate shifts.

lib/Target/PowerPC/PPCISelLowering.cpp
687 ↗(On Diff #98555)

s/2/3 (i.e. we need 3 instructions to do the shift). Same below.

iteratee updated this revision to Diff 99032.May 15 2017, 11:25 AM
iteratee added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
687 ↗(On Diff #98555)

Thanks.

iteratee marked 5 inline comments as done.May 15 2017, 11:26 AM
This revision is now accepted and ready to land.May 16 2017, 12:08 PM
This revision was automatically updated to reflect the committed changes.