This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Exploit single instruction load-and-splat for word and doubleword
ClosedPublic

Authored by nemanjai on Jun 20 2019, 2:04 PM.

Details

Summary

We currently produce a load, followed by (possibly a move for integers and) a splat as separate instructions. VSX has always had a splatting load for doublewords, but as of Power9, we have it for words as well. This patch just exploits these instructions.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai created this revision.Jun 20 2019, 2:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2019, 2:04 PM
Herald added a subscriber: kbarton. · View Herald Transcript
nemanjai marked an inline comment as done.Jun 20 2019, 2:05 PM
nemanjai added inline comments.
test/CodeGen/PowerPC/power9-moves-and-splats.ll
15

Please forgive the trivial changes in this test case. The script that produces the checks apparently behaves slightly differently now and I would prefer to leave the test case exactly as produced by the script.

lebedev.ri added inline comments.
test/CodeGen/PowerPC/power9-moves-and-splats.ll
15

You can just regenerate all the affected files in a preparatory commit and rebase the patch.

jsji added a subscriber: amyk.Jun 20 2019, 2:18 PM

@amyk FYI, does this cover all the cases you want to do https://reviews.llvm.org/D51750 ?

nemanjai marked an inline comment as done.Jun 20 2019, 6:23 PM
nemanjai added inline comments.
test/CodeGen/PowerPC/power9-moves-and-splats.ll
15

Ah, yeah. That's a good idea. I don't know why I didn't think of that. I'll definitely do that next time I see this issue.

amyk added a comment.Jun 26 2019, 7:14 AM

Thanks @jsji for letting me know, and thanks @nemanjai for the handling of the load and splats!

I think this mostly looks good to me. I'm curious though, we also have test/CodeGen/PowerPC/load-v4i8-improved.ll that has a load->shift/permute->splat case. Could this patch be extended to cover this?

That test case is as follows:

define <16 x i8> @test(i32* %s, i32* %t) {
; CHECK-LE-LABEL: test:
; CHECK-LE:       # %bb.0: # %entry
; CHECK-LE-NEXT:    lfiwzx f0, 0, r3
; CHECK-LE-NEXT:    xxpermdi vs0, f0, f0, 2
; CHECK-LE-NEXT:    xxspltw v2, vs0, 3
; CHECK-LE-NEXT:    blr

; CHECK-LABEL: test:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    lfiwzx f0, 0, r3
; CHECK-NEXT:    xxsldwi vs0, f0, f0, 1
; CHECK-NEXT:    xxspltw v2, vs0, 0
; CHECK-NEXT:    blr
entry:
  %0 = bitcast i32* %s to <4 x i8>*
  %1 = load <4 x i8>, <4 x i8>* %0, align 4
  %2 = shufflevector <4 x i8> %1, <4 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  ret <16 x i8> %2
}

Thanks @jsji for letting me know, and thanks @nemanjai for the handling of the load and splats!

I think this mostly looks good to me. I'm curious though, we also have test/CodeGen/PowerPC/load-v4i8-improved.ll that has a load->shift/permute->splat case. Could this patch be extended to cover this?

This is an excellent point Amy. We should handle the shuffle case as well as that will probably come up quite a bit.

nemanjai updated this revision to Diff 207380.Jul 1 2019, 11:05 AM

Updated to handle the load->shuffle pattern in addition to the load->build_vector pattern.

nemanjai updated this revision to Diff 207385.Jul 1 2019, 11:36 AM

Missed some code cleanup on the last upload.

@jsji @amyk Do you guys have any further comments regarding this?

jsji requested changes to this revision.Jul 8 2019, 2:45 PM

Some comments, but request changes as you will at least need to rebase to pick up non-relevant testcase update in https://reviews.llvm.org/rL365330.

lib/Target/PowerPC/PPCISelLowering.cpp
1781

Looks like you are repurposing this function: instead of using it just for 'VSPLT/VSPLTH/VSPLTW' , use it for lxvdsx as well.
I don't think it is a great idea to just update the assert here.

We should either rename the function, or re-structure the code to two different functions, in another NFC patch?

8090

We can't support indexed load as well, and we are also trying to return the InputLoad,
so maybe something like getNormalLoadInput and add comments about returning true for success?

8094

Any other ISDs we can/should peek through? How about ANY_EXTEND_VECTOR_INREG/EXTRACT_SUBVECTOR?

8098

Why not just LD->isNormalLoad()?

8217

Do we need to check hasOneUse here as well?

8735

Similar to isSplatShuffleMask, we are repurposing getVSPLTImmediate as well, VSPLT* has only 3 forms (1/2/4), we should either rename this function, or use another wrapper.

lib/Target/PowerPC/PPCISelLowering.h
454

This ISD has chain, please update the comments to describe it.

lib/Target/PowerPC/PPCInstrVSX.td
62

Why SDTCisSameAS<0,1>? Shouldn't it be SDTCisPtrTy<1>?

test/CodeGen/PowerPC/load-and-splat.ll
4

Any specific reason that we want to use powerpc64 instead of powerpc64le for pwr9?

test/CodeGen/PowerPC/power9-moves-and-splats.ll
15

I have committed https://reviews.llvm.org/rL365330 to include the new ';', you patch should auto-merge when you rebase.

This revision now requires changes to proceed.Jul 8 2019, 2:45 PM
nemanjai marked 9 inline comments as done.Jul 10 2019, 5:19 PM
nemanjai added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
1781

Updating the comment, I can see. But why do you not feel this function is adequate for 64-bit splats? The name is adequeate - we are looking for a splat shuffle mask. And we are in a legalized DAG which will make all shuffles v16i8.

8090

This is a good point. I can actually make it return a const SDValue * so success is simply signaled by the returned value not being nullptr.

8094

I don't think we want to peek through any vector operations since we are looking for a scalar non-extending, non-indexed load.

8098

I think you mean ISD::isNormalLoad(LD) and my answer is, there is no good reason. I first added the indexed check and realized later that I also have to ensure it is non-extending :)
I will definitely change it, thank you.

8217

I can certainly add it since this wouldn't be profitable if there are other uses of the load.

8735

Yeah, you're absolutely right. I should rename the function.
I initially left it alone as I thought that VSPLT adequately abbreviates Vector Splat, but since it is part of a mnemonic and capitalized, I agree with you that it is very misleading.

lib/Target/PowerPC/PPCISelLowering.h
454

Sure, I omitted it since all of these nodes have a chain and all these comments seem superfluous. But you're right, consistency is more important.

lib/Target/PowerPC/PPCInstrVSX.td
62

Wow, that's right. I neither know why I wrote it this way nor why table gen doesn't complain!

test/CodeGen/PowerPC/load-and-splat.ll
4

I wanted at least one of them to be big endian and I figured why not the one that has more hits in the test case. I don't think we really need four RUN lines for P8/P9 and LE/BE.

nemanjai updated this revision to Diff 209169.Jul 11 2019, 4:27 AM

Clean up some of the comments, function naming, single-use check and other comments from Jinsong. Thanks for the review Jinsong.

jsji accepted this revision.Jul 11 2019, 11:51 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Jul 11 2019, 11:51 AM
amyk accepted this revision.Jul 11 2019, 12:14 PM

I think this overall looks good. Thanks for extending this to handle the load->shuffle case!

lib/Target/PowerPC/PPCISelLowering.cpp
5135–5136

Do comments like these need to be full sentences with periods, as well?

amyk added inline comments.Jul 11 2019, 1:42 PM
lib/Target/PowerPC/PPCISelLowering.cpp
5135–5136

Sorry, I just noticed that for some reason, it highlighted this as your change. Please disregard my comment.

This revision was automatically updated to reflect the committed changes.