We currently produce a load, followed by (possibly a move for integers and) a splat as separate instructions. VSX has always had a splatting load for doublewords, but as of Power9, we have it for words as well. This patch just exploits these instructions.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
test/CodeGen/PowerPC/power9-moves-and-splats.ll | ||
---|---|---|
15 | Please forgive the trivial changes in this test case. The script that produces the checks apparently behaves slightly differently now and I would prefer to leave the test case exactly as produced by the script. |
test/CodeGen/PowerPC/power9-moves-and-splats.ll | ||
---|---|---|
15 | You can just regenerate all the affected files in a preparatory commit and rebase the patch. |
@amyk FYI, does this cover all the cases you want to do https://reviews.llvm.org/D51750 ?
test/CodeGen/PowerPC/power9-moves-and-splats.ll | ||
---|---|---|
15 | Ah, yeah. That's a good idea. I don't know why I didn't think of that. I'll definitely do that next time I see this issue. |
Thanks @jsji for letting me know, and thanks @nemanjai for the handling of the load and splats!
I think this mostly looks good to me. I'm curious though, we also have test/CodeGen/PowerPC/load-v4i8-improved.ll that has a load->shift/permute->splat case. Could this patch be extended to cover this?
That test case is as follows:
define <16 x i8> @test(i32* %s, i32* %t) { ; CHECK-LE-LABEL: test: ; CHECK-LE: # %bb.0: # %entry ; CHECK-LE-NEXT: lfiwzx f0, 0, r3 ; CHECK-LE-NEXT: xxpermdi vs0, f0, f0, 2 ; CHECK-LE-NEXT: xxspltw v2, vs0, 3 ; CHECK-LE-NEXT: blr ; CHECK-LABEL: test: ; CHECK: # %bb.0: # %entry ; CHECK-NEXT: lfiwzx f0, 0, r3 ; CHECK-NEXT: xxsldwi vs0, f0, f0, 1 ; CHECK-NEXT: xxspltw v2, vs0, 0 ; CHECK-NEXT: blr entry: %0 = bitcast i32* %s to <4 x i8>* %1 = load <4 x i8>, <4 x i8>* %0, align 4 %2 = shufflevector <4 x i8> %1, <4 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> ret <16 x i8> %2 }
This is an excellent point Amy. We should handle the shuffle case as well as that will probably come up quite a bit.
Updated to handle the load->shuffle pattern in addition to the load->build_vector pattern.
Some comments, but request changes as you will at least need to rebase to pick up non-relevant testcase update in https://reviews.llvm.org/rL365330.
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1775–1776 | Looks like you are repurposing this function: instead of using it just for 'VSPLT/VSPLTH/VSPLTW' , use it for lxvdsx as well. We should either rename the function, or re-structure the code to two different functions, in another NFC patch? | |
8143 | We can't support indexed load as well, and we are also trying to return the InputLoad, | |
8147 | Any other ISDs we can/should peek through? How about ANY_EXTEND_VECTOR_INREG/EXTRACT_SUBVECTOR? | |
8151 | Why not just LD->isNormalLoad()? | |
8282 | Do we need to check hasOneUse here as well? | |
8801 | Similar to isSplatShuffleMask, we are repurposing getVSPLTImmediate as well, VSPLT* has only 3 forms (1/2/4), we should either rename this function, or use another wrapper. | |
lib/Target/PowerPC/PPCISelLowering.h | ||
463 | This ISD has chain, please update the comments to describe it. | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
62 | Why SDTCisSameAS<0,1>? Shouldn't it be SDTCisPtrTy<1>? | |
test/CodeGen/PowerPC/load-and-splat.ll | ||
4 | Any specific reason that we want to use powerpc64 instead of powerpc64le for pwr9? | |
test/CodeGen/PowerPC/power9-moves-and-splats.ll | ||
15 | I have committed https://reviews.llvm.org/rL365330 to include the new ';', you patch should auto-merge when you rebase. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1775–1776 | Updating the comment, I can see. But why do you not feel this function is adequate for 64-bit splats? The name is adequeate - we are looking for a splat shuffle mask. And we are in a legalized DAG which will make all shuffles v16i8. | |
8143 | This is a good point. I can actually make it return a const SDValue * so success is simply signaled by the returned value not being nullptr. | |
8147 | I don't think we want to peek through any vector operations since we are looking for a scalar non-extending, non-indexed load. | |
8151 | I think you mean ISD::isNormalLoad(LD) and my answer is, there is no good reason. I first added the indexed check and realized later that I also have to ensure it is non-extending :) | |
8282 | I can certainly add it since this wouldn't be profitable if there are other uses of the load. | |
8801 | Yeah, you're absolutely right. I should rename the function. | |
lib/Target/PowerPC/PPCISelLowering.h | ||
463 | Sure, I omitted it since all of these nodes have a chain and all these comments seem superfluous. But you're right, consistency is more important. | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
62 | Wow, that's right. I neither know why I wrote it this way nor why table gen doesn't complain! | |
test/CodeGen/PowerPC/load-and-splat.ll | ||
4 |
Clean up some of the comments, function naming, single-use check and other comments from Jinsong. Thanks for the review Jinsong.
I think this overall looks good. Thanks for extending this to handle the load->shuffle case!
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5153 | Do comments like these need to be full sentences with periods, as well? |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
5153 | Sorry, I just noticed that for some reason, it highlighted this as your change. Please disregard my comment. |
This ISD has chain, please update the comments to describe it.