This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Back end improvements to vec_splat
ClosedPublic

Authored by nemanjai on Mar 30 2016, 4:27 AM.

Details

Summary

We currently have no way to emit the xxspltw for a word splat and a call to vec_splat for vectors with 8-byte elements get translated to vperm with a mask vector that comes from memory.
This patch provides intrinsics to get at xxspltw and xxspltd (extended mnemonic). In a subsequent patch, altivec.h will be modified to use these intrinsics for the respective vec_splat definitions.

This provides a significant improvement in one benchmark that uses vec_splat.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 52035.Mar 30 2016, 4:27 AM
nemanjai retitled this revision from to [PowerPC] Back end improvements to vec_splat.
nemanjai updated this object.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added a subscriber: llvm-commits.
amehsan added inline comments.Mar 30 2016, 4:37 AM
include/llvm/IR/IntrinsicsPowerPC.td
752–755

Are you going to define a builtin and emit this intrinsics in the clang, when that builtin is seen? Can't we use shufflevector instruction, instead of target specific intrinsic?

nemanjai added inline comments.Mar 30 2016, 8:07 AM
include/llvm/IR/IntrinsicsPowerPC.td
752–755

Yes, D18593 has the FE portions of this patch.
I don't think having this intrinsic prevents us from using these for the shufflevector instruction. These can be subsequently added as options to the framework for emitting vector shuffles. Namely, shufflevector is probably too general of an instruction to do this all in the .td files.
Notice that all the Altivec instructions that can be used for vector shuffles have code fragments that check whether the node has a mask that is eligible for that instruction.
The canonical type for all the vector shuffles is v16i8 which isn't a type that can be in vsrc. Of course, this can be changed...

I think that the simplest thing to do would be similar to the way the QPX QVESPLATI instruction is handled (namely, check for VSX, check that the node is a splat of the correct type, emit xxspltw/xxpermdi).

Of course, this is just my opinion and if people feel that I should implement this as a vector shuffle immediately, I can do so.

amehsan added inline comments.Mar 30 2016, 11:48 AM
include/llvm/IR/IntrinsicsPowerPC.td
752–755

You definitely know more about our codegen than I do and I may be missing something here. Also it will be good to hear from other reviewers about this (@hfinkel, @kbarton, @wschmidt, @seurer). The way I see it (and I maybe wrong) is this. We have two options:

1- Current implementation which is simpler.
2- Using vector shuffle, which requires extra C++ code, to get it right. It is more complicated.

I think (1) results in introducing a new target specific intrinsic which is not understood by optimizations. Also this intrinsic maybe used for other purposes in the future. Potentially causing optimization problems. So even though (2) is more complicated, to me it seems that (2) is the right way to go.

One reason that I see we may want to choose (1) is that the complexity of (2) is too much. There might be other reasons that I have not realized.

hfinkel added inline comments.Mar 30 2016, 12:16 PM
include/llvm/IR/IntrinsicsPowerPC.td
752–755

I don't really see a difference between what the two of you are proposing, except whether the frontend uses an intrinsic or not. Is this right?

Our general preference, across all targets, is that the frontend headers emit generic IR whenever possible (including using things like __builtin_shufflevector, or equivalent syntax. Then we match it in the backend.

Yes, please use generic IR if it is possible to represent. Thanks.

nemanjai added inline comments.Mar 30 2016, 2:38 PM
include/llvm/IR/IntrinsicsPowerPC.td
752–755

OK, I think it's settled. I'll update the patch so that we retain the information that a vec_splat is a vector_shuffle rather than using a target-specific builtin. Thank you all for your comments. New patch coming up tomorrow :).

nemanjai updated this revision to Diff 52216.Mar 31 2016, 8:50 AM

I've changed this around a bit to match an SDAG pattern that will be emitted when lowering vector_shuffle nodes rather than matching an intrinsic. This way, we'll get the FE to emit vector shuffles for splats (in the update to D18593).

Also, since XXSPLTW is a splat, I've added the handling for it to the swap removal pass.

hfinkel accepted this revision.Apr 26 2016, 6:11 PM
hfinkel edited edge metadata.

A couple of comments, otherwise LGTM.

lib/Target/PowerPC/PPCVSXSwapRemoval.cpp
410 ↗(On Diff #52216)

Looks like the "This is not yet implemented. When it is, we need to uncomment the following:" part of this comment is out-dated. If so, please remove it.

test/CodeGen/PowerPC/swaps-le-2.ll
90 ↗(On Diff #52216)

Please add a few more test cases for other index values. The space of bugs that might happen to continue to cause this index to be 0, but incorrectly compute others is pretty large.

This revision is now accepted and ready to land.Apr 26 2016, 6:11 PM
nemanjai closed this revision.May 4 2016, 9:10 AM

Committed revision 268516.