Page MenuHomePhabricator

[PowerPC] Implement vec_xxpermdi builtin.

Authored by jtony on May 10 2017, 10:27 AM.



The vec_xxpermdi builtin is missing from altivec.h. This has been requested by developers working on libvpx for VP9 support for Google (PR: Initially, I tried to define a new intrinsic to map it to the corresponding PowerPC hard instruction (XXPERMDI) directly. But there was feedback from the community that this can be done without introducing new intrinsic. This patch re-implement the vec_xxpermdi builtin by using the existing shuffleVector instruction just in the FE.

And we currently won't emit an XXPERMDI when parameters are not vectors of doubleword elements, subsequent BE optimization work needed to identify the correct shuffles so that we can emit XXPERMDI for all types of vector.

Diff Detail


Event Timeline

jtony created this revision.May 10 2017, 10:27 AM
nemanjai requested changes to this revision.May 13 2017, 6:26 AM

Add a test case like the one that currently crashes (see inline comment). Also, please do the following:

  • Put a note in the description (and the commit message) with a link to the PR this fixes
  • Put a note in the description that there is subsequent back end work to identify the correct shuffles (i.e. we currently won't emit an XXPERMDI when parameters are not vectors of doubleword elements)
7997 ↗(On Diff #98484)

It isn't appropriate to remove a target-independent diagnostic and replace it with a target-specific one (even if it's just in the name). I think it makes sense to update both err_shufflevector_non_vector and err_shufflevector_incompatible_vector to take an argument for the name of the builtin, but not to replace it with a target-specific one.

I would recommend renaming it to something like:
err_shufflevector_non_vector -> err_vec_builtin_non_vector
err_shufflevector_incompatible_vector -> err_vec_builtin_incompatible_vector
and add the parameter.

8005 ↗(On Diff #98484)

We shouldn't diagnose out-of-range values.

8407 ↗(On Diff #98484)

This change is unrelated and inconsequential. Please revert it.

8425 ↗(On Diff #98484)

We shouldn't clamp this. The correct semantics should be to mask out everything but the low order two bits (i.e. &0x3). But I think even that is unnecessary since you should just use the low order two bits from the input - see below.

8433 ↗(On Diff #98484)

The switch is overkill. You should just implement this in an obvious way (i.e. the same way as described in the ISA).
For big endian:
ElemIdx0 = (Index & 2;) >> 1
ElemIdx1 = 2 + (Index & 1)

For little endian:
ElemIdx0 = (~Index & 1) + 2;
ElemIdx1 = ~Index & 2 >> 1;

(of course, please verify the expressions).

8482 ↗(On Diff #98484)

This is not going to work. You can try it with this simple test case that it crashes with:

#include <altivec.h>
vector int test(vector int a, vector int b) {
  return vec_xxpermdi(a, b, 0);

The problem is that the return type of the call expression may be different from the return type of the shufflevector.

You'll probably need something like this:

QualType BIRetType = E->getType();
auto RetTy = ConvertType(BIRetType);
return Builder.CreateBitCast(ShuffleCall, RetTy);
3900 ↗(On Diff #98484)

I assume that we won't even need this at all if we're not diagnosing the range of the third argument.

3915 ↗(On Diff #98484)

No, we don't want to diagnose this. Values larger than 3 should just use the low-order two bits. This matches GCC behaviour.

This revision now requires changes to proceed.May 13 2017, 6:26 AM
jtony edited the summary of this revision. (Show Details)May 16 2017, 10:52 AM
jtony edited the summary of this revision. (Show Details)May 16 2017, 1:20 PM
jtony marked 6 inline comments as done.
jtony added inline comments.
8433 ↗(On Diff #98484)

Good call, fixed as suggested.

jtony updated this revision to Diff 99292.May 17 2017, 6:54 AM
jtony edited edge metadata.

Address all the comments from Nemanja.

jtony edited the summary of this revision. (Show Details)May 17 2017, 6:00 PM
jtony marked an inline comment as done.May 17 2017, 7:14 PM

Other than the few minor comments, this LGTM.

8458 ↗(On Diff #99292)

Minor nit: please add a comment explaining the expressions. For example:

// Element zero comes from the first input vector and element one comes from the
// second. The element indices within each vector are numbered in big-endian
// order so the shuffle mask must be adjusted for this on little endian
// platforms (i.e. index is complemented and source vector reversed).
3944 ↗(On Diff #99292)

It seems strange that we're comparing argument types above and element types of argument vectors here. Can we not just use hasSameUnqualifiedType like the Sema checks on __builtin_shufflevector do?

27 ↗(On Diff #99292)

Add a test for non-vector arguments. Perhaps something like:
vec_xxpermdi(1, 2, 3);

inouehrs added inline comments.May 19 2017, 2:16 AM
23 ↗(On Diff #99292)

I am not sure we can assure that clang always do a constant propagation to resolve index as a compile time constant. But it seems that an existing test case above already assumes clang does it.
IMO, const unsigned index = 5; is a little better.

jtony updated this revision to Diff 99955.May 23 2017, 11:17 AM
jtony marked 6 inline comments as done.

Address more comments from Nemanja and Hiroshi.

3900 ↗(On Diff #98484)

Changed to just check it is Compile Time constant without check the range.

23 ↗(On Diff #99292)

Hi Hiroshi, the index is used as a non-constant variable test input to test the diagnostic message. We want it to be a variable here. But I guess I can leave it uninitialized to be clear.

jtony marked 2 inline comments as done.May 23 2017, 11:19 AM
inouehrs added inline comments.May 23 2017, 11:37 AM
23 ↗(On Diff #99292)

Thank you for the clarification. Comment says expected error.
Sorry for confusing you.

echristo accepted this revision.May 23 2017, 11:38 AM

Couple of small nits and a request to make some of the change separately, but otherwise LGTM. For the split part please don't actually submit another patch, just go ahead and do it :)



8007 ↗(On Diff #99292)

Spaces after commas please.

8014–8017 ↗(On Diff #99955)

The name change can be done separately.

jtony updated this revision to Diff 99966.May 23 2017, 11:59 AM
seanbruno resigned from this revision.May 23 2017, 12:04 PM
jtony marked 2 inline comments as done.May 23 2017, 12:52 PM
nemanjai accepted this revision.May 23 2017, 2:57 PM

Much like Eric's comments, mine shouldn't hold up approval. Feel free to address them on the commit.


3900 ↗(On Diff #99966)

This statement doesn't belong in this comment. The Sema check is just that the third argument is a compile-time constant. There's no need to mention 0-3 or last two bits here. Just state what you're checking, not the details of what you're currently using it for.

3905 ↗(On Diff #99966)

All the statements in the comments as well as the code itself only handle functions that have exactly 3 parameters, the first two of which are vectors and the last an integral constant expression. I really don't see the need for the NumArgs parameter here. It would indeed be weird if someone down the line called it with something like:
SemaBuiltinVSX(TheCall, 1);
SemaBuiltinVSX(TheCall, 5);

This revision is now accepted and ready to land.May 23 2017, 2:57 PM
This revision was automatically updated to reflect the committed changes.
jtony marked 2 inline comments as done.