Page MenuHomePhabricator

[PPC] Add vec_insert4b/vec_extract4b to altivec.h

Authored by sfertile on Nov 11 2016, 6:48 AM.



Add macros that implement the vec_extract4b and vec_insert4b functionality.

vector unsigned long long vec_extract4b (vector unsigned char, const int)
Extracts a word from a vector at a byte position.
Result value:
The first doubleword element of the result contains the zero-extended extracted word from ARG1. The second doubleword is set to 0. ARG2 specifies the least-significant byte number (0 - 12) of the word to be extracted

vector unsigned char vec_insert4b (vector signed int, vector unsigned char, const int)
vector unsigned char vec_insert4b (vector unsigned int, vector unsigned char, const int)
Inserts a word into a vector at a byte position.
Result Value:
The contents of word element 1 of the first argument are extracted and placed into argument 2. The word is inserted into argument 2 starting at the byte offset indicated by the third argument.

Diff Detail


Event Timeline

sfertile updated this revision to Diff 77609.Nov 11 2016, 6:48 AM
sfertile retitled this revision from to [PPC] Add vec_insert4b/vec_extract4b to altivec.h.
sfertile updated this object.
sfertile set the repository for this revision to rL LLVM.
sfertile added subscribers: cfe-commits, echristo.
nemanjai added inline comments.Nov 11 2016, 4:22 PM

As far as I can tell by looking at this patch and the corresponding back end patch, the __a argument will have a word inserted into it and it will be returned.

Is that the semantics that the ABI specifies (I can't seem to make sense of the description).

vector unsigned int a = { 0xAAAAAAAA, 0xBBBBBB, 0xCCCCCC, 0xDDDDDD };
vector unsigned char b = (vector unsigned char) 0xFF;
vector unsigned char c = vec_insert4b(a, b, 4);
// Do we expect vector c to be:
// { 0xAA, 0xAA, 0xAA, 0xAA, 0xFF, 0xFF, 0xFF, 0xFF, 0xCC, 0xCC, 0xCC, 0xCC, 0xDD, 0xDD, 0xDD, 0xDD }
kbarton added inline comments.Nov 12 2016, 11:22 PM

I think the current version of the ABI document has an error in it. The description of the vec_insert4b is identical to the vec_extract4b, so I expect it was copy/pasted in error. I think we need to open up an (internal) bug against the ABI and wait for clarification to complete this.

nemanjai added inline comments.Nov 13 2016, 5:45 PM

I find it difficult to follow and understand this logic when it's in the header.
What I'd prefer to see here is that the macro simply expands into __builtin_vsx_xxextractuw and then handle all this logic in the code that emits an intrinsic call.
Namely if the target is little endian, we adjust the parameter, emit the intrinsic call and finally emit a shufflevector.

sfertile added inline comments.Nov 14 2016, 7:02 PM

I think this is a good idea, looking at the code its not obvious what is going on.


You have it correct Nemanja, word 1 will be extracted from b, and it will get inserted into a. The word will be inserted at the byte position starting at the 3rd argument. (so in this case byte offsets 4 to 7)

I talked to Bill Schmidt earlier today and he already has a bug open. It hasn't been updated yet, but it should roughly correspond to the description for the xxinsertw instruction:

The contents of word element 1 of VSR[XB] are placed
into byte elements UIM:UIM+3 of VSR[XT]. The contents
of the remaining byte elements of VSR[XT] are not

sfertile updated this object.Nov 14 2016, 7:13 PM
kbarton edited edge metadata.Nov 15 2016, 6:09 AM

Did you upload a new patch? The diff doesn't appear to change for me.

sfertile updated this revision to Diff 78760.Nov 21 2016, 12:57 PM
sfertile edited edge metadata.

Moved the endian related massaging from altivec.h into Clang codegen and clamped the input index into the valid range [0, 12].

syzaara added inline comments.Nov 21 2016, 1:13 PM

tiny comment, char is misspelled as cahr

sfertile updated this revision to Diff 78911.Nov 22 2016, 11:49 AM

Fixed spelling error in comment

sfertile marked an inline comment as done.Nov 22 2016, 11:51 AM
sfertile added inline comments.


nemanjai added inline comments.Nov 22 2016, 2:22 PM
assert(ArgCI && "The third operand of this intrinsic must be a compile-time constant")



Just a minor nit, but I'd initialize this inline here.


Same comment as above - make sure your asserts tell the developer why they trip.


Add testing with an argument out of range as well as non-constant (presumably in a separate test case for the latter).

kbarton accepted this revision.Nov 22 2016, 2:28 PM
kbarton edited edge metadata.

LGTM, but I'd like @nemanjai to have a quick look at the builtin_vsx_insertword and builtin_vsx_extractuword too.

This revision is now accepted and ready to land.Nov 22 2016, 2:28 PM
nemanjai requested changes to this revision.Nov 22 2016, 7:39 PM
nemanjai edited edge metadata.

Overall, I'd like to see another revision of this. This is not because I feel the patch requires major rework, but because there is a number of minor comments that I'd like to make sure are all addressed.
Just a few notes:

  • Please be careful with the naming convention (mentioned in inline comments)
  • Be descriptive with your asserts - imagine being a developer that gets an assert message that just says ArgCI - it would probably not mean much to you.
  • I am pretty sure this does not require any work in Sema since AFAICT the builtins are defined to require that the third parameter is a constant (i.e. Ii rather than just i). But this should be tested (mentioned in inline comments).
  • I would like to see testing cover all of the code you generate (i.e. you generate bitcasts to <2 x i64> and I don't see it in the tests)
  • I would like to see early exits if the subtarget does not have P9Vector. This is kind of subtle, but for most builtins that lower to an intrinsic that requires some subtarget feature that isn't present, the user gets a not able to compile this builtin yet. As you have it implemented, it would not be the case here - it would lower to an intrinsic and we'd get an ISEL error. I think the latter situation is inferior so we should find a way to get the former. I am hoping this will not require Sema (I'm not sure if returning a nullptr will achieve this goal but hopefully something along those lines will). Basically, the point is that you shouldn't assume that the only way you'd get a builtin call is through the interface defined in the ABI, but that someone may actually insert a __builtin_<whatever> in their code.

This nit applies to all naming in this changeset: variables start with an uppercase letter and functions with a lowercase letter.


Minor nit: please settle on where the star goes when declaring pointers.


Just a general nit here - we prefer comments to be complete sentences with capitalization and punctuation.


How does this pass? The "T2" is not enclosed in double square brackets.

This revision now requires changes to proceed.Nov 22 2016, 7:39 PM
sfertile updated this revision to Diff 79860.Nov 30 2016, 8:40 PM
sfertile edited edge metadata.
sfertile marked an inline comment as done.

Changed all variable names to start with capitals, added description strings to the assertions, changed uses of '12' to MaxIndex and initialized the arrays in their definition.

nemanjai added inline comments.Dec 1 2016, 1:36 AM

This will likely have to change when the wording is settled on for this builtin. Namely, the element to insert is in BE word element 1 for the instruction. So it is quite likely that the builtin will be made "endianness-neutral" by specifying the source element to be word element 1 (for both LE/BE).

nemanjai requested changes to this revision.Dec 7 2016, 2:37 AM
nemanjai edited edge metadata.

The finalized wording for vec_insert4b:
Let W be the first doubleword element of ARG1, truncated to 32 bits. The result vector is formed by inserting W into ARG2 at the byte position (0-12) specified by ARG3.

Given that wording, this implementation has almost the correct semantics. The only change required is that the source and target vectors need to be switched. Namely, a word is taken out of the first argument and inserted into the second argument. And it appears that the current implementation has that backwards.

This revision now requires changes to proceed.Dec 7 2016, 2:37 AM
sfertile updated this revision to Diff 82031.Dec 19 2016, 4:07 PM
sfertile edited edge metadata.
sfertile marked 6 inline comments as done.

Updated to swap the arguments when generating the intrinsic. Updated a number of the comments, and added some tests with the index out of range.

nemanjai accepted this revision.Dec 21 2016, 5:36 AM
nemanjai edited edge metadata.

Just one minor inline nit. Other than that, LGTM.


I'm just curious why -Werror? This should be an error even without that.

This revision is now accepted and ready to land.Dec 21 2016, 5:36 AM
nemanjai added inline comments.Jan 9 2017, 7:33 AM

I think this will fail on all the powerpc targets, such as powerpc64le, etc. Which isn't what you want I imagine.
Also, I think this isn't really the idea with XFAIL. The semantics we'd probably want are better served by using the not tool and checking for the error messages. You should be able to find some example of how the not tool is used in some of the other test cases (for example test/CodeGen/builtins-ppc-p7-disabled.c).

echristo closed this revision.Jan 30 2017, 11:48 PM

This was committed:

commit d65cd1f9424369c4ae7f945fac7fd9e4357451b2
Author: Sean Fertile <>
Date: Thu Jan 5 21:43:30 2017 +0000

Add vec_insert4b and vec_extract4b functions to altivec.h

Add builtins for the functions and custom codegen mapping the builtins to their
corresponding intrinsics and handling the endian related swapping.

git-svn-id: 91177308-0d34-0410-b5e6-96231b3b80d8