Add macros that implement the vec_extract4b and vec_insert4b functionality.
vector unsigned long long vec_extract4b (vector unsigned char, const int)
Purpose:
Extracts a word from a vector at a byte position.
Result value:
The first doubleword element of the result contains the zero-extended extracted word from ARG1. The second doubleword is set to 0. ARG2 specifies the least-significant byte number (0 - 12) of the word to be extracted
vector unsigned char vec_insert4b (vector signed int, vector unsigned char, const int)
vector unsigned char vec_insert4b (vector unsigned int, vector unsigned char, const int)
Purpose:
Inserts a word into a vector at a byte position.
Result Value:
The contents of word element 1 of the first argument are extracted and placed into argument 2. The word is inserted into argument 2 starting at the byte offset indicated by the third argument.
I find it difficult to follow and understand this logic when it's in the header.
What I'd prefer to see here is that the macro simply expands into __builtin_vsx_xxextractuw and then handle all this logic in the code that emits an intrinsic call.
Namely if the target is little endian, we adjust the parameter, emit the intrinsic call and finally emit a shufflevector.