This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10.
Details
- Reviewers
saghir nemanjai hfinkel - Group Reviewers
Restricted Project - Commits
- rGd30155feaa9c: [PowerPC] Implementation of 128-bit Binary Vector Rotate builtins
Diff Detail
Event Timeline
clang/lib/Headers/altivec.h | ||
---|---|---|
7980 | While correct, this implementation will require two constant pool loads (for the two shift amounts), then two vrlq's to shift the two vectors and finally an xxlor to OR them together. We should be able to do this with a single constant pool load and vperm. // Merge __b and __c using an appropriate shuffle. vector unsigned char TmpB = (vector unsigned char)__b; vector unsigned char TmpC = (vector unsigned char)__c; vector unsigned char MaskAndShift = #ifdef __LITTLE_ENDIAN__ __builtin_shufflevector(TmpB, TmpC, -1, -1, -1, -1, -1, -1, -1, -1, 16, 1, 0, -1, -1, -1, -1, -1); #else __builtin_shufflevector(TmpB, TmpC, -1, -1, -1, -1, -1, 30, 31, 15, -1, -1, -1, -1, -1, -1, -1, -1); #endif return __builtin_altivec_vrlqnm(__a, MaskAndShift); (but of course, double-check that the numbers are correct). | |
clang/test/CodeGen/builtins-ppc-p10vector.c | ||
1628 | Please show the shift in the test case as well. |
clang/test/CodeGen/builtins-ppc-p10vector.c | ||
---|---|---|
4 | The CHECK-COMMON should not be needed. You can just use the CHECK prefix in the tests since we have set up check prefixes. | |
llvm/lib/Target/PowerPC/PPCInstrPrefix.td | ||
1471 | If possible, I think it is better to leave the instruction patterns in the position they were in, and just add the patterns to them. | |
llvm/test/CodeGen/PowerPC/p10-vector-rotate.ll | ||
3 | Please also add the BE run line. |
The remaining requests can be fulfilled when committing. I don't think this requires another review. Thanks.
clang/lib/Headers/altivec.h | ||
---|---|---|
7988 | Please add explicit cast to vector unsigned __int128 for MaskAndShift. Similarly below. I forgot to add that to my comment. | |
clang/test/CodeGen/builtins-ppc-p10vector.c | ||
1628 |
This was still not addressed. Please show the shuffle in the checks. | |
llvm/test/CodeGen/PowerPC/p10-vector-rotate.ll | ||
60 | Add a test case for this that was produced from vec_rlnm at -O2. |
While correct, this implementation will require two constant pool loads (for the two shift amounts), then two vrlq's to shift the two vectors and finally an xxlor to OR them together. We should be able to do this with a single constant pool load and vperm.
Presumably the implementation would be something like:
(but of course, double-check that the numbers are correct).