Page MenuHomePhabricator

[PowerPC][Power10] Implementation of 128-bit Binary Vector Rotate builtins
ClosedPublic

Authored by Conanap on Aug 28 2020, 3:09 PM.

Details

Summary

This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10.

Diff Detail

Event Timeline

Conanap created this revision.Aug 28 2020, 3:09 PM
Conanap requested review of this revision.Aug 28 2020, 3:09 PM
Conanap updated this revision to Diff 289064.Aug 31 2020, 8:10 PM

Included extra test cases

nemanjai added inline comments.Sep 10 2020, 7:48 AM
clang/lib/Headers/altivec.h
7995

While correct, this implementation will require two constant pool loads (for the two shift amounts), then two vrlq's to shift the two vectors and finally an xxlor to OR them together. We should be able to do this with a single constant pool load and vperm.
Presumably the implementation would be something like:

  // Merge __b and __c using an appropriate shuffle.
  vector unsigned char TmpB = (vector unsigned char)__b;
  vector unsigned char TmpC = (vector unsigned char)__c;
  vector unsigned char MaskAndShift =
#ifdef __LITTLE_ENDIAN__
      __builtin_shufflevector(TmpB, TmpC, -1, -1, -1, -1, -1, -1, -1, -1, 16, 1,
                              0, -1, -1, -1, -1, -1);
#else
      __builtin_shufflevector(TmpB, TmpC, -1, -1, -1, -1, -1, 30, 31, 15, -1,
                              -1, -1, -1, -1, -1, -1, -1);
#endif
  return __builtin_altivec_vrlqnm(__a, MaskAndShift);

(but of course, double-check that the numbers are correct).

clang/test/CodeGen/builtins-ppc-p10vector.c
1628

Please show the shift in the test case as well.

amyk added a subscriber: amyk.Sep 17 2020, 2:29 PM
amyk added inline comments.
clang/test/CodeGen/builtins-ppc-p10vector.c
4

The CHECK-COMMON should not be needed. You can just use the CHECK prefix in the tests since we have set up check prefixes.

llvm/lib/Target/PowerPC/PPCInstrPrefix.td
2133

If possible, I think it is better to leave the instruction patterns in the position they were in, and just add the patterns to them.

llvm/test/CodeGen/PowerPC/p10-vector-rotate.ll
3

Please also add the BE run line.

Conanap updated this revision to Diff 293474.Sep 22 2020, 8:51 AM
Conanap marked 2 inline comments as done.

Changed implementation for vrlqnm as per Nemanja

nemanjai accepted this revision.Sep 24 2020, 6:46 AM

The remaining requests can be fulfilled when committing. I don't think this requires another review. Thanks.

clang/lib/Headers/altivec.h
8003

Please add explicit cast to vector unsigned __int128 for MaskAndShift. Similarly below. I forgot to add that to my comment.

clang/test/CodeGen/builtins-ppc-p10vector.c
1628

Please show the shift in the test case as well.

This was still not addressed. Please show the shuffle in the checks.

llvm/test/CodeGen/PowerPC/p10-vector-rotate.ll
60

Add a test case for this that was produced from vec_rlnm at -O2.

This revision is now accepted and ready to land.Sep 24 2020, 6:46 AM
Conanap updated this revision to Diff 295393.Wed, Sep 30, 1:27 PM
Conanap marked 6 inline comments as done.

Addressed comments and corrected incorrect behaviour on Power10