[PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins

Authored by saghir on Aug 6 2021, 7:44 AM.



This patch adds the following built-ins:


Actually we should not be removing the deprecated bultins. Just need to add the new ones.

Yes, and also the semantics are different.

Change the implementation to add new builtins and keep the depracated builtins.

doesn't look like we need the interm var IsLE. Just use the call directly within the if stmt.

if (getTarget().isLittleEndian()) {
unsigned NumVecs = (BuiltinID == PPC::BI__builtin_mma_build_acc) ? 4 : 2;
2 ↗(On Diff #368176)

future -> pwr10
We need to add BE tests.


this should be -target-cpu pwer10 now.


Please add BE testing.


This looks like a dup of testVPLocal(). Why not just add the new call line to that function right below the call to the deprecated function?


please update to pwr10

There are some questions to answer here before proceeding:

  1. Why do we not get paired vector loads and stores in the back end test cases for either little or big endian and regardless of whether we use the old or new builtins/intrinsics?
  2. Is the code generated the same as that for GCC when:
    • The inputs are all from registers
    • The inputs are all from memory
    • The inputs are a mix of memory and registers
  3. What do execution tests show for both little endian and big endian targets? Namely, write execution test cases that do everything mentioned in 2. above for both sets of builtins and confirm:
    • The behaviour is the same both with GCC and Clang for both LE and BE
    • The behaviour for new builtins is the same on LE and BE with both GCC and Clang
    • The behaviour for old builtins is different on LE and BE with both GCC and Clang

I am requesting changes until these questions are adequately addressed.

1442 ↗(On Diff #368176)

I find the need for this rather surprising. The new builtins should do the exact same thing as the old builtins but with elements in reversed order. So why do we need new intrinsics? Can we not just call the old intrinsics in the front end codegen but with arguments in reversed order?

73 ↗(On Diff #368176)

This indicates some kind of problem. Why are we moving a value from v2 to v3 and then not using v3?

Addressed review comments.

LGTM other than the code can be simplified as suggested.


This entire block seems to simply be std::reverse(Ops.begin() + 1, Ops.end())

Also, please add a note that the very first operand is the pointer to the pair/accumulator that is actually being built.

LGTM once the code is simplified as Nemanja suggested.

