This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins
ClosedPublic

Authored by saghir on Aug 6 2021, 7:44 AM.

Details

Summary

This patch adds the following built-ins:

builtin_vsx_build_pair
builtin_mma_build_acc

Diff Detail

Event Timeline

saghir created this revision.Aug 6 2021, 7:44 AM
saghir requested review of this revision.Aug 6 2021, 7:44 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 6 2021, 7:44 AM
saghir retitled this revision from Remove depracated built-ins for MMA and replace with new built-ins to [PowerPC] MMA - Remove deprecated built-ins and add new built-ins.Aug 6 2021, 7:50 AM
saghir edited the summary of this revision. (Show Details)
saghir added reviewers: Restricted Project, nemanjai, stefanp, amyk.
saghir added a project: Restricted Project.
lei added a subscriber: lei.Aug 10 2021, 9:43 AM
This comment was removed by lei.
lei accepted this revision as: lei.Aug 10 2021, 9:44 AM
This revision is now accepted and ready to land.Aug 10 2021, 9:44 AM
lei requested changes to this revision.Aug 10 2021, 11:35 AM

Actually we should not be removing the deprecated bultins. Just need to add the new ones.

This revision now requires changes to proceed.Aug 10 2021, 11:35 AM

Actually we should not be removing the deprecated bultins. Just need to add the new ones.

Yes, and also the semantics are different.

saghir updated this revision to Diff 367530.Aug 19 2021, 9:46 AM

Change the implementation to add new builtins and keep the depracated builtins.

saghir retitled this revision from [PowerPC] MMA - Remove deprecated built-ins and add new built-ins to [PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins.Aug 19 2021, 10:10 AM
saghir edited the summary of this revision. (Show Details)
saghir updated this revision to Diff 368109.Aug 23 2021, 7:51 AM

Added some more tests.

saghir updated this revision to Diff 368176.Aug 23 2021, 12:37 PM

Added comments, and re-organized tests.

lei added inline comments.Aug 26 2021, 8:30 AM
clang/lib/CodeGen/CGBuiltin.cpp
15919

doesn't look like we need the interm var IsLE. Just use the call directly within the if stmt.

if (getTarget().isLittleEndian()) {
15921–15923
unsigned NumVecs = (BuiltinID == PPC::BI__builtin_mma_build_acc) ? 4 : 2;
clang/test/CodeGen/builtins-ppc-pair-mma.c
2 ↗(On Diff #368176)

future -> pwr10
We need to add BE tests.

clang/test/Sema/ppc-pair-mma-types.c
2–4

this should be -target-cpu pwer10 now.

2–4

Please add BE testing.

268

This looks like a dup of testVPLocal(). Why not just add the new call line to that function right below the call to the deprecated function?

clang/test/SemaCXX/ppc-pair-mma-types.cpp
2–4

please update to pwr10

nemanjai requested changes to this revision.Sep 2 2021, 4:38 AM

There are some questions to answer here before proceeding:

  1. Why do we not get paired vector loads and stores in the back end test cases for either little or big endian and regardless of whether we use the old or new builtins/intrinsics?
  2. Is the code generated the same as that for GCC when:
    • The inputs are all from registers
    • The inputs are all from memory
    • The inputs are a mix of memory and registers
  3. What do execution tests show for both little endian and big endian targets? Namely, write execution test cases that do everything mentioned in 2. above for both sets of builtins and confirm:
    • The behaviour is the same both with GCC and Clang for both LE and BE
    • The behaviour for new builtins is the same on LE and BE with both GCC and Clang
    • The behaviour for old builtins is different on LE and BE with both GCC and Clang

I am requesting changes until these questions are adequately addressed.

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1442 ↗(On Diff #368176)

I find the need for this rather surprising. The new builtins should do the exact same thing as the old builtins but with elements in reversed order. So why do we need new intrinsics? Can we not just call the old intrinsics in the front end codegen but with arguments in reversed order?

llvm/test/CodeGen/PowerPC/paired-vector-intrinsics.ll
73 ↗(On Diff #368176)

This indicates some kind of problem. Why are we moving a value from v2 to v3 and then not using v3?

This revision now requires changes to proceed.Sep 2 2021, 4:38 AM
saghir updated this revision to Diff 372149.Sep 12 2021, 7:27 PM

Addressed review comments.

nemanjai accepted this revision.Sep 22 2021, 4:07 PM

LGTM other than the code can be simplified as suggested.

clang/lib/CodeGen/CGBuiltin.cpp
15919–15926

This entire block seems to simply be std::reverse(Ops.begin() + 1, Ops.end())

Also, please add a note that the very first operand is the pointer to the pair/accumulator that is actually being built.

lei accepted this revision.Sep 23 2021, 6:39 AM

LGTM once the code is simplified as Nemanja suggested.

This revision is now accepted and ready to land.Sep 23 2021, 6:39 AM
saghir updated this revision to Diff 375443.Sep 27 2021, 5:49 PM

Addressed review comments.

This revision was landed with ongoing or failed builds.Sep 27 2021, 5:51 PM
This revision was automatically updated to reflect the committed changes.