Page MenuHomePhabricator

[PowerPC] Add Support for PPC750 Paired Single ext
Needs ReviewPublic

Authored by DarkKirb on Aug 3 2020, 8:02 AM.


Group Reviewers
Restricted Project

The Paired Single extension is a extension to the PowerPC ISA and is
found on the PPC 750CL-series processors. These processors were most
notably used in Nintendo home consoles.

The Paired Single extension adds a simple version of SIMD to the
architecture. Each SIMD Vector contains 2 32-bit floats. Most of the
floating-point instructions supported by the 750CL are also implemented
in the Paired Single extension.

This initial patch adds the following intrinsic declarations (in

alphabetic order):
v2f32 ppc_paired_l(const v2f32 *src, i1 word, i8 gqr);
void  ppc_paired_st(v2f32 val, v2f32 *dst, i1 word, i8 gqr);
v2f32 ppc_paired_madds0(v2f32 a, v2f32 b, v2f32 c);
v2f32 ppc_paired_madds1(v2f32 a, v2f32 b, v2f32 c);
v2f32 ppc_paired_merge00(v2f32 a, v2f32 b);
v2f32 ppc_paired_merge01(v2f32 a, v2f32 b);
v2f32 ppc_paired_merge10(v2f32 a, v2f32 b);
v2f32 ppc_paired_merge11(v2f32 a, v2f32 b);
v2f32 ppc_paired_muls0(v2f32 a, v2f32 b);
v2f32 ppc_paired_muls1(v2f32 a, v2f32 b);
v2f32 ppc_paired_sel(v2f32 sel, v2f32 a, v2f32 bit);
v2f32 ppc_paired_sum0(v2f32 a, v2f32 b, v2f32 c);
v2f32 ppc_paired_sum1(v2f32 a, v2f32 b, v2f32 c);
  • ppc_paired_l maps to psq_l, psq_lx, psq_lu or psq_lux. It loads the paired single from the memory location
  • ppc_paired_st maps to psq_st, psq_stx, psq_stu, or psq_stux. It stores the paired single to the memory location.
  • ppc_paired_madds0 and ppc_paired_madds1 are multiply-accumulate a
    • c + b with the 1st or 2nd element of c being used for both elements respectively.
  • ppc_paired_merge[01][01] merges the 1st/2nd element of the first vector with the 1st/2nd element of the second vector and creates a new vector
  • ppc_paired_muls[01] multiply a with the 1st/2nd element of b (depending on the name)
  • ppc_paired_sel is a dynamic variant of the merge* functions. It will take an element of vector a if the corresponding control entry is smaller than 0 and will use b otherwise
  • ppc_paired_sum[01] will replace the 1st/2nd element of c with the sum of the first element of a and the second element of b

[PowerPC] Add Intrinsic for broadway-exclusive dcbz_l

A feature that only exists on the Nintendo Versions of the 750CL is the
locked cache. To support it one instruction was added: dcbz_l.

NOTE: this instruction is not the same as dcbzl. They have different instruction encodings and also different semantics. It works similar to dcbz, however it requires the locked cache to be enabled.

[PowerPC] Add assembly support for Paired Single

In particular, this commit adds support for the 32 2x32 bit "paired
single" point registers, makes changes to the calling convention to
support paired singles and adds all of the new instructions.

The ABI is not standardized everywhere. As far as I can tell there is no
standard ABI. Other compilers appear to not support passing paired
single types as arguments or to even interact with the extension at all
outside of inline assembly.

This calling convention is based on the Floating Point cc, meaning that
the first 8 arguments are stored in psf1-8 and 14-31 are callee-saved
registers. Another thing not found in the code is that GQR0 (SPR 912)
has to be set to 0. There is no GQR register class because the GQRs
can not be read or set from userspace.

Special thanks to Github user Tilka who created a similar, but much smaller patchset in 2014.

I'm currently working on getting codegen for the extension working. Many instructions are already working, however it is quite useless at the current stage since loads and stores are not working.

Diff Detail

Unit TestsFailed

780 mslinux > libomp.lock::omp_init_lock.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/lock/omp_init_lock.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/lock/Output/omp_init_lock.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/lock/Output/omp_init_lock.c.tmp

Event Timeline

DarkKirb created this revision.Aug 3 2020, 8:02 AM
DarkKirb requested review of this revision.Aug 3 2020, 8:02 AM
DarkKirb edited the summary of this revision. (Show Details)Aug 3 2020, 8:08 AM
Erk added a subscriber: Erk.Aug 3 2020, 8:10 AM
DarkKirb updated this revision to Diff 287892.Wed, Aug 26, 3:06 AM
This comment was removed by DarkKirb.
DarkKirb added reviewers: hfinkel, Restricted Project.Wed, Aug 26, 3:11 AM
DarkKirb updated this revision to Diff 287959.Wed, Aug 26, 6:37 AM
This comment was removed by DarkKirb.
DarkKirb updated this revision to Diff 287961.Wed, Aug 26, 6:46 AM

[PowerPC] Add Test for Paired Single Decoding

This has uncovered multiple bugs with my previous implementation, namely that
ps_res and ps_frsqte had the wrong opcode and that the decoding for
immediate-offset loads and stores in ps_l* and ps_st* caused internal compiler

I couldn't reuse the logic from other immediate-offset code because

  1. The register and the offset field are not right next to each other
  2. The currently existing memri* asm operands are not 12 bit with unaligned offsets.

This commit also fixes the test suite for powerpc as it puts the new
instructions into a separate namespace (called Paired).