The Paired Single extension is a extension to the PowerPC ISA and is
found on the PPC 750CL-series processors. These processors were most
notably used in Nintendo home consoles.
The Paired Single extension adds a simple version of SIMD to the
architecture. Each SIMD Vector contains 2 32-bit floats. Most of the
floating-point instructions supported by the 750CL are also implemented
in the Paired Single extension.
This initial patch adds the following intrinsic declarations (in
alphabetic order):
v2f32 ppc_paired_l(const v2f32 *src, i1 word, i8 gqr); void ppc_paired_st(v2f32 val, v2f32 *dst, i1 word, i8 gqr); v2f32 ppc_paired_madds0(v2f32 a, v2f32 b, v2f32 c); v2f32 ppc_paired_madds1(v2f32 a, v2f32 b, v2f32 c); v2f32 ppc_paired_merge00(v2f32 a, v2f32 b); v2f32 ppc_paired_merge01(v2f32 a, v2f32 b); v2f32 ppc_paired_merge10(v2f32 a, v2f32 b); v2f32 ppc_paired_merge11(v2f32 a, v2f32 b); v2f32 ppc_paired_muls0(v2f32 a, v2f32 b); v2f32 ppc_paired_muls1(v2f32 a, v2f32 b); v2f32 ppc_paired_sel(v2f32 sel, v2f32 a, v2f32 bit); v2f32 ppc_paired_sum0(v2f32 a, v2f32 b, v2f32 c); v2f32 ppc_paired_sum1(v2f32 a, v2f32 b, v2f32 c);
- ppc_paired_l maps to psq_l, psq_lx, psq_lu or psq_lux. It loads the paired single from the memory location
- ppc_paired_st maps to psq_st, psq_stx, psq_stu, or psq_stux. It stores the paired single to the memory location.
- ppc_paired_madds0 and ppc_paired_madds1 are multiply-accumulate a
- c + b with the 1st or 2nd element of c being used for both elements respectively.
- ppc_paired_merge[01][01] merges the 1st/2nd element of the first vector with the 1st/2nd element of the second vector and creates a new vector
- ppc_paired_muls[01] multiply a with the 1st/2nd element of b (depending on the name)
- ppc_paired_sel is a dynamic variant of the merge* functions. It will take an element of vector a if the corresponding control entry is smaller than 0 and will use b otherwise
- ppc_paired_sum[01] will replace the 1st/2nd element of c with the sum of the first element of a and the second element of b
[PowerPC] Add Intrinsic for broadway-exclusive dcbz_l
A feature that only exists on the Nintendo Versions of the 750CL is the
locked cache. To support it one instruction was added: dcbz_l.
[PowerPC] Add assembly support for Paired Single
In particular, this commit adds support for the 32 2x32 bit "paired
single" point registers, makes changes to the calling convention to
support paired singles and adds all of the new instructions.
The ABI is not standardized everywhere. As far as I can tell there is no
standard ABI. Other compilers appear to not support passing paired
single types as arguments or to even interact with the extension at all
outside of inline assembly.
This calling convention is based on the Floating Point cc, meaning that
the first 8 arguments are stored in psf1-8 and 14-31 are callee-saved
registers. Another thing not found in the code is that GQR0 (SPR 912)
has to be set to 0. There is no GQR register class because the GQRs
can not be read or set from userspace.
Special thanks to Github user Tilka who created a similar, but much smaller patchset in 2014.
I'm currently working on getting codegen for the extension working. Many instructions are already working, however it is quite useless at the current stage since loads and stores are not working.