diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -13,6 +13,7 @@ BlockLanguageSpec Block-ABI-Apple AutomaticReferenceCounting + MatrixTypes Introduction ============ @@ -492,6 +493,13 @@ 'select', they operate somewhat differently. OpenCL selects based on signedness of the condition operands, but GCC vectors use normal bool conversions (that is, != 0). +Matrix Types +============ + +Clang provides an extension for matrix types, which is currently being +implemented. See :ref:`matrixtypes` for more details. + + Half-Precision Floating Point ============================= diff --git a/clang/docs/MatrixTypes.rst b/clang/docs/MatrixTypes.rst new file mode 100644 --- /dev/null +++ b/clang/docs/MatrixTypes.rst @@ -0,0 +1,323 @@ +================== +Matrix Types +================== + +.. contents:: + :local: + +.. _matrixtypes: + +Clang provides a C/C++ language extension that allows users to directly express +fixed-size matrices as language values and perform arithmetic on them. + +This feature is currently experimental, and both its design and its +implementation are in flux. + +Draft Specification +=================== + +Matrix Type +----------- + +A matrix type is a scalar type with an underlying *element type*, a constant +number of *rows*, and a constant number of *columns*. Matrix types with the same +element type, rows, and columns are the same type. A value of a matrix type +includes storage for ``rows * columns`` values of the *element ype*. The +internal layout, overall size and alignment are implementation-defined. +A *matrix element type* must be a real type (as in C99 6.2.5p17) excluding +enumeration types or an implementation-defined half-precision floating point +type, otherwise the program is ill-formed. + +The maximum of the product of the number of rows and columns is +implementation-defined. If that implementation-defined limit is exceeded, the +program is ill-formed. + +Matrix Type Attribute +--------------------- + +Matrix types can be declared by adding the ``matrix_type`` attribute to the +declaration of a *typedef* (or a C++ alias declaration). The underlying type +of the *typedef* must be a valid matrix element type. The +attribute takes two arguments, both of which must be integer constant +expressions that evaluate to a value greater than zero. The first specifies the +number of rows, and the second specifies the number of columns. The underlying +type of the *typedef* becomes a matrix type with the given dimensions and an +element type of the former underlying type. + +If a declaration of a *typedef-name* has a ``matrix_type`` attribute, then all +declaration of that *typedef-name* shall have a matrix_type attribute with the +same element type, number of rows, and number of columns. + +Standard Conversions +-------------------- + +The standard conversions are extended as follows. Note that these conversions +are intentionally not listed as satisfying the constraints for assignment, +which is to say, they are only permitted as explicit casts, not as implicit +conversions. + +A value of matrix type can be converted to another matrix type if the number of +rows and columns are the size and the value's elements can be converted to the +element type of the result type. The result is a matrix where each element is +the converted corresponding element. + +A value of non-matrix type can be converted to a matrix type if it can be +converted to the element type of the matrix. The result is a matrix where +all elements are the converted original value. + +If the number of rows or columns differ between the original and resulting +type, the program is ill-formed. + + +Arithmetic Conversions +---------------------- + +The usual arithmetic conversions are extended as follows. + +Insert at the start: + +* If both operands are of matrix type, no arithmetic conversion is performed. +* If one operand is of matrix type and the other operand is of a valid matrix + element type, convert the non-matrix type operand to the matrix type + according to the standard conversion rules. + +Matrix Type Element Access Operator +----------------------------------- + +An expression of the form ``E1 [E2] [E3]``, where ``E1`` has matrix type ``cv +M``, is a matrix element access expression. Let ``T`` be the element type +of ``M``, and let ``R`` and ``C`` be the number of rows and columns in ``M`` +respectively. The index expressions shall have integral or unscoped +enumeration type and shall not be uses of the comma operator unless +parenthesized. The first index expression shall evaluate to a +non-negative value less than ``R``, and the second index expression shall +evaluate to a non-negative value less than ``C``, or else the expression has +undefined behavior. If ``E1`` is a prvalue, the result is a prvalue with type +``T`` and is the value of the element at the given row and column in the matrix. +Otherwise, the result is a glvalue with type ``cv T`` and with the same value +category as ``E1`` which refers to the element at the given row and column in +the matrix. + +Programs containing a single subscript expression into a matrix are ill-formed. + +**Note**: We considered providing an expression of the form +``postfix-expression [expression]`` to access columns of a matrix. We think +that such an expression would be problematic once both column and row major +matrixes are supported: depending on the memory layout, either accessing columns +or rows can be done efficiently, but not both. Instead, we propose to provide +builtins to extract rows and columns from a matrix. This makes the operations +more explicit. + +Matrix Type Binary Operators +---------------------------- + +Each matrix type supports the following binary operators: ``+``, ``-`` and ``*``. The ``*`` +operator provides matrix multiplication, while ``+`` and ``-`` are performed +element-wise. There are also scalar versions of the operators, which take a +matrix type and the underlying element type. The operation is applied to all +elements of the matrix using the scalar value. + +For ``BIN_OP`` in ``+``, ``-``, ``*`` given the expression ``M1 BIN_OP M2`` where +at least one of ``M1`` or ``M2`` is of matrix type and, for `*`, the other is of +arithmetic type: + +* The usual arithmetic conversions are applied to ``M1`` and ``M2``. [ Note: if ``M1`` or + ``M2`` are of arithmetic type, they are broadcast to matrices here. — end note ] +* ``M1`` and ``M2`` shall be of the same matrix type. +* The result is equivalent to Res in the following where col is the number of + columns and row is the number of rows in the matrix type: + +.. code-block:: c++ + + decltype(M1) Res; + for (int C = 0; C < col; ++C) + for (int R = 0; R < row; ++R) + Res[R][C] = M1[R][C] BIN_OP M2[R][C]; + +Given the expression ``M1 * M2`` where ``M1`` and ``M2`` are of matrix type: + +* The usual arithmetic conversions are applied to ``M1`` and ``M2``. +* The type of ``M1`` shall have the same number of columns as the type of ``M2`` has + rows. The element types of ``M1`` and ``M2`` shall be the same type. +* The resulting type, ``MTy``, is a matrix type with the common element type, + the number of rows of ``M1`` and the number of columns of ``M2``. +* The result is equivalent to ``Res`` in the following where ``EltTy`` is the + element type of ``MTy``, ``col`` is the number of columns, ``row`` is the + number of rows in ``MTy`` and ``inner`` is the number of columns of ``M1``: + +.. code-block:: c++ + + MTy Res; + for (int C = 0; C < col; ++C) { + for (int R = 0; R < row; ++R) { + EltTy Elt = 0; + for (int K = 0; K < inner; ++K) { + Elt += M1[R][K] * M2[K][C]; + } + Res[R][C] = Elt; + } + +All operations on matrix types match the behavior of the underlying element +type with respect to signed overflows. + +With respect to floating-point contraction, rounding and environment rules, +operations on matrix types match the behavior of the elementwise operations +in the corresponding expansions provided above. + +For the ``+=``, ``-=`` and ``*=`` operators the semantics match their expanded +variants. + +Matrix Type Builtin Operations +------------------------------ + +Each matrix type supports a collection of builtin expressions that look like +function calls but do not form an overload set. Here they are described as +function declarations with rules for how to construct the argument list types +and return type and the library description elements from +[library.description.structure.specifications]/3 in the C++ standard. + +Definitions: + +* *M*, *M1*, *M2*, *M3* - Matrix types +* *T* - Element type +* *row*, *col* - Row and column arguments respectively. + + +``M2 __builtin_matrix_transpose(M1 matrix)`` + +**Remarks**: The return type is a cv-unqualified matrix type that has the same +element type as ``M1`` and has the the same number of rows as ``M1`` has columns and +the same number of columns as ``M1`` has rows. + +**Returns**: A matrix ``Res`` equivalent to the code below, where ``col`` refers to the +number of columns of ``M``, and ``row`` to the number of rows of ``M``. + +**Effects**: Equivalent to: + +.. code-block:: c++ + + M Res; + for (int C = 0; C < col; ++C) + for (int R = 0; R < row; ++R) + Res[C][R] = matrix[R][C]; + + +``M __builtin_matrix_column_major_load(T *ptr, int row, int col, int columnStride)`` + +**Mandates**: ``row`` and ``col`` shall be integral constants greater than 0. + +**Preconditions**: ``columnStride`` is greater than or equal to ``row``. + +**Remarks**: The return type is a cv-unqualified matrix type with an element +type of the cv-unqualified version of ``T`` and a number of rows and columns equal +to ``row`` and ``col`` respectively. The parameter ``columnStride`` is optional +and if ommitted ``row`` is used as ``columnStride``. + +**Returns**: A matrix ``Res`` equivalent to: + +.. code-block:: c++ + + M Res; + for (int C = 0; C < col; ++C) { + for (int R = 0; R < row; ++K) + Res[R][C] = ptr[R]; + ptr += columnStride + } + + +``void __builtin_matrix_column_major_store(M matrix, T *ptr, int columnStride)`` + +**Preconditions**: ``columnStride`` is greater than or equal to the number of rows in ``M``. + +**Remarks**: The type ``T`` is the const-unqualified version of the matrix +argument’s element type. The paramter ``columnStride`` is optional and if +ommitted, the number of rows of ``M`` is used as ``columnStride``. + +**Effects**: Equivalent to: + +.. code-block:: c++ + + for (int C = 0; C < columns in M; ++C) { + for (int R = 0; R < rows in M; ++K) + ptr[R] = matrix[R][C]; + ptr += columnStride + } + + +TODOs +----- + +TODO: Does it make sense to allow M::element_type, M::rows, and M::columns +where M is a matrix type? We don’t support this anywhere else, but it’s +convenient. The alternative is using template deduction to extract this +information. Also add spelling for C. + +Future Work: Initialization syntax. + + +Decisions for the Implementation in Clang +========================================= + +This section details decisions taken for the implementation in Clang and is not +part of the draft specification. + +The elements of a value of a matrix type are laid out in column-major order +without padding. + +We propose to provide a Clang option to override this behavior and allow +contraction of those operations (e.g. *-ffp-contract=matrix*). + +TODO: Specify how matrix values are passed to functions. + +Example +======= + +This code performs a matrix-multiply of two 4x4 *float* matrixes followed by an matrix addition: + +.. code-block:: c++ + + typedef float m4x4_t __attribute__((matrix_type(4, 4))); + + void f(m4x4_t *a, m4x4_t *b, m4x4_t *c, m4x4_t *r) { + *r = *a + (*b * *c); + } + + +This will get lowered by Clang to the LLVM IR below. In our current +implementation, we use LLVM’s array type as storage type for the matrix +data. Before accessing the data, we cast the array to a vector type. This +allows us to use the element width as alignment, without running into issues +with LLVM’s large default alignment for vector types, which is problematic in +structs. + +.. code:: + + define void @f([16 x float]* %a, [16 x float]* %b, [16 x float]* %c, [16 x float]* %r) #0 { + entry: + %a.addr = alloca [16 x float]*, align 8 + %b.addr = alloca [16 x float]*, align 8 + %c.addr = alloca [16 x float]*, align 8 + %r.addr = alloca [16 x float]*, align 8 + store [16 x float]* %a, [16 x float]** %a.addr, align 8 + store [16 x float]* %b, [16 x float]** %b.addr, align 8 + store [16 x float]* %c, [16 x float]** %c.addr, align 8 + store [16 x float]* %r, [16 x float]** %r.addr, align 8 + %0 = load [16 x float]*, [16 x float]** %a.addr, align 8 + %1 = bitcast [16 x float]* %0 to <16 x float>* + %2 = load <16 x float>, <16 x float>* %1, align 4 + %3 = load [16 x float]*, [16 x float]** %b.addr, align 8 + %4 = bitcast [16 x float]* %3 to <16 x float>* + %5 = load <16 x float>, <16 x float>* %4, align 4 + %6 = call <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float> %2, <16 x float> %5, i32 4, i32 4, i32 4) + %7 = load [16 x float]*, [16 x float]** %c.addr, align 8 + %8 = bitcast [16 x float]* %7 to <16 x float>* + %9 = load <16 x float>, <16 x float>* %8, align 4 + %10 = fadd <16 x float> %6, %9 + %11 = load [16 x float]*, [16 x float]** %r.addr, align 8 + %12 = bitcast [16 x float]* %11 to <16 x float>* + store <16 x float> %10, <16 x float>* %12, align 4 + ret void + } + ; Function Attrs: nounwind readnone speculatable willreturn + declare <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float>, <16 x floa +