diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -13,6 +13,7 @@
    BlockLanguageSpec
    Block-ABI-Apple
    AutomaticReferenceCounting
+   MatrixSupport
 
 Introduction
 ============
@@ -492,6 +493,12 @@
   'select', they operate somewhat differently. OpenCL selects based on signedness of
   the condition operands, but GCC vectors use normal bool conversions (that is, != 0).
 
+Matrixes
+========
+
+Clang provides a matrix extension, which is currently being implemented. See
+:ref:`matrixsupport` for more details.
+
 Half-Precision Floating Point
 =============================
 
diff --git a/clang/docs/MatrixSupport.rst b/clang/docs/MatrixSupport.rst
new file mode 100644
--- /dev/null
+++ b/clang/docs/MatrixSupport.rst
@@ -0,0 +1,305 @@
+==================
+Matrix Support
+==================
+
+.. contents::
+   :local:
+
+.. _matrixsupport:
+
+Clang provides a language extension that allows users to express high-level
+matrix math on the C/C++ level. The draft specification can be found
+:ref:`below <matrixsupport-draftspec>`.
+
+Note that the implementation is currently in progress.
+
+.. _matrixsupport-draftspec:
+
+Draft Specification
+===================
+
+Matrix Type Attribute
+---------------------
+
+The *attribute-token* ``matrix_type`` is used to declare a matrix type. It shall
+appear at most once in each *attribute-list*. The attribute shall only appertain
+to a *typedef-name* of a typedef of a non-volatile type that is a *signed
+integer type*, an *unsigned integer type*, or a *floating-point type*. The
+element type must be unqualified and qualifiers are applied to the matrix type
+directly. An *attribute-argument-clause* must be present and it shall have the
+form:
+
+``(constant-expression, constant-expression)``
+
+Both *constant-expressions* shall be a positive non-zero integral constant
+expressions. The maximum of the product of the constants is implementation
+defined. If that implementation defined limit is exceeded, the program is
+ill-formed.
+
+An *attribute* of the form ``matrix_type(R, C)`` forms a matrix type with an
+element type of the cv-qualified type the attribute appertains to and *R* rows
+and *C* columns.
+
+If a declaration of a *typedef-name* has a ``matrix_type`` attribute, then all
+declaration of that *typedef-name* shall have a matrix_type attribute with the
+same element type, number of rows, and number of columns.
+
+Matrix Type
+-----------
+
+A matrix type has an underlying *element type*, a constant number of rows, and
+a constant number of columns. Matrix types with the same element type, rows,
+and columns are the same type. A value of a matrix type contains ``rows *
+columns`` values of the *element type* laid out in column-major order without
+padding in a way compatible with an array of at least that many elements of the
+underlying *element type*.
+
+A matrix type is a *scalar type* and its alignment is implementation defined.
+
+TODO: Does it make sense to allow M::element_type, M::rows, and M::columns
+where M is a matrix type? We don’t support this anywhere else, but it’s
+convenient. The alternative is using template deduction to extract this
+information. Also add spelling for C.
+
+TODO: Specify how matrix values are passed to functions.
+
+Future Work: Initialization syntax.
+
+Standard Conversions
+--------------------
+
+The standard conversions are extended as follows. Note that these conversions
+are intentionally not listed as satisfying the constraints for assignment,
+which is to say, they are only permitted as explicit casts, not as implicit
+conversions.
+
+For integral promotions, floating-point promotion, integral conversions,
+floating-point conversions, and floating-integral conversions: apply the rules
+to the underlying type of the matrix type. The resulting type is a matrix type
+with that underlying element type. If the number of rows or columns differ
+between the original and resulting type, the program is ill-formed. Otherwise
+the resulting value is as follows:
+
+* If the original value was of matrix type, each element is converted element
+  by element.
+* If the original value was not of matrix type, each element takes the value of
+  the original value.
+
+Arithmetic Conversions
+----------------------
+
+The usual arithmetic conversions are extended as follows.
+
+Insert at the start:
+
+* If both operands are of matrix type, no arithmetic conversion is performed.
+* If one operand is of matrix type and the other operand is of an integer or
+  floating point type, convert the integer or floating point operand to the
+  underlying element type of the operand of matrix type.
+
+Matrix Type Element Access Operator
+-----------------------------------
+
+An expression of the form ``postfix-expression [expression][expression]`` where
+the ``postfix-expression`` is of matrix type is a matrix element access
+expression. ``expression`` shall not be a comma expression, and shall be a
+prvalue of unscoped enumeration or integral type. Given the expression
+``E1[E2][E3]`` the result is an lvalue of the same type as the underlying
+element type of the matrix that refers to the value at E2 row and E3 column in
+the matrix. The expression E1 is sequenced before E2 and E3. The expressions
+E2 and E3 are unsequenced.
+
+**Note**: We considered providing an expression of the form
+``postfix-expression [expression]`` to access columns of a matrix. We think
+that such an expression would be problematic once both column and row major
+matrixes are supported: depending on the memory layout, either accessing columns
+or rows can be done efficiently, but not both. Instead, we propose to provide
+builtins to extract rows and columns from a matrix. This makes the operations
+more explicit.
+
+Matrix Type Binary Operators
+----------------------------
+
+Each matrix type supports the following binary operators: ``+``, ``-`` and ``*``. The ``*``
+operator provides matrix multiplication, while ``+`` and ``-`` are performed
+element-wise. There are also scalar versions of the operators, which take a
+matrix type and the underlying element type. The operation is applied to all
+elements of the matrix using the scalar value.
+
+The operands of ``+``, ``-`` and ``*`` shall have either matrix type, arithmetic or
+unscoped enumeration type. At least one of the operands shall be of matrix type.
+
+For ``BIN_OP`` in ``+``, ``-``, ``*`` given the expression ``M1 BIN_OP M2`` where, for
+``*``, one of M1 or M2 is of arithmetic type:
+
+* The usual arithmetic conversions are applied to M1 and M2. [ Note: if M1 or
+  M2 are of arithmetic type, they are broadcast to matrices here. — end note ]
+* The matrix types of M1 and M2 shall have the same number of rows and columns.
+* The result is equivalent to Res in the following where col is the number of
+  columns and row is the number of rows in the matrix type:
+
+.. code-block:: c++
+
+  decltype(M1) Res;
+  for (int C = 0; C < col; ++C)
+    for (int R = 0; R < row; ++R)
+      Res[R][C] = M1[R][C] BIN_OP M2[R][C];
+
+Given the expression ``M1 * M2`` where ``M1`` and ``M2`` are of matrix type:
+
+* The usual arithmetic conversions are applied to ``M1`` and ``M2``.
+* The type of ``M1`` shall have the same number of columns as the type of ``M2`` has
+  rows.
+* The resulting type, ``MTy``, is the result of applying the usual arithmetic
+  conversions to ``M1`` and ``M2``, but with the same number of rows as M1’s matrix
+  type and the same number of columns as M2’s matrix type.
+* The result is equivalent to ``Res`` in the following where ``EltTy`` is the
+  element type of ``MTy``, ``col`` is the number of columns and ``row`` is the
+  number of rows in ``MTy``:
+
+.. code-block:: c++
+
+  MTy Res;
+  for (int C = 0; C < col; ++C) {
+    for (int R = 0; R < row; ++R) {
+      EltTy Elt = 0;
+      for (int K = 0; K < inner; ++K) {
+        Elt += M1[R][K] * M2[K][C];
+    }
+    Res[R][C] = Elt;
+  }
+
+All operations on matrix types match the behavior of the underlying element
+type with respect to signed overflows.
+
+With respect to rounding errors, the the ``*`` operator preserves the behavior of
+the separate multiply and add operations by default. We propose to provide a
+Clang option to override this behavior and allow contraction of those
+operations (e.g. *-ffp-contract=matrix*).
+
+Matrix Type builtin Operations
+------------------------------
+
+Each matrix type supports a collection of builtin expressions that look like
+function calls but do not form an overload set. Here they are described as
+function declarations with rules for how to construct the argument list types
+and return type and the library description elements from
+[library.description.structure.specifications]/3 in the C++ standard.
+
+Definitions:
+
+* *M*, *M1*, *M2*, *M3* - Matrix types
+* *T* - Element type
+* *row*, *col* - Row and column arguments respectively.
+
+
+``M2 __builtin_matrix_transpose(M1 matrix)``
+
+**Remarks**: The return type is a cv-unqualified matrix type that has the same
+element type as ``M1`` and has the the same number of rows as ``M1`` has columns and
+the same number of columns as ``M1`` has rows.
+
+**Returns**: A matrix ``Res`` equivalent to the code below, where ``col`` refers to the
+number of columns of ``M``, and ``row`` to the number of rows of ``M``.
+
+**Effects**: Equivalent to:
+
+.. code-block:: c++
+
+  M Res;
+  for (int C = 0; C < col; ++C)
+    for (int R = 0; R < row; ++R)
+      Res[C][R] = matrix[R][C];
+
+
+``M __builtin_matrix_column_load(T *ptr, int row, int col, int stride)``
+
+**Mandates**: ``row`` and ``col`` shall be integral constants greater than 0.
+
+**Preconditions**: ``stride >= row``.
+
+**Remarks**: The return type is a cv-unqualified matrix type with an element
+type of the cv-unqualified version of ``T`` and a number of rows and columns equal
+to ``row`` and ``col`` respectively.
+
+**Returns**: A matrix ``Res`` equivalent to:
+
+.. code-block:: c++
+
+  M Res;
+  for (int C = 0; C < col; ++C) {
+    for (int R = 0; R < row; ++K)
+      Res[R][C] = ptr[R];
+    ptr += stride
+  }
+
+
+``void __builtin_matrix_column_store(M matrix, T *ptr, int stride)``
+
+**Preconditions**: ``stride`` is greater than or equal to the number of rows in ``M``.
+
+**Effects**: Equivalent to:
+
+.. code-block:: c++
+
+  for (int C = 0; C < columns in M; ++C) {
+    for (int R = 0; R < rows in M; ++K)
+      ptr[R] = matrix[R][C];
+    ptr += stride
+  }
+
+**Remarks**: The type ``T`` is the const-unqualified version of the matrix
+argument’s element type.
+
+Example
+=======
+
+This code performs a matrix-multiply of two 4x4 *float* matrixes followed by an matrix addition:
+
+.. code-block:: c++
+
+  typedef float m4x4_t __attribute__((matrix_type(4, 4)));
+
+  void f(m4x4_t *a, m4x4_t *b, m4x4_t *c, m4x4_t *r) {
+    *r = *a +  (*b * *c);
+  }
+
+
+This will get lowered by Clang to the LLVM IR below. In our current
+implementation, we use LLVM’s array type as storage type for the matrix
+data. Before accessing the data, we cast the array to a vector type. This
+allows us to use the element width as alignment, without running into issues
+with LLVM’s large default alignment for vector types, which is problematic in
+structs.
+
+.. code::
+
+  define void @f([16 x float]* %a, [16 x float]* %b, [16 x float]* %c, [16 x float]* %r) #0 {
+  entry:
+    %a.addr = alloca [16 x float]*, align 8
+    %b.addr = alloca [16 x float]*, align 8
+    %c.addr = alloca [16 x float]*, align 8
+    %r.addr = alloca [16 x float]*, align 8
+    store [16 x float]* %a, [16 x float]** %a.addr, align 8
+    store [16 x float]* %b, [16 x float]** %b.addr, align 8
+    store [16 x float]* %c, [16 x float]** %c.addr, align 8
+    store [16 x float]* %r, [16 x float]** %r.addr, align 8
+    %0 = load [16 x float]*, [16 x float]** %a.addr, align 8
+    %1 = bitcast [16 x float]* %0 to <16 x float>*
+    %2 = load <16 x float>, <16 x float>* %1, align 4
+    %3 = load [16 x float]*, [16 x float]** %b.addr, align 8
+    %4 = bitcast [16 x float]* %3 to <16 x float>*
+    %5 = load <16 x float>, <16 x float>* %4, align 4
+    %6 = call <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float> %2, <16 x float> %5, i32 4, i32 4, i32 4)
+    %7 = load [16 x float]*, [16 x float]** %c.addr, align 8
+    %8 = bitcast [16 x float]* %7 to <16 x float>*
+    %9 = load <16 x float>, <16 x float>* %8, align 4
+    %10 = fadd <16 x float> %6, %9
+    %11 = load [16 x float]*, [16 x float]** %r.addr, align 8
+    %12 = bitcast [16 x float]* %11 to <16 x float>*
+    store <16 x float> %10, <16 x float>* %12, align 4
+    ret void
+  }
+  ; Function Attrs: nounwind readnone speculatable willreturn
+  declare <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float>, <16 x floa
+