Index: llvm/docs/LangRef.rst =================================================================== --- llvm/docs/LangRef.rst +++ llvm/docs/LangRef.rst @@ -15496,17 +15496,20 @@ ----------------- Operations on matrixes requiring shape information (like number of rows/columns -or the memory layout) can be expressed using the matrix intrinsics. Matrixes are -embedded in a flat vector and the intrinsics take the dimensions as arguments. -Currently column-major layout is assumed. The intrinsics support both integer -and floating point matrixes. +or the memory layout) can be expressed using the matrix intrinsics. These +intrinsics require matrix dimensions to be passed as immediate arguments, and +matrixes are passed and returned as vectors. This means that for a ``R`` x +``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in its +vector, with indices starting at 0. Currently column-major layout is assumed. +The intrinsics support both integer and floating point matrixes. '``llvm.matrix.transpose.*``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15515,21 +15518,24 @@ Overview: """"""""" -The '``llvm.matrix.transpose.*``' intrinsic treats %In as containing a matrix -with rows and columns and returns the transposed matrix embedded in -the result vector. +The '``llvm.matrix.transpose.*``' intrinsics treat %In as a x matrix +and return the transposed matrix in the result vector. Arguments: """""""""" -The and arguments must be constant integers. The vector argument -%In and the returned vector must have * elements. +First argument %In is vector that corresponds to a x matrix. +Thus, arguments and correspond to the number of rows and columns, +respectively, and must be positive, constant integers. The returned vector must +have * elements, and have the same float or integer element type +as %In. '``llvm.matrix.multiply.*``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15538,18 +15544,19 @@ Overview: """"""""" -The '``llvm.matrix.multiply.*``' intrinsic treats %A as a matrix with -rows and columns, %B as a matrix with rows and -columns and multiplies them. The result matrix is returned embedded in the -result vector. +The '``llvm.matrix.multiply.*``' intrinsics treat %A as a x +matrix, %B as a x matrix, and multiplies them. The result +matrix is returned in the result vector. Arguments: """""""""" -The , and arguments must be constant -integers. The vector argument %A must have * elements, %B -must have * elements and the returned vector must have - * elements. +First vector argument %A corresponds to a matrix with * +elements, and second argument %B to a matrix with * +elements. Arguments , and must be positive, +constant integers. The returned vector must have * +elements. Vectors %A, %B, and the returned vector all have the same float or +integer element type. '``llvm.matrix.column.major.load.*``' Intrinsic @@ -15557,6 +15564,7 @@ Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15566,22 +15574,26 @@ Overview: """"""""" -The '``llvm.matrix.column.major.load.*``' intrinsic loads a matrix with -rows and columns, using a stride of %Stride between columns. For two -consecutive columns A and B, %Stride refers to the distance (the number of -elements) between the start of column A and the start of column B. The result -matrix is returned embedded in the result vector. This allows for convenient -loading of sub matrixes. If is true, the intrinsic is considered -a :ref:`volatile memory access `. - -If the %Ptr argument is known to be aligned to some boundary, this can be -specified as an attribute on the argument. +The '``llvm.matrix.column.major.load.*``' intrinsics load a x +matrix using a stride of %Stride to compute the start address of the different +columns. This allows for convenient loading of sub matrixes. If +is true, the intrinsic is considered a :ref:`volatile memory access +`. The result matrix is returned in the result vector. If the %Ptr +argument is known to be aligned to some boundary, this can be specified as an +attribute on the argument. Arguments: """""""""" -The , and arguments must be constant integers. The -returned vector must have * elements. %Stride must be >= . +First argument %Ptr is a pointer type to the returned vector type, and +correponds to the start address to load from. Second argument %Stride is a +postive, constant integer with %Stride ``>=`` . %Stride is used to compute +the column memory addresses. I.e., for a column ``C``, its start memory +addresses is calculated with %Ptr + ``C`` * %Stride. Third Argument + is a boolean value. The fourth and fifth arguments, and +, correspond to the number of rows and columns, respectively, and must be +positive, constant integers. The returned vector must have * +elements. The :ref:`align ` parameter attribute can be provided for the %Ptr arguments. @@ -15592,6 +15604,7 @@ Syntax: """"""" +This is an overloaded intrinsic. :: @@ -15601,12 +15614,10 @@ Overview: """"""""" -The '``llvm.matrix.column.major.store.*``' intrinsic stores the matrix with - rows and columns embedded in %In, using a stride of %Stride -between columns. For two consecutive columns A and B, %Stride refers to the -distance (the number of elements) between the start of column A and the start -of column B. If is true, the intrinsic is considered a -:ref:`volatile memory access `. +The '``llvm.matrix.column.major.store.*``' intrinsics store the x +matrix in %In to memory using a stride of %Stride between columns. If + is true, the intrinsic is considered a :ref:`volatile memory +access `. If the %Ptr argument is known to be aligned to some boundary, this can be specified as an attribute on the argument. @@ -15614,8 +15625,15 @@ Arguments: """""""""" -The , , arguments must be constant integers. The -vector argument %In must have * elements. %Stride must be >= . +First argument %In is vector that corresponds to a x matrix to be +stored to memory. Second argument %Ptr is a pointer type to the vector type of +%In, and is the start address of the matrix in memory. Third argument %Stride +is a positive, constant integer with %Stride ``>=`` . %Stride is used to +compute the column memory addresses. I.e., for a column ``C``, its start memory +addresses is calculated with %Ptr + ``C`` * %Stride. Fourth argument + is a boolean value. Arguments and correspond to the +number of rows and columns, respectively, and must be positive, constant +integers. The :ref:`align ` parameter attribute can be provided for the %Ptr arguments. Index: llvm/lib/IR/Verifier.cpp =================================================================== --- llvm/lib/IR/Verifier.cpp +++ llvm/lib/IR/Verifier.cpp @@ -5006,36 +5006,76 @@ case Intrinsic::matrix_transpose: case Intrinsic::matrix_column_major_load: case Intrinsic::matrix_column_major_store: { + Function *IF = Call.getCalledFunction(); + ConstantInt *Stride = nullptr; ConstantInt *NumRows; ConstantInt *NumColumns; - VectorType *TypeToCheck; + VectorType *ResultTy; + Type *Op0ElemTy = nullptr; + Type *Op1ElemTy = nullptr; switch (ID) { case Intrinsic::matrix_multiply: NumRows = cast(Call.getArgOperand(2)); NumColumns = cast(Call.getArgOperand(4)); - TypeToCheck = cast(Call.getType()); + ResultTy = cast(Call.getType()); + Op0ElemTy = + cast(Call.getArgOperand(0)->getType())->getElementType(); + Op1ElemTy = + cast(Call.getArgOperand(1)->getType())->getElementType(); break; case Intrinsic::matrix_transpose: NumRows = cast(Call.getArgOperand(1)); NumColumns = cast(Call.getArgOperand(2)); - TypeToCheck = cast(Call.getType()); + ResultTy = cast(Call.getType()); + Op0ElemTy = + cast(Call.getArgOperand(0)->getType())->getElementType(); break; - case Intrinsic::matrix_column_major_load: + case Intrinsic::matrix_column_major_load: { + Stride = dyn_cast(Call.getArgOperand(1)); NumRows = cast(Call.getArgOperand(3)); NumColumns = cast(Call.getArgOperand(4)); - TypeToCheck = cast(Call.getType()); + ResultTy = cast(Call.getType()); + auto *VecTy = cast( + cast(Call.getArgOperand(0)->getType())->getElementType()); + Op0ElemTy = VecTy->getElementType(); + } break; - case Intrinsic::matrix_column_major_store: + case Intrinsic::matrix_column_major_store: { + Stride = dyn_cast(Call.getArgOperand(2)); NumRows = cast(Call.getArgOperand(4)); NumColumns = cast(Call.getArgOperand(5)); - TypeToCheck = cast(Call.getArgOperand(0)->getType()); + ResultTy = cast(Call.getArgOperand(0)->getType()); + Op0ElemTy = + cast(Call.getArgOperand(0)->getType())->getElementType(); + auto *VecTy = cast( + cast(Call.getArgOperand(1)->getType())->getElementType()); + Op1ElemTy = VecTy->getElementType(); + } break; default: llvm_unreachable("unexpected intrinsic"); } - Assert(TypeToCheck->getNumElements() == + + Assert(ResultTy->getElementType()->isIntegerTy() || + ResultTy->getElementType()->isFloatingPointTy(), + "Result type must be an integer or floating-point type!", IF); + + Assert(ResultTy->getElementType() == Op0ElemTy, + "Vector element type mismatch of the result and first operand " + "vector!", IF); + + if (Op1ElemTy) + Assert(ResultTy->getElementType() == Op1ElemTy, + "Type mismatch of the result and second operand vector!", IF); + + Assert(ResultTy->getNumElements() == NumRows->getZExtValue() * NumColumns->getZExtValue(), "result of a matrix operation does not fit in the returned vector"); + + if (Stride) + Assert(Stride->getZExtValue() >= NumRows->getZExtValue(), + "Stride must be greater or equal than the number of rows!", IF); + break; } }; Index: llvm/test/Verifier/matrix-intrinsics.ll =================================================================== --- llvm/test/Verifier/matrix-intrinsics.ll +++ llvm/test/Verifier/matrix-intrinsics.ll @@ -64,3 +64,102 @@ call void @llvm.matrix.column.major.store.v6f32.p0v6f32(<6 x float> zeroinitializer, <6 x float>* %n, i64 %arg, i1 false, i32 3, i32 3) ret void } + +declare <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32>, i32, i32) +declare <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float>, i32, i32) + +define <4 x float> @transpose_mixed_types(<4 x float> %fvec, <4 x i32> %ivec, i32 %arg) { +; +; CHECK-NEXT: Intrinsic has incorrect argument type! +; CHECK-NEXT: <4 x float> (<4 x i32>, i32, i32)* @llvm.matrix.transpose.v4f32.v4i32 +; CHECK-NEXT: Intrinsic has incorrect argument type! +; CHECK-NEXT: <4 x i32> (<4 x float>, i32, i32)* @llvm.matrix.transpose.v4i32.v4f32 +; + %result.0 = call <4 x float> @llvm.matrix.transpose.v4f32.v4i32(<4 x i32> %ivec, i32 0, i32 0) + %result.1 = call <4 x i32> @llvm.matrix.transpose.v4i32.v4f32(<4 x float> %result.0, i32 3, i32 2) + ret <4 x float> %result.0 +} + +declare <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float>, <4 x float>, i32, i32, i32) +declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32>, <4 x float>, i32, i32, i32) +declare <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float>, <4 x i32>, i32, i32, i32) +declare <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32) + +define <4 x float> @multiply_mixed_types(<4 x i32> %ivec, <4 x float> %fvec, i32 %arg) { +; +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x i32> (<4 x float>, <4 x float>, i32, i32, i32)* @llvm.matrix.multiply.v4i32.v4f32.v4f32 +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x float> (<4 x i32>, <4 x float>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4i32.v4f32 +; CHECK-NEXT: Type mismatch of the result and second operand vector! +; CHECK-NEXT: <4 x float> (<4 x float>, <4 x i32>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4f32.v4i32 +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x float> (<4 x i32>, <4 x i32>, i32, i32, i32)* @llvm.matrix.multiply.v4f32.v4i32.v4i32 +; + %result.0 = call <4 x i32> @llvm.matrix.multiply.v4i32.v4f32.v4f32(<4 x float> %fvec, <4 x float> %fvec, i32 2, i32 2, i32 2) + %result.1 = call <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4f32(<4 x i32> %result.0, <4 x float> %fvec, i32 2, i32 2, i32 2) + %result.2 = call <4 x float> @llvm.matrix.multiply.v4f32.v4f32.v4i32(<4 x float> %fvec, <4 x i32> %ivec, i32 2, i32 2, i32 2) + %result.3 = call <4 x float> @llvm.matrix.multiply.v4f32.v4i32.v4i32(<4 x i32> %ivec, <4 x i32> %ivec, i32 2, i32 2, i32 2) + ret <4 x float> %result.3 +} + +declare <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4i32(<4 x i32>*, i64, i1, i32, i32) +declare <4 x i32> @llvm.matrix.column.major.load.v4i32.p0v4f32(<4 x float>*, i64, i1, i32, i32) + +define <4 x float> @column.major_load_mixed_types(<4 x i32>* %m, <4 x float>* %n, i32 %arg) { +; +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x float> (<4 x i32>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4f32.p0v4i32 +; CHECK-NEXT: Vector element type mismatch of the result and first operand vector! +; CHECK-NEXT: <4 x i32> (<4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4i32.p0v4f32 +; + %result.0 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4i32(<4 x i32>* %m, i64 2, i1 false, i32 2, i32 2) + %result.1 = call <4 x i32> @llvm.matrix.column.major.load.v4i32.p0v4f32(<4 x float>* %n, i64 2, i1 false, i32 2, i32 2) + ret <4 x float> %result.0 +} + +declare void @llvm.matrix.column.major.store.v4i32.p0v4f32(<4 x i32>, <4 x float>*, i64, i1, i32, i32) +declare void @llvm.matrix.column.major.store.v4f32.p0v4i32(<4 x float>, <4 x i32>*, i64, i1, i32, i32) + +define void @column.major_store_mixed_types(<4 x float>* %m, <4 x i32>* %n, i64 %arg) { +; +; CHECK-NEXT: Type mismatch of the result and second operand vector! +; CHECK-NEXT: void (<4 x i32>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4i32.p0v4f32 +; CHECK-NEXT: Type mismatch of the result and second operand vector! +; CHECK-NEXT: void (<4 x float>, <4 x i32>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32.p0v4i32 +; + call void @llvm.matrix.column.major.store.v4i32.p0v4f32(<4 x i32> zeroinitializer, <4 x float>* %m, i64 2, i1 false, i32 2, i32 2) + call void @llvm.matrix.column.major.store.v4f32.p0v4i32(<4 x float> zeroinitializer, <4 x i32>* %n, i64 2, i1 false, i32 2, i32 2) + ret void +} + +declare void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float*>, <4 x float>*, i64, i1, i32, i32) + +define void @column.major_store_non_int_float_type(<4 x float>* %m, <4 x float>* %n, i64 %arg) { +; +; CHECK-NEXT: Result type must be an integer or floating-point type! +; CHECK-NEXT: void (<4 x float*>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4p0f32.p0v4f32 +; + call void @llvm.matrix.column.major.store.v4f32p0.p0v4f32(<4 x float*> zeroinitializer, <4 x float>* %n, i64 2, i1 false, i32 2, i32 2) + ret void +} + +define <4 x float> @column.major_load_stride_too_small(<4 x float>* %m, i32 %arg) { +; +; CHECK-NEXT: Stride must be greater or equal than the number of rows! +; CHECK-NEXT: <4 x float> (<4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.load.v4f32.p0v4f32 +; + %result.1 = call <4 x float> @llvm.matrix.column.major.load.v4f32.p0v4f32(<4 x float>* %m, i64 1, i1 false, i32 2, i32 2) + ret <4 x float> %result.1 +} + +define void @column.major_store_stride_too_small(<4 x float>* %m, i64 %arg) { +; +; CHECK-NEXT: Stride must be greater or equal than the number of rows! +; CHECK-NEXT: void (<4 x float>, <4 x float>*, i64, i1, i32, i32)* @llvm.matrix.column.major.store.v4f32.p0v4f32 +; + call void @llvm.matrix.column.major.store.v4f32.p0v4f32(<4 x float> zeroinitializer, <4 x float>* %m, i64 1, i1 false, i32 2, i32 2) + ret void +} + +