This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/GPU/GPUOps.td
1043–1044	This should be changed/dropped I think.
1056	This doesn't match with the valid types of mmaMatrixType.
mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
351–352	This comment needs to be updated.

This revision now requires changes to proceed.Jun 9 2021, 9:27 AM

Address review comments

ThomasRaoux marked 2 inline comments as done.Jun 9 2021, 9:44 AM

ThomasRaoux added inline comments.

mlir/include/mlir/Dialect/GPU/GPUOps.td
1043–1044	Changed it to only mention gpu.subgroup_mma_compute

In D103870#2808312, @navdeepkk wrote:

This is a great addition. We can bring in a scaling op also which scales mmaMatrix by a certain value. Maybe I can take that up.

It would be nice to be able to handle most of the element-wise ops, ideally we should re-use the std ops but it looks like this would require infrastructure changes to MLIR (https://llvm.discourse.group/t/using-gpu-type-with-standard-ops/3542/2). The best short term solution is probably to add an op taking an attribute like GPU_AllReduceOperationAttr. This is a bit hacky but that would allow us to be able to generate interesting code using the mma ops.

Harbormaster completed remote builds in B108439: Diff 350929.Jun 9 2021, 10:33 AM

In D103870#2808376, @ThomasRaoux wrote:

In D103870#2808312, @navdeepkk wrote:

This is a great addition. We can bring in a scaling op also which scales mmaMatrix by a certain value. Maybe I can take that up.

It would be nice to be able to handle most of the element-wise ops, ideally we should re-use the std ops but it looks like this would require infrastructure changes to MLIR (https://llvm.discourse.group/t/using-gpu-type-with-standard-ops/3542/2). The best short term solution is probably to add an op taking an attribute like GPU_AllReduceOperationAttr. This is a bit hacky but that would allow us to be able to generate interesting code using the mma ops.

Okay, That would allow us to use the same op and define the semantics of the pointwise op as we like. I will be happy to take this up in the near future. Let me know what you think.

This revision is now accepted and ready to land.Jun 10 2021, 8:12 AM

This revision was landed with ongoing or failed builds.Jun 10 2021, 8:34 AM

Closed by commit rG428a62f65f16: [mlir][gpu] Add op to create MMA constant matrix (authored by ThomasRaoux). · Explain Why

This revision was automatically updated to reflect the committed changes.

ThomasRaoux added a commit: rG428a62f65f16: [mlir][gpu] Add op to create MMA constant matrix.

In D103870#2810650, @navdeepkk wrote:

In D103870#2808376, @ThomasRaoux wrote:

In D103870#2808312, @navdeepkk wrote:

This is a great addition. We can bring in a scaling op also which scales mmaMatrix by a certain value. Maybe I can take that up.

It would be nice to be able to handle most of the element-wise ops, ideally we should re-use the std ops but it looks like this would require infrastructure changes to MLIR (https://llvm.discourse.group/t/using-gpu-type-with-standard-ops/3542/2). The best short term solution is probably to add an op taking an attribute like GPU_AllReduceOperationAttr. This is a bit hacky but that would allow us to be able to generate interesting code using the mma ops.

Okay, That would allow us to use the same op and define the semantics of the pointwise op as we like. I will be happy to take this up in the near future. Let me know what you think.

Yes that would be great. Feel free to pick it up. I'll sync up with you when I get close to needing it to make sure our timelines match but right now I can live with what exists now. My next step will most likely be adding transpose support (equivalent to the .col layout in wmma intrinsics).

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

GPUOps.td

45 lines

lib/

Conversion/

GPUToNVVM/

WmmaOpsToNvvm.cpp

42 lines

test/

Conversion/

GPUToNVVM/

wmma-ops-to-nvvm.mlir

25 lines

Dialect/

GPU/

ops.mlir

4 lines

Diff 351185

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show First 20 Lines • Show All 1,016 Lines • ▼ Show 20 Lines	def GPU_SubgroupMmaComputeOp : GPU_Op<"subgroup_mma_compute",

let assemblyFormat = [{		let assemblyFormat = [{
$opA`,` $opB`,` $opC attr-dict `:` type($opA)`,` type($opB) `->` type($res)		$opA`,` $opB`,` $opC attr-dict `:` type($opA)`,` type($opB) `->` type($res)
}];		}];

let verifier = [{ return ::verify(*this); }];		let verifier = [{ return ::verify(*this); }];
}		}

		def GPU_SubgroupMmaConstantMatrixOp : GPU_Op<"subgroup_mma_constant_matrix",
		[NoSideEffect,
		TypesMatchWith<"value type matches element type of mma_matrix",
		"res", "value",
		"$_self.cast<gpu::MMAMatrixType>().getElementType()">]>{

		let summary = "GPU warp synchronous constant matrix";

		let description = [{
		The `gpu.subgroup_mma_constant_matrix` creates a `!gpu.mma_matrix` with
		constant elements.

		The operation takes a scalar input and return a `!gpu.mma_matrix` where each
		element of is equal to the operand constant. The destination mma_matrix type
		must have elememt type equal to the constant type. Since the layout of
		`!gpu.mma_matrix` is opaque this only support setting all the elements to
		the same value.

		This op is meant to be used along with `gpu.subgroup_mma_compute`.

		navdeepkkUnsubmitted Not Done Reply Inline Actions This should be changed/dropped I think. navdeepkk: This should be changed/dropped I think.
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Changed it to only mention gpu.subgroup_mma_compute ThomasRaoux: Changed it to only mention gpu.subgroup_mma_compute
		Example:

		```mlir
		%0 = gpu.subgroup_mma_constant_matrix %a :
		!gpu.mma_matrix<16x16xf16, "AOp">
		%1 = gpu.subgroup_mma_constant_matrix %b :
		!gpu.mma_matrix<16x16xf32, "COp">
		```
		}];

		let arguments = (ins AnyTypeOf<[F16, F32]>:$value);

		navdeepkkUnsubmitted Done Reply Inline Actions This doesn't match with the valid types of mmaMatrixType. navdeepkk: This doesn't match with the valid types of mmaMatrixType.
		let results = (outs GPU_MMAMatrix:$res);

		let extraClassDeclaration = [{
		gpu::MMAMatrixType getType() {
		return res().getType().cast<gpu::MMAMatrixType>();
		}
		}];

		let assemblyFormat = [{
		$value attr-dict `:` type($res)
		}];
		}

#endif // GPU_OPS		#endif // GPU_OPS

mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	if (cType.getElementType().isF32()) {
return success();		return success();
}		}
return rewriter.notifyMatchFailure(op, kInvalidCaseStr);		return rewriter.notifyMatchFailure(op, kInvalidCaseStr);
}		}
return failure();		return failure();
}		}
};		};

		/// Convert GPU MMA ConstantMatrixOp to a chain of InsertValueOp.
		struct WmmaConstantOpToNVVMLowering
		navdeepkkUnsubmitted Done Reply Inline Actions This comment needs to be updated. navdeepkk: This comment needs to be updated.
		: public ConvertOpToLLVMPattern<gpu::SubgroupMmaConstantMatrixOp> {
		using ConvertOpToLLVMPattern<
		gpu::SubgroupMmaConstantMatrixOp>::ConvertOpToLLVMPattern;

		LogicalResult
		matchAndRewrite(gpu::SubgroupMmaConstantMatrixOp subgroupMmaConstantOp,
		ArrayRef<Value> operands,
		ConversionPatternRewriter &rewriter) const override {
		if (failed(areAllLLVMTypes(subgroupMmaConstantOp.getOperation(), operands,
		rewriter)))
		return failure();
		Location loc = subgroupMmaConstantOp.getLoc();
		Value cst = operands[0];
		LLVM::LLVMStructType type = convertMMAToLLVMType(
		subgroupMmaConstantOp.getType().cast<gpu::MMAMatrixType>());
		// If the element type is a vector create a vector from the operand.
		if (auto vecType = type.getBody()[0].dyn_cast<VectorType>()) {
		Value vecCst = rewriter.create<LLVM::UndefOp>(loc, vecType);
		for (int64_t vecEl = 0; vecEl < vecType.getNumElements(); vecEl++) {
		Value idx = rewriter.create<LLVM::ConstantOp>(
		loc, typeConverter->convertType(rewriter.getIntegerType(32)),
		rewriter.getI32ArrayAttr(vecEl));
		vecCst = rewriter.create<LLVM::InsertElementOp>(loc, vecType, vecCst,
		cst, idx);
		}
		cst = vecCst;
		}
		Value matrixStruct = rewriter.create<LLVM::UndefOp>(loc, type);
		for (size_t i : llvm::seq(size_t(0), type.getBody().size())) {
		matrixStruct = rewriter.create<LLVM::InsertValueOp>(
		loc, matrixStruct, cst, rewriter.getI32ArrayAttr(i));
		}
		rewriter.replaceOp(subgroupMmaConstantOp, matrixStruct);
		return success();
		}
		};

} // anonymous namespace		} // anonymous namespace

namespace mlir {		namespace mlir {
void populateGpuWMMAToNVVMConversionPatterns(LLVMTypeConverter &converter,		void populateGpuWMMAToNVVMConversionPatterns(LLVMTypeConverter &converter,
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
patterns.insert<WmmaLoadOpToNVVMLowering, WmmaMmaOpToNVVMLowering,		patterns.insert<WmmaLoadOpToNVVMLowering, WmmaMmaOpToNVVMLowering,
WmmaStoreOpToNVVMLowering>(converter);		WmmaStoreOpToNVVMLowering, WmmaConstantOpToNVVMLowering>(
		converter);
}		}
} // namespace mlir		} // namespace mlir

mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	^bb2: // pred: ^bb1
%6 = gpu.subgroup_mma_compute %4, %5, %2 : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">		%6 = gpu.subgroup_mma_compute %4, %5, %2 : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">
%7 = addi %1, %c32 : index		%7 = addi %1, %c32 : index
br ^bb1(%7, %6 : index, !gpu.mma_matrix<16x16xf16, "COp">)		br ^bb1(%7, %6 : index, !gpu.mma_matrix<16x16xf16, "COp">)
^bb3: // pred: ^bb1		^bb3: // pred: ^bb1
gpu.subgroup_mma_store_matrix %2, %arg2[%c0, %c0] {leadDimension = 128 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<128x128xf16>		gpu.subgroup_mma_store_matrix %2, %arg2[%c0, %c0] {leadDimension = 128 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<128x128xf16>
return		return
}		}
}		}


		// -----

		gpu.module @test_module {

		// CHECK-LABEL: func @gpu_wmma_constant_op
		// CHECK: %[[CST:.+]] = llvm.mlir.constant(1.000000e+00 : f16) : f16
		// CHECK: %[[V0:.+]] = llvm.mlir.undef : vector<2xf16>
		// CHECK: %[[C0:.+]] = llvm.mlir.constant([0 : i32]) : i32
		// CHECK: %[[V1:.+]] = llvm.insertelement %[[CST]], %[[V0]][%[[C0]] : i32] : vector<2xf16>
		// CHECK: %[[C1:.+]] = llvm.mlir.constant([1 : i32]) : i32
		// CHECK: %[[V2:.+]] = llvm.insertelement %[[CST]], %[[V1]][%[[C1]] : i32] : vector<2xf16>
		// CHECK: %[[M0:.+]] = llvm.mlir.undef : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
		// CHECK: %[[M1:.+]] = llvm.insertvalue %[[V2]], %[[M0]][0 : i32] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
		// CHECK: %[[M2:.+]] = llvm.insertvalue %[[V2]], %[[M1]][1 : i32] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
		// CHECK: %[[M3:.+]] = llvm.insertvalue %[[V2]], %[[M2]][2 : i32] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
		// CHECK: %[[M4:.+]] = llvm.insertvalue %[[V2]], %[[M3]][3 : i32] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
		// CHECK: llvm.return %[[M4]] : !llvm.struct<(vector<2xf16>, vector<2xf16>, vector<2xf16>, vector<2xf16>)>
		func @gpu_wmma_constant_op() ->(!gpu.mma_matrix<16x16xf16, "COp">) {
		%cst = constant 1.0 : f16
		%C = gpu.subgroup_mma_constant_matrix %cst : !gpu.mma_matrix<16x16xf16, "COp">
		return %C : !gpu.mma_matrix<16x16xf16, "COp">
		}
		}

mlir/test/Dialect/GPU/ops.mlir

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	module attributes {gpu.container_module} {
}		}

func @mmamatrix_valid_element_type(){		func @mmamatrix_valid_element_type(){
// CHECK-LABEL: func @mmamatrix_valid_element_type		// CHECK-LABEL: func @mmamatrix_valid_element_type
%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>		%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>
// CHECK: %[[wg:.*]] = memref.alloca()		// CHECK: %[[wg:.*]] = memref.alloca()
%i = constant 16 : index		%i = constant 16 : index
// CHECK: %[[i:.*]] = constant 16 : index		// CHECK: %[[i:.*]] = constant 16 : index
		%cst = constant 1.000000e+00 : f32
		// CHECK: %[[cst:.*]] = constant 1.000000e+00 : f32
%0 = gpu.subgroup_mma_load_matrix %wg[%i, %i] {leadDimension = 32 : index} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">		%0 = gpu.subgroup_mma_load_matrix %wg[%i, %i] {leadDimension = 32 : index} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">
// CHECK: gpu.subgroup_mma_load_matrix %[[wg]][%[[i]], %[[i]]] {leadDimension = 32 : index} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">		// CHECK: gpu.subgroup_mma_load_matrix %[[wg]][%[[i]], %[[i]]] {leadDimension = 32 : index} : memref<32x32xf16, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">
		%1 = gpu.subgroup_mma_constant_matrix %cst : !gpu.mma_matrix<16x16xf32, "COp">
		// CHECK: gpu.subgroup_mma_constant_matrix %[[cst]] : !gpu.mma_matrix<16x16xf32, "COp">
return		return
}		}
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add op to create MMA constant matrixClosedPublic

Details

Diff Detail