This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/GPU/IR/
-
mlir/
-
Dialect/
-
GPU/
-
IR/
1
GPUBase.td
-
GPUDialect.h
1/5
GPUOps.td
-
lib/Dialect/GPU/IR/
-
Dialect/
-
GPU/
-
IR/
1/2
GPUDialect.cpp
-
test/Dialect/GPU/
-
Dialect/
-
GPU/
1/3
ops.mlir

Differential D146556

[mlir][GPU] Add gpu.create_stream op and add stream as an optional argument to gpu ops
Needs ReviewPublic

Authored by nbpatel on Mar 21 2023, 11:44 AM.

Download Raw Diff

Details

Reviewers

bondhugula
ThomasRaoux
nicolasvasilache
herhut

Summary

The above PR introduces a new op called gpu.create_stream and adds stream as an optional arg to some gpu dialect
ops. This will give the user the flexibility to launch kernels on their custom stream. We require this to enable
the MLIR pipeline on Intel GPU.

The link to the discussion is:
https://discourse.llvm.org/t/proposal-to-add-stream-queue-as-an-optional-argument-to-few-gpu-dialect-ops/67920/13

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nbpatel created this revision.Mar 21 2023, 11:44 AM

Herald added a reviewer: bondhugula. · View Herald TranscriptMar 21 2023, 11:44 AM

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 22 others. · View Herald Transcript

nbpatel requested review of this revision.Mar 21 2023, 11:44 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMar 21 2023, 11:44 AM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Hardcode84 added a subscriber: Hardcode84.Mar 21 2023, 11:47 AM

Harbormaster completed remote builds in B220795: Diff 507073.Mar 21 2023, 1:15 PM

rengolin added a subscriber: rengolin.Mar 28 2023, 6:11 AM

rengolin added inline comments.

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
973	If streams are optional, you should have builders without them, but this will force all builder calls to pass a `nullptr` or something for the stream, and could break existing downstream code. Is there a reason to drop the existing default builders?
1425	What creates a `device`? How can I get hold of it?
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
744	No validation for what a device is or should be?
mlir/test/Dialect/GPU/ops.mlir
134	You add functionality to pass a device but offers no test to how a device should be created, used or destroyed.

But the gpu.wait async op already creates a stream. Can you expand the commit summary on why this is needed? If you already have a custom stream that you want to pass to your IR, a !gpu.async_token type could be passed.

In D146556#4229385, @bondhugula wrote:

But the gpu.wait async op already creates a stream. Can you expand the commit summary on why this is needed? If you already have a custom stream that you want to pass to your IR, a !gpu.async_token type could be passed.

Stream is separate concept from synchronization, please see my comment in https://discourse.llvm.org/t/proposal-to-add-stream-queue-as-an-optional-argument-to-few-gpu-dialect-ops/67920/17?u=hardcode84

nbpatel added inline comments.Mar 31 2023, 10:42 AM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1425	Device is optional and just a placeholder for now. The user can pass an opaque pointer as device which will be used by create_stream op. We are just exposing the API.
mlir/test/Dialect/GPU/ops.mlir
134	Will add a test.

Added a test for creating stream with device

nbpatel added inline comments.Mar 31 2023, 10:55 AM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
744	what sort of validation you expect here?

Harbormaster completed remote builds in B223046: Diff 510084.Mar 31 2023, 1:19 PM

Ping for review

bondhugula added inline comments.Apr 3 2023, 7:17 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
97–101	Both of these need proper doc comments.
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1418–1420	Reflow - use whole width.
1425	Please document what `device`'s type is/can be and how it's created.

What's the current behavior if a gpu dialect op with the stream argument is passed to the gpu-to-llvm pass? Please ensure it doesn't assert/crash.

mlir/test/Dialect/GPU/ops.mlir
228	Drop blank line.

mshahneo added a subscriber: mshahneo.Apr 19 2023, 4:35 PM

Herald added a subscriber: bviyer. · View Herald TranscriptApr 19 2023, 4:35 PM

Hardcode84 mentioned this in D154803: [mlir][GPU] Add gpu.create_queue op and add queue as an optional argument to gpu ops.Jul 9 2023, 3:34 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUBase.td

8 lines

GPUDialect.h

12 lines

GPUOps.td

93 lines

lib/

Dialect/

GPU/

IR/

GPUDialect.cpp

64 lines

test/

Dialect/

GPU/

ops.mlir

35 lines

Diff 507073

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GPU Types.			// GPU Types.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def GPU_AsyncToken : DialectType<			def GPU_AsyncToken : DialectType<
	GPU_Dialect, CPred<"$_self.isa<::mlir::gpu::AsyncTokenType>()">, "async token type">,			GPU_Dialect, CPred<"$_self.isa<::mlir::gpu::AsyncTokenType>()">, "async token type">,
	BuildableType<"mlir::gpu::AsyncTokenType::get($_builder.getContext())">;			BuildableType<"mlir::gpu::AsyncTokenType::get($_builder.getContext())">;

				def GPU_Stream : DialectType<
				GPU_Dialect, CPred<"$_self.isa<::mlir::gpu::StreamType>()">, "stream type">,
				BuildableType<"mlir::gpu::StreamType::get($_builder.getContext())">;

				def GPU_Device : DialectType<
				bondhugulaUnsubmitted Not Done Reply Inline Actions Both of these need proper doc comments. bondhugula: Both of these need proper doc comments.
				GPU_Dialect, CPred<"$_self.isa<::mlir::gpu::DeviceType>()">, "device type">,
				BuildableType<"mlir::gpu::DeviceType::get($_builder.getContext())">;

	// Predicat to check if type is gpu::MMAMatrixType.			// Predicat to check if type is gpu::MMAMatrixType.
	def IsMMAMatrixTypePred : CPred<"$_self.isa<::mlir::gpu::MMAMatrixType>()">;			def IsMMAMatrixTypePred : CPred<"$_self.isa<::mlir::gpu::MMAMatrixType>()">;

	def GPU_MMAMatrix : DialectType<			def GPU_MMAMatrix : DialectType<
	GPU_Dialect, IsMMAMatrixTypePred, "MMAMatrix type">;			GPU_Dialect, IsMMAMatrixTypePred, "MMAMatrix type">;

	// Memref type acceptable to gpu.subgroup_mma_{load\|store}_matrix ops.			// Memref type acceptable to gpu.subgroup_mma_{load\|store}_matrix ops.
	def GPU_MMAMemRef : MemRefOf<[I8, I32, F16, F32, VectorOfRankAndType<[1], [I8, I32, F16, F32]>]>;			def GPU_MMAMemRef : MemRefOf<[I8, I32, F16, F32, VectorOfRankAndType<[1], [I8, I32, F16, F32]>]>;
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

	Show All 39 Lines

	class AsyncTokenType			class AsyncTokenType
	: public Type::TypeBase<AsyncTokenType, Type, TypeStorage> {			: public Type::TypeBase<AsyncTokenType, Type, TypeStorage> {
	public:			public:
	// Used for generic hooks in TypeBase.			// Used for generic hooks in TypeBase.
	using Base::Base;			using Base::Base;
	};			};

				class StreamType : public Type::TypeBase<StreamType, Type, TypeStorage> {
				public:
				// Used for generic hooks in TypeBase.
				using Base::Base;
				};

				class DeviceType : public Type::TypeBase<DeviceType, Type, TypeStorage> {
				public:
				// Used for generic hooks in TypeBase.
				using Base::Base;
				};

	/// MMAMatrixType storage and uniquing. Array is uniqued based on its shape			/// MMAMatrixType storage and uniquing. Array is uniqued based on its shape
	/// and type.			/// and type.
	struct MMAMatrixStorageType : public TypeStorage {			struct MMAMatrixStorageType : public TypeStorage {
	MMAMatrixStorageType(unsigned numDims, const int64_t *dimShapes,			MMAMatrixStorageType(unsigned numDims, const int64_t *dimShapes,
	Type elementType, StringRef operand)			Type elementType, StringRef operand)
	: dimShapes(dimShapes), numDims(numDims), elementType(elementType),			: dimShapes(dimShapes), numDims(numDims), elementType(elementType),
	operand(operand) {}			operand(operand) {}

	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 372 Lines • ▼ Show 20 Lines

def GPU_LaunchFuncOp : GPU_Op<"launch_func",		def GPU_LaunchFuncOp : GPU_Op<"launch_func",
[GPU_AsyncOpInterface, AttrSizedOperandSegments]>,		[GPU_AsyncOpInterface, AttrSizedOperandSegments]>,
Arguments<(ins Variadic<GPU_AsyncToken>:$asyncDependencies,		Arguments<(ins Variadic<GPU_AsyncToken>:$asyncDependencies,
SymbolRefAttr:$kernel,		SymbolRefAttr:$kernel,
Index:$gridSizeX, Index:$gridSizeY, Index:$gridSizeZ,		Index:$gridSizeX, Index:$gridSizeY, Index:$gridSizeZ,
Index:$blockSizeX, Index:$blockSizeY, Index:$blockSizeZ,		Index:$blockSizeX, Index:$blockSizeY, Index:$blockSizeZ,
Optional<I32>:$dynamicSharedMemorySize,		Optional<I32>:$dynamicSharedMemorySize,
Variadic<AnyType>:$kernelOperands)>,		Variadic<AnyType>:$kernelOperands,
		Optional<GPU_Stream>:$stream)>,
Results<(outs Optional<GPU_AsyncToken>:$asyncToken)> {		Results<(outs Optional<GPU_AsyncToken>:$asyncToken)> {
let summary = "Launches a function as a GPU kernel";		let summary = "Launches a function as a GPU kernel";

let description = [{		let description = [{
Launch a kernel function on the specified grid of thread blocks.		Launch a kernel function on the specified grid of thread blocks.
`gpu.launch` operations are lowered to `gpu.launch_func` operations by		`gpu.launch` operations are lowered to `gpu.launch_func` operations by
outlining the kernel body into a function in a dedicated module, which		outlining the kernel body into a function in a dedicated module, which
reflects the separate compilation process. The kernel function is required		reflects the separate compilation process. The kernel function is required
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	module attributes {gpu.container_module} {
%arg1 : memref<?xf32, 1>)		%arg1 : memref<?xf32, 1>)
}		}
```		```
}];		}];

let skipDefaultBuilders = 1;		let skipDefaultBuilders = 1;

let builders = [		let builders = [
OpBuilder<(ins "GPUFuncOp":$kernelFunc, "KernelDim3":$gridSize,		OpBuilder<(ins "GPUFuncOp":$kernelFunc,
"KernelDim3":$blockSize, "Value":$dynamicSharedMemorySize,		"KernelDim3":$gridSize, "KernelDim3":$blockSize,
		"Value":$dynamicSharedMemorySize,
"ValueRange":$kernelOperands,		"ValueRange":$kernelOperands,
CArg<"Type", "nullptr">:$asyncTokenType,		CArg<"Type", "nullptr">:$asyncTokenType,
CArg<"ValueRange", "{}">:$asyncDependencies)>		CArg<"ValueRange", "{}">:$asyncDependencies,
];		CArg<"Value", "Value{}">:$stream)>];

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// The name of the kernel's containing module.		/// The name of the kernel's containing module.
StringAttr getKernelModuleName();		StringAttr getKernelModuleName();

/// The name of the kernel.		/// The name of the kernel.
StringAttr getKernelName();		StringAttr getKernelName();

Show All 15 Lines	friend LogicalResult GPUDialect::verifyOperationAttribute(Operation *,
NamedAttribute);		NamedAttribute);
}];		}];

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$kernel		$kernel
`blocks` `in` ` ` `(`$gridSizeX`,` $gridSizeY`,` $gridSizeZ`)`		`blocks` `in` ` ` `(`$gridSizeX`,` $gridSizeY`,` $gridSizeZ`)`
`threads` `in` ` ` `(`$blockSizeX`,` $blockSizeY`,` $blockSizeZ`)`		`threads` `in` ` ` `(`$blockSizeX`,` $blockSizeY`,` $blockSizeZ`)`
		(`stream` $stream^)?
(`dynamic_shared_memory_size` $dynamicSharedMemorySize^)?		(`dynamic_shared_memory_size` $dynamicSharedMemorySize^)?
custom<LaunchFuncOperands>($kernelOperands, type($kernelOperands)) attr-dict		custom<LaunchFuncOperands>($kernelOperands, type($kernelOperands)) attr-dict
}];		}];
let hasVerifier = 1;		let hasVerifier = 1;
}		}

def GPU_LaunchOp : GPU_Op<"launch", [		def GPU_LaunchOp : GPU_Op<"launch", [
AutomaticAllocationScope, AttrSizedOperandSegments, GPU_AsyncOpInterface,		AutomaticAllocationScope, AttrSizedOperandSegments, GPU_AsyncOpInterface,
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	let description = [{
Writes from the host are guaranteed to be visible to device kernels that are		Writes from the host are guaranteed to be visible to device kernels that are
launched afterwards. Writes from the device are guaranteed to be visible on		launched afterwards. Writes from the device are guaranteed to be visible on
the host after synchronizing with the device kernel completion.		the host after synchronizing with the device kernel completion.
}];		}];

let assemblyFormat = "$value attr-dict `:` type($value)";		let assemblyFormat = "$value attr-dict `:` type($value)";
}		}

def GPU_WaitOp : GPU_Op<"wait", [GPU_AsyncOpInterface]> {		def GPU_WaitOp : GPU_Op<"wait", [GPU_AsyncOpInterface, AttrSizedOperandSegments]> {
let summary = "Wait for async gpu ops to complete.";		let summary = "Wait for async gpu ops to complete.";
let description = [{		let description = [{
This op synchronizes the host or the device with a list of dependent ops.		This op synchronizes the host or the device with a list of dependent ops.

If the op contains the `async` keyword, it returns a new async token which		If the op contains the `async` keyword, it returns a new async token which
is synchronized with the op arguments. This new token is merely a shortcut		is synchronized with the op arguments. This new token is merely a shortcut
to the argument list, and one could replace the uses of the result with the		to the argument list, and one could replace the uses of the result with the
arguments for the same effect. The async version of this op is primarily		arguments for the same effect. The async version of this op is primarily
Show All 17 Lines	let description = [{
```mlir		```mlir
%t0 = gpu.foo async : !gpu.async.token		%t0 = gpu.foo async : !gpu.async.token
%t1 = gpu.bar async : !gpu.async.token		%t1 = gpu.bar async : !gpu.async.token
// The gpu.wait op blocks until gpu.foo and gpu.bar have completed.		// The gpu.wait op blocks until gpu.foo and gpu.bar have completed.
gpu.wait [%t0, %t1]		gpu.wait [%t0, %t1]
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies);		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
		Optional<GPU_Stream>:$stream);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

		let skipDefaultBuilders = 1;
		rengolinUnsubmitted Not Done Reply Inline Actions If streams are optional, you should have builders without them, but this will force all builder calls to pass a `nullptr` or something for the stream, and could break existing downstream code. Is there a reason to drop the existing default builders? rengolin: If streams are optional, you should have builders without them, but this will force all builder…

		let builders = [
		OpBuilder<(ins
		CArg<"Type", "nullptr">:$asyncTokenType,
		CArg<"ValueRange", "{}">:$asyncDependencies,
		CArg<"Value", "Value{}">:$stream)>];

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies) attr-dict		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies) attr-dict
		(`stream` $stream^)?
}];		}];

let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
}		}

def GPU_AllocOp : GPU_Op<"alloc", [		def GPU_AllocOp : GPU_Op<"alloc", [
GPU_AsyncOpInterface,		GPU_AsyncOpInterface,
AttrSizedOperandSegments		AttrSizedOperandSegments
Show All 18 Lines	let description = [{

```mlir		```mlir
%memref, %token = gpu.alloc async [%dep] host_shared (%width) : memref<64x?xf32, 1>		%memref, %token = gpu.alloc async [%dep] host_shared (%width) : memref<64x?xf32, 1>
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
Variadic<Index>:$dynamicSizes, Variadic<Index>:$symbolOperands,		Variadic<Index>:$dynamicSizes, Variadic<Index>:$symbolOperands,
UnitAttr:$hostShared);		UnitAttr:$hostShared, Optional<GPU_Stream>:$stream);
let results = (outs Res<AnyMemRef, "", [MemAlloc]>:$memref,		let results = (outs Res<AnyMemRef, "", [MemAlloc]>:$memref,
Optional<GPU_AsyncToken>:$asyncToken);		Optional<GPU_AsyncToken>:$asyncToken);

let extraClassDeclaration = [{		let extraClassDeclaration = [{
MemRefType getType() { return getMemref().getType().cast<MemRefType>(); }		MemRefType getType() { return getMemref().getType().cast<MemRefType>(); }
}];		}];

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies) (` ` `host_shared` $hostShared^)? ` `		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
`(` $dynamicSizes `)` (`` `[` $symbolOperands^ `]`)? attr-dict `:` type($memref)		(` ` `host_shared` $hostShared^)? ` `
		`(` $dynamicSizes `)` (`` `[` $symbolOperands^ `]`)?
		(` ` `stream` $stream^)? ` ` attr-dict `:` type($memref)
}];		}];

let hasVerifier = 1;		let hasVerifier = 1;
let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
}		}

def GPU_DeallocOp : GPU_Op<"dealloc", [GPU_AsyncOpInterface]> {		def GPU_DeallocOp : GPU_Op<"dealloc", [GPU_AsyncOpInterface, AttrSizedOperandSegments]> {

let summary = "GPU memory deallocation operation";		let summary = "GPU memory deallocation operation";

let description = [{		let description = [{
The `gpu.dealloc` operation frees the region of memory referenced by a		The `gpu.dealloc` operation frees the region of memory referenced by a
memref which was originally created by the `gpu.alloc` operation. It is		memref which was originally created by the `gpu.alloc` operation. It is
similar to the `memref.dealloc` op, but supports asynchronous GPU execution.		similar to the `memref.dealloc` op, but supports asynchronous GPU execution.

The op does not execute before all async dependencies have finished		The op does not execute before all async dependencies have finished
executing.		executing.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token.		that case, it returns a !gpu.async.token.

Example:		Example:

```mlir		```mlir
%token = gpu.dealloc async [%dep] %memref : memref<8x64xf32, 1>		%token = gpu.dealloc async [%dep] %memref : memref<8x64xf32, 1>
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
Arg<AnyMemRef, "", [MemFree]>:$memref);		Arg<AnyMemRef, "", [MemFree]>:$memref,
		Optional<GPU_Stream>:$stream);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$memref attr-dict `:` type($memref)		$memref
		(`stream` $stream^)? attr-dict `:` type($memref)
}];		}];
}		}

def GPU_MemcpyOp : GPU_Op<"memcpy", [GPU_AsyncOpInterface]> {		def GPU_MemcpyOp : GPU_Op<"memcpy", [GPU_AsyncOpInterface, AttrSizedOperandSegments]> {

let summary = "GPU memcpy operation";		let summary = "GPU memcpy operation";

let description = [{		let description = [{
The `gpu.memcpy` operation copies the content of one memref to another.		The `gpu.memcpy` operation copies the content of one memref to another.

The op does not execute before all async dependencies have finished		The op does not execute before all async dependencies have finished
executing.		executing.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token.		that case, it returns a !gpu.async.token.

Example:		Example:

```mlir		```mlir
%token = gpu.memcpy async [%dep] %dst, %src : memref<?xf32, 1>, memref<?xf32>		%token = gpu.memcpy async [%dep] %dst, %src : memref<?xf32, 1>, memref<?xf32>
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
Arg<AnyMemRef, "", [MemWrite]>:$dst,		Arg<AnyMemRef, "", [MemWrite]>:$dst,
Arg<AnyMemRef, "", [MemRead]>:$src);		Arg<AnyMemRef, "", [MemRead]>:$src,
		Optional<GPU_Stream>:$stream);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$dst`,` $src `:` type($dst)`,` type($src) attr-dict		$dst`,` $src (`stream` $stream^)? `:` type($dst)`,` type($src) attr-dict

}];		}];
let hasFolder = 1;		let hasFolder = 1;
let hasVerifier = 1;		let hasVerifier = 1;
let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
}		}

def GPU_MemsetOp : GPU_Op<"memset",		def GPU_MemsetOp : GPU_Op<"memset",
[GPU_AsyncOpInterface, AllElementTypesMatch<["dst", "value"]>]> {		[GPU_AsyncOpInterface, AttrSizedOperandSegments,
		AllElementTypesMatch<["dst", "value"]>]> {

let summary = "GPU memset operation";		let summary = "GPU memset operation";

let description = [{		let description = [{
The `gpu.memset` operation sets the content of memref to a scalar value.		The `gpu.memset` operation sets the content of memref to a scalar value.

The op does not execute before all async dependencies have finished		The op does not execute before all async dependencies have finished
executing.		executing.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token.		that case, it returns a !gpu.async.token.

Example:		Example:

```mlir		```mlir
%token = gpu.memset async [%dep] %dst, %value : memref<?xf32, 1>, f32		%token = gpu.memset async [%dep] %dst, %value : memref<?xf32, 1>, f32
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
Arg<AnyMemRef, "", [MemWrite]>:$dst,		Arg<AnyMemRef, "", [MemWrite]>:$dst,
Arg<AnyType, "">:$value);		Arg<AnyType, "">:$value,
		Optional<GPU_Stream>:$stream);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$dst`,` $value `:` type($dst)`,` type($value) attr-dict		$dst`,` $value (`stream` $stream^)? `:` type($dst)`,` type($value) attr-dict
}];		}];
let hasFolder = 1;		let hasFolder = 1;
}		}

def GPU_SetDefaultDeviceOp : GPU_Op<"set_default_device",		def GPU_SetDefaultDeviceOp : GPU_Op<"set_default_device",
[MemoryEffects<[MemWrite]>]>,		[MemoryEffects<[MemWrite]>]>,
Arguments<(ins I32:$devIndex)> {		Arguments<(ins I32:$devIndex)> {
let summary = "Set default GPU for operations after this by index";		let summary = "Set default GPU for operations after this by index";
▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
}		}
}];		}];

let assemblyFormat = [{		let assemblyFormat = [{
$opType $args attr-dict `:` functional-type($args, $res)		$opType $args attr-dict `:` functional-type($args, $res)
}];		}];
}		}


		def GPU_CreateStreamOp : GPU_Op<"create_stream", [SameVariadicOperandSize]> {

		let description = [{
		The `gpu.create_stream` takes an optional argument `device` as input and
		returns a stream, based on the device. If no device is provided, a default
		device will be created by the underlying runtime.
		The stream is then used for launching/queuing kernels
		on the GPU.
		bondhugulaUnsubmitted Not Done Reply Inline Actions Reflow - use whole width. bondhugula: Reflow - use whole width.

		Example:

		```mlir
		%stream = gpu.create_stream %device : !gpu.stream
		rengolinUnsubmitted Not Done Reply Inline Actions What creates a `device`? How can I get hold of it? rengolin: What creates a `device`? How can I get hold of it?
		nbpatelAuthorUnsubmitted Done Reply Inline Actions Device is optional and just a placeholder for now. The user can pass an opaque pointer as device which will be used by create_stream op. We are just exposing the API. nbpatel: Device is optional and just a placeholder for now. The user can pass an opaque pointer as…
		bondhugulaUnsubmitted Not Done Reply Inline Actions Please document what `device`'s type is/can be and how it's created. bondhugula: Please document what `device`'s type is/can be and how it's created.

		OR

		%stream = gpu.create_stream : !gpu.stream
		```

		}];

		let skipDefaultBuilders = 1;

		let arguments = (ins Optional<GPU_Device> : $device);
		let results = (outs GPU_Stream : $stream);
		let builders = [OpBuilder<(ins CArg<"Value", "Value{}">:$device)>];

		let assemblyFormat = [{
		($device^)? attr-dict `:` type($stream)
		}];

		}

#endif // GPU_OPS		#endif // GPU_OPS

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	bool isLegalToInline(Operation , Region , bool, IRMapping &) const final {
return true;		return true;
}		}
};		};
} // namespace		} // namespace

void GPUDialect::initialize() {		void GPUDialect::initialize() {
addTypes<AsyncTokenType>();		addTypes<AsyncTokenType>();
addTypes<MMAMatrixType>();		addTypes<MMAMatrixType>();
		addTypes<StreamType>();
		addTypes<DeviceType>();
addOperations<		addOperations<
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"		#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"
>();		>();
addAttributes<		addAttributes<
#define GET_ATTRDEF_LIST		#define GET_ATTRDEF_LIST
#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"		#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"
>();		>();
addInterfaces<GPUInlinerInterface>();		addInterfaces<GPUInlinerInterface>();
}		}

Type GPUDialect::parseType(DialectAsmParser &parser) const {		Type GPUDialect::parseType(DialectAsmParser &parser) const {
// Parse the main keyword for the type.		// Parse the main keyword for the type.
StringRef keyword;		StringRef keyword;
if (parser.parseKeyword(&keyword))		if (parser.parseKeyword(&keyword))
return Type();		return Type();
MLIRContext *context = getContext();		MLIRContext *context = getContext();

// Handle 'async token' types.		// Handle 'async token' types.
if (keyword == "async.token")		if (keyword == "async.token")
return AsyncTokenType::get(context);		return AsyncTokenType::get(context);

		if (keyword == "stream")
		return StreamType::get(context);

		if (keyword == "device")
		return DeviceType::get(context);

if (keyword == "mma_matrix") {		if (keyword == "mma_matrix") {
SMLoc beginLoc = parser.getNameLoc();		SMLoc beginLoc = parser.getNameLoc();

// Parse '<'.		// Parse '<'.
if (parser.parseLess())		if (parser.parseLess())
return nullptr;		return nullptr;

// Parse the size and elementType.		// Parse the size and elementType.
Show All 23 Lines	Type GPUDialect::parseType(DialectAsmParser &parser) const {

parser.emitError(parser.getNameLoc(), "unknown gpu type: " + keyword);		parser.emitError(parser.getNameLoc(), "unknown gpu type: " + keyword);
return Type();		return Type();
}		}

void GPUDialect::printType(Type type, DialectAsmPrinter &os) const {		void GPUDialect::printType(Type type, DialectAsmPrinter &os) const {
TypeSwitch<Type>(type)		TypeSwitch<Type>(type)
.Case<AsyncTokenType>([&](Type) { os << "async.token"; })		.Case<AsyncTokenType>([&](Type) { os << "async.token"; })
		.Case<StreamType>([&](Type) { os << "stream"; })
		.Case<DeviceType>([&](Type) { os << "device"; })
.Case<MMAMatrixType>([&](MMAMatrixType fragTy) {		.Case<MMAMatrixType>([&](MMAMatrixType fragTy) {
os << "mma_matrix<";		os << "mma_matrix<";
auto shape = fragTy.getShape();		auto shape = fragTy.getShape();
for (auto dim = shape.begin(), e = shape.end() - 1; dim != e; ++dim)		for (auto dim = shape.begin(), e = shape.end() - 1; dim != e; ++dim)
os << *dim << 'x';		os << *dim << 'x';
os << shape.back() << 'x' << fragTy.getElementType();		os << shape.back() << 'x' << fragTy.getElementType();
os << ", \"" << fragTy.getOperand() << "\"" << '>';		os << ", \"" << fragTy.getOperand() << "\"" << '>';
})		})
▲ Show 20 Lines • Show All 507 Lines • ▼ Show 20 Lines
};		};

void LaunchOp::getCanonicalizationPatterns(RewritePatternSet &rewrites,		void LaunchOp::getCanonicalizationPatterns(RewritePatternSet &rewrites,
MLIRContext *context) {		MLIRContext *context) {
rewrites.add<FoldLaunchArguments>(context);		rewrites.add<FoldLaunchArguments>(context);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// CreateStreamOp
		//===----------------------------------------------------------------------===//

		void CreateStreamOp::build(OpBuilder &odsBuilder, OperationState &result,
		Value device) {
		if (device)
		rengolinUnsubmitted Not Done Reply Inline Actions No validation for what a device is or should be? rengolin: No validation for what a device is or should be?
		nbpatelAuthorUnsubmitted Done Reply Inline Actions what sort of validation you expect here? nbpatel: what sort of validation you expect here?
		result.addOperands(device);

		SmallVector<int32_t, 1> segmentSizes(1, 1);
		segmentSizes.front() = device ? 1 : 0;
		}

		//===----------------------------------------------------------------------===//
// LaunchFuncOp		// LaunchFuncOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void LaunchFuncOp::build(OpBuilder &builder, OperationState &result,		void LaunchFuncOp::build(OpBuilder &builder, OperationState &result,
GPUFuncOp kernelFunc, KernelDim3 gridSize,		GPUFuncOp kernelFunc, KernelDim3 gridSize,
KernelDim3 getBlockSize, Value dynamicSharedMemorySize,		KernelDim3 getBlockSize, Value dynamicSharedMemorySize,
ValueRange kernelOperands, Type asyncTokenType,		ValueRange kernelOperands, Type asyncTokenType,
ValueRange asyncDependencies) {		ValueRange asyncDependencies, Value stream) {
result.addOperands(asyncDependencies);		result.addOperands(asyncDependencies);
if (asyncTokenType)		if (asyncTokenType)
result.types.push_back(builder.getType<AsyncTokenType>());		result.types.push_back(builder.getType<AsyncTokenType>());

// Add grid and block sizes as op operands, followed by the data operands.		// Add grid and block sizes as op operands, followed by the data operands.
result.addOperands({gridSize.x, gridSize.y, gridSize.z, getBlockSize.x,		result.addOperands({gridSize.x, gridSize.y, gridSize.z, getBlockSize.x,
getBlockSize.y, getBlockSize.z});		getBlockSize.y, getBlockSize.z});
if (dynamicSharedMemorySize)		if (dynamicSharedMemorySize)
result.addOperands(dynamicSharedMemorySize);		result.addOperands(dynamicSharedMemorySize);
result.addOperands(kernelOperands);		result.addOperands(kernelOperands);
auto kernelModule = kernelFunc->getParentOfType<GPUModuleOp>();		auto kernelModule = kernelFunc->getParentOfType<GPUModuleOp>();
auto kernelSymbol =		auto kernelSymbol =
SymbolRefAttr::get(kernelModule.getNameAttr(),		SymbolRefAttr::get(kernelModule.getNameAttr(),
{SymbolRefAttr::get(kernelFunc.getNameAttr())});		{SymbolRefAttr::get(kernelFunc.getNameAttr())});
result.addAttribute(getKernelAttrName(result.name), kernelSymbol);		result.addAttribute(getKernelAttrName(result.name), kernelSymbol);
SmallVector<int32_t, 9> segmentSizes(9, 1);
		if (stream)
		result.addOperands(stream);

		SmallVector<int32_t, 10> segmentSizes(10, 1);
segmentSizes.front() = asyncDependencies.size();		segmentSizes.front() = asyncDependencies.size();
segmentSizes[segmentSizes.size() - 2] = dynamicSharedMemorySize ? 1 : 0;		segmentSizes[segmentSizes.size() - 3] = dynamicSharedMemorySize ? 1 : 0;
segmentSizes.back() = static_cast<int32_t>(kernelOperands.size());		segmentSizes[segmentSizes.size() - 2] =
		static_cast<int32_t>(kernelOperands.size());
		segmentSizes.back() = stream ? 1 : 0;
result.addAttribute(getOperandSegmentSizeAttr(),		result.addAttribute(getOperandSegmentSizeAttr(),
builder.getDenseI32ArrayAttr(segmentSizes));		builder.getDenseI32ArrayAttr(segmentSizes));
}		}

StringAttr LaunchFuncOp::getKernelModuleName() {		StringAttr LaunchFuncOp::getKernelModuleName() {
return getKernel().getRootReference();		return getKernel().getRootReference();
}		}

▲ Show 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	LogicalResult MemsetOp::fold(FoldAdaptor adaptor,
SmallVectorImpl<::mlir::OpFoldResult> &results) {		SmallVectorImpl<::mlir::OpFoldResult> &results) {
return memref::foldMemRefCast(*this);		return memref::foldMemRefCast(*this);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GPU_WaitOp		// GPU_WaitOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		void WaitOp::build(OpBuilder &builder, OperationState &result,
		Type asyncTokenType, ValueRange asyncDependencies,
		Value stream) {
		result.addOperands(asyncDependencies);
		if (asyncTokenType)
		result.types.push_back(builder.getType<AsyncTokenType>());

		if (stream)
		result.addOperands(stream);

		SmallVector<int32_t, 2> segmentSizes(2, 1);
		segmentSizes.front() = asyncDependencies.size();
		segmentSizes.back() = stream ? 1 : 0;
		result.addAttribute(getOperandSegmentSizeAttr(),
		builder.getDenseI32ArrayAttr(segmentSizes));
		}

namespace {		namespace {

/// Remove gpu.wait op use of gpu.wait op def without async dependencies.		/// Remove gpu.wait op use of gpu.wait op def without async dependencies.
/// %t = gpu.wait async [] // No async dependencies.		/// %t = gpu.wait async [] // No async dependencies.
/// ... gpu.wait ... [%t, ...] // %t can be removed.		/// ... gpu.wait ... [%t, ...] // %t can be removed.
struct EraseRedundantGpuWaitOpPairs : public OpRewritePattern<WaitOp> {		struct EraseRedundantGpuWaitOpPairs : public OpRewritePattern<WaitOp> {
public:		public:
using OpRewritePattern::OpRewritePattern;		using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(WaitOp op,		LogicalResult matchAndRewrite(WaitOp op,
PatternRewriter &rewriter) const final {		PatternRewriter &rewriter) const final {
auto predicate = [](Value value) {		auto predicate = [](Value value) {
auto waitOp = value.getDefiningOp<WaitOp>();		auto waitOp = value.getDefiningOp<WaitOp>();
return waitOp && waitOp->getNumOperands() == 0;		return waitOp && waitOp.getAsyncDependencies().size() == 0;
};		};

if (llvm::none_of(op.getAsyncDependencies(), predicate))		if (llvm::none_of(op.getAsyncDependencies(), predicate))
return failure();		return failure();

SmallVector<Value> validOperands;		SmallVector<Value> validOperands;
for (Value operand : op->getOperands()) {		for (Value operand : op.getAsyncDependencies()) {
if (predicate(operand))		if (predicate(operand))
continue;		continue;
validOperands.push_back(operand);		validOperands.push_back(operand);
}		}
rewriter.updateRootInPlace(op, [&]() { op->setOperands(validOperands); });
		rewriter.updateRootInPlace(
		op, [&]() { op.getAsyncDependenciesMutable().assign(validOperands); });
return success();		return success();
}		}
};		};

/// Simplify trivial gpu.wait ops for the following patterns.		/// Simplify trivial gpu.wait ops for the following patterns.
/// 1. %t = gpu.wait async ... ops, where %t has no uses (regardless of async		/// 1. %t = gpu.wait async ... ops, where %t has no uses (regardless of async
/// dependencies).		/// dependencies).
/// 2. %t1 = gpu.wait async [%t0], in this case, we can replace uses of %t1 with		/// 2. %t1 = gpu.wait async [%t0], in this case, we can replace uses of %t1 with
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/ops.mlir

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	module attributes {gpu.container_module} {
func.func @foo() {		func.func @foo() {
%0 = "op"() : () -> (f32)		%0 = "op"() : () -> (f32)
%1 = "op"() : () -> (memref<?xf32, 1>)		%1 = "op"() : () -> (memref<?xf32, 1>)
// CHECK: %{{.*}} = arith.constant 8		// CHECK: %{{.*}} = arith.constant 8
%cst = arith.constant 8 : index		%cst = arith.constant 8 : index
%c0 = arith.constant 0 : i32		%c0 = arith.constant 0 : i32
%t0 = gpu.wait async		%t0 = gpu.wait async

		// CHECK: %[[stream:.*]] = gpu.create_stream : !gpu.stream
		rengolinUnsubmitted Not Done Reply Inline Actions You add functionality to pass a device but offers no test to how a device should be created, used or destroyed. rengolin: You add functionality to pass a device but offers no test to how a device should be created…
		nbpatelAuthorUnsubmitted Done Reply Inline Actions Will add a test. nbpatel: Will add a test.
		%stream = gpu.create_stream : !gpu.stream

// CHECK: gpu.launch_func @kernels::@kernel_1 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}}) args(%{{.}} : f32, %{{.}} : memref<?xf32, 1>)		// CHECK: gpu.launch_func @kernels::@kernel_1 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}}) args(%{{.}} : f32, %{{.}} : memref<?xf32, 1>)
gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) args(%0 : f32, %1 : memref<?xf32, 1>)		gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) args(%0 : f32, %1 : memref<?xf32, 1>)

		gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) stream %stream args(%0 : f32, %1 : memref<?xf32, 1>)

		gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) stream %stream dynamic_shared_memory_size %c0 args(%0 : f32, %1 : memref<?xf32, 1>)

gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) dynamic_shared_memory_size %c0 args(%0 : f32, %1 : memref<?xf32, 1>)		gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) dynamic_shared_memory_size %c0 args(%0 : f32, %1 : memref<?xf32, 1>)

// CHECK: gpu.launch_func @kernels::@kernel_2 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}})		// CHECK: gpu.launch_func @kernels::@kernel_2 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}})
gpu.launch_func @kernels::@kernel_2 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst)		gpu.launch_func @kernels::@kernel_2 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst)

// CHECK: %{{.}} = gpu.launch_func async [%{{.}}] @kernels::@kernel_2 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}})		// CHECK: %{{.}} = gpu.launch_func async [%{{.}}] @kernels::@kernel_2 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}})
%t1 = gpu.launch_func async [%t0] @kernels::@kernel_2 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst)		%t1 = gpu.launch_func async [%t0] @kernels::@kernel_2 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst)

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	gpu.module @explicit_attributions {
^bb0(%arg0: f32, %arg1: memref<?xf32>, %arg2: memref<5xf32, 3>, %arg3: memref<5xf32, 5>):		^bb0(%arg0: f32, %arg1: memref<?xf32>, %arg2: memref<5xf32, 3>, %arg3: memref<5xf32, 5>):
"gpu.return"() : () -> ()		"gpu.return"() : () -> ()
} ) {function_type = (f32, memref<?xf32>) -> (), gpu.kernel, sym_name = "kernel_1", workgroup_attributions = 1: i64} : () -> ()		} ) {function_type = (f32, memref<?xf32>) -> (), gpu.kernel, sym_name = "kernel_1", workgroup_attributions = 1: i64} : () -> ()
}		}

func.func @alloc() {		func.func @alloc() {
// CHECK-LABEL: func @alloc()		// CHECK-LABEL: func @alloc()


		bondhugulaUnsubmitted Not Done Reply Inline Actions Drop blank line. bondhugula: Drop blank line.
// CHECK: %[[m0:.*]] = gpu.alloc () : memref<13xf32, 1>		// CHECK: %[[m0:.*]] = gpu.alloc () : memref<13xf32, 1>
%m0 = gpu.alloc () : memref<13xf32, 1>		%m0 = gpu.alloc () : memref<13xf32, 1>
// CHECK: gpu.dealloc %[[m0]] : memref<13xf32, 1>		// CHECK: gpu.dealloc %[[m0]] : memref<13xf32, 1>
gpu.dealloc %m0 : memref<13xf32, 1>		gpu.dealloc %m0 : memref<13xf32, 1>

%t0 = gpu.wait async		%t0 = gpu.wait async
// CHECK: %[[m1:.]], %[[t1:.]] = gpu.alloc async [{{.*}}] () : memref<13xf32, 1>		// CHECK: %[[m1:.]], %[[t1:.]] = gpu.alloc async [{{.*}}] () : memref<13xf32, 1>
%m1, %t1 = gpu.alloc async [%t0] () : memref<13xf32, 1>		%m1, %t1 = gpu.alloc async [%t0] () : memref<13xf32, 1>
// CHECK: gpu.dealloc async [%[[t1]]] %[[m1]] : memref<13xf32, 1>		// CHECK: gpu.dealloc async [%[[t1]]] %[[m1]] : memref<13xf32, 1>
%t2 = gpu.dealloc async [%t1] %m1 : memref<13xf32, 1>		%t2 = gpu.dealloc async [%t1] %m1 : memref<13xf32, 1>

// CHECK: %[[m2:.*]] = gpu.alloc host_shared () : memref<13xf32, 1>		// CHECK: %[[m2:.*]] = gpu.alloc host_shared () : memref<13xf32, 1>
%m2 = gpu.alloc host_shared () : memref<13xf32, 1>		%m2 = gpu.alloc host_shared () : memref<13xf32, 1>
// CHECK: gpu.dealloc %[[m2]] : memref<13xf32, 1>		// CHECK: gpu.dealloc %[[m2]] : memref<13xf32, 1>
gpu.dealloc %m2 : memref<13xf32, 1>		gpu.dealloc %m2 : memref<13xf32, 1>

		// CHECK: %[[stream:.*]] = gpu.create_stream : !gpu.stream
		%stream = gpu.create_stream : !gpu.stream
		// CHECK: %[[m3:.*]] = gpu.alloc () stream %[[stream]] : memref<13xf32, 1>
		%m3 = gpu.alloc () stream %stream : memref<13xf32, 1>
		// CHECK: gpu.dealloc %[[m3]] stream %[[stream]] : memref<13xf32, 1>
		gpu.dealloc %m3 stream %stream : memref<13xf32, 1>

return		return
}		}

func.func @async_token(%arg0 : !gpu.async.token) -> !gpu.async.token {		func.func @async_token(%arg0 : !gpu.async.token) -> !gpu.async.token {
// CHECK-LABEL: func @async_token({{.*}}: !gpu.async.token)		// CHECK-LABEL: func @async_token({{.*}}: !gpu.async.token)
// CHECK: return {{.*}} : !gpu.async.token		// CHECK: return {{.*}} : !gpu.async.token
return %arg0 : !gpu.async.token		return %arg0 : !gpu.async.token
}		}
Show All 18 Lines	module attributes {gpu.container_module} {
func.func @memcpy(%dst : memref<3x7xf32>, %src : memref<3x7xf32, 1>) {		func.func @memcpy(%dst : memref<3x7xf32>, %src : memref<3x7xf32, 1>) {
// CHECK-LABEL: func @memcpy		// CHECK-LABEL: func @memcpy
// CHECK: gpu.memcpy {{.}}, {{.}} : memref<3x7xf32>, memref<3x7xf32, 1>		// CHECK: gpu.memcpy {{.}}, {{.}} : memref<3x7xf32>, memref<3x7xf32, 1>
gpu.memcpy %dst, %src : memref<3x7xf32>, memref<3x7xf32, 1>		gpu.memcpy %dst, %src : memref<3x7xf32>, memref<3x7xf32, 1>
// CHECK: %[[t0:.*]] = gpu.wait async		// CHECK: %[[t0:.*]] = gpu.wait async
%0 = gpu.wait async		%0 = gpu.wait async
// CHECK: {{.}} = gpu.memcpy async [%[[t0]]] {{.}}, {{.*}} : memref<3x7xf32>, memref<3x7xf32, 1>		// CHECK: {{.}} = gpu.memcpy async [%[[t0]]] {{.}}, {{.*}} : memref<3x7xf32>, memref<3x7xf32, 1>
%1 = gpu.memcpy async [%0] %dst, %src : memref<3x7xf32>, memref<3x7xf32, 1>		%1 = gpu.memcpy async [%0] %dst, %src : memref<3x7xf32>, memref<3x7xf32, 1>

		// CHECK: %[[stream:.*]] = gpu.create_stream : !gpu.stream
		%stream = gpu.create_stream : !gpu.stream
		// CHECK: gpu.memcpy {{.}}, {{.}} stream %[[stream]] : memref<3x7xf32>, memref<3x7xf32, 1>
		gpu.memcpy %dst, %src stream %stream : memref<3x7xf32>, memref<3x7xf32, 1>
		// CHECK: %[[t1:.*]] = gpu.wait async stream %[[stream]]
		%2 = gpu.wait async stream %stream
		// CHECK: {{.}} = gpu.memcpy async [%[[t1]]] {{.}}, {{.*}} stream %[[stream]] : memref<3x7xf32>, memref<3x7xf32, 1>
		%3 = gpu.memcpy async [%2] %dst, %src stream %stream : memref<3x7xf32>, memref<3x7xf32, 1>

return		return
}		}

func.func @memset(%dst : memref<3x7xf32>, %value : f32) {		func.func @memset(%dst : memref<3x7xf32>, %value : f32) {
// CHECK-LABEL: func @memset		// CHECK-LABEL: func @memset
// CHECK: gpu.memset {{.}}, {{.}} : memref<3x7xf32>, f32		// CHECK: gpu.memset {{.}}, {{.}} : memref<3x7xf32>, f32
gpu.memset %dst, %value : memref<3x7xf32>, f32		gpu.memset %dst, %value : memref<3x7xf32>, f32
// CHECK: %[[t0:.*]] = gpu.wait async		// CHECK: %[[t0:.*]] = gpu.wait async
%0 = gpu.wait async		%0 = gpu.wait async
// CHECK: {{.}} = gpu.memset async [%[[t0]]] {{.}}, {{.*}} : memref<3x7xf32>, f32		// CHECK: {{.}} = gpu.memset async [%[[t0]]] {{.}}, {{.*}} : memref<3x7xf32>, f32
%1 = gpu.memset async [%0] %dst, %value : memref<3x7xf32>, f32		%1 = gpu.memset async [%0] %dst, %value : memref<3x7xf32>, f32

		// CHECK: %[[stream:.*]] = gpu.create_stream : !gpu.stream
		%stream = gpu.create_stream : !gpu.stream
		// CHECK: gpu.memset {{.}}, {{.}} stream %[[stream]] : memref<3x7xf32>, f32
		gpu.memset %dst, %value stream %stream : memref<3x7xf32>, f32
		// CHECK: %[[t1:.*]] = gpu.wait async stream %[[stream]]
		%2 = gpu.wait async stream %stream
		// CHECK: {{.}} = gpu.memset async [%[[t1]]] {{.}}, {{.*}} stream %[[stream]] : memref<3x7xf32>, f32
		%3 = gpu.memset async [%2] %dst, %value stream %stream : memref<3x7xf32>, f32

return		return
}		}

func.func @mmamatrix_valid_scalar_element_type(%src : memref<32x32xf16, affine_map<(d0, d1) -> (d0 * 64 + d1)>>){		func.func @mmamatrix_valid_scalar_element_type(%src : memref<32x32xf16, affine_map<(d0, d1) -> (d0 * 64 + d1)>>){
// CHECK-LABEL: func @mmamatrix_valid_scalar_element_type		// CHECK-LABEL: func @mmamatrix_valid_scalar_element_type
%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>		%wg = memref.alloca() {alignment = 32} : memref<32x32xf16, 3>
// CHECK: %[[wg:.*]] = memref.alloca()		// CHECK: %[[wg:.*]] = memref.alloca()
%i = arith.constant 16 : index		%i = arith.constant 16 : index
Show All 38 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPU] Add gpu.create_stream op and add stream as an optional argument to gpu opsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 507073

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/test/Dialect/GPU/ops.mlir

[mlir][GPU] Add gpu.create_stream op and add stream as an optional argument to gpu ops
Needs ReviewPublic