This is an archive of the discontinued LLVM Phabricator instance.

mlir/test/Dialect/GPU/promotion.mlir
100	You can remove all of the empty module wrappers, module is implicitly the top-level operation.
mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp
28 ↗	(On Diff #258911)	Can you just change the pipeline specification in the tests instead of changing this pass?
30 ↗	(On Diff #258911)	MLIR uses camelCase for variable names.

Nice!

mlir/test/Dialect/GPU/promotion.mlir
1–3	Maybe something like // RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='module(gpu.module(gpu.func(test-gpu-memory-promotion)))' -split-input-file %s \| FileCheck %s is enough to direct the test pass to the functions through the extra wrapping modules.
55	Why this change?
mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp
28 ↗	(On Diff #258911)	See my above comment for how to do this. You can then revert these changes,

This revision now requires changes to proceed.Apr 21 2020, 1:11 AM

Harbormaster failed remote builds in B54047: Diff 258911!Apr 21 2020, 2:07 AM

Simplify test case.

Herald added a subscriber: aartbik. · View Herald TranscriptApr 21 2020, 3:06 AM

frgossen added inline comments.Apr 21 2020, 3:10 AM

mlir/test/Dialect/GPU/promotion.mlir
55	Was not intended. Locally, I did not see this because all the indented lines appear as a change.
100	Didn't know that. Thanks!
mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp
28 ↗	(On Diff #258911)	That's much nicer. Thanks!

ftynse accepted this revision.Apr 21 2020, 3:53 AM

Harbormaster failed remote builds in B54071: Diff 258944!Apr 21 2020, 4:17 AM

Thanks!

This revision is now accepted and ready to land.Apr 21 2020, 4:43 AM

@ftynse , @herhut is this giving any added semantic advantage? I have been trying to see if there is a way to target gpu.func directly without going through gpu.launch + outlining. Having the requirement that a gpu.func always live in a gpu.module seems too restrictive. Just want to understand the advantage apart from the fact that all gpu.func need to live in a module "separate" from the host side.

This revision now requires changes to proceed.Apr 21 2020, 8:54 AM

In D78541#1994763, @mravishankar wrote:

@ftynse , @herhut is this giving any added semantic advantage? I have been trying to see if there is a way to target gpu.func directly without going through gpu.launch + outlining. Having the requirement that a gpu.func always live in a gpu.module seems too restrictive. Just want to understand the advantage apart from the fact that all gpu.func need to live in a module "separate" from the host side.

We always had a requirement that gpu.funcs live in a somehow special module. Originally, it was a module with a kernel_module attribute. This is necessary to pick out the modules that should be translated to device-specific dialects. Later, gpu.module was introduced, but the old condition with the attribute was not removed. This change basically completes the transition.

I do see some value in having a dedicated gpu.module to clearly separate what is intended for device and what is not. Your issue with generating a gpu.func without gpu.launch_func seems mostly orthogonal: this is a matter of writing the conversion, not of the rules the resulting IR must follow. If you have another conversion pass (or just IR building code) that creates a gpu.func, it can equally create a gpu.module first. Not having a gpu.module would put the semantics in a weird state when, for example, the same function is called from both gpu.func and from std.func or if there is some global state mutation.

@mravishankar, can I land this?

Herald added a subscriber: Kayjukh. · View Herald TranscriptApr 23 2020, 5:20 AM

Sorry for the delay. Let's land this as is. If I have a more concrete issue/suggestion on this aspect, I can revisit this.

This revision is now accepted and ready to land.Apr 23 2020, 7:16 AM

Closed by commit rG7e4b139a04d7: [MLIR] Ensure `gpu.func` must be inside a `gpu.module`. (authored by frgossen). · Explain WhyApr 24 2020, 12:29 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

GPUBase.td

3 lines

GPUOps.td

3 lines

test/

Dialect/

GPU/

4 lines

4 lines

21 lines

2 lines

11 lines

Diff 258944

mlir/include/mlir/Dialect/GPU/GPUBase.td

Show All 27 Lines	let extraClassDeclaration = [{
/// kernel modules.		/// kernel modules.
static StringRef getContainerModuleAttrName() {		static StringRef getContainerModuleAttrName() {
return "gpu.container_module";		return "gpu.container_module";
}		}
/// Get the name of the attribute used to annotate external kernel		/// Get the name of the attribute used to annotate external kernel
/// functions.		/// functions.
static StringRef getKernelFuncAttrName() { return "gpu.kernel"; }		static StringRef getKernelFuncAttrName() { return "gpu.kernel"; }

/// Get the name of the attribute used to annotate kernel modules.
static StringRef getKernelModuleAttrName() { return "gpu.kernel_module"; }

/// Returns whether the given function is a kernel function, i.e., has the		/// Returns whether the given function is a kernel function, i.e., has the
/// 'gpu.kernel' attribute.		/// 'gpu.kernel' attribute.
static bool isKernel(Operation *op);		static bool isKernel(Operation *op);

/// Returns the number of workgroup (thread, block) dimensions supported in		/// Returns the number of workgroup (thread, block) dimensions supported in
/// the GPU dialect.		/// the GPU dialect.
// TODO(zinenko,herhut): consider generalizing this.		// TODO(zinenko,herhut): consider generalizing this.
static unsigned getNumWorkgroupDimensions() { return 3; }		static unsigned getNumWorkgroupDimensions() { return 3; }
Show All 12 Lines

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	let description = [{
Example:		Example:

```mlir		```mlir
%tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index)		%tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index)
```		```
}];		}];
}		}

def GPU_GPUFuncOp : GPU_Op<"func", [AutomaticAllocationScope, FunctionLike,		def GPU_GPUFuncOp : GPU_Op<"func", [HasParent<"GPUModuleOp">,
		AutomaticAllocationScope, FunctionLike,
IsolatedFromAbove, Symbol]> {		IsolatedFromAbove, Symbol]> {
let summary = "Function executable on a GPU";		let summary = "Function executable on a GPU";

let description = [{		let description = [{
Defines a function that can be executed on a GPU. This supports memory		Defines a function that can be executed on a GPU. This supports memory
attribution and its body has a particular execution model.		attribution and its body has a particular execution model.

GPU functions are either kernels (as indicated by the `kernel` attribute) or		GPU functions are either kernels (as indicated by the `kernel` attribute) or
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/all-reduce-max.mlir

	// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s			// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s

	// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py			// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py
	// CHECK: module @kernels attributes {gpu.kernel_module} {			// CHECK: gpu.module @kernels {
	module @kernels attributes {gpu.kernel_module} {			gpu.module @kernels {

	// CHECK-LABEL: gpu.func @kernel(			// CHECK-LABEL: gpu.func @kernel(
	// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {			// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {
	gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {			gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {
	// CHECK: [[VAL_2:%.*]] = constant 31 : i32			// CHECK: [[VAL_2:%.*]] = constant 31 : i32
	// CHECK: [[VAL_3:%.*]] = constant 0 : i32			// CHECK: [[VAL_3:%.*]] = constant 0 : i32
	// CHECK: [[VAL_4:%.*]] = constant 0 : index			// CHECK: [[VAL_4:%.*]] = constant 0 : index
	// CHECK: [[VAL_5:%.*]] = constant 32 : i32			// CHECK: [[VAL_5:%.*]] = constant 32 : i32
	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/all-reduce.mlir

	// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s			// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s

	// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py			// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py
	// CHECK: module @kernels attributes {gpu.kernel_module} {			// CHECK: gpu.module @kernels {
	module @kernels attributes {gpu.kernel_module} {			gpu.module @kernels {

	// CHECK-LABEL: gpu.func @kernel(			// CHECK-LABEL: gpu.func @kernel(
	// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {			// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {
	gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {			gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {
	// CHECK: [[VAL_2:%.*]] = constant 31 : i32			// CHECK: [[VAL_2:%.*]] = constant 31 : i32
	// CHECK: [[VAL_3:%.*]] = constant 0 : i32			// CHECK: [[VAL_3:%.*]] = constant 0 : i32
	// CHECK: [[VAL_4:%.*]] = constant 0 : index			// CHECK: [[VAL_4:%.*]] = constant 0 : index
	// CHECK: [[VAL_5:%.*]] = constant 32 : i32			// CHECK: [[VAL_5:%.*]] = constant 32 : i32
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/invalid.mlir

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	func @launch_func_undefined_module(%sz : index) {
return		return
}		}
}		}

// -----		// -----

module attributes {gpu.container_module} {		module attributes {gpu.container_module} {
module @kernels {		module @kernels {
		// expected-error@+1 {{'gpu.func' op expects parent op 'gpu.module'}}
		gpu.func @kernel_1(%arg1 : !llvm<"float*">) {
		gpu.return
		}
		}
		}

		// -----

		module attributes {gpu.container_module} {
		module @kernels {
}		}

func @launch_func_missing_module_attribute(%sz : index) {		func @launch_func_missing_module_attribute(%sz : index) {
// expected-error@+1 {{kernel module 'kernels' is undefined}}		// expected-error@+1 {{kernel module 'kernels' is undefined}}
"gpu.launch_func"(%sz, %sz, %sz, %sz, %sz, %sz)		"gpu.launch_func"(%sz, %sz, %sz, %sz, %sz, %sz)
{ kernel = "kernel_1", kernel_module = @kernels }		{ kernel = "kernel_1", kernel_module = @kernels }
: (index, index, index, index, index, index) -> ()		: (index, index, index, index, index, index) -> ()
return		return
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	^bb0(%arg0: f32):
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{requires 'type' attribute of function type}}		// expected-error @+1 {{requires 'type' attribute of function type}}
"gpu.func"() ({		"gpu.func"() ({
gpu.return		gpu.return
}) {sym_name="kernel_1", type=f32} : () -> ()		}) {sym_name="kernel_1", type=f32} : () -> ()
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memref type in attribution}}		// expected-error @+1 {{expected memref type in attribution}}
gpu.func @kernel() workgroup(%0: i32) {		gpu.func @kernel() workgroup(%0: i32) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 3 in attribution}}		// expected-error @+1 {{expected memory space 3 in attribution}}
gpu.func @kernel() workgroup(%0: memref<4xf32>) {		gpu.func @kernel() workgroup(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 5 in attribution}}		// expected-error @+1 {{expected memory space 5 in attribution}}
gpu.func @kernel() private(%0: memref<4xf32>) {		gpu.func @kernel() private(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 5 in attribution}}		// expected-error @+1 {{expected memory space 5 in attribution}}
gpu.func @kernel() private(%0: memref<4xf32>) {		gpu.func @kernel() private(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----
Show All 23 Lines

mlir/test/Dialect/GPU/ops.mlir

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	func @foo() {
// CHECK: "gpu.launch_func"(%{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}) {kernel = "kernel_2", kernel_module = @kernels} : (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()		// CHECK: "gpu.launch_func"(%{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}) {kernel = "kernel_2", kernel_module = @kernels} : (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()
"gpu.launch_func"(%cst, %cst, %cst, %cst, %cst, %cst, %0, %1)		"gpu.launch_func"(%cst, %cst, %cst, %cst, %cst, %cst, %0, %1)
{ kernel = "kernel_2", kernel_module = @kernels }		{ kernel = "kernel_2", kernel_module = @kernels }
: (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()		: (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()

return		return
}		}

module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// CHECK-LABEL: gpu.func @kernel_1({{.*}}: f32)		// CHECK-LABEL: gpu.func @kernel_1({{.*}}: f32)
// CHECK: workgroup		// CHECK: workgroup
// CHECK: private		// CHECK: private
// CHECK: attributes		// CHECK: attributes
gpu.func @kernel_1(%arg0: f32)		gpu.func @kernel_1(%arg0: f32)
workgroup(%arg1: memref<42xf32, 3>)		workgroup(%arg1: memref<42xf32, 3>)
private(%arg2: memref<2xf32, 5>, %arg3: memref<1xf32, 5>)		private(%arg2: memref<2xf32, 5>, %arg3: memref<1xf32, 5>)
kernel		kernel
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/promotion.mlir

// RUN: mlir-opt -allow-unregistered-dialect -test-gpu-memory-promotion -split-input-file %s \| FileCheck %s		// RUN: mlir-opt -allow-unregistered-dialect -test-gpu-memory-promotion -pass-pipeline='gpu.module(gpu.func(test-gpu-memory-promotion))' -split-input-file %s \| FileCheck %s

		gpu.module @foo {
		herhutUnsubmitted Done Reply Inline Actions Maybe something like // RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='module(gpu.module(gpu.func(test-gpu-memory-promotion)))' -split-input-file %s \| FileCheck %s is enough to direct the test pass to the functions through the extra wrapping modules. herhut: Maybe something like ``` // RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='module…

module @foo attributes {gpu.kernel_module} {
// Verify that the attribution was indeed introduced		// Verify that the attribution was indeed introduced
// CHECK-LABEL: @memref3d		// CHECK-LABEL: @memref3d
// CHECK-SAME: (%[[arg:.*]]: memref<5x4xf32>		// CHECK-SAME: (%[[arg:.*]]: memref<5x4xf32>
// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<5x4xf32, 3>)		// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<5x4xf32, 3>)
gpu.func @memref3d(%arg0: memref<5x4xf32> {gpu.test_promote_workgroup}) kernel {		gpu.func @memref3d(%arg0: memref<5x4xf32> {gpu.test_promote_workgroup}) kernel {
// Verify that loop bounds are emitted, the order does not matter.		// Verify that loop bounds are emitted, the order does not matter.
// CHECK-DAG: %[[c1:.*]] = constant 1		// CHECK-DAG: %[[c1:.*]] = constant 1
// CHECK-DAG: %[[c4:.*]] = constant 4		// CHECK-DAG: %[[c4:.*]] = constant 4
Show All 32 Lines	gpu.func @memref3d(%arg0: memref<5x4xf32> {gpu.test_promote_workgroup}) kernel {
// CHECK: %[[v:.*]] = load %[[promoted]][%[[i1]], %[[i2]]]		// CHECK: %[[v:.*]] = load %[[promoted]][%[[i1]], %[[i2]]]
// CHECK: store %[[v]], %[[arg]][%[[i1]], %[[i2]]]		// CHECK: store %[[v]], %[[arg]][%[[i1]], %[[i2]]]
gpu.return		gpu.return
}		}
}		}

// -----		// -----

module @foo attributes {gpu.kernel_module} {		gpu.module @foo {

// Verify that the attribution was indeed introduced		// Verify that the attribution was indeed introduced
		herhutUnsubmitted Done Reply Inline Actions Why this change? herhut: Why this change?
		frgossenAuthorUnsubmitted Done Reply Inline Actions Was not intended. Locally, I did not see this because all the indented lines appear as a change. frgossen: Was not intended. Locally, I did not see this because all the indented lines appear as a change.
// CHECK-LABEL: @memref5d		// CHECK-LABEL: @memref5d
// CHECK-SAME: (%[[arg:.*]]: memref<8x7x6x5x4xf32>		// CHECK-SAME: (%[[arg:.*]]: memref<8x7x6x5x4xf32>
// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<8x7x6x5x4xf32, 3>)		// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<8x7x6x5x4xf32, 3>)
gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {		gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {
// Verify that loop bounds are emitted, the order does not matter.		// Verify that loop bounds are emitted, the order does not matter.
// CHECK-DAG: %[[c0:.*]] = constant 0		// CHECK-DAG: %[[c0:.*]] = constant 0
// CHECK-DAG: %[[c1:.*]] = constant 1		// CHECK-DAG: %[[c1:.*]] = constant 1
// CHECK-DAG: %[[c4:.*]] = constant 4		// CHECK-DAG: %[[c4:.*]] = constant 4
Show All 28 Lines	gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {
// CHECK: loop.for %[[i1:.*]] =		// CHECK: loop.for %[[i1:.*]] =
// CHECK: loop.for %[[i2:.*]] =		// CHECK: loop.for %[[i2:.*]] =
// CHECK: loop.for %[[i3:.*]] =		// CHECK: loop.for %[[i3:.*]] =
// CHECK: loop.for %[[i4:.*]] =		// CHECK: loop.for %[[i4:.*]] =

// Verify that the copy is emitted.		// Verify that the copy is emitted.
// CHECK: %[[v:.*]] = load %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]		// CHECK: %[[v:.*]] = load %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
// CHECK: store %[[v]], %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]		// CHECK: store %[[v]], %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
gpu.return		gpu.return
		rriddleUnsubmitted Done Reply Inline Actions You can remove all of the empty module wrappers, module is implicitly the top-level operation. rriddle: You can remove all of the empty module wrappers, module is implicitly the top-level operation.
		frgossenAuthorUnsubmitted Done Reply Inline Actions Didn't know that. Thanks! frgossen: Didn't know that. Thanks!
}		}
}		}

// -----		// -----

module @foo attributes {gpu.kernel_module} {		gpu.module @foo {

// Check that attribution insertion works fine.		// Check that attribution insertion works fine.
// CHECK-LABEL: @insert		// CHECK-LABEL: @insert
// CHECK-SAME: (%{{.*}}: memref<4xf32>		// CHECK-SAME: (%{{.*}}: memref<4xf32>
// CHECK-SAME: workgroup(%{{.*}}: memref<1x1xf64, 3>		// CHECK-SAME: workgroup(%{{.*}}: memref<1x1xf64, 3>
// CHECK-SAME: %[[wg2:.*]] : memref<4xf32, 3>)		// CHECK-SAME: %[[wg2:.*]] : memref<4xf32, 3>)
// CHECK-SAME: private(%{{.*}}: memref<1x1xi64, 5>)		// CHECK-SAME: private(%{{.*}}: memref<1x1xi64, 5>)
gpu.func @insert(%arg0: memref<4xf32> {gpu.test_promote_workgroup})		gpu.func @insert(%arg0: memref<4xf32> {gpu.test_promote_workgroup})
workgroup(%arg1: memref<1x1xf64, 3>)		workgroup(%arg1: memref<1x1xf64, 3>)
private(%arg2: memref<1x1xi64, 5>)		private(%arg2: memref<1x1xi64, 5>)
kernel {		kernel {
// CHECK: "use"(%[[wg2]])		// CHECK: "use"(%[[wg2]])
"use"(%arg0) : (memref<4xf32>) -> ()		"use"(%arg0) : (memref<4xf32>) -> ()
gpu.return		gpu.return
}		}
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Ensure `gpu.func` must be inside a `gpu.module`.ClosedPublic

Details

Diff Detail

Event Timeline