This is an archive of the discontinued LLVM Phabricator instance.

mlir/test/Dialect/GPU/promotion.mlir
55	You can remove all of the empty module wrappers, module is implicitly the top-level operation.
mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp
28	Can you just change the pipeline specification in the tests instead of changing this pass?
30	MLIR uses camelCase for variable names.

Nice!

mlir/test/Dialect/GPU/promotion.mlir
1	Maybe something like // RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='module(gpu.module(gpu.func(test-gpu-memory-promotion)))' -split-input-file %s \| FileCheck %s is enough to direct the test pass to the functions through the extra wrapping modules.
11	Why this change?
mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp
28	See my above comment for how to do this. You can then revert these changes,

This revision now requires changes to proceed.Apr 21 2020, 1:11 AM

Harbormaster failed remote builds in B54047: Diff 258911!Apr 21 2020, 2:07 AM

Simplify test case.

Herald added a subscriber: aartbik. · View Herald TranscriptApr 21 2020, 3:06 AM

frgossen added inline comments.Apr 21 2020, 3:10 AM

mlir/test/Dialect/GPU/promotion.mlir
11	Was not intended. Locally, I did not see this because all the indented lines appear as a change.
55	Didn't know that. Thanks!
mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp
28	That's much nicer. Thanks!

ftynse accepted this revision.Apr 21 2020, 3:53 AM

Harbormaster failed remote builds in B54071: Diff 258944!Apr 21 2020, 4:17 AM

Thanks!

This revision is now accepted and ready to land.Apr 21 2020, 4:43 AM

@ftynse , @herhut is this giving any added semantic advantage? I have been trying to see if there is a way to target gpu.func directly without going through gpu.launch + outlining. Having the requirement that a gpu.func always live in a gpu.module seems too restrictive. Just want to understand the advantage apart from the fact that all gpu.func need to live in a module "separate" from the host side.

This revision now requires changes to proceed.Apr 21 2020, 8:54 AM

In D78541#1994763, @mravishankar wrote:

@ftynse , @herhut is this giving any added semantic advantage? I have been trying to see if there is a way to target gpu.func directly without going through gpu.launch + outlining. Having the requirement that a gpu.func always live in a gpu.module seems too restrictive. Just want to understand the advantage apart from the fact that all gpu.func need to live in a module "separate" from the host side.

We always had a requirement that gpu.funcs live in a somehow special module. Originally, it was a module with a kernel_module attribute. This is necessary to pick out the modules that should be translated to device-specific dialects. Later, gpu.module was introduced, but the old condition with the attribute was not removed. This change basically completes the transition.

I do see some value in having a dedicated gpu.module to clearly separate what is intended for device and what is not. Your issue with generating a gpu.func without gpu.launch_func seems mostly orthogonal: this is a matter of writing the conversion, not of the rules the resulting IR must follow. If you have another conversion pass (or just IR building code) that creates a gpu.func, it can equally create a gpu.module first. Not having a gpu.module would put the semantics in a weird state when, for example, the same function is called from both gpu.func and from std.func or if there is some global state mutation.

@mravishankar, can I land this?

Herald added a subscriber: Kayjukh. · View Herald TranscriptApr 23 2020, 5:20 AM

Sorry for the delay. Let's land this as is. If I have a more concrete issue/suggestion on this aspect, I can revisit this.

This revision is now accepted and ready to land.Apr 23 2020, 7:16 AM

Closed by commit rG7e4b139a04d7: [MLIR] Ensure `gpu.func` must be inside a `gpu.module`. (authored by frgossen). · Explain WhyApr 24 2020, 12:29 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

GPUBase.td

3 lines

GPUOps.td

3 lines

test/

Dialect/

GPU/

4 lines

4 lines

21 lines

2 lines

219 lines

lib/

Transforms/

TestGpuMemoryPromotion.cpp

17 lines

Diff 258911

mlir/include/mlir/Dialect/GPU/GPUBase.td

Show All 27 Lines	let extraClassDeclaration = [{
/// kernel modules.		/// kernel modules.
static StringRef getContainerModuleAttrName() {		static StringRef getContainerModuleAttrName() {
return "gpu.container_module";		return "gpu.container_module";
}		}
/// Get the name of the attribute used to annotate external kernel		/// Get the name of the attribute used to annotate external kernel
/// functions.		/// functions.
static StringRef getKernelFuncAttrName() { return "gpu.kernel"; }		static StringRef getKernelFuncAttrName() { return "gpu.kernel"; }

/// Get the name of the attribute used to annotate kernel modules.
static StringRef getKernelModuleAttrName() { return "gpu.kernel_module"; }

/// Returns whether the given function is a kernel function, i.e., has the		/// Returns whether the given function is a kernel function, i.e., has the
/// 'gpu.kernel' attribute.		/// 'gpu.kernel' attribute.
static bool isKernel(Operation *op);		static bool isKernel(Operation *op);

/// Returns the number of workgroup (thread, block) dimensions supported in		/// Returns the number of workgroup (thread, block) dimensions supported in
/// the GPU dialect.		/// the GPU dialect.
// TODO(zinenko,herhut): consider generalizing this.		// TODO(zinenko,herhut): consider generalizing this.
static unsigned getNumWorkgroupDimensions() { return 3; }		static unsigned getNumWorkgroupDimensions() { return 3; }
Show All 12 Lines

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	let description = [{
Example:		Example:

```mlir		```mlir
%tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index)		%tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index)
```		```
}];		}];
}		}

def GPU_GPUFuncOp : GPU_Op<"func", [AutomaticAllocationScope, FunctionLike,		def GPU_GPUFuncOp : GPU_Op<"func", [HasParent<"GPUModuleOp">,
		AutomaticAllocationScope, FunctionLike,
IsolatedFromAbove, Symbol]> {		IsolatedFromAbove, Symbol]> {
let summary = "Function executable on a GPU";		let summary = "Function executable on a GPU";

let description = [{		let description = [{
Defines a function that can be executed on a GPU. This supports memory		Defines a function that can be executed on a GPU. This supports memory
attribution and its body has a particular execution model.		attribution and its body has a particular execution model.

GPU functions are either kernels (as indicated by the `kernel` attribute) or		GPU functions are either kernels (as indicated by the `kernel` attribute) or
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/all-reduce-max.mlir

	// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s			// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s

	// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py			// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py
	// CHECK: module @kernels attributes {gpu.kernel_module} {			// CHECK: gpu.module @kernels {
	module @kernels attributes {gpu.kernel_module} {			gpu.module @kernels {

	// CHECK-LABEL: gpu.func @kernel(			// CHECK-LABEL: gpu.func @kernel(
	// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {			// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {
	gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {			gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {
	// CHECK: [[VAL_2:%.*]] = constant 31 : i32			// CHECK: [[VAL_2:%.*]] = constant 31 : i32
	// CHECK: [[VAL_3:%.*]] = constant 0 : i32			// CHECK: [[VAL_3:%.*]] = constant 0 : i32
	// CHECK: [[VAL_4:%.*]] = constant 0 : index			// CHECK: [[VAL_4:%.*]] = constant 0 : index
	// CHECK: [[VAL_5:%.*]] = constant 32 : i32			// CHECK: [[VAL_5:%.*]] = constant 32 : i32
	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/all-reduce.mlir

	// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s			// RUN: mlir-opt -test-all-reduce-lowering %s \| FileCheck %s

	// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py			// NOTE: Assertions have been autogenerated by utils/generate-test-checks.py
	// CHECK: module @kernels attributes {gpu.kernel_module} {			// CHECK: gpu.module @kernels {
	module @kernels attributes {gpu.kernel_module} {			gpu.module @kernels {

	// CHECK-LABEL: gpu.func @kernel(			// CHECK-LABEL: gpu.func @kernel(
	// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {			// CHECK-SAME: [[VAL_0:%.]]: f32) workgroup([[VAL_1:%.]] : memref<32xf32, 3>) kernel {
	gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {			gpu.func @kernel(%arg0 : f32) attributes { gpu.kernel } {
	// CHECK: [[VAL_2:%.*]] = constant 31 : i32			// CHECK: [[VAL_2:%.*]] = constant 31 : i32
	// CHECK: [[VAL_3:%.*]] = constant 0 : i32			// CHECK: [[VAL_3:%.*]] = constant 0 : i32
	// CHECK: [[VAL_4:%.*]] = constant 0 : index			// CHECK: [[VAL_4:%.*]] = constant 0 : index
	// CHECK: [[VAL_5:%.*]] = constant 32 : i32			// CHECK: [[VAL_5:%.*]] = constant 32 : i32
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/invalid.mlir

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	func @launch_func_undefined_module(%sz : index) {
return		return
}		}
}		}

// -----		// -----

module attributes {gpu.container_module} {		module attributes {gpu.container_module} {
module @kernels {		module @kernels {
		// expected-error@+1 {{'gpu.func' op expects parent op 'gpu.module'}}
		gpu.func @kernel_1(%arg1 : !llvm<"float*">) {
		gpu.return
		}
		}
		}

		// -----

		module attributes {gpu.container_module} {
		module @kernels {
}		}

func @launch_func_missing_module_attribute(%sz : index) {		func @launch_func_missing_module_attribute(%sz : index) {
// expected-error@+1 {{kernel module 'kernels' is undefined}}		// expected-error@+1 {{kernel module 'kernels' is undefined}}
"gpu.launch_func"(%sz, %sz, %sz, %sz, %sz, %sz)		"gpu.launch_func"(%sz, %sz, %sz, %sz, %sz, %sz)
{ kernel = "kernel_1", kernel_module = @kernels }		{ kernel = "kernel_1", kernel_module = @kernels }
: (index, index, index, index, index, index) -> ()		: (index, index, index, index, index, index) -> ()
return		return
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	^bb0(%arg0: f32):
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{requires 'type' attribute of function type}}		// expected-error @+1 {{requires 'type' attribute of function type}}
"gpu.func"() ({		"gpu.func"() ({
gpu.return		gpu.return
}) {sym_name="kernel_1", type=f32} : () -> ()		}) {sym_name="kernel_1", type=f32} : () -> ()
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memref type in attribution}}		// expected-error @+1 {{expected memref type in attribution}}
gpu.func @kernel() workgroup(%0: i32) {		gpu.func @kernel() workgroup(%0: i32) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 3 in attribution}}		// expected-error @+1 {{expected memory space 3 in attribution}}
gpu.func @kernel() workgroup(%0: memref<4xf32>) {		gpu.func @kernel() workgroup(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 5 in attribution}}		// expected-error @+1 {{expected memory space 5 in attribution}}
gpu.func @kernel() private(%0: memref<4xf32>) {		gpu.func @kernel() private(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----

module {		module {
module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// expected-error @+1 {{expected memory space 5 in attribution}}		// expected-error @+1 {{expected memory space 5 in attribution}}
gpu.func @kernel() private(%0: memref<4xf32>) {		gpu.func @kernel() private(%0: memref<4xf32>) {
gpu.return		gpu.return
}		}
}		}
}		}

// -----		// -----
Show All 23 Lines

mlir/test/Dialect/GPU/ops.mlir

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	func @foo() {
// CHECK: "gpu.launch_func"(%{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}) {kernel = "kernel_2", kernel_module = @kernels} : (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()		// CHECK: "gpu.launch_func"(%{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}) {kernel = "kernel_2", kernel_module = @kernels} : (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()
"gpu.launch_func"(%cst, %cst, %cst, %cst, %cst, %cst, %0, %1)		"gpu.launch_func"(%cst, %cst, %cst, %cst, %cst, %cst, %0, %1)
{ kernel = "kernel_2", kernel_module = @kernels }		{ kernel = "kernel_2", kernel_module = @kernels }
: (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()		: (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> ()

return		return
}		}

module @gpu_funcs attributes {gpu.kernel_module} {		gpu.module @gpu_funcs {
// CHECK-LABEL: gpu.func @kernel_1({{.*}}: f32)		// CHECK-LABEL: gpu.func @kernel_1({{.*}}: f32)
// CHECK: workgroup		// CHECK: workgroup
// CHECK: private		// CHECK: private
// CHECK: attributes		// CHECK: attributes
gpu.func @kernel_1(%arg0: f32)		gpu.func @kernel_1(%arg0: f32)
workgroup(%arg1: memref<42xf32, 3>)		workgroup(%arg1: memref<42xf32, 3>)
private(%arg2: memref<2xf32, 5>, %arg3: memref<1xf32, 5>)		private(%arg2: memref<2xf32, 5>, %arg3: memref<1xf32, 5>)
kernel		kernel
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/promotion.mlir

	// RUN: mlir-opt -allow-unregistered-dialect -test-gpu-memory-promotion -split-input-file %s \| FileCheck %s			// RUN: mlir-opt -allow-unregistered-dialect -test-gpu-memory-promotion -split-input-file %s \| FileCheck %s
				herhutUnsubmitted Done Reply Inline Actions Maybe something like // RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='module(gpu.module(gpu.func(test-gpu-memory-promotion)))' -split-input-file %s \| FileCheck %s is enough to direct the test pass to the functions through the extra wrapping modules. herhut: Maybe something like ``` // RUN: mlir-opt -allow-unregistered-dialect -pass-pipeline='module…

	module @foo attributes {gpu.kernel_module} {			module {
				gpu.module @foo {

	// Verify that the attribution was indeed introduced			// Verify that the attribution was indeed introduced
	// CHECK-LABEL: @memref3d			// CHECK-LABEL: @memref3d
	// CHECK-SAME: (%[[arg:.*]]: memref<5x4xf32>			// CHECK-SAME: (%[[arg:.*]]: memref<5x4xf32>
	// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<5x4xf32, 3>)			// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<5x4xf32, 3>)
	gpu.func @memref3d(%arg0: memref<5x4xf32> {gpu.test_promote_workgroup}) kernel {			gpu.func @memref3d(%arg0: memref<5x4xf32> {gpu.test_promote_workgroup}) kernel {
	// Verify that loop bounds are emitted, the order does not matter.			// verify that loop bounds are emitted, the order does not matter.
				herhutUnsubmitted Done Reply Inline Actions Why this change? herhut: Why this change?
				frgossenAuthorUnsubmitted Done Reply Inline Actions Was not intended. Locally, I did not see this because all the indented lines appear as a change. frgossen: Was not intended. Locally, I did not see this because all the indented lines appear as a change.
	// CHECK-DAG: %[[c1:.*]] = constant 1			// CHECK-DAG: %[[c1:.*]] = constant 1
	// CHECK-DAG: %[[c4:.*]] = constant 4			// CHECK-DAG: %[[c4:.*]] = constant 4
	// CHECK-DAG: %[[c5:.*]] = constant 5			// CHECK-DAG: %[[c5:.*]] = constant 5
	// CHECK-DAG: %[[tx:.*]] = "gpu.thread_id"() {dimension = "x"}			// CHECK-DAG: %[[tx:.*]] = "gpu.thread_id"() {dimension = "x"}
	// CHECK-DAG: %[[ty:.*]] = "gpu.thread_id"() {dimension = "y"}			// CHECK-DAG: %[[ty:.*]] = "gpu.thread_id"() {dimension = "y"}
	// CHECK-DAG: %[[tz:.*]] = "gpu.thread_id"() {dimension = "z"}			// CHECK-DAG: %[[tz:.*]] = "gpu.thread_id"() {dimension = "z"}
	// CHECK-DAG: %[[bdx:.*]] = "gpu.block_dim"() {dimension = "x"}			// CHECK-DAG: %[[bdx:.*]] = "gpu.block_dim"() {dimension = "x"}
	// CHECK-DAG: %[[bdy:.*]] = "gpu.block_dim"() {dimension = "y"}			// CHECK-DAG: %[[bdy:.*]] = "gpu.block_dim"() {dimension = "y"}
	// CHECK-DAG: %[[bdz:.*]] = "gpu.block_dim"() {dimension = "z"}			// CHECK-DAG: %[[bdz:.*]] = "gpu.block_dim"() {dimension = "z"}

	// Verify that loops for the copy are emitted. We only check the number of			// Verify that loops for the copy are emitted. We only check the number of
	// loops here since their bounds are produced by mapLoopToProcessorIds,			// loops here since their bounds are produced by mapLoopToProcessorIds,
	// tested separately.			// tested separately.
	// CHECK: loop.for %[[i0:.*]] =			// CHECK: loop.for %[[i0:.*]] =
	// CHECK: loop.for %[[i1:.*]] =			// CHECK: loop.for %[[i1:.*]] =
	// CHECK: loop.for %[[i2:.*]] =			// CHECK: loop.for %[[i2:.*]] =

	// Verify that the copy is emitted and uses only the last two loops.			// Verify that the copy is emitted and uses only the last two loops.
	// CHECK: %[[v:.*]] = load %[[arg]][%[[i1]], %[[i2]]]			// CHECK: %[[v:.*]] = load %[[arg]][%[[i1]], %[[i2]]]
	// CHECK: store %[[v]], %[[promoted]][%[[i1]], %[[i2]]]			// CHECK: store %[[v]], %[[promoted]][%[[i1]], %[[i2]]]

	// Verify that the use has been rewritten.			// Verify that the use has been rewritten.
	// CHECK: "use"(%[[promoted]]) : (memref<5x4xf32, 3>)			// CHECK: "use"(%[[promoted]]) : (memref<5x4xf32, 3>)
	"use"(%arg0) : (memref<5x4xf32>) -> ()			"use"(%arg0) : (memref<5x4xf32>) -> ()


	// Verify that loops for the copy are emitted. We only check the number of			// Verify that loops for the copy are emitted. We only check the number of
	// loops here since their bounds are produced by mapLoopToProcessorIds,			// loops here since their bounds are produced by mapLoopToProcessorIds,
	// tested separately.			// tested separately.
	// CHECK: loop.for %[[i0:.*]] =			// CHECK: loop.for %[[i0:.*]] =
	// CHECK: loop.for %[[i1:.*]] =			// CHECK: loop.for %[[i1:.*]] =
	// CHECK: loop.for %[[i2:.*]] =			// CHECK: loop.for %[[i2:.*]] =

	// Verify that the copy is emitted and uses only the last two loops.			// Verify that the copy is emitted and uses only the last two loops.
	// CHECK: %[[v:.*]] = load %[[promoted]][%[[i1]], %[[i2]]]			// CHECK: %[[v:.*]] = load %[[promoted]][%[[i1]], %[[i2]]]
	// CHECK: store %[[v]], %[[arg]][%[[i1]], %[[i2]]]			// CHECK: store %[[v]], %[[arg]][%[[i1]], %[[i2]]]
	gpu.return			gpu.return
	}			}
	}			}
				}

	// -----			// -----

	module @foo attributes {gpu.kernel_module} {			module {
				rriddleUnsubmitted Done Reply Inline Actions You can remove all of the empty module wrappers, module is implicitly the top-level operation. rriddle: You can remove all of the empty module wrappers, module is implicitly the top-level operation.
				frgossenAuthorUnsubmitted Done Reply Inline Actions Didn't know that. Thanks! frgossen: Didn't know that. Thanks!
				gpu.module @foo {

	// Verify that the attribution was indeed introduced			// Verify that the attribution was indeed introduced
	// CHECK-LABEL: @memref5d			// CHECK-LABEL: @memref5d
	// CHECK-SAME: (%[[arg:.*]]: memref<8x7x6x5x4xf32>			// CHECK-SAME: (%[[arg:.*]]: memref<8x7x6x5x4xf32>
	// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<8x7x6x5x4xf32, 3>)			// CHECK-SAME: workgroup(%[[promoted:.*]] : memref<8x7x6x5x4xf32, 3>)
	gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {			gpu.func @memref5d(%arg0: memref<8x7x6x5x4xf32> {gpu.test_promote_workgroup}) kernel {
	// Verify that loop bounds are emitted, the order does not matter.			// Verify that loop bounds are emitted, the order does not matter.
	// CHECK-DAG: %[[c0:.*]] = constant 0			// CHECK-DAG: %[[c0:.*]] = constant 0
	// CHECK-DAG: %[[c1:.*]] = constant 1			// CHECK-DAG: %[[c1:.*]] = constant 1
	// CHECK-DAG: %[[c4:.*]] = constant 4			// CHECK-DAG: %[[c4:.*]] = constant 4
	// CHECK-DAG: %[[c5:.*]] = constant 5			// CHECK-DAG: %[[c5:.*]] = constant 5
	// CHECK-DAG: %[[c6:.*]] = constant 6			// CHECK-DAG: %[[c6:.*]] = constant 6
	// CHECK-DAG: %[[c7:.*]] = constant 7			// CHECK-DAG: %[[c7:.*]] = constant 7
	// CHECK-DAG: %[[c8:.*]] = constant 8			// CHECK-DAG: %[[c8:.*]] = constant 8
	// CHECK-DAG: %[[tx:.*]] = "gpu.thread_id"() {dimension = "x"}			// CHECK-DAG: %[[tx:.*]] = "gpu.thread_id"() {dimension = "x"}
	// CHECK-DAG: %[[ty:.*]] = "gpu.thread_id"() {dimension = "y"}			// CHECK-DAG: %[[ty:.*]] = "gpu.thread_id"() {dimension = "y"}
	// CHECK-DAG: %[[tz:.*]] = "gpu.thread_id"() {dimension = "z"}			// CHECK-DAG: %[[tz:.*]] = "gpu.thread_id"() {dimension = "z"}
	// CHECK-DAG: %[[bdx:.*]] = "gpu.block_dim"() {dimension = "x"}			// CHECK-DAG: %[[bdx:.*]] = "gpu.block_dim"() {dimension = "x"}
	// CHECK-DAG: %[[bdy:.*]] = "gpu.block_dim"() {dimension = "y"}			// CHECK-DAG: %[[bdy:.*]] = "gpu.block_dim"() {dimension = "y"}
	// CHECK-DAG: %[[bdz:.*]] = "gpu.block_dim"() {dimension = "z"}			// CHECK-DAG: %[[bdz:.*]] = "gpu.block_dim"() {dimension = "z"}

	// Verify that loops for the copy are emitted.			// Verify that loops for the copy are emitted.
	// CHECK: loop.for %[[i0:.*]] =			// CHECK: loop.for %[[i0:.*]] =
	// CHECK: loop.for %[[i1:.*]] =			// CHECK: loop.for %[[i1:.*]] =
	// CHECK: loop.for %[[i2:.*]] =			// CHECK: loop.for %[[i2:.*]] =
	// CHECK: loop.for %[[i3:.*]] =			// CHECK: loop.for %[[i3:.*]] =
	// CHECK: loop.for %[[i4:.*]] =			// CHECK: loop.for %[[i4:.*]] =

	// Verify that the copy is emitted.			// Verify that the copy is emitted.
	// CHECK: %[[v:.*]] = load %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]			// CHECK: %[[v:.*]] = load %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
	// CHECK: store %[[v]], %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]			// CHECK: store %[[v]], %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]

	// Verify that the use has been rewritten.			// Verify that the use has been rewritten.
	// CHECK: "use"(%[[promoted]]) : (memref<8x7x6x5x4xf32, 3>)			// CHECK: "use"(%[[promoted]]) : (memref<8x7x6x5x4xf32, 3>)
	"use"(%arg0) : (memref<8x7x6x5x4xf32>) -> ()			"use"(%arg0) : (memref<8x7x6x5x4xf32>) -> ()

	// Verify that loop loops for the copy are emitted.			// Verify that loop loops for the copy are emitted.
	// CHECK: loop.for %[[i0:.*]] =			// CHECK: loop.for %[[i0:.*]] =
	// CHECK: loop.for %[[i1:.*]] =			// CHECK: loop.for %[[i1:.*]] =
	// CHECK: loop.for %[[i2:.*]] =			// CHECK: loop.for %[[i2:.*]] =
	// CHECK: loop.for %[[i3:.*]] =			// CHECK: loop.for %[[i3:.*]] =
	// CHECK: loop.for %[[i4:.*]] =			// CHECK: loop.for %[[i4:.*]] =

	// Verify that the copy is emitted.			// Verify that the copy is emitted.
	// CHECK: %[[v:.*]] = load %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]			// CHECK: %[[v:.*]] = load %[[promoted]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
	// CHECK: store %[[v]], %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]			// CHECK: store %[[v]], %[[arg]][%[[i0]], %[[i1]], %[[i2]], %[[i3]], %[[i4]]]
	gpu.return			gpu.return
	}			}
	}			}
				}

	// -----			// -----

	module @foo attributes {gpu.kernel_module} {			module {
				gpu.module @foo {

	// Check that attribution insertion works fine.			// Check that attribution insertion works fine.
	// CHECK-LABEL: @insert			// CHECK-LABEL: @insert
	// CHECK-SAME: (%{{.*}}: memref<4xf32>			// CHECK-SAME: (%{{.*}}: memref<4xf32>
	// CHECK-SAME: workgroup(%{{.*}}: memref<1x1xf64, 3>			// CHECK-SAME: workgroup(%{{.*}}: memref<1x1xf64, 3>
	// CHECK-SAME: %[[wg2:.*]] : memref<4xf32, 3>)			// CHECK-SAME: %[[wg2:.*]] : memref<4xf32, 3>)
	// CHECK-SAME: private(%{{.*}}: memref<1x1xi64, 5>)			// CHECK-SAME: private(%{{.*}}: memref<1x1xi64, 5>)
	gpu.func @insert(%arg0: memref<4xf32> {gpu.test_promote_workgroup})			gpu.func @insert(%arg0: memref<4xf32> {gpu.test_promote_workgroup})
	workgroup(%arg1: memref<1x1xf64, 3>)			workgroup(%arg1: memref<1x1xf64, 3>)
	private(%arg2: memref<1x1xi64, 5>)			private(%arg2: memref<1x1xi64, 5>)
	kernel {			kernel {
	// CHECK: "use"(%[[wg2]])			// CHECK: "use"(%[[wg2]])
	"use"(%arg0) : (memref<4xf32>) -> ()			"use"(%arg0) : (memref<4xf32>) -> ()
	gpu.return			gpu.return
	}			}
	}			}
				}

mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp

	Show All 19 Lines

	namespace {			namespace {
	/// Simple pass for testing the promotion to workgroup memory in GPU functions.			/// Simple pass for testing the promotion to workgroup memory in GPU functions.
	/// Promotes all arguments with "gpu.test_promote_workgroup" attribute. This			/// Promotes all arguments with "gpu.test_promote_workgroup" attribute. This
	/// does not check whether the promotion is legal (e.g., amount of memory used)			/// does not check whether the promotion is legal (e.g., amount of memory used)
	/// or beneficial (e.g., makes previously uncoalesced loads coalesced).			/// or beneficial (e.g., makes previously uncoalesced loads coalesced).
	class TestGpuMemoryPromotionPass			class TestGpuMemoryPromotionPass
	: public PassWrapper<TestGpuMemoryPromotionPass,			: public PassWrapper<TestGpuMemoryPromotionPass,
	OperationPass<gpu::GPUFuncOp>> {			OperationPass<gpu::GPUModuleOp>> {
				rriddleUnsubmitted Done Reply Inline Actions Can you just change the pipeline specification in the tests instead of changing this pass? rriddle: Can you just change the pipeline specification in the tests instead of changing this pass?
				herhutUnsubmitted Done Reply Inline Actions See my above comment for how to do this. You can then revert these changes, herhut: See my above comment for how to do this. You can then revert these changes,
				frgossenAuthorUnsubmitted Done Reply Inline Actions That's much nicer. Thanks! frgossen: That's much nicer. Thanks!
	void runOnOperation() override {			void runOnOperation() override {
	gpu::GPUFuncOp op = getOperation();			gpu::GPUModuleOp gpu_module = getOperation();
				rriddleUnsubmitted Done Reply Inline Actions MLIR uses camelCase for variable names. rriddle: MLIR uses camelCase for variable names.
				gpu_module.walk([](Operation *operation) {
				gpu::GPUFuncOp op = llvm::dyn_cast<gpu::GPUFuncOp>(operation);
				if (op) {
	for (unsigned i = 0, e = op.getNumArguments(); i < e; ++i) {			for (unsigned i = 0, e = op.getNumArguments(); i < e; ++i) {
	if (op.getArgAttrOfType<UnitAttr>(i, "gpu.test_promote_workgroup"))			if (op.getArgAttrOfType<UnitAttr>(i, "gpu.test_promote_workgroup"))
	promoteToWorkgroupMemory(op, i);			promoteToWorkgroupMemory(op, i);
	}			}
	}			}
				});
				}
	};			};
	} // end namespace			} // end namespace

	namespace mlir {			namespace mlir {
	void registerTestGpuMemoryPromotionPass() {			void registerTestGpuMemoryPromotionPass() {
	PassRegistration<TestGpuMemoryPromotionPass>(			PassRegistration<TestGpuMemoryPromotionPass>(
	"test-gpu-memory-promotion",			"test-gpu-memory-promotion",
	"Promotes the annotated arguments of gpu.func to workgroup memory.");			"Promotes the annotated arguments of gpu.func to workgroup memory.");
	}			}
	} // namespace mlir			} // namespace mlir

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Ensure `gpu.func` must be inside a `gpu.module`.ClosedPublic

Details

Diff Detail

Event Timeline