This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Stop allowing LLVMType Int arguments for GPULaunchFuncOp.
ClosedPublic

Authored by pifon2a on Sep 23 2020, 5:47 AM.

Download Raw Diff

Details

Reviewers

herhut
csigg
mehdi_amini

Commits

rG56ffb8d16979: [mlir] Stop allowing LLVMType Int arguments for GPULaunchFuncOp.

Summary

Conversion to LLVM becomes confusing and incorrect if someone tries to lower
STD -> LLVM and only then GPULaunchFuncOp to LLVM separately. Although it is
technically allowed now, it works incorrectly because of the argument
promotion. The correct way to use this conversion pattern is to add to the
STD->LLVM patterns before running the pass.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pifon2a created this revision.Sep 23 2020, 5:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2020, 5:47 AM

Herald added subscribers: tatianashp, msifontes, jurahul and 13 others. · View Herald Transcript

pifon2a requested review of this revision.Sep 23 2020, 5:47 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptSep 23 2020, 5:47 AM

Harbormaster completed remote builds in B72649: Diff 293707.Sep 23 2020, 6:03 AM

Removed todo

Harbormaster completed remote builds in B72667: Diff 293755.Sep 23 2020, 9:11 AM

mehdi_amini added inline comments.Sep 23 2020, 10:04 AM

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir
26	Seems to indicate that the verifier is fairly loose here with respect to the compatibility of the matching between the call site and callee arguments list?

pifon2a added inline comments.Sep 23 2020, 11:12 AM

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir
26	that's true. I don't think there is any verification. GPU module is lowered to a binary blob, which will become a global constant. Then GPU LaunchFuncOp will call the code in that blob. Probably, there is a way to verify arguments before we are in LLVM dialect.

mehdi_amini accepted this revision.Sep 23 2020, 12:02 PM

mehdi_amini added inline comments.

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir
26	Seems like we won't be able to with launch_func right now, it is a bit uncomfortably lose though. The region based one doesn't suffer from this though.

This revision is now accepted and ready to land.Sep 23 2020, 12:02 PM

Closed by commit rG56ffb8d16979: [mlir] Stop allowing LLVMType Int arguments for GPULaunchFuncOp. (authored by pifon2a). · Explain WhySep 24 2020, 2:16 AM

This revision was automatically updated to reflect the committed changes.

pifon2a added a commit: rG56ffb8d16979: [mlir] Stop allowing LLVMType Int arguments for GPULaunchFuncOp..

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

GPUOps.td

20 lines

test/

Conversion/

GPUCommon/

lower-launch-func-to-gpu-runtime-calls.mlir

60 lines

Diff 293707

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show All 12 Lines
#ifndef GPU_OPS		#ifndef GPU_OPS
#define GPU_OPS		#define GPU_OPS

include "mlir/Dialect/GPU/GPUBase.td"		include "mlir/Dialect/GPU/GPUBase.td"
include "mlir/Dialect/LLVMIR/LLVMOpBase.td"		include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
include "mlir/IR/SymbolInterfaces.td"		include "mlir/IR/SymbolInterfaces.td"
include "mlir/Interfaces/SideEffectInterfaces.td"		include "mlir/Interfaces/SideEffectInterfaces.td"

// Type constraint accepting standard integers, indices and wrapped LLVM integer		// Type constraint accepting standard integers, indices.
// types.		// TODO() : move to std?
def IntLikeOrLLVMInt : TypeConstraint<		def IntOrIndex : TypeConstraint<
Or<[AnySignlessInteger.predicate, Index.predicate,		Or<[AnySignlessInteger.predicate, Index.predicate]>, "integer or index">;
LLVM_AnyInteger.predicate]>,
"integer, index or LLVM dialect equivalent">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GPU Dialect operations.		// GPU Dialect operations.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class GPU_Op<string mnemonic, list<OpTrait> traits = []> :		class GPU_Op<string mnemonic, list<OpTrait> traits = []> :
Op<GPU_Dialect, mnemonic, traits>;		Op<GPU_Dialect, mnemonic, traits>;

▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	def GPU_GPUFuncOp : GPU_Op<"func", [HasParent<"GPUModuleOp">,
}];		}];

// let verifier = [{ return ::verifFuncOpy(*this); }];		// let verifier = [{ return ::verifFuncOpy(*this); }];
let printer = [{ printGPUFuncOp(p, *this); }];		let printer = [{ printGPUFuncOp(p, *this); }];
let parser = [{ return parseGPUFuncOp(parser, result); }];		let parser = [{ return parseGPUFuncOp(parser, result); }];
}		}

def GPU_LaunchFuncOp : GPU_Op<"launch_func">,		def GPU_LaunchFuncOp : GPU_Op<"launch_func">,
Arguments<(ins IntLikeOrLLVMInt:$gridSizeX, IntLikeOrLLVMInt:$gridSizeY,		Arguments<(ins IntOrIndex:$gridSizeX, IntOrIndex:$gridSizeY,
IntLikeOrLLVMInt:$gridSizeZ, IntLikeOrLLVMInt:$blockSizeX,		IntOrIndex:$gridSizeZ, IntOrIndex:$blockSizeX,
IntLikeOrLLVMInt:$blockSizeY, IntLikeOrLLVMInt:$blockSizeZ,		IntOrIndex:$blockSizeY, IntOrIndex:$blockSizeZ,
Variadic<AnyType>:$operands)>,		Variadic<AnyType>:$operands)>,
Results<(outs)> {		Results<(outs)> {
let summary = "Launches a function as a GPU kerneel";		let summary = "Launches a function as a GPU kerneel";

let description = [{		let description = [{
Launch a kernel function on the specified grid of thread blocks.		Launch a kernel function on the specified grid of thread blocks.
`gpu.launch` operations are lowered to `gpu.launch_func` operations by		`gpu.launch` operations are lowered to `gpu.launch_func` operations by
outlining the kernel body into a function in a dedicated module, which		outlining the kernel body into a function in a dedicated module, which
Show All 15 Lines	let description = [{

Example:		Example:

```mlir		```mlir
module attributes {gpu.container_module} {		module attributes {gpu.container_module} {

// This module creates a separate compilation unit for the GPU compiler.		// This module creates a separate compilation unit for the GPU compiler.
gpu.module @kernels {		gpu.module @kernels {
func @kernel_1(%arg0 : f32, %arg1 : !llvm<"float*">)		func @kernel_1(%arg0 : f32, %arg1 : memref<?xf32, 1>)
attributes { nvvm.kernel = true } {		attributes { nvvm.kernel = true } {

// Operations that produce block/thread IDs and dimensions are		// Operations that produce block/thread IDs and dimensions are
// injected when outlining the `gpu.launch` body to a function called		// injected when outlining the `gpu.launch` body to a function called
// by `gpu.launch_func`.		// by `gpu.launch_func`.
%tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index)		%tIdX = "gpu.thread_id"() {dimension = "x"} : () -> (index)
%tIdY = "gpu.thread_id"() {dimension = "y"} : () -> (index)		%tIdY = "gpu.thread_id"() {dimension = "y"} : () -> (index)
%tIdZ = "gpu.thread_id"() {dimension = "z"} : () -> (index)		%tIdZ = "gpu.thread_id"() {dimension = "z"} : () -> (index)
Show All 15 Lines	module attributes {gpu.container_module} {
}		}
}		}

"gpu.launch_func"(%cst, %cst, %cst, // Grid sizes.		"gpu.launch_func"(%cst, %cst, %cst, // Grid sizes.
%cst, %cst, %cst, // Block sizes.		%cst, %cst, %cst, // Block sizes.
%arg0, %arg1) // Arguments passed to the kernel.		%arg0, %arg1) // Arguments passed to the kernel.
{ kernel_module = @kernels, // Module containing the kernel.		{ kernel_module = @kernels, // Module containing the kernel.
kernel = "kernel_1" } // Kernel function.		kernel = "kernel_1" } // Kernel function.
: (index, index, index, index, index, index, f32, !llvm<"float*">)		: (index, index, index, index, index, index, f32, memref<?xf32, 1>)
-> ()		-> ()
}		}
```		```
}];		}];

let skipDefaultBuilders = 1;		let skipDefaultBuilders = 1;

let builders = [		let builders = [
▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir

	// RUN: mlir-opt -allow-unregistered-dialect %s --gpu-to-llvm="gpu-binary-annotation=nvvm.cubin" \| FileCheck %s			// RUN: mlir-opt -allow-unregistered-dialect %s --gpu-to-llvm="gpu-binary-annotation=nvvm.cubin" \| FileCheck %s
	// RUN: mlir-opt -allow-unregistered-dialect %s --gpu-to-llvm="gpu-binary-annotation=rocdl.hsaco" \| FileCheck %s --check-prefix=ROCDL			// RUN: mlir-opt -allow-unregistered-dialect %s --gpu-to-llvm="gpu-binary-annotation=rocdl.hsaco" \| FileCheck %s --check-prefix=ROCDL

	module attributes {gpu.container_module} {			module attributes {gpu.container_module} {

	// CHECK: llvm.mlir.global internal constant @[[kernel_name:.*]]("kernel\00")			// CHECK: llvm.mlir.global internal constant @[[KERNEL_NAME:.*]]("kernel\00")
	// CHECK: llvm.mlir.global internal constant @[[global:.*]]("CUBIN")			// CHECK: llvm.mlir.global internal constant @[[GLOBAL:.*]]("CUBIN")
	// ROCDL: llvm.mlir.global internal constant @[[global:.*]]("HSACO")			// ROCDL: llvm.mlir.global internal constant @[[GLOBAL:.*]]("HSACO")

	gpu.module @kernel_module attributes {nvvm.cubin = "CUBIN", rocdl.hsaco = "HSACO"} {			gpu.module @kernel_module attributes {
	llvm.func @kernel(%arg0: !llvm.float, %arg1: !llvm.ptr<float>) attributes {gpu.kernel} {			nvvm.cubin = "CUBIN", rocdl.hsaco = "HSACO"
				} {
				llvm.func @kernel(%arg0: !llvm.i32, %arg1: !llvm.ptr<float>,
				%arg2: !llvm.ptr<float>, %arg3: !llvm.i64, %arg4: !llvm.i64,
				%arg5: !llvm.i64) attributes {gpu.kernel} {
	llvm.return			llvm.return
	}			}
	}			}

	llvm.func @foo() {			func @foo(%buffer: memref<?xf32>) {
	%0 = "op"() : () -> (!llvm.float)			%c8 = constant 8 : index
	%1 = "op"() : () -> (!llvm.ptr<float>)			%c32 = constant 32 : i32
	%cst = llvm.mlir.constant(8 : index) : !llvm.i64			"gpu.launch_func"(%c8, %c8, %c8, %c8, %c8, %c8, %c32, %buffer) {
				kernel = @kernel_module::@kernel
	// CHECK: %[[addressof:.*]] = llvm.mlir.addressof @[[global]]			} : (index, index, index, index, index, index, i32, memref<?xf32>) -> ()
	// CHECK: %[[c0:.*]] = llvm.mlir.constant(0 : index)			return
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Seems to indicate that the verifier is fairly loose here with respect to the compatibility of the matching between the call site and callee arguments list? mehdi_amini: Seems to indicate that the verifier is fairly loose here with respect to the compatibility of…
				pifon2aAuthorUnsubmitted Done Reply Inline Actions that's true. I don't think there is any verification. GPU module is lowered to a binary blob, which will become a global constant. Then GPU LaunchFuncOp will call the code in that blob. Probably, there is a way to verify arguments before we are in LLVM dialect. pifon2a: that's true. I don't think there is any verification. GPU module is lowered to a binary blob…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Seems like we won't be able to with launch_func right now, it is a bit uncomfortably lose though. The region based one doesn't suffer from this though. mehdi_amini: Seems like we won't be able to with launch_func right now, it is a bit uncomfortably lose…
	// CHECK: %[[binary:.*]] = llvm.getelementptr %[[addressof]][%[[c0]], %[[c0]]]			}

				// CHECK: [[C8:%.*]] = llvm.mlir.constant(8 : index) : !llvm.i64
				// CHECK: [[ADDRESSOF:%.*]] = llvm.mlir.addressof @[[GLOBAL]]
				// CHECK: [[C0:%.*]] = llvm.mlir.constant(0 : index)
				// CHECK: [[BINARY:%.*]] = llvm.getelementptr [[ADDRESSOF]]{{\[}}[[C0]], [[C0]]]
	// CHECK-SAME: -> !llvm.ptr<i8>			// CHECK-SAME: -> !llvm.ptr<i8>
	// CHECK: %[[module:.*]] = llvm.call @mgpuModuleLoad(%[[binary]]) : (!llvm.ptr<i8>) -> !llvm.ptr<i8>
	// CHECK: %[[func:.]] = llvm.call @mgpuModuleGetFunction(%[[module]], {{.}}) : (!llvm.ptr<i8>, !llvm.ptr<i8>) -> !llvm.ptr<i8>
	// CHECK: llvm.call @mgpuStreamCreate
	// CHECK: llvm.call @mgpuLaunchKernel
	// CHECK: llvm.call @mgpuStreamSynchronize
	"gpu.launch_func"(%cst, %cst, %cst, %cst, %cst, %cst, %0, %1) { kernel = @kernel_module::@kernel }
	: (!llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.float, !llvm.ptr<float>) -> ()

	llvm.return			// CHECK: [[MODULE:%.*]] = llvm.call @mgpuModuleLoad([[BINARY]])
	}			// CHECK: [[FUNC:%.]] = llvm.call @mgpuModuleGetFunction([[MODULE]], {{.}})

				// CHECK: [[C0_I32:%.*]] = llvm.mlir.constant(0 : i32)
				// CHECK: [[STREAM:%.*]] = llvm.call @mgpuStreamCreate

				// CHECK: [[NUM_PARAMS:%.*]] = llvm.mlir.constant(6 : i32) : !llvm.i32
				// CHECK-NEXT: [[PARAMS:%.*]] = llvm.alloca [[NUM_PARAMS]] x !llvm.ptr<i8>

				// CHECK: [[EXTRA_PARAMS:%.*]] = llvm.mlir.null : !llvm.ptr<ptr<i8>>

				// CHECK: llvm.call @mgpuLaunchKernel([[FUNC]], [[C8]], [[C8]], [[C8]],
				// CHECK-SAME: [[C8]], [[C8]], [[C8]], [[C0_I32]], [[STREAM]],
				// CHECK-SAME: [[PARAMS]], [[EXTRA_PARAMS]])
				// CHECK: llvm.call @mgpuStreamSynchronize
	}			}