This is an archive of the discontinued LLVM Phabricator instance.

Do we actually know that the number of blocks/threads is bounded above by 2^31? For one thing, the HIP (and presumably CUDA) launch APIs take uint32_t and so the bound could be 2^32
Could you find a way to propagate these into (outlined) kernels, so that we get bounds on gpu.block_id and friends?

krzysz00 added inline comments.Jul 2 2022, 7:14 AM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
695 ↗	(On Diff #441873)	The ranges already specify a <= %v <= b, so there should be no need to subtract 1?

In D129036#3626697, @krzysz00 wrote:

Do we actually know that the number of blocks/threads is bounded above by 2^31? For one thing, the HIP (and presumably CUDA) launch APIs take uint32_t and so the bound could be 2^32

CUDA gridDim.x's limit is 2^31-1, everything is well below.
For ROCm, they are 2^16-1 and below.

Could you find a way to propagate these into (outlined) kernels, so that we get bounds on gpu.block_id and friends?

It should be possible to infer gpu.block/thread_dim/id int ranges, but it would require combining ranges of the gpu.launch_func ops referencing the parent kernel.
IIUC, this would require building a symbol table for every op's inferResultRanges() call. Compared to that, inferring gpu.launch region arg int ranges is much simpler.

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
695 ↗	(On Diff #441873)	threadIdx is in [0, blockDim-1] (inclusive), blockIdx is in [0, gridDim-1] (inclusive).

Thanks for digging up the sources on those bounds!

Would you be willing to go ahead and give the gpu.*_{dim,id} ops conservative bounds of {[1, 2^31], [0, 2^31 - 1]} in a future revision? (Or I can do it)

Could you add a test before we land this? It looks fine to me, but I'd like to make sure it doesn't break in the future

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
695 ↗	(On Diff #441873)	Ok, right, now I see what's going on here. That is, if I set a dimension to, say, 256, I'll get the dimension bounds being [256, 256] and the index bounds being [0, 255] like I'd expect.

Add kernel ops bounds and test.

Herald added a subscriber: mgorny. · View Herald TranscriptJul 3 2022, 11:12 PM

In D129036#3626734, @krzysz00 wrote:

Thanks for digging up the sources on those bounds!

Except they were not conservative enough. ;-)
The Vulkan device registry is the better source of what values are out in the wild, e.g.
https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupCount[0]
https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupSize[0]
https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupInvocations
https://vulkan.gpuinfo.org/displaycoreproperty.php?name=subgroupProperties.subgroupSize
I adjusted the values accordingly.

Would you be willing to go ahead and give the gpu.*_{dim,id} ops conservative bounds of {[1, 2^31], [0, 2^31 - 1]} in a future revision? (Or I can do it)

Done with the conservative bounds for now.

As a follow-up, how about adding an attribute next to gpu.kernel that encodes the launch bounds?
They can be used to provide ranges to gpu.block_id etc and can be lowered to e.g.:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives

If SCCP supports function calls (not sure if it does), it should be possible to also derive the launch bounds from all gpu.launch_func during analysis.

Could you add a test before we land this? It looks fine to me, but I'd like to make sure it doesn't break in the future

Oops, sorry. Test added.

Harbormaster completed remote builds in B173494: Diff 442013.Jul 3 2022, 11:33 PM

In D129036#3627632, @csigg wrote:

As a follow-up, how about adding an attribute next to gpu.kernel that encodes the launch bounds?
They can be used to provide ranges to gpu.block_id etc and can be lowered to e.g.:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives

This sounds like a reasonable approach to me. Outlining seems the ideal place to add these. We could also add a verifier that ensures that the annotations coincide with the call-sites. For example, we could have every gpu.launch_func check whether its parameters fit the annotation.

If SCCP supports function calls (not sure if it does), it should be possible to also derive the launch bounds from all gpu.launch_func during analysis.

I see kernel outlining as a relatively late step in processing and would expect that there is no further optimization after it. After all, the inline representation of gpu.launch is meant for optimization. The only use I could see is if we want to reuse kernels but then the optimization that attaches kernels to multiple call-sites could take care of fixing up annotations. This would only be legal if the new annotations are a superset, anyway, as we might have used the static knowledge in optimization earlier.

I agree that getting anything more than very conservative bounds in an outlined kernel will be tricky and so it might not be worth doing (right now).

One note: that bound of 128 on the workitem ID is definitely wrong: from what I can tell, those Vulkan numbers are counted in units of warps/waves/..., not actual workitems. See, for example, https://www.llvm.org/docs/AMDGPUUsage.html#llvm-ir-attributes (where the default upper bound on workgroup size is 1024) and that, downstream, we routinely use workgroups with 256 items in them without issue.

I'd also *prefer* that we stick with what the API has told us the constraints on maximum workgroup/workitem dimensions are so we don't get suddenly bit by optimizations breaking on future hardware.

This revision now requires changes to proceed.Jul 4 2022, 7:57 AM

In D129036#3628463, @krzysz00 wrote:

I'd also *prefer* that we stick with what the API has told us the constraints on maximum workgroup/workitem dimensions are so we don't get suddenly bit by optimizations breaking on future hardware.

Yes, but we can't really query the target hardware in the compiler. At most, it could be a pass option. My feeling is that the current conservative bounds are large enough that this won't break in the foreseeable future: Kernel dispatch (cu/hipLaunchKernel, vkCmdDispatch) use 32-bit parameters for block/grid dimensions.

One note: that bound of 128 on the workitem ID is definitely wrong: from what I can tell, those Vulkan numbers are counted in units of warps/waves/..., not actual workitems. See, for example, https://www.llvm.org/docs/AMDGPUUsage.html#llvm-ir-attributes (where the default upper bound on workgroup size is 1024) and that, downstream, we routinely use workgroups with 256 items in them without issue.

128 is the limit for subgroup (Vulkan-speak for warps/wavefronts), not workgroup. I was quite surprised to see that Turnip Adreno seems to use a subgroup size of 128, and I don't think there is much value in going higher. But if you prefer, we could set this to uint32_max as well.

Ah, I missed that the one with 128 was laneId, not any of the other ones - my bad for not reading carefully.

Also, wrt the value of kMaxDim ... didn't we establish that std::numeric_limits<int32_t>::max would be enough, or did the Vulkan info contradict that?

In D129036#3628580, @krzysz00 wrote:

Also, wrt the value of kMaxDim ... didn't we establish that std::numeric_limits<int32_t>::max would be enough, or did the Vulkan info contradict that?

Mali GPUs report 2^32-1 for max grid dim:
https://vulkan.gpuinfo.org/listreports.php?limit=maxComputeWorkGroupCount[0]

I doubt that the values for max block dim above Intel's 2240 limit are correct, but I didn't want to do too much interpretation:
https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxComputeWorkGroupSize[0]

Thanks for doing all the research on this!

LGTM, approved

This revision is now accepted and ready to land.Jul 4 2022, 10:37 AM

This infers block/grid dimensions/indices from launch operands.

Infers block/grid dimensions/indices or ranges of such dimensions/indices?

csigg edited the summary of this revision. (Show Details)Jul 4 2022, 10:13 PM

This revision was landed with ongoing or failed builds.Jul 4 2022, 10:15 PM

Closed by commit rG3e01af093f92: [mlir] Add InferIntRangeInterface to gpu.launch (authored by csigg). · Explain Why

This revision was automatically updated to reflect the committed changes.

csigg added a commit: rG3e01af093f92: [mlir] Add InferIntRangeInterface to gpu.launch.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUDialect.h

1 line

GPUOps.td

21 lines

lib/

Dialect/

GPU/

CMakeLists.txt

2 lines

IR/

InferIntRangeInterfaceImpls.cpp

97 lines

test/

Dialect/

GPU/

int-range-interface.mlir

128 lines

utils/

bazel/

llvm-project-overlay/

mlir/

BUILD.bazel

2 lines

Diff 442176

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

	Show All 16 Lines
	#include "mlir/Dialect/DLTI/Traits.h"			#include "mlir/Dialect/DLTI/Traits.h"
	#include "mlir/IR/Builders.h"			#include "mlir/IR/Builders.h"
	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/FunctionInterfaces.h"			#include "mlir/IR/FunctionInterfaces.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
	#include "mlir/IR/OpImplementation.h"			#include "mlir/IR/OpImplementation.h"
	#include "mlir/IR/SymbolTable.h"			#include "mlir/IR/SymbolTable.h"
				#include "mlir/Interfaces/InferIntRangeInterface.h"
	#include "mlir/Interfaces/InferTypeOpInterface.h"			#include "mlir/Interfaces/InferTypeOpInterface.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"

	namespace mlir {			namespace mlir {
	namespace gpu {			namespace gpu {

	/// Utility class for the GPU dialect to represent triples of `Value`s			/// Utility class for the GPU dialect to represent triples of `Value`s
	/// accessible through `.x`, `.y`, and `.z` similarly to CUDA notation.			/// accessible through `.x`, `.y`, and `.z` similarly to CUDA notation.
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show All 14 Lines

include "mlir/Dialect/DLTI/DLTIBase.td"		include "mlir/Dialect/DLTI/DLTIBase.td"
include "mlir/Dialect/GPU/IR/GPUBase.td"		include "mlir/Dialect/GPU/IR/GPUBase.td"
include "mlir/Dialect/GPU/IR/ParallelLoopMapperAttr.td"		include "mlir/Dialect/GPU/IR/ParallelLoopMapperAttr.td"
include "mlir/IR/EnumAttr.td"		include "mlir/IR/EnumAttr.td"
include "mlir/IR/FunctionInterfaces.td"		include "mlir/IR/FunctionInterfaces.td"
include "mlir/IR/SymbolInterfaces.td"		include "mlir/IR/SymbolInterfaces.td"
include "mlir/Interfaces/DataLayoutInterfaces.td"		include "mlir/Interfaces/DataLayoutInterfaces.td"
		include "mlir/Interfaces/InferIntRangeInterface.td"
include "mlir/Interfaces/InferTypeOpInterface.td"		include "mlir/Interfaces/InferTypeOpInterface.td"
include "mlir/Interfaces/SideEffectInterfaces.td"		include "mlir/Interfaces/SideEffectInterfaces.td"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GPU Dialect operations.		// GPU Dialect operations.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class GPU_Op<string mnemonic, list<Trait> traits = []> :		class GPU_Op<string mnemonic, list<Trait> traits = []> :
Op<GPU_Dialect, mnemonic, traits>;		Op<GPU_Dialect, mnemonic, traits>;

def GPU_Dimension : I32EnumAttr<"Dimension",		def GPU_Dimension : I32EnumAttr<"Dimension",
"a dimension, either 'x', 'y', or 'z'",		"a dimension, either 'x', 'y', or 'z'",
[		[
I32EnumAttrCase<"x", 0>,		I32EnumAttrCase<"x", 0>,
I32EnumAttrCase<"y", 1>,		I32EnumAttrCase<"y", 1>,
I32EnumAttrCase<"z", 2>		I32EnumAttrCase<"z", 2>
]>{		]>{
let genSpecializedAttr = 0;		let genSpecializedAttr = 0;
let cppNamespace = "::mlir::gpu";		let cppNamespace = "::mlir::gpu";
}		}
def GPU_DimensionAttr : EnumAttr<GPU_Dialect, GPU_Dimension, "dim">;		def GPU_DimensionAttr : EnumAttr<GPU_Dialect, GPU_Dimension, "dim">;

class GPU_IndexOp<string mnemonic, list<Trait> traits = []> :		class GPU_IndexOp<string mnemonic, list<Trait> traits = []> :
GPU_Op<mnemonic, !listconcat(traits, [NoSideEffect])>,		GPU_Op<mnemonic, !listconcat(traits, [
		NoSideEffect, DeclareOpInterfaceMethods<InferIntRangeInterface>])>,
Arguments<(ins GPU_DimensionAttr:$dimension)>, Results<(outs Index)> {		Arguments<(ins GPU_DimensionAttr:$dimension)>, Results<(outs Index)> {
let assemblyFormat = "$dimension attr-dict";		let assemblyFormat = "$dimension attr-dict";
}		}

def GPU_BlockDimOp : GPU_IndexOp<"block_dim"> {		def GPU_BlockDimOp : GPU_IndexOp<"block_dim"> {
let description = [{		let description = [{
Returns the number of threads in the thread block (aka the block size) along		Returns the number of threads in the thread block (aka the block size) along
the x, y, or z `dimension`.		the x, y, or z `dimension`.
Show All 37 Lines	let description = [{
Example:		Example:

```mlir		```mlir
%tIdX = gpu.thread_id x		%tIdX = gpu.thread_id x
```		```
}];		}];
}		}

def GPU_LaneIdOp : GPU_Op<"lane_id", [NoSideEffect]> {		def GPU_LaneIdOp : GPU_Op<"lane_id", [
		NoSideEffect, DeclareOpInterfaceMethods<InferIntRangeInterface>]> {
let description = [{		let description = [{
Returns the lane id within the subgroup (warp/wave).		Returns the lane id within the subgroup (warp/wave).

Example:		Example:
```mlir		```mlir
%laneId = gpu.lane_id		%laneId = gpu.lane_id
```		```
}];		}];
let results = (outs Index:$result);		let results = (outs Index:$result);
let assemblyFormat = "attr-dict";		let assemblyFormat = "attr-dict";
}		}

def GPU_SubgroupIdOp : GPU_Op<"subgroup_id", [NoSideEffect]>,		def GPU_SubgroupIdOp : GPU_Op<"subgroup_id", [
		NoSideEffect, DeclareOpInterfaceMethods<InferIntRangeInterface>]>,
Arguments<(ins)>, Results<(outs Index:$result)> {		Arguments<(ins)>, Results<(outs Index:$result)> {
let description = [{		let description = [{
Returns the subgroup id, i.e. the index of the current subgroup within the		Returns the subgroup id, i.e. the index of the current subgroup within the
workgroup.		workgroup.

Example:		Example:

```mlir		```mlir
Show All 14 Lines	let description = [{

```mlir		```mlir
%gidX = gpu.global_id x		%gidX = gpu.global_id x
```		```
}];		}];
}		}


def GPU_NumSubgroupsOp : GPU_Op<"num_subgroups", [NoSideEffect]>,		def GPU_NumSubgroupsOp : GPU_Op<"num_subgroups", [
		NoSideEffect, DeclareOpInterfaceMethods<InferIntRangeInterface>]>,
Arguments<(ins)>, Results<(outs Index:$result)> {		Arguments<(ins)>, Results<(outs Index:$result)> {
let description = [{		let description = [{
Returns the number of subgroups within a workgroup.		Returns the number of subgroups within a workgroup.

Example:		Example:

```mlir		```mlir
%numSg = gpu.num_subgroups : index		%numSg = gpu.num_subgroups : index
```		```
}];		}];

let assemblyFormat = "attr-dict `:` type($result)";		let assemblyFormat = "attr-dict `:` type($result)";
}		}

def GPU_SubgroupSizeOp : GPU_Op<"subgroup_size", [NoSideEffect]>,		def GPU_SubgroupSizeOp : GPU_Op<"subgroup_size", [
		NoSideEffect, DeclareOpInterfaceMethods<InferIntRangeInterface>]>,
Arguments<(ins)>, Results<(outs Index:$result)> {		Arguments<(ins)>, Results<(outs Index:$result)> {
let description = [{		let description = [{
Returns the number of threads within a subgroup.		Returns the number of threads within a subgroup.

Example:		Example:

```mlir		```mlir
%sgSz = gpu.subgroup_size : index		%sgSz = gpu.subgroup_size : index
▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	let assemblyFormat = [{
`blocks` `in` ` ` `(`$gridSizeX`,` $gridSizeY`,` $gridSizeZ`)`		`blocks` `in` ` ` `(`$gridSizeX`,` $gridSizeY`,` $gridSizeZ`)`
`threads` `in` ` ` `(`$blockSizeX`,` $blockSizeY`,` $blockSizeZ`)`		`threads` `in` ` ` `(`$blockSizeX`,` $blockSizeY`,` $blockSizeZ`)`
(`dynamic_shared_memory_size` $dynamicSharedMemorySize^)?		(`dynamic_shared_memory_size` $dynamicSharedMemorySize^)?
custom<LaunchFuncOperands>($operands, type($operands)) attr-dict		custom<LaunchFuncOperands>($operands, type($operands)) attr-dict
}];		}];
let hasVerifier = 1;		let hasVerifier = 1;
}		}

def GPU_LaunchOp : GPU_Op<"launch",		def GPU_LaunchOp : GPU_Op<"launch", [
[AutomaticAllocationScope, AttrSizedOperandSegments, GPU_AsyncOpInterface]>,		AutomaticAllocationScope, AttrSizedOperandSegments, GPU_AsyncOpInterface,
		DeclareOpInterfaceMethods<InferIntRangeInterface>]>,
Arguments<(ins Variadic<GPU_AsyncToken>:$asyncDependencies,		Arguments<(ins Variadic<GPU_AsyncToken>:$asyncDependencies,
Index:$gridSizeX, Index:$gridSizeY, Index:$gridSizeZ,		Index:$gridSizeX, Index:$gridSizeY, Index:$gridSizeZ,
Index:$blockSizeX, Index:$blockSizeY, Index:$blockSizeZ,		Index:$blockSizeX, Index:$blockSizeY, Index:$blockSizeZ,
Optional<I32>:$dynamicSharedMemorySize)>,		Optional<I32>:$dynamicSharedMemorySize)>,
Results<(outs Optional<GPU_AsyncToken>:$asyncToken)> {		Results<(outs Optional<GPU_AsyncToken>:$asyncToken)> {
let summary = "GPU kernel launch operation";		let summary = "GPU kernel launch operation";

let description = [{		let description = [{
▲ Show 20 Lines • Show All 804 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/CMakeLists.txt

Show All 15 Lines	set(AMDGPU_LIBS
AMDGPUDesc		AMDGPUDesc
AMDGPUInfo		AMDGPUInfo
target		target
)		)
endif()		endif()

add_mlir_dialect_library(MLIRGPUOps		add_mlir_dialect_library(MLIRGPUOps
IR/GPUDialect.cpp		IR/GPUDialect.cpp
		IR/InferIntRangeInterfaceImpls.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU		${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU

DEPENDS		DEPENDS
MLIRGPUOpsIncGen		MLIRGPUOpsIncGen
MLIRGPUOpsAttributesIncGen		MLIRGPUOpsAttributesIncGen
MLIRGPUOpsEnumsGen		MLIRGPUOpsEnumsGen
MLIRGPUOpInterfacesIncGen		MLIRGPUOpInterfacesIncGen

LINK_LIBS PUBLIC		LINK_LIBS PUBLIC
MLIRArithmeticDialect		MLIRArithmeticDialect
MLIRDLTIDialect		MLIRDLTIDialect
		MLIRInferIntRangeInterface
MLIRIR		MLIRIR
MLIRMemRefDialect		MLIRMemRefDialect
MLIRSideEffectInterfaces		MLIRSideEffectInterfaces
MLIRSupport		MLIRSupport
)		)

add_mlir_dialect_library(MLIRGPUTransforms		add_mlir_dialect_library(MLIRGPUTransforms
Transforms/AllReduceLowering.cpp		Transforms/AllReduceLowering.cpp
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp

This file was added.

				//===- InferIntRangeInterfaceImpls.cpp - Integer range impls for gpu -===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/GPU/IR/GPUDialect.h"
				#include "mlir/Interfaces/InferIntRangeInterface.h"

				using namespace mlir;
				using namespace mlir::gpu;

				// Maximum grid and block dimensions of all known GPUs are less than 2^32.
				static constexpr uint64_t kMaxDim = std::numeric_limits<uint32_t>::max();
				// Maximum subgroups are no larger than 128.
				static constexpr uint64_t kMaxSubgroupSize = 128;

				static ConstantIntRanges getIndexRange(uint64_t umin, uint64_t umax) {
				unsigned width = IndexType::kInternalStorageBitWidth;
				return ConstantIntRanges::fromUnsigned(APInt(width, umin),
				APInt(width, umax));
				}

				void BlockDimOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(1, kMaxDim));
				}

				void BlockIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(0, kMaxDim - 1));
				}

				void GridDimOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(1, kMaxDim));
				}

				void ThreadIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(0, kMaxDim - 1));
				}

				void LaneIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(0, kMaxSubgroupSize - 1));
				}

				void SubgroupIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(0, kMaxDim - 1));
				}

				void GlobalIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(),
				getIndexRange(0, std::numeric_limits<int64_t>::max()));
				}

				void NumSubgroupsOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(1, kMaxDim));
				}

				void SubgroupSizeOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
				SetIntRangeFn setResultRange) {
				setResultRange(getResult(), getIndexRange(1, kMaxSubgroupSize));
				}

				void LaunchOp::inferResultRanges(ArrayRef<ConstantIntRanges> argRanges,
				SetIntRangeFn setResultRange) {
				auto setRange = [&](ConstantIntRanges argRange, Value dimResult,
				Value idxResult) {
				if (argRange.umin().getBitWidth() != IndexType::kInternalStorageBitWidth)
				return;
				ConstantIntRanges dimRange =
				argRange.intersection(getIndexRange(1, kMaxDim));
				setResultRange(dimResult, dimRange);
				ConstantIntRanges idxRange =
				getIndexRange(0, dimRange.umax().getZExtValue() - 1);
				setResultRange(idxResult, idxRange);
				};

				argRanges = argRanges.drop_front(asyncDependencies().size());
				KernelDim3 gridDims = getGridSize();
				KernelDim3 blockIds = getBlockIds();
				setRange(argRanges[0], gridDims.x, blockIds.x);
				setRange(argRanges[1], gridDims.y, blockIds.y);
				setRange(argRanges[2], gridDims.z, blockIds.z);
				KernelDim3 blockDims = getBlockSize();
				KernelDim3 threadIds = getThreadIds();
				setRange(argRanges[3], blockDims.x, threadIds.x);
				setRange(argRanges[4], blockDims.y, threadIds.y);
				setRange(argRanges[5], blockDims.z, threadIds.z);
				}

mlir/test/Dialect/GPU/int-range-interface.mlir

This file was added.

				// RUN: mlir-opt -test-int-range-inference %s \| FileCheck %s

				// CHECK-LABEL: func @launch_func
				func.func @launch_func(%arg0 : index) {
				%0 = test.with_bounds {
				umin = 3 : index, umax = 5 : index,
				smin = 3 : index, smax = 5 : index
				}
				%1 = test.with_bounds {
				umin = 7 : index, umax = 11 : index,
				smin = 7 : index, smax = 11 : index
				}
				gpu.launch blocks(%block_id_x, %block_id_y, %block_id_z) in (%grid_dim_x = %0, %grid_dim_y = %1, %grid_dim_z = %arg0)
				threads(%thread_id_x, %thread_id_y, %thread_id_z) in (%block_dim_x = %arg0, %block_dim_y = %0, %block_dim_z = %1) {

				// CHECK: test.reflect_bounds {smax = 5 : index, smin = 3 : index, umax = 5 : index, umin = 3 : index}
				// CHECK: test.reflect_bounds {smax = 11 : index, smin = 7 : index, umax = 11 : index, umin = 7 : index}
				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				%grid_dim_x0 = test.reflect_bounds %grid_dim_x
				%grid_dim_y0 = test.reflect_bounds %grid_dim_y
				%grid_dim_z0 = test.reflect_bounds %grid_dim_z

				// CHECK: test.reflect_bounds {smax = 4 : index, smin = 0 : index, umax = 4 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 10 : index, smin = 0 : index, umax = 10 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				%block_id_x0 = test.reflect_bounds %block_id_x
				%block_id_y0 = test.reflect_bounds %block_id_y
				%block_id_z0 = test.reflect_bounds %block_id_z

				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				// CHECK: test.reflect_bounds {smax = 5 : index, smin = 3 : index, umax = 5 : index, umin = 3 : index}
				// CHECK: test.reflect_bounds {smax = 11 : index, smin = 7 : index, umax = 11 : index, umin = 7 : index}
				%block_dim_x0 = test.reflect_bounds %block_dim_x
				%block_dim_y0 = test.reflect_bounds %block_dim_y
				%block_dim_z0 = test.reflect_bounds %block_dim_z

				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 4 : index, smin = 0 : index, umax = 4 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 10 : index, smin = 0 : index, umax = 10 : index, umin = 0 : index}
				%thread_id_x0 = test.reflect_bounds %thread_id_x
				%thread_id_y0 = test.reflect_bounds %thread_id_y
				%thread_id_z0 = test.reflect_bounds %thread_id_z

				gpu.terminator
				}

				func.return
				}

				// CHECK-LABEL: func @kernel
				module attributes {gpu.container_module} {
				gpu.module @gpu_module {
				llvm.func @kernel() attributes {gpu.kernel} {

				%grid_dim_x = gpu.grid_dim x
				%grid_dim_y = gpu.grid_dim y
				%grid_dim_z = gpu.grid_dim z

				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				%grid_dim_x0 = test.reflect_bounds %grid_dim_x
				%grid_dim_y0 = test.reflect_bounds %grid_dim_y
				%grid_dim_z0 = test.reflect_bounds %grid_dim_z

				%block_id_x = gpu.block_id x
				%block_id_y = gpu.block_id y
				%block_id_z = gpu.block_id z

				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				%block_id_x0 = test.reflect_bounds %block_id_x
				%block_id_y0 = test.reflect_bounds %block_id_y
				%block_id_z0 = test.reflect_bounds %block_id_z

				%block_dim_x = gpu.block_dim x
				%block_dim_y = gpu.block_dim y
				%block_dim_z = gpu.block_dim z

				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				%block_dim_x0 = test.reflect_bounds %block_dim_x
				%block_dim_y0 = test.reflect_bounds %block_dim_y
				%block_dim_z0 = test.reflect_bounds %block_dim_z

				%thread_id_x = gpu.thread_id x
				%thread_id_y = gpu.thread_id y
				%thread_id_z = gpu.thread_id z

				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				%thread_id_x0 = test.reflect_bounds %thread_id_x
				%thread_id_y0 = test.reflect_bounds %thread_id_y
				%thread_id_z0 = test.reflect_bounds %thread_id_z

				%global_id_x = gpu.global_id x
				%global_id_y = gpu.global_id y
				%global_id_z = gpu.global_id z

				// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = 0 : index, umax = 9223372036854775807 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = 0 : index, umax = 9223372036854775807 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = 0 : index, umax = 9223372036854775807 : index, umin = 0 : index}
				%global_id_x0 = test.reflect_bounds %global_id_x
				%global_id_y0 = test.reflect_bounds %global_id_y
				%global_id_z0 = test.reflect_bounds %global_id_z

				%subgroup_size = gpu.subgroup_size : index
				%lane_id = gpu.lane_id
				%num_subgroups = gpu.num_subgroups : index
				%subgroup_id = gpu.subgroup_id : index

				// CHECK: test.reflect_bounds {smax = 128 : index, smin = 1 : index, umax = 128 : index, umin = 1 : index}
				// CHECK: test.reflect_bounds {smax = 127 : index, smin = 0 : index, umax = 127 : index, umin = 0 : index}
				// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
				// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
				%subgroup_size0 = test.reflect_bounds %subgroup_size
				%lane_id0 = test.reflect_bounds %lane_id
				%num_subgroups0 = test.reflect_bounds %num_subgroups
				%subgroup_id0 = test.reflect_bounds %subgroup_id

				llvm.return
				}
				}
				}

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

Show First 20 Lines • Show All 3,484 Lines • ▼ Show 20 Lines	srcs = [
"include/mlir/Dialect/GPU/IR/GPUOps.td",		"include/mlir/Dialect/GPU/IR/GPUOps.td",
"include/mlir/Dialect/GPU/IR/ParallelLoopMapperAttr.td",		"include/mlir/Dialect/GPU/IR/ParallelLoopMapperAttr.td",
],		],
includes = ["include"],		includes = ["include"],
deps = [		deps = [
":DLTIDialectTdFiles",		":DLTIDialectTdFiles",
":DataLayoutInterfacesTdFiles",		":DataLayoutInterfacesTdFiles",
":FunctionInterfacesTdFiles",		":FunctionInterfacesTdFiles",
		":InferIntRangeInterfaceTdFiles",
":LLVMOpsTdFiles",		":LLVMOpsTdFiles",
":OpBaseTdFiles",		":OpBaseTdFiles",
":SideEffectInterfacesTdFiles",		":SideEffectInterfacesTdFiles",
],		],
)		)

gentbl_cc_library(		gentbl_cc_library(
name = "GPUBaseIncGen",		name = "GPUBaseIncGen",
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	cc_library(
hdrs = glob(["include/mlir/Dialect/GPU/IR/*.h"]),		hdrs = glob(["include/mlir/Dialect/GPU/IR/*.h"]),
includes = ["include"],		includes = ["include"],
deps = [		deps = [
":ArithmeticDialect",		":ArithmeticDialect",
":DLTIDialect",		":DLTIDialect",
":GPUBaseIncGen",		":GPUBaseIncGen",
":GPUOpsIncGen",		":GPUOpsIncGen",
":IR",		":IR",
		":InferIntRangeInterface",
":InferTypeOpInterface",		":InferTypeOpInterface",
":LLVMDialect",		":LLVMDialect",
":MemRefDialect",		":MemRefDialect",
":SideEffectInterfaces",		":SideEffectInterfaces",
"//llvm:Support",		"//llvm:Support",
],		],
)		)

▲ Show 20 Lines • Show All 5,707 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add InferIntRangeInterface to gpu.launchClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 442176

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp

mlir/test/Dialect/GPU/int-range-interface.mlir

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

[mlir] Add InferIntRangeInterface to gpu.launch
ClosedPublic