This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/GPU/IR/
-
mlir/
-
Dialect/
-
GPU/
-
IR/
-
GPUOps.td
-
lib/Dialect/GPU/
-
Dialect/
-
GPU/
-
IR/
-
GPUDialect.cpp
3
InferIntRangeInterfaceImpls.cpp
-
Transforms/
1/1
KernelOutlining.cpp
-
test/Dialect/GPU/
-
Dialect/
-
GPU/
-
int-range-interface.mlir
-
invalid.mlir
-
outlining.mlir

Differential D139865

[mlir][GPU] Add known_block_size and known_grid_size to gpu.func
ClosedPublic

Authored by krzysz00 on Dec 12 2022, 11:11 AM.

Download Raw Diff

Details

Reviewers

bondhugula
ThomasRaoux
nicolasvasilache
herhut
antiagainst

Commits

rG85e38d7cd670: [mlir][GPU] Add known_block_size and known_grid_size to gpu.func

Summary

In many cases, the the number of workgroups (the grid size) and the
number of workitems within each group (the block size) that a GPU
kernel will be launched with are known. For example, if gpu.launch is
called with constant block and grid sizes, we know that those are the
only possible sizes that will be used to launch that kernel. In other
cases, a custom code-generation pipeline that eventually produces GPU
kernels may know the launch dimensions of those kernels, or at least
may be able to provide an upper bound on them.

Other GPU programming systems, such as OpenCL, allow capturing such
information to enable compiler optimizations - see
reqd_work_group_size, but MLIR currently has no mechanism for doing so.

This set of attributes is the first step in enabling optimizations
based on the known launch dimensions of kernels. It extends the kernel
outline pass to set these bounds on kernels with constant launch
dimensions and extends integer range inference for GPU index
operations to account for the bounds when they are known.

Subsequent revisions will use this data when lowering GPU operations
to the ROCDL dialect.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Dec 12 2022, 11:11 AM

Herald added a reviewer: bondhugula. · View Herald TranscriptDec 12 2022, 11:11 AM

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 22 others. · View Herald Transcript

krzysz00 requested review of this revision.Dec 12 2022, 11:11 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptDec 12 2022, 11:11 AM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

krzysz00 added a child revision: D139866: [mlir][ROCDL] Translate known block size attributes to ROCDL.Dec 12 2022, 11:24 AM

Harbormaster completed remote builds in B202634: Diff 482196.Dec 12 2022, 11:37 AM

antiagainst added inline comments.Dec 21 2022, 10:38 AM

mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp
128	Shouldn't this be `getKnownLaunchDim(this, LaunchDims::Block) getKnownLaunchDim(*this, LaunchDims::Grid) - 1` if both are available?
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
153	Nit: name it as `inferConstantDimsAttr` or something similar?

antiagainst requested changes to this revision.Dec 21 2022, 10:38 AM

This revision now requires changes to proceed.Dec 21 2022, 10:38 AM

Address review comments

krzysz00 marked an inline comment as done.Dec 21 2022, 3:10 PM

krzysz00 added inline comments.

mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp
128	`%res = gpu.global_id [dim]` is, as far as I'm aware, exactly %tid = gpu.thread_id [dim] %bid = gpu.block_id[dim] %res = arith.muli %tid, %bid : index which means the upper bound is the upper bound on thread_id (known block dimension - 1) times the upper bound on he grid dimension (known grid dimension - 1).

Harbormaster completed remote builds in B204469: Diff 484693.Dec 21 2022, 3:24 PM

krzysz00 marked an inline comment as not done.Dec 21 2022, 8:56 PM

krzysz00 added inline comments.

mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp
128	Wait, no, you're right, good catch. I'll fix it soon

Address bug caught in review, remove .transform()

antiagainst accepted this revision.Dec 22 2022, 12:29 PM

This revision is now accepted and ready to land.Dec 22 2022, 12:29 PM

Harbormaster completed remote builds in B204649: Diff 484926.Dec 22 2022, 12:40 PM

Closed by commit rG85e38d7cd670: [mlir][GPU] Add known_block_size and known_grid_size to gpu.func (authored by krzysz00). · Explain WhyDec 22 2022, 1:41 PM

This revision was automatically updated to reflect the committed changes.

krzysz00 added a commit: rG85e38d7cd670: [mlir][GPU] Add known_block_size and known_grid_size to gpu.func.

The valueByDim() switch in InferIntRangeInterfaceImpls.cpp appears to have broken the Windows buildbot (https://lab.llvm.org/buildbot/#/builders/13/builds/30180/steps/6/logs/stdio).

C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Dialect\GPU\IR\InferIntRangeInterfaceImpls.cpp(49) : error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Dialect\GPU\IR\InferIntRangeInterfaceImpls.cpp(49) : warning C4715: 'valueByDim': not all control paths return a value

@NathanialMcVicar Can you add a return nullptr; to the end of the function? Or a llvm_unreachable() go make the error clearer? (Not at a computer right now so I can't do it, but I can take care of it in a few hours if it still needs fixing)

Looks like this broke the windows mlir buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/30140

If you can fix it today, can we revert it so we don't go into the weekend with a red bot?

Go ahead and revert, I'll land the fox after the holiday, probably on Monday or Tuesday

stella.stamenova added a reverting change: rG828b4762caf4: Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func".Dec 23 2022, 5:30 PM

krzysz00 mentioned this in rGbe575c5dfc55: Re-land D139865 "Add known_block_size and known_grid_size to gpu.func".Jan 2 2023, 8:39 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUOps.td

40 lines

lib/

Dialect/

GPU/

IR/

GPUDialect.cpp

22 lines

InferIntRangeInterfaceImpls.cpp

85 lines

Transforms/

KernelOutlining.cpp

32 lines

test/

Dialect/

GPU/

int-range-interface.mlir

97 lines

invalid.mlir

22 lines

outlining.mlir

19 lines

Diff 482196

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	let description = [{
modification, followed by buffers defined in memory annotations. The body of		modification, followed by buffers defined in memory annotations. The body of
a GPU function, when launched, is executed by multiple work items. There are		a GPU function, when launched, is executed by multiple work items. There are
no guarantees on the order in which work items execute, or on the connection		no guarantees on the order in which work items execute, or on the connection
between them. In particular, work items are not necessarily executed in		between them. In particular, work items are not necessarily executed in
lock-step. Synchronization ops such as "gpu.barrier" should be used to		lock-step. Synchronization ops such as "gpu.barrier" should be used to
coordinate work items. Declarations of GPU functions, i.e. not having the		coordinate work items. Declarations of GPU functions, i.e. not having the
body region, are not supported.		body region, are not supported.

		A function may optionally be annotated with the block and/or grid sizes
		that will be used when it is launched using the `gpu.known_block_size` and
		`gpu.known_grid_size` attributes, respectively. If set, these attributes must
		be arrays of three 32-bit integers giving the x, y, and z launch dimensions.
		Launching a kernel that has these annotations, or that calls a function with
		these annotations, using a block size or grid size other than what is specified
		is undefined behavior.

Syntax:		Syntax:

```		```
op ::= `gpu.func` symbol-ref-id `(` argument-list `)` (`->`		op ::= `gpu.func` symbol-ref-id `(` argument-list `)` (`->`
function-result-list)?		function-result-list)?
memory-attribution `kernel`? function-attributes? region		memory-attribution `kernel`? function-attributes? region

memory-attribution ::= (`workgroup` `(` ssa-id-and-type-list `)`)?		memory-attribution ::= (`workgroup` `(` ssa-id-and-type-list `)`)?
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
BlockArgument addPrivateAttribution(Type type, Location loc);		BlockArgument addPrivateAttribution(Type type, Location loc);

/// Returns the name of the attribute containing the number of buffers		/// Returns the name of the attribute containing the number of buffers
/// located in the workgroup memory.		/// located in the workgroup memory.
static StringRef getNumWorkgroupAttributionsAttrName() {		static StringRef getNumWorkgroupAttributionsAttrName() {
return "workgroup_attributions";		return "workgroup_attributions";
}		}

		static constexpr StringLiteral getKnownBlockSizeAttrName() {
		return StringLiteral("gpu.known_block_size");
		}

		static constexpr StringLiteral getKnownGridSizeAttrName() {
		return StringLiteral("gpu.known_grid_size");
		}

		/// Returns the block size this kernel will be launched with along
		/// dimension `dim` if known. The value of gpu.thread_id dim will be strictly
		/// less than this size.
		Optional<uint32_t> getKnownBlockSize(gpu::Dimension dim) {
		if (auto array =
		(*this)->getAttrOfType<DenseI32ArrayAttr>(getKnownBlockSizeAttrName())) {
		return array[static_cast<uint32_t>(dim)];
		}
		return std::nullopt;
		}

		/// Returns the grid size this kernel will be launched with along
		/// dimension `dim` if known. The value of gpu.block_id dim will be strictly
		/// less than this size.
		Optional<uint32_t> getKnownGridSize(gpu::Dimension dim) {
		if (auto array =
		(*this)->getAttrOfType<DenseI32ArrayAttr>(getKnownGridSizeAttrName())) {
		return array[static_cast<uint32_t>(dim)];
		}
		return std::nullopt;
		}

/// Returns the argument types of this function.		/// Returns the argument types of this function.
ArrayRef<Type> getArgumentTypes() { return getFunctionType().getInputs(); }		ArrayRef<Type> getArgumentTypes() { return getFunctionType().getInputs(); }

/// Returns the result types of this function.		/// Returns the result types of this function.
ArrayRef<Type> getResultTypes() { return getFunctionType().getResults(); }		ArrayRef<Type> getResultTypes() { return getFunctionType().getResults(); }

/// Returns the keywords used in the custom syntax for this Op.		/// Returns the keywords used in the custom syntax for this Op.
static StringRef getWorkgroupKeyword() { return "workgroup"; }		static StringRef getWorkgroupKeyword() { return "workgroup"; }
static StringRef getPrivateKeyword() { return "private"; }		static StringRef getPrivateKeyword() { return "private"; }
static StringRef getKernelKeyword() { return "kernel"; }		static StringRef getKernelKeyword() { return "kernel"; }

/// Hook for FunctionOpInterface verifier.		/// Hook for FunctionOpInterface verifier.
LogicalResult verifyType();		LogicalResult verifyType();

/// Verifies the body of the function.		/// Verifies the body of the function.
LogicalResult verifyBody();		LogicalResult verifyBody();
}];		}];
let hasCustomAssemblyFormat = 1;		let hasCustomAssemblyFormat = 1;

		let hasVerifier = 1;
}		}

def GPU_LaunchFuncOp : GPU_Op<"launch_func",		def GPU_LaunchFuncOp : GPU_Op<"launch_func",
[GPU_AsyncOpInterface, AttrSizedOperandSegments]>,		[GPU_AsyncOpInterface, AttrSizedOperandSegments]>,
Arguments<(ins Variadic<GPU_AsyncToken>:$asyncDependencies,		Arguments<(ins Variadic<GPU_AsyncToken>:$asyncDependencies,
SymbolRefAttr:$kernel,		SymbolRefAttr:$kernel,
Index:$gridSizeX, Index:$gridSizeY, Index:$gridSizeZ,		Index:$gridSizeX, Index:$gridSizeY, Index:$gridSizeZ,
Index:$blockSizeX, Index:$blockSizeY, Index:$blockSizeZ,		Index:$blockSizeX, Index:$blockSizeY, Index:$blockSizeZ,
▲ Show 20 Lines • Show All 1,000 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

Show All 10 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/GPU/IR/GPUDialect.h"		#include "mlir/Dialect/GPU/IR/GPUDialect.h"

#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/IR/Attributes.h"		#include "mlir/IR/Attributes.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
		#include "mlir/IR/BuiltinAttributes.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/BuiltinTypes.h"		#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/DialectImplementation.h"		#include "mlir/IR/DialectImplementation.h"
#include "mlir/IR/FunctionImplementation.h"		#include "mlir/IR/FunctionImplementation.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/OpImplementation.h"		#include "mlir/IR/OpImplementation.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/IR/TypeUtilities.h"		#include "mlir/IR/TypeUtilities.h"
▲ Show 20 Lines • Show All 1,024 Lines • ▼ Show 20 Lines	if (failed(verifyAttributions(getOperation(), getWorkgroupAttributions(),
GPUDialect::getWorkgroupAddressSpace())) \|\|		GPUDialect::getWorkgroupAddressSpace())) \|\|
failed(verifyAttributions(getOperation(), getPrivateAttributions(),		failed(verifyAttributions(getOperation(), getPrivateAttributions(),
GPUDialect::getPrivateAddressSpace())))		GPUDialect::getPrivateAddressSpace())))
return failure();		return failure();

return success();		return success();
}		}

		static LogicalResult verifyKnownLaunchSizeAttr(gpu::GPUFuncOp op,
		StringRef attrName) {
		auto maybeAttr = op->getAttr(attrName);
		if (!maybeAttr)
		return success();
		auto array = maybeAttr.dyn_cast<DenseI32ArrayAttr>();
		if (!array)
		return op.emitOpError(attrName + " must be a dense i32 array");
		if (array.size() != 3)
		return op.emitOpError(attrName + " must contain exactly 3 elements");
		return success();
		}

		LogicalResult GPUFuncOp::verify() {
		if (failed(verifyKnownLaunchSizeAttr(*this, getKnownBlockSizeAttrName())))
		return failure();
		if (failed(verifyKnownLaunchSizeAttr(*this, getKnownGridSizeAttrName())))
		return failure();
		return success();
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ReturnOp		// ReturnOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

LogicalResult gpu::ReturnOp::verify() {		LogicalResult gpu::ReturnOp::verify() {
GPUFuncOp function = (*this)->getParentOfType<GPUFuncOp>();		GPUFuncOp function = (*this)->getParentOfType<GPUFuncOp>();

FunctionType funType = function.getFunctionType();		FunctionType funType = function.getFunctionType();
▲ Show 20 Lines • Show All 346 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp

	//===- InferIntRangeInterfaceImpls.cpp - Integer range impls for gpu -===//			//===- InferIntRangeInterfaceImpls.cpp - Integer range impls for gpu -===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Dialect/GPU/IR/GPUDialect.h"			#include "mlir/Dialect/GPU/IR/GPUDialect.h"
				#include "mlir/IR/Matchers.h"
	#include "mlir/Interfaces/InferIntRangeInterface.h"			#include "mlir/Interfaces/InferIntRangeInterface.h"
				#include "llvm/Support/MathExtras.h"
				#include <optional>

	using namespace mlir;			using namespace mlir;
	using namespace mlir::gpu;			using namespace mlir::gpu;

	// Maximum grid and block dimensions of all known GPUs are less than 2^32.			// Maximum grid and block dimensions of all known GPUs are less than 2^32.
	static constexpr uint64_t kMaxDim = std::numeric_limits<uint32_t>::max();			static constexpr uint64_t kMaxDim = std::numeric_limits<uint32_t>::max();
	// Maximum subgroups are no larger than 128.			// Maximum subgroups are no larger than 128.
	static constexpr uint64_t kMaxSubgroupSize = 128;			static constexpr uint64_t kMaxSubgroupSize = 128;

	static ConstantIntRanges getIndexRange(uint64_t umin, uint64_t umax) {			static ConstantIntRanges getIndexRange(uint64_t umin, uint64_t umax) {
	unsigned width = IndexType::kInternalStorageBitWidth;			unsigned width = IndexType::kInternalStorageBitWidth;
	return ConstantIntRanges::fromUnsigned(APInt(width, umin),			return ConstantIntRanges::fromUnsigned(APInt(width, umin),
	APInt(width, umax));			APInt(width, umax));
	}			}

				namespace {
				enum class LaunchDims : uint32_t { Block = 0, Grid = 1 };
				} // end namespace

				/// If the operation `op` is in a context that is annotated with maximum
				/// launch dimensions (a launch op with constant block or grid
				/// sizes or a launch_func op with the appropriate dimensions), return
				/// the bound on the maximum size of the dimension that the op is querying.
				/// IDs will be one less than this bound.

				static Value valueByDim(KernelDim3 dims, Dimension dim) {
				switch (dim) {
				case Dimension::x:
				return dims.x;
				case Dimension::y:
				return dims.y;
				case Dimension::z:
				return dims.z;
				}
				}

				static uint64_t zext(uint32_t arg) { return static_cast<uint64_t>(arg); }

				template <typename Op>
				static Optional<uint64_t> getKnownLaunchDim(Op op, LaunchDims type) {
				Dimension dim = op.getDimension();
				if (auto launch = op->template getParentOfType<LaunchOp>()) {
				KernelDim3 bounds;
				switch (type) {
				case LaunchDims::Block:
				bounds = launch.getBlockSizeOperandValues();
				break;
				case LaunchDims::Grid:
				bounds = launch.getGridSizeOperandValues();
				break;
				}
				Value maybeBound = valueByDim(bounds, dim);
				APInt value;
				if (matchPattern(maybeBound, m_ConstantInt(&value)))
				return value.getZExtValue();
				}

				if (auto func = op->template getParentOfType<GPUFuncOp>()) {
				switch (type) {
				case LaunchDims::Block:
				return func.getKnownBlockSize(dim).transform(zext);
				case LaunchDims::Grid:
				return func.getKnownGridSize(dim).transform(zext);
				}
				}
				return std::nullopt;
				}

	void BlockDimOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void BlockDimOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
				Optional<uint64_t> knownVal = getKnownLaunchDim(*this, LaunchDims::Block);
				if (knownVal)
				setResultRange(getResult(), getIndexRange(knownVal, knownVal));
				else
	setResultRange(getResult(), getIndexRange(1, kMaxDim));			setResultRange(getResult(), getIndexRange(1, kMaxDim));
	}			}

	void BlockIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void BlockIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
	setResultRange(getResult(), getIndexRange(0, kMaxDim - 1));			uint64_t max = getKnownLaunchDim(*this, LaunchDims::Grid).value_or(kMaxDim);
				setResultRange(getResult(), getIndexRange(0, max - 1ULL));
	}			}

	void GridDimOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void GridDimOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
				Optional<uint64_t> knownVal = getKnownLaunchDim(*this, LaunchDims::Grid);
				if (knownVal)
				setResultRange(getResult(), getIndexRange(knownVal, knownVal));
				else
	setResultRange(getResult(), getIndexRange(1, kMaxDim));			setResultRange(getResult(), getIndexRange(1, kMaxDim));
	}			}

	void ThreadIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void ThreadIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
	setResultRange(getResult(), getIndexRange(0, kMaxDim - 1));			uint64_t max = getKnownLaunchDim(*this, LaunchDims::Block).value_or(kMaxDim);
				setResultRange(getResult(), getIndexRange(0, max - 1ULL));
	}			}

	void LaneIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void LaneIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
	setResultRange(getResult(), getIndexRange(0, kMaxSubgroupSize - 1));			setResultRange(getResult(), getIndexRange(0, kMaxSubgroupSize - 1ULL));
	}			}

	void SubgroupIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void SubgroupIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
	setResultRange(getResult(), getIndexRange(0, kMaxDim - 1));			setResultRange(getResult(), getIndexRange(0, kMaxDim - 1ULL));
	}			}

	void GlobalIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void GlobalIdOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
	setResultRange(getResult(),			uint64_t blockIdMax =
	getIndexRange(0, std::numeric_limits<int64_t>::max()));			getKnownLaunchDim(*this, LaunchDims::Block).value_or(kMaxDim) - 1ULL;
				uint64_t gridIdMax =
				getKnownLaunchDim(*this, LaunchDims::Grid).value_or(kMaxDim) - 1ULL;
				setResultRange(getResult(), getIndexRange(0, blockIdMax * gridIdMax));
				antiagainstUnsubmitted Not Done Reply Inline Actions Shouldn't this be `getKnownLaunchDim(this, LaunchDims::Block) getKnownLaunchDim(this, LaunchDims::Grid) - 1` if both are available? antiagainst:* Shouldn't this be `getKnownLaunchDim(this, LaunchDims::Block) getKnownLaunchDim(*this…
				krzysz00AuthorUnsubmitted Not Done Reply Inline Actions `%res = gpu.global_id [dim]` is, as far as I'm aware, exactly %tid = gpu.thread_id [dim] %bid = gpu.block_id[dim] %res = arith.muli %tid, %bid : index which means the upper bound is the upper bound on thread_id (known block dimension - 1) times the upper bound on he grid dimension (known grid dimension - 1). krzysz00: `%res = gpu.global_id [dim]` is, as far as I'm aware, exactly ``` %tid = gpu.thread_id [dim]…
				krzysz00AuthorUnsubmitted Not Done Reply Inline Actions Wait, no, you're right, good catch. I'll fix it soon krzysz00: Wait, no, you're right, good catch. I'll fix it soon
	}			}

	void NumSubgroupsOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void NumSubgroupsOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	SetIntRangeFn setResultRange) {			SetIntRangeFn setResultRange) {
	setResultRange(getResult(), getIndexRange(1, kMaxDim));			setResultRange(getResult(), getIndexRange(1, kMaxDim));
	}			}

	void SubgroupSizeOp::inferResultRanges(ArrayRef<ConstantIntRanges>,			void SubgroupSizeOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
	Show All 30 Lines

mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp

Show All 16 Lines
#include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"		#include "mlir/Dialect/ControlFlow/IR/ControlFlowOps.h"
#include "mlir/Dialect/DLTI/DLTI.h"		#include "mlir/Dialect/DLTI/DLTI.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"		#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"		#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/GPU/Transforms/Utils.h"		#include "mlir/Dialect/GPU/Transforms/Utils.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/IR/BlockAndValueMapping.h"		#include "mlir/IR/BlockAndValueMapping.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
		#include "mlir/IR/BuiltinAttributes.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/SymbolTable.h"		#include "mlir/IR/SymbolTable.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/RegionUtils.h"		#include "mlir/Transforms/RegionUtils.h"
		#include <limits>

namespace mlir {		namespace mlir {
#define GEN_PASS_DEF_GPULAUNCHSINKINDEXCOMPUTATIONS		#define GEN_PASS_DEF_GPULAUNCHSINKINDEXCOMPUTATIONS
#define GEN_PASS_DEF_GPUKERNELOUTLINING		#define GEN_PASS_DEF_GPUKERNELOUTLINING
#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"		#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"
} // namespace mlir		} // namespace mlir

using namespace mlir;		using namespace mlir;
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	for (Operation *op : toBeSunk) {
// Only replace uses within the launch op.		// Only replace uses within the launch op.
for (auto pair : llvm::zip(op->getResults(), clonedOp->getResults()))		for (auto pair : llvm::zip(op->getResults(), clonedOp->getResults()))
replaceAllUsesInRegionWith(std::get<0>(pair), std::get<1>(pair),		replaceAllUsesInRegionWith(std::get<0>(pair), std::get<1>(pair),
launchOp.getBody());		launchOp.getBody());
}		}
return success();		return success();
}		}

		/// Return the provided KernelDim3 as an array of i32 constants if possible.
		static DenseI32ArrayAttr constantDimsAttr(gpu::KernelDim3 dims) {
		antiagainstUnsubmitted Done Reply Inline Actions Nit: name it as `inferConstantDimsAttr` or something similar? antiagainst: Nit: name it as `inferConstantDimsAttr` or something similar?
		SmallVector<int32_t, 3> constants;
		MLIRContext *ctx = dims.x.getContext();
		for (Value v : {dims.x, dims.y, dims.z}) {
		APInt constValue;
		if (!matchPattern(v, m_ConstantInt(&constValue)))
		return nullptr;
		// In the event someone called for a too-large block or grid dimension,
		// don't set bounds as it is likely to cause more confusing behavior.
		if (constValue.ugt(std::numeric_limits<uint32_t>::max()))
		return nullptr;
		constants.push_back(
		constValue.getLimitedValue(std::numeric_limits<uint32_t>::max()));
		}
		return DenseI32ArrayAttr::get(ctx, constants);
		}

/// Outline the `gpu.launch` operation body into a kernel function. Replace		/// Outline the `gpu.launch` operation body into a kernel function. Replace
/// `gpu.terminator` operations by `gpu.return` in the generated function.		/// `gpu.terminator` operations by `gpu.return` in the generated function.
		/// Set block and grid size bounds if known.
static gpu::GPUFuncOp outlineKernelFuncImpl(gpu::LaunchOp launchOp,		static gpu::GPUFuncOp outlineKernelFuncImpl(gpu::LaunchOp launchOp,
StringRef kernelFnName,		StringRef kernelFnName,
SetVector<Value> &operands) {		SetVector<Value> &operands) {
Location loc = launchOp.getLoc();		Location loc = launchOp.getLoc();
// Create a builder with no insertion point, insertion will happen separately		// Create a builder with no insertion point, insertion will happen separately
// due to symbol table manipulation.		// due to symbol table manipulation.
OpBuilder builder(launchOp.getContext());		OpBuilder builder(launchOp.getContext());
Region &launchOpBody = launchOp.getBody();		Region &launchOpBody = launchOp.getBody();

// Identify uses from values defined outside of the scope of the launch		// Identify uses from values defined outside of the scope of the launch
// operation.		// operation.
getUsedValuesDefinedAbove(launchOpBody, operands);		getUsedValuesDefinedAbove(launchOpBody, operands);

// Create the gpu.func operation.		// Create the gpu.func operation.
SmallVector<Type, 4> kernelOperandTypes;		SmallVector<Type, 4> kernelOperandTypes;
kernelOperandTypes.reserve(operands.size());		kernelOperandTypes.reserve(operands.size());
for (Value operand : operands) {		for (Value operand : operands) {
kernelOperandTypes.push_back(operand.getType());		kernelOperandTypes.push_back(operand.getType());
}		}
FunctionType type =		FunctionType type =
FunctionType::get(launchOp.getContext(), kernelOperandTypes, {});		FunctionType::get(launchOp.getContext(), kernelOperandTypes, {});
auto outlinedFunc = builder.create<gpu::GPUFuncOp>(loc, kernelFnName, type);		auto outlinedFunc = builder.create<gpu::GPUFuncOp>(loc, kernelFnName, type);
outlinedFunc->setAttr(gpu::GPUDialect::getKernelFuncAttrName(),		outlinedFunc->setAttr(gpu::GPUDialect::getKernelFuncAttrName(),
builder.getUnitAttr());		builder.getUnitAttr());

		// If we can infer bounds on the grid and/or block sizes from the arguments
		// to the launch op, propagate them to the generated kernel. This is safe
		// because multiple launches with the same body are not deduplicated.
		if (auto blockBounds = constantDimsAttr(launchOp.getBlockSizeOperandValues()))
		outlinedFunc->setAttr(gpu::GPUFuncOp::getKnownBlockSizeAttrName(),
		blockBounds);
		if (auto gridBounds = constantDimsAttr(launchOp.getGridSizeOperandValues()))
		outlinedFunc->setAttr(gpu::GPUFuncOp::getKnownGridSizeAttrName(),
		gridBounds);

BlockAndValueMapping map;		BlockAndValueMapping map;

// Map the arguments corresponding to the launch parameters like blockIdx,		// Map the arguments corresponding to the launch parameters like blockIdx,
// threadIdx, etc.		// threadIdx, etc.
Region &outlinedFuncBody = outlinedFunc.getBody();		Region &outlinedFuncBody = outlinedFunc.getBody();
injectGpuIndexOperations(loc, outlinedFuncBody, launchOpBody, map);		injectGpuIndexOperations(loc, outlinedFuncBody, launchOpBody, map);

// Map arguments from gpu.launch region to the arguments of the gpu.func		// Map arguments from gpu.launch region to the arguments of the gpu.func
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

mlir/test/Dialect/GPU/int-range-interface.mlir

// RUN: mlir-opt -test-int-range-inference %s \| FileCheck %s		// RUN: mlir-opt -test-int-range-inference -split-input-file %s \| FileCheck %s

// CHECK-LABEL: func @launch_func		// CHECK-LABEL: func @launch_func
func.func @launch_func(%arg0 : index) {		func.func @launch_func(%arg0 : index) {
%0 = test.with_bounds {		%0 = test.with_bounds {
umin = 3 : index, umax = 5 : index,		umin = 3 : index, umax = 5 : index,
smin = 3 : index, smax = 5 : index		smin = 3 : index, smax = 5 : index
}		}
%1 = test.with_bounds {		%1 = test.with_bounds {
Show All 26 Lines	gpu.launch blocks(%block_id_x, %block_id_y, %block_id_z) in (%grid_dim_x = %0, %grid_dim_y = %1, %grid_dim_z = %arg0)

// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
// CHECK: test.reflect_bounds {smax = 4 : index, smin = 0 : index, umax = 4 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 4 : index, smin = 0 : index, umax = 4 : index, umin = 0 : index}
// CHECK: test.reflect_bounds {smax = 10 : index, smin = 0 : index, umax = 10 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 10 : index, smin = 0 : index, umax = 10 : index, umin = 0 : index}
%thread_id_x0 = test.reflect_bounds %thread_id_x		%thread_id_x0 = test.reflect_bounds %thread_id_x
%thread_id_y0 = test.reflect_bounds %thread_id_y		%thread_id_y0 = test.reflect_bounds %thread_id_y
%thread_id_z0 = test.reflect_bounds %thread_id_z		%thread_id_z0 = test.reflect_bounds %thread_id_z

		// The launch bounds are not constant, and so this can't infer anything
		// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
		%thread_id_op = gpu.thread_id y
		%thread_id_op0 = test.reflect_bounds %thread_id_op
gpu.terminator		gpu.terminator
}		}

func.return		func.return
}		}

		// -----

// CHECK-LABEL: func @kernel		// CHECK-LABEL: func @kernel
module attributes {gpu.container_module} {		module attributes {gpu.container_module} {
gpu.module @gpu_module {		gpu.module @gpu_module {
llvm.func @kernel() attributes {gpu.kernel} {		llvm.func @kernel() attributes {gpu.kernel} {

%grid_dim_x = gpu.grid_dim x		%grid_dim_x = gpu.grid_dim x
%grid_dim_y = gpu.grid_dim y		%grid_dim_y = gpu.grid_dim y
%grid_dim_z = gpu.grid_dim z		%grid_dim_z = gpu.grid_dim z
Show All 37 Lines	llvm.func @kernel() attributes {gpu.kernel} {
%thread_id_x0 = test.reflect_bounds %thread_id_x		%thread_id_x0 = test.reflect_bounds %thread_id_x
%thread_id_y0 = test.reflect_bounds %thread_id_y		%thread_id_y0 = test.reflect_bounds %thread_id_y
%thread_id_z0 = test.reflect_bounds %thread_id_z		%thread_id_z0 = test.reflect_bounds %thread_id_z

%global_id_x = gpu.global_id x		%global_id_x = gpu.global_id x
%global_id_y = gpu.global_id y		%global_id_y = gpu.global_id y
%global_id_z = gpu.global_id z		%global_id_z = gpu.global_id z

// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = 0 : index, umax = 9223372036854775807 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = -9223372036854775808 : index, umax = -17179869180 : index, umin = 0 : index}
// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = 0 : index, umax = 9223372036854775807 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = -9223372036854775808 : index, umax = -17179869180 : index, umin = 0 : index}
// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = 0 : index, umax = 9223372036854775807 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 9223372036854775807 : index, smin = -9223372036854775808 : index, umax = -17179869180 : index, umin = 0 : index}
%global_id_x0 = test.reflect_bounds %global_id_x		%global_id_x0 = test.reflect_bounds %global_id_x
%global_id_y0 = test.reflect_bounds %global_id_y		%global_id_y0 = test.reflect_bounds %global_id_y
%global_id_z0 = test.reflect_bounds %global_id_z		%global_id_z0 = test.reflect_bounds %global_id_z

%subgroup_size = gpu.subgroup_size : index		%subgroup_size = gpu.subgroup_size : index
%lane_id = gpu.lane_id		%lane_id = gpu.lane_id
%num_subgroups = gpu.num_subgroups : index		%num_subgroups = gpu.num_subgroups : index
%subgroup_id = gpu.subgroup_id : index		%subgroup_id = gpu.subgroup_id : index

// CHECK: test.reflect_bounds {smax = 128 : index, smin = 1 : index, umax = 128 : index, umin = 1 : index}		// CHECK: test.reflect_bounds {smax = 128 : index, smin = 1 : index, umax = 128 : index, umin = 1 : index}
// CHECK: test.reflect_bounds {smax = 127 : index, smin = 0 : index, umax = 127 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 127 : index, smin = 0 : index, umax = 127 : index, umin = 0 : index}
// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}		// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}		// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
%subgroup_size0 = test.reflect_bounds %subgroup_size		%subgroup_size0 = test.reflect_bounds %subgroup_size
%lane_id0 = test.reflect_bounds %lane_id		%lane_id0 = test.reflect_bounds %lane_id
%num_subgroups0 = test.reflect_bounds %num_subgroups		%num_subgroups0 = test.reflect_bounds %num_subgroups
%subgroup_id0 = test.reflect_bounds %subgroup_id		%subgroup_id0 = test.reflect_bounds %subgroup_id

llvm.return		llvm.return
}		}
}		}
}		}

		// -----

		// CHECK-LABEL: func @annotated_kernel
		module attributes {gpu.container_module} {
		gpu.module @gpu_module {
		gpu.func @annotated_kernel() kernel
		attributes {gpu.known_block_size = array<i32: 8, 12, 16>,
		gpu.known_grid_size = array<i32: 20, 24, 28>} {

		%grid_dim_x = gpu.grid_dim x
		%grid_dim_y = gpu.grid_dim y
		%grid_dim_z = gpu.grid_dim z

		// CHECK: test.reflect_bounds {smax = 20 : index, smin = 20 : index, umax = 20 : index, umin = 20 : index}
		// CHECK: test.reflect_bounds {smax = 24 : index, smin = 24 : index, umax = 24 : index, umin = 24 : index}
		// CHECK: test.reflect_bounds {smax = 28 : index, smin = 28 : index, umax = 28 : index, umin = 28 : index}
		%grid_dim_x0 = test.reflect_bounds %grid_dim_x
		%grid_dim_y0 = test.reflect_bounds %grid_dim_y
		%grid_dim_z0 = test.reflect_bounds %grid_dim_z

		%block_id_x = gpu.block_id x
		%block_id_y = gpu.block_id y
		%block_id_z = gpu.block_id z

		// CHECK: test.reflect_bounds {smax = 19 : index, smin = 0 : index, umax = 19 : index, umin = 0 : index}
		// CHECK: test.reflect_bounds {smax = 23 : index, smin = 0 : index, umax = 23 : index, umin = 0 : index}
		// CHECK: test.reflect_bounds {smax = 27 : index, smin = 0 : index, umax = 27 : index, umin = 0 : index}
		%block_id_x0 = test.reflect_bounds %block_id_x
		%block_id_y0 = test.reflect_bounds %block_id_y
		%block_id_z0 = test.reflect_bounds %block_id_z

		%block_dim_x = gpu.block_dim x
		%block_dim_y = gpu.block_dim y
		%block_dim_z = gpu.block_dim z

		// CHECK: test.reflect_bounds {smax = 8 : index, smin = 8 : index, umax = 8 : index, umin = 8 : index}
		// CHECK: test.reflect_bounds {smax = 12 : index, smin = 12 : index, umax = 12 : index, umin = 12 : index}
		// CHECK: test.reflect_bounds {smax = 16 : index, smin = 16 : index, umax = 16 : index, umin = 16 : index}
		%block_dim_x0 = test.reflect_bounds %block_dim_x
		%block_dim_y0 = test.reflect_bounds %block_dim_y
		%block_dim_z0 = test.reflect_bounds %block_dim_z

		%thread_id_x = gpu.thread_id x
		%thread_id_y = gpu.thread_id y
		%thread_id_z = gpu.thread_id z

		// CHECK: test.reflect_bounds {smax = 7 : index, smin = 0 : index, umax = 7 : index, umin = 0 : index}
		// CHECK: test.reflect_bounds {smax = 11 : index, smin = 0 : index, umax = 11 : index, umin = 0 : index}
		// CHECK: test.reflect_bounds {smax = 15 : index, smin = 0 : index, umax = 15 : index, umin = 0 : index}
		%thread_id_x0 = test.reflect_bounds %thread_id_x
		%thread_id_y0 = test.reflect_bounds %thread_id_y
		%thread_id_z0 = test.reflect_bounds %thread_id_z

		%global_id_x = gpu.global_id x
		%global_id_y = gpu.global_id y
		%global_id_z = gpu.global_id z

		// CHECK: test.reflect_bounds {smax = 133 : index, smin = 0 : index, umax = 133 : index, umin = 0 : index}
		// CHECK: test.reflect_bounds {smax = 253 : index, smin = 0 : index, umax = 253 : index, umin = 0 : index}
		// CHECK: test.reflect_bounds {smax = 405 : index, smin = 0 : index, umax = 405 : index, umin = 0 : index}
		%global_id_x0 = test.reflect_bounds %global_id_x
		%global_id_y0 = test.reflect_bounds %global_id_y
		%global_id_z0 = test.reflect_bounds %global_id_z

		%subgroup_size = gpu.subgroup_size : index
		%lane_id = gpu.lane_id
		%num_subgroups = gpu.num_subgroups : index
		%subgroup_id = gpu.subgroup_id : index

		// CHECK: test.reflect_bounds {smax = 128 : index, smin = 1 : index, umax = 128 : index, umin = 1 : index}
		// CHECK: test.reflect_bounds {smax = 127 : index, smin = 0 : index, umax = 127 : index, umin = 0 : index}
		// CHECK: test.reflect_bounds {smax = 4294967295 : index, smin = 1 : index, umax = 4294967295 : index, umin = 1 : index}
		// CHECK: test.reflect_bounds {smax = 4294967294 : index, smin = 0 : index, umax = 4294967294 : index, umin = 0 : index}
		%subgroup_size0 = test.reflect_bounds %subgroup_size
		%lane_id0 = test.reflect_bounds %lane_id
		%num_subgroups0 = test.reflect_bounds %num_subgroups
		%subgroup_id0 = test.reflect_bounds %subgroup_id

		gpu.return
		}
		}
		}

mlir/test/Dialect/GPU/invalid.mlir

	Show First 20 Lines • Show All 593 Lines • ▼ Show 20 Lines

	// Number of dynamic dimension operand count less than memref dynamic dimension count.			// Number of dynamic dimension operand count less than memref dynamic dimension count.
	func.func @alloc() {			func.func @alloc() {
	%0 = arith.constant 7 : index			%0 = arith.constant 7 : index
	// expected-error@+1 {{dimension operand count does not equal memref dynamic dimension count}}			// expected-error@+1 {{dimension operand count does not equal memref dynamic dimension count}}
	%1 = gpu.alloc(%0) : memref<2x?x?xf32, 1>			%1 = gpu.alloc(%0) : memref<2x?x?xf32, 1>
	return			return
	}			}

				// -----

				module attributes {gpu.container_module} {
				gpu.module @kernel {
				// expected-error@+1 {{'gpu.func' op gpu.known_block_size must be a dense i32 array}}
				gpu.func @kernel() kernel attributes {gpu.known_block_size = 32 : i32} {
				gpu.return
				}
				}
				}

				// -----

				module attributes {gpu.container_module} {
				gpu.module @kernel {
				// expected-error@+1 {{'gpu.func' op gpu.known_block_size must contain exactly 3 elements}}
				gpu.func @kernel() kernel attributes {gpu.known_block_size = array<i32: 2, 1>} {
				gpu.return
				}
				}
				}

mlir/test/Dialect/GPU/outlining.mlir

Show All 35 Lines	func.func @launch() {
return		return
}		}

// CHECK-DL-LABEL: gpu.module @launch_kernel attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<index, 32 : i32>>}		// CHECK-DL-LABEL: gpu.module @launch_kernel attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<index, 32 : i32>>}

// CHECK-LABEL: gpu.module @launch_kernel		// CHECK-LABEL: gpu.module @launch_kernel
// CHECK-NEXT: gpu.func @launch_kernel		// CHECK-NEXT: gpu.func @launch_kernel
// CHECK-SAME: (%[[KERNEL_ARG0:.]]: f32, %[[KERNEL_ARG1:.]]: memref<?xf32, 1>)		// CHECK-SAME: (%[[KERNEL_ARG0:.]]: f32, %[[KERNEL_ARG1:.]]: memref<?xf32, 1>)
		// CHECK-SAME: gpu.known_block_size = array<i32: 20, 24, 28>
		// CHECK-SAME: gpu.known_grid_size = array<i32: 8, 12, 16>
// CHECK-NEXT: %[[BID:.*]] = gpu.block_id x		// CHECK-NEXT: %[[BID:.*]] = gpu.block_id x
// CHECK-NEXT: = gpu.block_id y		// CHECK-NEXT: = gpu.block_id y
// CHECK-NEXT: = gpu.block_id z		// CHECK-NEXT: = gpu.block_id z
// CHECK-NEXT: %[[TID:.*]] = gpu.thread_id x		// CHECK-NEXT: %[[TID:.*]] = gpu.thread_id x
// CHECK-NEXT: = gpu.thread_id y		// CHECK-NEXT: = gpu.thread_id y
// CHECK-NEXT: = gpu.thread_id z		// CHECK-NEXT: = gpu.thread_id z
// CHECK-NEXT: = gpu.grid_dim x		// CHECK-NEXT: = gpu.grid_dim x
// CHECK-NEXT: = gpu.grid_dim y		// CHECK-NEXT: = gpu.grid_dim y
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines
// CHECK: llvm.mlir.addressof @global : !llvm.ptr<i64>		// CHECK: llvm.mlir.addressof @global : !llvm.ptr<i64>
// CHECK: gpu.return		// CHECK: gpu.return
//		//
// CHECK: llvm.mlir.global internal @global(42 : i64) {addr_space = 0 : i32} : i64		// CHECK: llvm.mlir.global internal @global(42 : i64) {addr_space = 0 : i32} : i64
//		//
// CHECK: func @device_function()		// CHECK: func @device_function()
// CHECK: func @recursive_device_function()		// CHECK: func @recursive_device_function()
// CHECK-NOT: func @device_function		// CHECK-NOT: func @device_function

		// -----

		// CHECK-LABEL: @non_constant_launches
		func.func @non_constant_launches(%arg0 : index) {
		// CHECK-NOT: gpu.known_block_size
		// CHECK-NOT: gpu.known_grid_size
		gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %arg0, %grid_y = %arg0,
		%grid_z = %arg0)
		threads(%tx, %ty, %tz) in (%block_x = %arg0, %block_y = %arg0,
		%block_z = %arg0) {
		gpu.terminator
		}
		return
		}

		// CHECK-DL-LABEL: gpu.module @non_constant_launches_kernel attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<index, 32 : i32>>}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPU] Add known_block_size and known_grid_size to gpu.funcClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 482196

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp

mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp

mlir/test/Dialect/GPU/int-range-interface.mlir

mlir/test/Dialect/GPU/invalid.mlir

mlir/test/Dialect/GPU/outlining.mlir

[mlir][GPU] Add known_block_size and known_grid_size to gpu.func
ClosedPublic