This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/GPU/IR/
-
mlir/
-
Dialect/
-
GPU/
-
IR/
-
GPUCompilationAttr.td
2/4
GPUOps.td
-
lib/Dialect/GPU/
-
Dialect/
-
GPU/
-
CMakeLists.txt
-
IR/
1/4
GPUDialect.cpp
-
Targets/
1/2
ObjectHandler.cpp
-
test/Dialect/GPU/
-
Dialect/
-
GPU/
-
invalid.mlir

Differential D154147

[mlir][gpu] Add the Select Object compilation attribute.
ClosedPublic

Authored by fmorac on Jun 29 2023, 1:37 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman
bondhugula
ThomasRaoux
mehdi_amini
nicolasvasilache
herhut
ftynse
dcaballe

Commits

rG8ae074b19597: [mlir][gpu] Add the Select Object compilation attribute.

Summary

For an explanation of these patches see D154153.

Commit message:
This patch adds the default offloading handler for GPU binary ops: #gpu.select_object,
it selects the object to embed based on an index or a target attribute, embedding
the object as a global string and launches the kernel using the scheme used in the
GPU to LLVM pass.

Depends on D154137

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fmorac created this revision.Jun 29 2023, 1:37 PM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptJun 29 2023, 1:37 PM

Herald added a reviewer: bondhugula. · View Herald Transcript

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 22 others. · View Herald Transcript

fmorac added a child revision: D154149: [mlir][gpu] Add the `gpu-module-to-binary` pass..Jun 29 2023, 1:53 PM

Harbormaster completed remote builds in B242227: Diff 535981.Jun 29 2023, 3:23 PM

Rebasing.

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 6:19 PM

fmorac added a reviewer: mehdi_amini.

fmorac mentioned this in D154153: [mlir][gpu] Update GPU translation to accept binaries..Jun 29 2023, 7:03 PM

Harbormaster completed remote builds in B242302: Diff 536076.Jun 29 2023, 9:56 PM

fmorac published this revision for review.Jun 30 2023, 6:30 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJun 30 2023, 6:30 AM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

In D154147#4465163, @krzysz00 wrote:

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

With this particular attribute, no.

However If you add the relevant runtime functions and create your own ObjectManager attribute -see D154108 for the interface, you could definitely do it. It would look something like:

gpu.binary @myobject <#mydialect.runtime_select_object> [object0, object1]

I even think it would be possible to have AMD and NVIDIA targets all packed in a single IR, and perform dispatching based on the GPU at execution time. The ObjectManager attribute interface leaves the room open for all of this.

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

That would make the compiler all but stateless, which is something I would be strongly against.

In D154147#4465656, @mehdi_amini wrote:

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

That would make the compiler all but stateless, which is something I would be strongly against.

I think we were talking about something along the lines of fat binaries. Like implementing a fatbin Object Manager attribute, something like:

gpu.binary @name <#gpu.fatbin> [#gpu.object<#gpu.amdgpu<chip = "gfx90a">, "BLOB">, #gpu.object<#gpu.amdgpu<chip = "gfx940">, "BLOB">, #gpu.object<#gpu.nvptx<chip = "sm_80">, "BLOB">]

That when translated down to LLVM IR embeds all the objects together in a single image. And instead of calling mgpuModuleLoad we have something like mgpuFatbinLoad and mgpuFatbinGetFunction loading the correct function at execution time depending on the detected GPU.

The above scheme is possible with Object Manager attributes.

Yeah, I was thinking something more like fat binaries here.

Looks reasonable to me

Rebasing.

Harbormaster completed remote builds in B247413: Diff 543193.Jul 22 2023, 9:21 AM

mehdi_amini added inline comments.Jul 24 2023, 12:27 AM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122	It's not clear to me when it makes sense to have multiple gpu.object when there is a select?

fmorac mentioned this in D154132: [mlir][gpu] Add `gpu.binary` op and `#gpu.object` attribute..Jul 24 2023, 5:26 AM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122	The `SelectObjectManager` is just the default manager, but you can have `FatBinManager` in which it makes sense to have multiple objects. As to why `select_object` is the way it is, is to provide a sort of similar method to that of `gpu-binary-annotation` in the existing implementation of `gpu-to-llvm` where you can choose the binary: gpu.module @kernel_module attributes { nvvm.cubin = "CUBIN", rocdl.hsaco = "HSACO" }

mehdi_amini added inline comments.Jul 24 2023, 11:05 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122	How would the select be used? The gpu-binary-annotation is an input to a transformation and not an IR construct.

fmorac added inline comments.Jul 25 2023, 5:15 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122	I thought of 2 uses cases: having a pass that lets you specify the target and it changes the selected object. the option of manually editing the selected object without having to do heavy edits to a file (like removing multiple objects).

More nitpicking than actual issues - this translation seems reasonable

mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp
103	It feels weird to be declaring things into the `llvm` namespace from `mlir/`. Maybe this'd make sense in `mlir::LLVM` with a `using`? I might also be off about how this translation code works

fmorac added inline comments.Aug 2 2023, 10:32 AM

mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp
103	The issue is conflicting names in namespaces, so I cannot have `using mlir` & using `llvm`, inside translation it made more sense to use `llvm` to fix this issue as it in its majority LLVM API.

Moved the SelectObject implementation to the GPU translation library & added registration calls.

TODO:
The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms.
One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises.

Herald added a reviewer: ftynse. · View Herald TranscriptAug 7 2023, 5:42 AM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, awarzynski. · View Herald Transcript

Harbormaster completed remote builds in B250754: Diff 547744.Aug 7 2023, 7:29 AM

mehdi_amini added inline comments.Aug 8 2023, 10:17 PM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
160	Isn't it an Attribute Interface? I'm confused what it means on the dialect here? Does it provide some fallback or something?
1609	cast?

fmorac added inline comments.Aug 9 2023, 7:46 AM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
160	Yes, it's an attribute interface, however to not bundle LLVM libs into the GPUDialect lib and being able to put the implementation of the translation on the GPU Translation To LLVM lib, I have to register the interface as a promised interface. As far as I could tell, the promised interface mechanism doesn't enable the granularity to say that something it's a promised interface for an specific attribute, it just promises an interface, that's why I listed it in the TODO on my previous comment: The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms. One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises.

mehdi_amini added inline comments.Aug 9 2023, 9:52 PM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
160	I don't understand what it means to promise this interface here actually? What happens if we just remove this line without changing anything else?

Remove the promised interface, instead use OffloadingTranslationAttrTrait in SelectObjectAttr
and register the interface automatically with the GPUTranslationInterface.

Harbormaster completed remote builds in B251913: Diff 549332.Aug 11 2023, 6:06 AM

mehdi_amini accepted this revision.Aug 11 2023, 11:09 AM

This revision is now accepted and ready to land.Aug 11 2023, 11:09 AM

Closed by commit rG8ae074b19597: [mlir][gpu] Add the Select Object compilation attribute. (authored by fmorac). · Explain WhyAug 11 2023, 3:00 PM

This revision was automatically updated to reflect the committed changes.

fmorac added a commit: rG8ae074b19597: [mlir][gpu] Add the Select Object compilation attribute..

fmorac mentioned this in rGb43068e8707d: [mlir][gpu] Update GPU translation to accept binaries..Aug 11 2023, 5:29 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUCompilationAttr.td

27 lines

GPUOps.td

9 lines

lib/

Dialect/

GPU/

CMakeLists.txt

1 line

IR/

GPUDialect.cpp

17 lines

Targets/

ObjectHandler.cpp

350 lines

test/

Dialect/

GPU/

invalid.mlir

14 lines

Diff 543193

mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	bool $cppClass::getUnsafeMath() const {
return hasFlag("unsafe_math");		return hasFlag("unsafe_math");
}		}
bool $cppClass::getCorrectSqrt() const {		bool $cppClass::getCorrectSqrt() const {
return !hasFlag("unsafe_sqrt");		return !hasFlag("unsafe_sqrt");
}		}
}];		}];
}		}

		//===----------------------------------------------------------------------===//
		// GPU object manager attributes.
		//===----------------------------------------------------------------------===//

		def GPU_SelectObjectAttr : GPU_Attr<"SelectObject", "select_object", [
		DeclareAttrInterfaceMethods<GPUObjectManagerAttrInterface, [
		"embedBinary"
		]>
		]> {
		let description = [{
		This GPU object manager selects a single GPU object for embedding. The
		object is selected based on the `target` parameter, this parameter can be
		either a number -i.e. selects the ith-target, or the target itself -i.e.
		searches for the specified target in the object array.

		If no target is given, it selects the first object in the array.
		}];
		let parameters = (ins
		OptionalParameter<"Attribute", "Object target to embed.">:
		$target
		);
		let assemblyFormat = [{
		(`<` $target^ `>`)?
		}];
		let genVerifyDecl = 1;
		}

#endif // GPU_COMPILATIONATTR		#endif // GPU_COMPILATIONATTR

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 1,103 Lines • ▼ Show 20 Lines	let description = [{
GPU binaries provide a semantic mechanism for storing GPU objects,		GPU binaries provide a semantic mechanism for storing GPU objects,
e.g. the result of compiling a GPU module to an object file.		e.g. the result of compiling a GPU module to an object file.

This operation has 3 arguments:		This operation has 3 arguments:
- The name of the binary.		- The name of the binary.
- An attribute implementing the `ObjectManagerAttrInterface` interface.		- An attribute implementing the `ObjectManagerAttrInterface` interface.
- An array of GPU object attributes.		- An array of GPU object attributes.

		If no `objectManager` is present in the assembly format, then an implicit
		`#gpu.select_object` attribute is added.

		Examples:
		1. GPU binary with implicit `#gpu.select_object` object manager attribute.
```		```
gpu.binary @myobject [#gpu.object<...>, #gpu.object<...>]		gpu.binary @myobject [#gpu.object<...>, #gpu.object<...>]
```		```
		2. GPU binary with an explicit `#gpu.select_object<1>` object manager attribute.
		```
		gpu.binary @myobject #gpu.select_object<1> [#gpu.object<...>, #gpu.object<...>]
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions It's not clear to me when it makes sense to have multiple gpu.object when there is a select? mehdi_amini: It's not clear to me when it makes sense to have multiple gpu.object when there is a select?
		fmoracAuthorUnsubmitted Done Reply Inline Actions The `SelectObjectManager` is just the default manager, but you can have `FatBinManager` in which it makes sense to have multiple objects. As to why `select_object` is the way it is, is to provide a sort of similar method to that of `gpu-binary-annotation` in the existing implementation of `gpu-to-llvm` where you can choose the binary: gpu.module @kernel_module attributes { nvvm.cubin = "CUBIN", rocdl.hsaco = "HSACO" } fmorac: The `SelectObjectManager` is just the default manager, but you can have `FatBinManager` in…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions How would the select be used? The gpu-binary-annotation is an input to a transformation and not an IR construct. mehdi_amini: How would the select be used? The gpu-binary-annotation is an input to a transformation and not…
		fmoracAuthorUnsubmitted Done Reply Inline Actions I thought of 2 uses cases: having a pass that lets you specify the target and it changes the selected object. the option of manually editing the selected object without having to do heavy edits to a file (like removing multiple objects). fmorac: I thought of 2 uses cases: 1. having a pass that lets you specify the target and it changes the…
		```
}];		}];
let builders = [		let builders = [
OpBuilder<(ins "StringRef":$name, "Attribute":$objectManager,		OpBuilder<(ins "StringRef":$name, "Attribute":$objectManager,
"ArrayAttr":$objects)>		"ArrayAttr":$objects)>
];		];
let skipDefaultBuilders = 1;		let skipDefaultBuilders = 1;
let assemblyFormat = [{		let assemblyFormat = [{
$sym_name custom<ObjectManager>($objectManager) attr-dict-with-keyword $objects		$sym_name custom<ObjectManager>($objectManager) attr-dict-with-keyword $objects
▲ Show 20 Lines • Show All 1,068 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/CMakeLists.txt

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	add_mlir_dialect_library(MLIRGPUDialect

PRIVATE		PRIVATE
MLIRGPUTargets		MLIRGPUTargets
)		)

add_mlir_dialect_library(MLIRGPUTargets		add_mlir_dialect_library(MLIRGPUTargets
Targets/AMDGPUTarget.cpp		Targets/AMDGPUTarget.cpp
Targets/NVPTXTarget.cpp		Targets/NVPTXTarget.cpp
		Targets/ObjectHandler.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU		${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU

LINK_COMPONENTS		LINK_COMPONENTS
Core		Core
MC		MC
Target		Target
▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	#define GET_OP_LIST			#define GET_OP_LIST
	#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"			#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"
	>();			>();
	addAttributes<			addAttributes<
	#define GET_ATTRDEF_LIST			#define GET_ATTRDEF_LIST
	#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"			#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"
	>();			>();
	addInterfaces<GPUInlinerInterface>();			addInterfaces<GPUInlinerInterface>();
	}			}
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Isn't it an Attribute Interface? I'm confused what it means on the dialect here? Does it provide some fallback or something? mehdi_amini: Isn't it an Attribute Interface? I'm confused what it means on the dialect here? Does it…
				fmoracAuthorUnsubmitted Done Reply Inline Actions Yes, it's an attribute interface, however to not bundle LLVM libs into the GPUDialect lib and being able to put the implementation of the translation on the GPU Translation To LLVM lib, I have to register the interface as a promised interface. As far as I could tell, the promised interface mechanism doesn't enable the granularity to say that something it's a promised interface for an specific attribute, it just promises an interface, that's why I listed it in the TODO on my previous comment: The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms. One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises. fmorac: Yes, it's an attribute interface, however to not bundle LLVM libs into the GPUDialect lib and…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I don't understand what it means to promise this interface here actually? What happens if we just remove this line without changing anything else? mehdi_amini: I don't understand what it means to promise this interface here actually? What happens if we…

	static std::string getSparseHandleKeyword(SparseHandleKind kind) {			static std::string getSparseHandleKeyword(SparseHandleKind kind) {
	switch (kind) {			switch (kind) {
	case SparseHandleKind::DnTensor:			case SparseHandleKind::DnTensor:
	return "sparse.dntensor_handle";			return "sparse.dntensor_handle";
	case SparseHandleKind::SpMat:			case SparseHandleKind::SpMat:
	return "sparse.spmat_handle";			return "sparse.spmat_handle";
	}			}
	▲ Show 20 Lines • Show All 1,427 Lines • ▼ Show 20 Lines
	}			}

	void BinaryOp::build(OpBuilder &builder, OperationState &result, StringRef name,			void BinaryOp::build(OpBuilder &builder, OperationState &result, StringRef name,
	Attribute manager, ArrayAttr objects) {			Attribute manager, ArrayAttr objects) {
	auto &properties = result.getOrAddProperties<Properties>();			auto &properties = result.getOrAddProperties<Properties>();
	result.attributes.push_back(builder.getNamedAttr(			result.attributes.push_back(builder.getNamedAttr(
	SymbolTable::getSymbolAttrName(), builder.getStringAttr(name)));			SymbolTable::getSymbolAttrName(), builder.getStringAttr(name)));
	properties.objects = objects;			properties.objects = objects;
				if (manager)
	properties.objectManager = manager;			properties.objectManager = manager;
				else
				properties.objectManager = builder.getAttr<SelectObjectAttr>(nullptr);
	}			}

				mehdi_aminiUnsubmitted Not Done Reply Inline Actions cast? mehdi_amini: cast?
	static ParseResult parseObjectManager(OpAsmParser &parser,			static ParseResult parseObjectManager(OpAsmParser &parser,
	Attribute &objectManager) {			Attribute &objectManager) {
				if (succeeded(parser.parseOptionalLess())) {
	if (parser.parseAttribute(objectManager))			if (parser.parseAttribute(objectManager))
	return failure();			return failure();
				if (parser.parseGreater())
				return failure();
				}
				if (!objectManager)
				objectManager = parser.getBuilder().getAttr<SelectObjectAttr>(nullptr);
	return success();			return success();
	}			}

	static void printObjectManager(OpAsmPrinter &printer, Operation *op,			static void printObjectManager(OpAsmPrinter &printer, Operation *op,
	Attribute objectManager) {			Attribute objectManager) {
	if (objectManager)			if (objectManager != SelectObjectAttr::get(op->getContext(), nullptr))
	printer << '<' << objectManager << '>';			printer << '<' << objectManager << '>';
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GPUMemcpyOp			// GPUMemcpyOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	LogicalResult MemcpyOp::verify() {			LogicalResult MemcpyOp::verify() {
	▲ Show 20 Lines • Show All 299 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp

This file was added.

				//===- ObjectHandler.cpp - Implements base ObjectManager attributes -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements base ObjectManager attributes, like the default
				// SelectObject attribute.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/GPU/IR/GPUDialect.h"

				#include "mlir/Target/LLVMIR/Export.h"
				#include "mlir/Target/LLVMIR/ModuleTranslation.h"

				#include "llvm/IR/Constants.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Support/FormatVariadic.h"

				using namespace mlir;

				namespace {
				std::string getBinaryIdentifier(StringRef moduleName) {
				return moduleName.str() + "_bin_cst";
				}
				} // namespace

				LogicalResult
				gpu::SelectObjectAttr::verify(function_ref<InFlightDiagnostic()> emitError,
				Attribute target) {
				// Check `target`, it can be null, an integer attr or an attr implementing
				// `TargetAttrInterface`.
				if (target) {
				if (auto intAttr = mlir::dyn_cast<IntegerAttr>(target)) {
				if (intAttr.getInt() < 0) {
				return emitError() << "The object index must be positive.";
				}
				} else if (!target.hasTrait<TargetAttrInterface::Trait>()) {
				return emitError()
				<< "The target attribute must implement the `TargetAttrInterface` "
				"interface or be an `IntegerAttr`.";
				}
				}
				return success();
				}

				LogicalResult gpu::SelectObjectAttr::embedBinary(
				Operation *op, llvm::IRBuilderBase &builder,
				LLVM::ModuleTranslation &moduleTranslation) const {

				auto binaryOp = mlir::dyn_cast<BinaryOp>(op);
				assert(binaryOp && "Op is not a BinaryOp.");

				ArrayRef<Attribute> objects = binaryOp.getObjectsAttr().getValue();

				// Obtain the index of the object to select.
				int64_t index = -1;
				if (Attribute target = getTarget()) {
				// If the target attribute is a number it is the index. Otherwise compare
				// the attribute to every target inside the object array to find the index.
				if (auto indexAttr = mlir::dyn_cast<IntegerAttr>(target)) {
				index = indexAttr.getInt();
				} else {
				for (auto [i, attr] : llvm::enumerate(objects)) {
				auto obj = mlir::dyn_cast<ObjectAttr>(attr);
				if (obj.getTarget() == target) {
				index = i;
				}
				}
				}
				} else {
				// If the target attribute is null then it's selecting the first object in
				// the object array.
				index = 0;
				}

				if (index < 0 \|\| index >= static_cast<int64_t>(objects.size())) {
				op->emitError("The requested target object couldn't be found.");
				return failure();
				}
				ObjectAttr object = mlir::dyn_cast<ObjectAttr>(objects[index]);

				llvm::Module *module = moduleTranslation.getLLVMModule();

				// Embed the object as a global string.
				llvm::Constant *binary = llvm::ConstantDataArray::getString(
				builder.getContext(), object.getObject().getValue(), false);
				llvm::GlobalVariable *serializedObj =
				new llvm::GlobalVariable(*module, binary->getType(), true,
				llvm::GlobalValue::LinkageTypes::InternalLinkage,
				binary, getBinaryIdentifier(binaryOp.getName()));
				serializedObj->setLinkage(llvm::GlobalValue::LinkageTypes::InternalLinkage);
				serializedObj->setAlignment(llvm::MaybeAlign(8));
				serializedObj->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::None);
				return success();
				}

				namespace llvm {
				krzysz00Unsubmitted Not Done Reply Inline Actions It feels weird to be declaring things into the `llvm` namespace from `mlir/`. Maybe this'd make sense in `mlir::LLVM` with a `using`? I might also be off about how this translation code works krzysz00: It feels weird to be declaring things into the `llvm` namespace from `mlir/`. Maybe this'd make…
				fmoracAuthorUnsubmitted Done Reply Inline Actions The issue is conflicting names in namespaces, so I cannot have `using mlir` & using `llvm`, inside translation it made more sense to use `llvm` to fix this issue as it in its majority LLVM API. fmorac: The issue is conflicting names in namespaces, so I cannot have `using mlir` & using `llvm`…
				namespace {
				class LaunchKernel {
				public:
				LaunchKernel(Module &module, IRBuilderBase &builder,
				mlir::LLVM::ModuleTranslation &moduleTranslation);
				// Get the kernel launch callee.
				FunctionCallee getKernelLaunchFn();

				// Get the module function callee.
				FunctionCallee getModuleFunctionFn();

				// Get the module load callee.
				FunctionCallee getModuleLoadFn();

				// Get the module unload callee.
				FunctionCallee getModuleUnloadFn();

				// Get the stream create callee.
				FunctionCallee getStreamCreateFn();

				// Get the stream destroy callee.
				FunctionCallee getStreamDestroyFn();

				// Get the stream sync callee.
				FunctionCallee getStreamSyncFn();

				// Ger or create the function name global string.
				Value *getOrCreateFunctionName(StringRef moduleName, StringRef kernelName);

				// Create the void* kernel array for passing the arguments.
				Value *createKernelArgArray(mlir::gpu::LaunchFuncOp op);

				// Create the full kernel launch.
				mlir::LogicalResult createKernelLaunch(mlir::gpu::LaunchFuncOp op);

				private:
				Module &module;
				IRBuilderBase &builder;
				mlir::LLVM::ModuleTranslation &moduleTranslation;
				Type *i32Ty{};
				Type *voidTy{};
				Type *intPtrTy{};
				PointerType *ptrTy{};
				};
				} // namespace
				} // namespace llvm

				llvm::LaunchKernel::LaunchKernel(
				Module &module, IRBuilderBase &builder,
				mlir::LLVM::ModuleTranslation &moduleTranslation)
				: module(module), builder(builder), moduleTranslation(moduleTranslation) {
				i32Ty = builder.getInt32Ty();
				ptrTy = builder.getPtrTy(0);
				voidTy = builder.getVoidTy();
				intPtrTy = builder.getIntPtrTy(module.getDataLayout());
				}

				llvm::FunctionCallee llvm::LaunchKernel::getKernelLaunchFn() {
				return module.getOrInsertFunction(
				"mgpuLaunchKernel",
				FunctionType::get(
				voidTy,
				ArrayRef<Type *>({ptrTy, intPtrTy, intPtrTy, intPtrTy, intPtrTy,
				intPtrTy, intPtrTy, i32Ty, ptrTy, ptrTy, ptrTy}),
				false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getModuleFunctionFn() {
				return module.getOrInsertFunction(
				"mgpuModuleGetFunction",
				FunctionType::get(ptrTy, ArrayRef<Type *>({ptrTy, ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getModuleLoadFn() {
				return module.getOrInsertFunction(
				"mgpuModuleLoad",
				FunctionType::get(ptrTy, ArrayRef<Type *>({ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getModuleUnloadFn() {
				return module.getOrInsertFunction(
				"mgpuModuleUnload",
				FunctionType::get(voidTy, ArrayRef<Type *>({ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getStreamCreateFn() {
				return module.getOrInsertFunction("mgpuStreamCreate",
				FunctionType::get(ptrTy, false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getStreamDestroyFn() {
				return module.getOrInsertFunction(
				"mgpuStreamDestroy",
				FunctionType::get(voidTy, ArrayRef<Type *>({ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getStreamSyncFn() {
				return module.getOrInsertFunction(
				"mgpuStreamSynchronize",
				FunctionType::get(voidTy, ArrayRef<Type *>({ptrTy}), false));
				}

				// Generates an LLVM IR dialect global that contains the name of the given
				// kernel function as a C string, and returns a pointer to its beginning.
				llvm::Value *llvm::LaunchKernel::getOrCreateFunctionName(StringRef moduleName,
				StringRef kernelName) {
				std::string globalName =
				std::string(formatv("{0}_{1}_kernel_name", moduleName, kernelName));

				if (GlobalVariable *gv = module.getGlobalVariable(globalName))
				return gv;

				return builder.CreateGlobalString(kernelName, globalName);
				}

				// Creates a struct containing all kernel parameters on the stack and returns
				// an array of type-erased pointers to the fields of the struct. The array can
				// then be passed to the CUDA / ROCm (HIP) kernel launch calls.
				// The generated code is essentially as follows:
				//
				// %struct = alloca(sizeof(struct { Parameters... }))
				// %array = alloca(NumParameters * sizeof(void *))
				// for (i : [0, NumParameters))
				// %fieldPtr = llvm.getelementptr %struct[0, i]
				// llvm.store parameters[i], %fieldPtr
				// %elementPtr = llvm.getelementptr %array[i]
				// llvm.store %fieldPtr, %elementPtr
				// return %array
				llvm::Value *
				llvm::LaunchKernel::createKernelArgArray(mlir::gpu::LaunchFuncOp op) {
				SmallVector<Value *> args =
				moduleTranslation.lookupValues(op.getKernelOperands());
				SmallVector<Type *> structTypes(args.size(), nullptr);

				for (auto [i, arg] : llvm::enumerate(args))
				structTypes[i] = arg->getType();

				Type *structTy = StructType::create(module.getContext(), structTypes);
				Value *argStruct = builder.CreateAlloca(structTy, 0u);
				Value *argArray = builder.CreateAlloca(
				ptrTy, ConstantInt::get(intPtrTy, structTypes.size()));

				for (auto [i, arg] : enumerate(args)) {
				Value *structMember = builder.CreateStructGEP(structTy, argStruct, i);
				builder.CreateStore(arg, structMember);
				Value *arrayMember = builder.CreateConstGEP1_32(ptrTy, argArray, i);
				builder.CreateStore(structMember, arrayMember);
				}
				return argArray;
				}

				// Emits LLVM IR to launch a kernel function. Expects the module that contains
				// the compiled kernel function as a cubin in the 'nvvm.cubin' attribute, or a
				// hsaco in the 'rocdl.hsaco' attribute of the kernel function in the IR.
				//
				// %0 = call %binarygetter
				// %1 = call %moduleLoad(%0)
				// %2 = <see generateKernelNameConstant>
				// %3 = call %moduleGetFunction(%1, %2)
				// %4 = call %streamCreate()
				// %5 = <see generateParamsArray>
				// call %launchKernel(%3, <launchOp operands 0..5>, 0, %4, %5, nullptr)
				// call %streamSynchronize(%4)
				// call %streamDestroy(%4)
				// call %moduleUnload(%1)
				mlir::LogicalResult
				llvm::LaunchKernel::createKernelLaunch(mlir::gpu::LaunchFuncOp op) {
				auto llvmValue = [&](mlir::Value value) -> Value * {
				Value *v = moduleTranslation.lookupValue(value);
				assert(v && "Value has not been translated.");
				return v;
				};

				// Get grid dimensions.
				mlir::gpu::KernelDim3 grid = op.getGridSizeOperandValues();
				Value gx = llvmValue(grid.x), gy = llvmValue(grid.y),
				*gz = llvmValue(grid.z);

				// Get block dimensions.
				mlir::gpu::KernelDim3 block = op.getBlockSizeOperandValues();
				Value bx = llvmValue(block.x), by = llvmValue(block.y),
				*bz = llvmValue(block.z);

				// Get dynamic shared memory size.
				Value *dynamicMemorySize = nullptr;
				if (mlir::Value dynSz = op.getDynamicSharedMemorySize())
				dynamicMemorySize = llvmValue(dynSz);
				else
				dynamicMemorySize = ConstantInt::get(i32Ty, 0);

				// Create the argument array.
				Value *argArray = createKernelArgArray(op);

				// Load the kernel module.
				StringRef moduleName = op.getKernelModuleName().getValue();
				std::string binaryIdentifier = getBinaryIdentifier(moduleName);
				Value *binary = module.getGlobalVariable(binaryIdentifier, true);
				if (!binary)
				return op.emitError() << "Couldn't find the binary: " << binaryIdentifier;
				Value *moduleObject = builder.CreateCall(getModuleLoadFn(), {binary});

				// Load the kernel function.
				Value *moduleFunction = builder.CreateCall(
				getModuleFunctionFn(),
				{moduleObject,
				getOrCreateFunctionName(moduleName, op.getKernelName().getValue())});

				// Get the stream to use for execution. If there's no async object then create
				// a stream to make a synchronous kernel launch.
				Value *stream = nullptr;
				bool handleStream = false;
				if (mlir::Value asyncObject = op.getAsyncObject()) {
				stream = llvmValue(asyncObject);
				} else {
				handleStream = true;
				stream = builder.CreateCall(getStreamCreateFn(), {});
				}

				// Create the launch call.
				Value *nullPtr = ConstantPointerNull::get(ptrTy);
				builder.CreateCall(
				getKernelLaunchFn(),
				ArrayRef<Value *>({moduleFunction, gx, gy, gz, bx, by, bz,
				dynamicMemorySize, stream, argArray, nullPtr}));

				// Sync & destroy the stream, for synchronous launches.
				if (handleStream) {
				builder.CreateCall(getStreamSyncFn(), {stream});
				builder.CreateCall(getStreamDestroyFn(), {stream});
				}

				// Unload the kernel module.
				builder.CreateCall(getModuleUnloadFn(), {moduleObject});

				return success();
				}

				LogicalResult gpu::SelectObjectAttr::launchKernel(
				Operation launchFuncOperation, Operation binaryOperation,
				llvm::IRBuilderBase &builder,
				LLVM::ModuleTranslation &moduleTranslation) const {
				llvm::Module *module = moduleTranslation.getLLVMModule();
				auto launchFuncOp = mlir::dyn_cast<LaunchFuncOp>(launchFuncOperation);
				assert(launchFuncOp && "Op is not a LaunchFuncOp.");
				return llvm::LaunchKernel(*module, builder, moduleTranslation)
				.createKernelLaunch(launchFuncOp);
				}

mlir/test/Dialect/GPU/invalid.mlir

	Show First 20 Lines • Show All 620 Lines • ▼ Show 20 Lines

	// -----			// -----

	module {			module {
	// expected-error @+1 {{'gpu.module' op attribute 'targets' failed to satisfy constraint: Array of GPU target attributes with at least 1 elements}}			// expected-error @+1 {{'gpu.module' op attribute 'targets' failed to satisfy constraint: Array of GPU target attributes with at least 1 elements}}
	gpu.module @gpu_funcs [1] {			gpu.module @gpu_funcs [1] {
	}			}
	}			}

				// -----

				module {
				// expected-error @+1 {{'gpu.binary' op attribute 'objects' failed to satisfy constraint: An array of GPU object attributes with at least 1 elements}}
				gpu.binary @binary []
				}

				// -----

				module {
				// expected-error @+1 {{gpu.binary' op attribute 'objectManager' failed to satisfy constraint: any attribute Attribute implementing the `ObjectManagerAttrInterface` interface.}}
				gpu.binary @binary <1> [#gpu.object<#gpu.nvptx, "">]
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add the Select Object compilation attribute.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 543193

mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp

mlir/test/Dialect/GPU/invalid.mlir

[mlir][gpu] Add the Select Object compilation attribute.
ClosedPublic