This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/GPU/IR/
-
GPU/
-
IR/
-
CompilationAttrs.td
-
GPUDialect.h
2/4
GPUOps.td
-
InitAllDialects.h
-
lib/
-
Dialect/GPU/IR/
-
GPU/
-
IR/
1/4
GPUDialect.cpp
-
Target/LLVMIR/Dialect/GPU/
-
LLVMIR/
-
Dialect/
-
GPU/
-
CMakeLists.txt
-
GPUToLLVMIRTranslation.cpp
-
SelectObjectAttr.cpp
-
test/Dialect/GPU/
-
Dialect/
-
GPU/
-
invalid.mlir
-
ops.mlir

Differential D154147

[mlir][gpu] Add the Select Object compilation attribute.
ClosedPublic

Authored by fmorac on Jun 29 2023, 1:37 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman
bondhugula
ThomasRaoux
mehdi_amini
nicolasvasilache
herhut
ftynse
dcaballe

Commits

rG8ae074b19597: [mlir][gpu] Add the Select Object compilation attribute.

Summary

For an explanation of these patches see D154153.

Commit message:
This patch adds the default offloading handler for GPU binary ops: #gpu.select_object,
it selects the object to embed based on an index or a target attribute, embedding
the object as a global string and launches the kernel using the scheme used in the
GPU to LLVM pass.

Depends on D154137

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fmorac created this revision.Jun 29 2023, 1:37 PM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptJun 29 2023, 1:37 PM

Herald added a reviewer: bondhugula. · View Herald Transcript

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 22 others. · View Herald Transcript

fmorac added a child revision: D154149: [mlir][gpu] Add the `gpu-module-to-binary` pass..Jun 29 2023, 1:53 PM

Harbormaster completed remote builds in B242227: Diff 535981.Jun 29 2023, 3:23 PM

Rebasing.

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 6:19 PM

fmorac added a reviewer: mehdi_amini.

fmorac mentioned this in D154153: [mlir][gpu] Update GPU translation to accept binaries..Jun 29 2023, 7:03 PM

Harbormaster completed remote builds in B242302: Diff 536076.Jun 29 2023, 9:56 PM

fmorac published this revision for review.Jun 30 2023, 6:30 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJun 30 2023, 6:30 AM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

In D154147#4465163, @krzysz00 wrote:

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

With this particular attribute, no.

However If you add the relevant runtime functions and create your own ObjectManager attribute -see D154108 for the interface, you could definitely do it. It would look something like:

gpu.binary @myobject <#mydialect.runtime_select_object> [object0, object1]

I even think it would be possible to have AMD and NVIDIA targets all packed in a single IR, and perform dispatching based on the GPU at execution time. The ObjectManager attribute interface leaves the room open for all of this.

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

That would make the compiler all but stateless, which is something I would be strongly against.

In D154147#4465656, @mehdi_amini wrote:

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

That would make the compiler all but stateless, which is something I would be strongly against.

I think we were talking about something along the lines of fat binaries. Like implementing a fatbin Object Manager attribute, something like:

gpu.binary @name <#gpu.fatbin> [#gpu.object<#gpu.amdgpu<chip = "gfx90a">, "BLOB">, #gpu.object<#gpu.amdgpu<chip = "gfx940">, "BLOB">, #gpu.object<#gpu.nvptx<chip = "sm_80">, "BLOB">]

That when translated down to LLVM IR embeds all the objects together in a single image. And instead of calling mgpuModuleLoad we have something like mgpuFatbinLoad and mgpuFatbinGetFunction loading the correct function at execution time depending on the detected GPU.

The above scheme is possible with Object Manager attributes.

Yeah, I was thinking something more like fat binaries here.

Looks reasonable to me

Rebasing.

Harbormaster completed remote builds in B247413: Diff 543193.Jul 22 2023, 9:21 AM

mehdi_amini added inline comments.Jul 24 2023, 12:27 AM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1112	It's not clear to me when it makes sense to have multiple gpu.object when there is a select?

fmorac mentioned this in D154132: [mlir][gpu] Add `gpu.binary` op and `#gpu.object` attribute..Jul 24 2023, 5:26 AM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1112	The `SelectObjectManager` is just the default manager, but you can have `FatBinManager` in which it makes sense to have multiple objects. As to why `select_object` is the way it is, is to provide a sort of similar method to that of `gpu-binary-annotation` in the existing implementation of `gpu-to-llvm` where you can choose the binary: gpu.module @kernel_module attributes { nvvm.cubin = "CUBIN", rocdl.hsaco = "HSACO" }

mehdi_amini added inline comments.Jul 24 2023, 11:05 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1112	How would the select be used? The gpu-binary-annotation is an input to a transformation and not an IR construct.

fmorac added inline comments.Jul 25 2023, 5:15 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1112	I thought of 2 uses cases: having a pass that lets you specify the target and it changes the selected object. the option of manually editing the selected object without having to do heavy edits to a file (like removing multiple objects).

More nitpicking than actual issues - this translation seems reasonable

mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp
103 ↗	(On Diff #543193)	It feels weird to be declaring things into the `llvm` namespace from `mlir/`. Maybe this'd make sense in `mlir::LLVM` with a `using`? I might also be off about how this translation code works

fmorac added inline comments.Aug 2 2023, 10:32 AM

mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp
103 ↗	(On Diff #543193)	The issue is conflicting names in namespaces, so I cannot have `using mlir` & using `llvm`, inside translation it made more sense to use `llvm` to fix this issue as it in its majority LLVM API.

Moved the SelectObject implementation to the GPU translation library & added registration calls.

TODO:
The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms.
One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises.

Herald added a reviewer: ftynse. · View Herald TranscriptAug 7 2023, 5:42 AM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, awarzynski. · View Herald Transcript

Harbormaster completed remote builds in B250754: Diff 547744.Aug 7 2023, 7:29 AM

mehdi_amini added inline comments.Aug 8 2023, 10:17 PM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
211	Isn't it an Attribute Interface? I'm confused what it means on the dialect here? Does it provide some fallback or something?
1654	cast?

fmorac added inline comments.Aug 9 2023, 7:46 AM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
211	Yes, it's an attribute interface, however to not bundle LLVM libs into the GPUDialect lib and being able to put the implementation of the translation on the GPU Translation To LLVM lib, I have to register the interface as a promised interface. As far as I could tell, the promised interface mechanism doesn't enable the granularity to say that something it's a promised interface for an specific attribute, it just promises an interface, that's why I listed it in the TODO on my previous comment: The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms. One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises.

mehdi_amini added inline comments.Aug 9 2023, 9:52 PM

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
211	I don't understand what it means to promise this interface here actually? What happens if we just remove this line without changing anything else?

Remove the promised interface, instead use OffloadingTranslationAttrTrait in SelectObjectAttr
and register the interface automatically with the GPUTranslationInterface.

Harbormaster completed remote builds in B251913: Diff 549332.Aug 11 2023, 6:06 AM

mehdi_amini accepted this revision.Aug 11 2023, 11:09 AM

This revision is now accepted and ready to land.Aug 11 2023, 11:09 AM

Closed by commit rG8ae074b19597: [mlir][gpu] Add the Select Object compilation attribute. (authored by fmorac). · Explain WhyAug 11 2023, 3:00 PM

This revision was automatically updated to reflect the committed changes.

fmorac added a commit: rG8ae074b19597: [mlir][gpu] Add the Select Object compilation attribute..

fmorac mentioned this in rGb43068e8707d: [mlir][gpu] Update GPU translation to accept binaries..Aug 11 2023, 5:29 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

23 lines

4 lines

6 lines

1 line

lib/

Dialect/

GPU/

IR/

GPUDialect.cpp

49 lines

Target/

LLVMIR/

Dialect/

GPU/

CMakeLists.txt

1 line

GPUToLLVMIRTranslation.cpp

1 line

SelectObjectAttr.cpp

370 lines

test/

Dialect/

GPU/

invalid.mlir

2 lines

ops.mlir

9 lines

Diff 547744

mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td

Show All 33 Lines	def GPU_ObjectAttr : GPU_Attr<"Object", "object"> {
}];		}];
let parameters = (ins "TargetAttrInterface":$target, "StringAttr":$object);		let parameters = (ins "TargetAttrInterface":$target, "StringAttr":$object);
let assemblyFormat = [{`<` $target `,` $object `>`}];		let assemblyFormat = [{`<` $target `,` $object `>`}];
}		}

def GPUObjectArrayAttr :		def GPUObjectArrayAttr :
TypedArrayAttrBase<GPU_ObjectAttr, "an array of GPU object attributes">;		TypedArrayAttrBase<GPU_ObjectAttr, "an array of GPU object attributes">;

		//===----------------------------------------------------------------------===//
		// GPU offloading LLVM translation handler attributes.
		//===----------------------------------------------------------------------===//

		def GPU_SelectObjectAttr : GPU_Attr<"SelectObject", "select_object"> {
		let description = [{
		This GPU offloading handler selects a single GPU object for embedding. The
		object is selected based on the `target` parameter, this parameter can be
		either a number -i.e. selects the ith-target, or the target itself -i.e.
		searches for the specified target in the object array.

		The first object in a `gpu.binary` operation is selected if no target is
		specified.
		}];
		let parameters = (ins
		OptionalParameter<"Attribute", "Target to select for embedding.">:$target
		);
		let assemblyFormat = [{
		(`<` $target^ `>`)?
		}];
		let genVerifyDecl = 1;
		}

#endif // GPU_COMPILATION_ATTRS		#endif // GPU_COMPILATION_ATTRS

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	public:
using Base =		using Base =
typename Type::TypeBase<SparseHandleType<K>, Type, TypeStorage>::Base;		typename Type::TypeBase<SparseHandleType<K>, Type, TypeStorage>::Base;
using Base::Base;		using Base::Base;
};		};

using SparseDnTensorHandleType = SparseHandleType<SparseHandleKind::DnTensor>;		using SparseDnTensorHandleType = SparseHandleType<SparseHandleKind::DnTensor>;
using SparseSpMatHandleType = SparseHandleType<SparseHandleKind::SpMat>;		using SparseSpMatHandleType = SparseHandleType<SparseHandleKind::SpMat>;

		/// Registers offloading LLVM translation interfaces. TODO: Remove this
		/// function.
		void registerOffloadingLLVMTranslationInterfacesExternalModels(
		mlir::DialectRegistry &registry);
} // namespace gpu		} // namespace gpu
} // namespace mlir		} // namespace mlir

#include "mlir/Dialect/GPU/IR/GPUOpsEnums.h.inc"		#include "mlir/Dialect/GPU/IR/GPUOpsEnums.h.inc"

#include "mlir/Dialect/GPU/IR/GPUOpsDialect.h.inc"		#include "mlir/Dialect/GPU/IR/GPUOpsDialect.h.inc"

#include "mlir/Dialect/GPU/IR/GPUOpInterfaces.h.inc"		#include "mlir/Dialect/GPU/IR/GPUOpInterfaces.h.inc"
Show All 10 Lines

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 1,094 Lines • ▼ Show 20 Lines	let description = [{

During translation into LLVM, the offloading attribute will be called		During translation into LLVM, the offloading attribute will be called
for translating GPU binary and launch operations into LLVM instructions. If		for translating GPU binary and launch operations into LLVM instructions. If
no attribute is provided, the default handler selects the first object from		no attribute is provided, the default handler selects the first object from
the array and embeds it as a string.		the array and embeds it as a string.

Examples:		Examples:
```		```
		// Selects the first object.
gpu.binary @myobject [#gpu.object<...>, #gpu.object<...>]		gpu.binary @myobject [#gpu.object<...>, #gpu.object<...>]
		// Uses the `#foo.my_handler` for handling the binary during translation.
gpu.binary @myobject <#foo.my_handler> [#gpu.object<...>, #gpu.object<...>]		gpu.binary @myobject <#foo.my_handler> [#gpu.object<...>, #gpu.object<...>]
		// Selects the object with the `#rocdl.target` target attribute.
		gpu.binary @myobject <#gpu.select_object<#rocdl.target>> [#gpu.object<...>, #gpu.object<#rocdl.target, ...>]
```		```
}];		}];
let builders = [		let builders = [
OpBuilder<(ins "StringRef":$name,		OpBuilder<(ins "StringRef":$name,
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions It's not clear to me when it makes sense to have multiple gpu.object when there is a select? mehdi_amini: It's not clear to me when it makes sense to have multiple gpu.object when there is a select?
		fmoracAuthorUnsubmitted Done Reply Inline Actions The `SelectObjectManager` is just the default manager, but you can have `FatBinManager` in which it makes sense to have multiple objects. As to why `select_object` is the way it is, is to provide a sort of similar method to that of `gpu-binary-annotation` in the existing implementation of `gpu-to-llvm` where you can choose the binary: gpu.module @kernel_module attributes { nvvm.cubin = "CUBIN", rocdl.hsaco = "HSACO" } fmorac: The `SelectObjectManager` is just the default manager, but you can have `FatBinManager` in…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions How would the select be used? The gpu-binary-annotation is an input to a transformation and not an IR construct. mehdi_amini: How would the select be used? The gpu-binary-annotation is an input to a transformation and not…
		fmoracAuthorUnsubmitted Done Reply Inline Actions I thought of 2 uses cases: having a pass that lets you specify the target and it changes the selected object. the option of manually editing the selected object without having to do heavy edits to a file (like removing multiple objects). fmorac: I thought of 2 uses cases: 1. having a pass that lets you specify the target and it changes the…
"OffloadingLLVMTranslationAttrInterface":$offloadingHandler,		"OffloadingLLVMTranslationAttrInterface":$offloadingHandler,
"ArrayAttr":$objects)>,		"ArrayAttr":$objects)>,
OpBuilder<(ins "StringRef":$name,		OpBuilder<(ins "StringRef":$name,
"OffloadingLLVMTranslationAttrInterface":$offloadingHandler,		"OffloadingLLVMTranslationAttrInterface":$offloadingHandler,
"ArrayRef<Attribute>":$objects)>		"ArrayRef<Attribute>":$objects)>
];		];
let skipDefaultBuilders = 1;		let skipDefaultBuilders = 1;
let assemblyFormat = [{		let assemblyFormat = [{
$sym_name (`<` $offloadingHandler ^ `>`)? attr-dict $objects		$sym_name custom<OffloadingHandler>($offloadingHandler) attr-dict $objects
}];		}];
}		}

def GPU_HostRegisterOp : GPU_Op<"host_register">,		def GPU_HostRegisterOp : GPU_Op<"host_register">,
Arguments<(ins AnyUnrankedMemRef:$value)> {		Arguments<(ins AnyUnrankedMemRef:$value)> {
let summary = "Registers a memref for access from device.";		let summary = "Registers a memref for access from device.";
let description = [{		let description = [{
This op maps the provided host buffer into the device address space.		This op maps the provided host buffer into the device address space.
▲ Show 20 Lines • Show All 1,079 Lines • Show Last 20 Lines

mlir/include/mlir/InitAllDialects.h

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	inline void registerAllDialects(DialectRegistry &registry) {

// Register all external models.		// Register all external models.
affine::registerValueBoundsOpInterfaceExternalModels(registry);		affine::registerValueBoundsOpInterfaceExternalModels(registry);
arith::registerBufferizableOpInterfaceExternalModels(registry);		arith::registerBufferizableOpInterfaceExternalModels(registry);
arith::registerValueBoundsOpInterfaceExternalModels(registry);		arith::registerValueBoundsOpInterfaceExternalModels(registry);
bufferization::func_ext::registerBufferizableOpInterfaceExternalModels(		bufferization::func_ext::registerBufferizableOpInterfaceExternalModels(
registry);		registry);
builtin::registerCastOpInterfaceExternalModels(registry);		builtin::registerCastOpInterfaceExternalModels(registry);
		gpu::registerOffloadingLLVMTranslationInterfacesExternalModels(registry);
linalg::registerBufferizableOpInterfaceExternalModels(registry);		linalg::registerBufferizableOpInterfaceExternalModels(registry);
linalg::registerTilingInterfaceExternalModels(registry);		linalg::registerTilingInterfaceExternalModels(registry);
linalg::registerValueBoundsOpInterfaceExternalModels(registry);		linalg::registerValueBoundsOpInterfaceExternalModels(registry);
memref::registerBufferizableOpInterfaceExternalModels(registry);		memref::registerBufferizableOpInterfaceExternalModels(registry);
memref::registerRuntimeVerifiableOpInterfaceExternalModels(registry);		memref::registerRuntimeVerifiableOpInterfaceExternalModels(registry);
memref::registerValueBoundsOpInterfaceExternalModels(registry);		memref::registerValueBoundsOpInterfaceExternalModels(registry);
memref::registerMemorySlotExternalModels(registry);		memref::registerMemorySlotExternalModels(registry);
scf::registerBufferizableOpInterfaceExternalModels(registry);		scf::registerBufferizableOpInterfaceExternalModels(registry);
Show All 21 Lines

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

	Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines
	#define GET_OP_LIST			#define GET_OP_LIST
	#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"			#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"
	>();			>();
	addAttributes<			addAttributes<
	#define GET_ATTRDEF_LIST			#define GET_ATTRDEF_LIST
	#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"			#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"
	>();			>();
	addInterfaces<GPUInlinerInterface>();			addInterfaces<GPUInlinerInterface>();
				declarePromisedInterface<OffloadingLLVMTranslationAttrInterface>();
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Isn't it an Attribute Interface? I'm confused what it means on the dialect here? Does it provide some fallback or something? mehdi_amini: Isn't it an Attribute Interface? I'm confused what it means on the dialect here? Does it…
				fmoracAuthorUnsubmitted Done Reply Inline Actions Yes, it's an attribute interface, however to not bundle LLVM libs into the GPUDialect lib and being able to put the implementation of the translation on the GPU Translation To LLVM lib, I have to register the interface as a promised interface. As far as I could tell, the promised interface mechanism doesn't enable the granularity to say that something it's a promised interface for an specific attribute, it just promises an interface, that's why I listed it in the TODO on my previous comment: The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms. One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises. fmorac: Yes, it's an attribute interface, however to not bundle LLVM libs into the GPUDialect lib and…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I don't understand what it means to promise this interface here actually? What happens if we just remove this line without changing anything else? mehdi_amini: I don't understand what it means to promise this interface here actually? What happens if we…
	}			}

	static std::string getSparseHandleKeyword(SparseHandleKind kind) {			static std::string getSparseHandleKeyword(SparseHandleKind kind) {
	switch (kind) {			switch (kind) {
	case SparseHandleKind::DnTensor:			case SparseHandleKind::DnTensor:
	return "sparse.dntensor_handle";			return "sparse.dntensor_handle";
	case SparseHandleKind::SpMat:			case SparseHandleKind::SpMat:
	return "sparse.spmat_handle";			return "sparse.spmat_handle";
	▲ Show 20 Lines • Show All 1,421 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	void BinaryOp::build(OpBuilder &builder, OperationState &result, StringRef name,			void BinaryOp::build(OpBuilder &builder, OperationState &result, StringRef name,
	OffloadingLLVMTranslationAttrInterface offloadingHandler,			OffloadingLLVMTranslationAttrInterface offloadingHandler,
	ArrayAttr objects) {			ArrayAttr objects) {
	auto &properties = result.getOrAddProperties<Properties>();			auto &properties = result.getOrAddProperties<Properties>();
	result.attributes.push_back(builder.getNamedAttr(			result.attributes.push_back(builder.getNamedAttr(
	SymbolTable::getSymbolAttrName(), builder.getStringAttr(name)));			SymbolTable::getSymbolAttrName(), builder.getStringAttr(name)));
	properties.objects = objects;			properties.objects = objects;
				if (offloadingHandler) {
	properties.offloadingHandler = offloadingHandler;			properties.offloadingHandler = offloadingHandler;
				} else {
				auto offloadingHandler = builder.getAttr<SelectObjectAttr>(nullptr);
				properties.offloadingHandler =
				dyn_cast<OffloadingLLVMTranslationAttrInterface>(offloadingHandler);
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions cast? mehdi_amini: cast?
				}
	}			}

	void BinaryOp::build(OpBuilder &builder, OperationState &result, StringRef name,			void BinaryOp::build(OpBuilder &builder, OperationState &result, StringRef name,
	OffloadingLLVMTranslationAttrInterface offloadingHandler,			OffloadingLLVMTranslationAttrInterface offloadingHandler,
	ArrayRef<Attribute> objects) {			ArrayRef<Attribute> objects) {
	build(builder, result, name, offloadingHandler,			build(builder, result, name, offloadingHandler,
	objects.size() > 0 ? builder.getArrayAttr(objects) : ArrayAttr());			objects.size() > 0 ? builder.getArrayAttr(objects) : ArrayAttr());
	}			}

				static ParseResult parseOffloadingHandler(OpAsmParser &parser,
				Attribute &offloadingHandler) {
				if (succeeded(parser.parseOptionalLess())) {
				if (parser.parseAttribute(offloadingHandler))
				return failure();
				if (parser.parseGreater())
				return failure();
				}
				if (!offloadingHandler)
				offloadingHandler = parser.getBuilder().getAttr<SelectObjectAttr>(nullptr);
				return success();
				}

				static void printOffloadingHandler(OpAsmPrinter &printer, Operation *op,
				Attribute offloadingHandler) {
				if (offloadingHandler != SelectObjectAttr::get(op->getContext(), nullptr))
				printer << '<' << offloadingHandler << '>';
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GPUMemcpyOp			// GPUMemcpyOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	LogicalResult MemcpyOp::verify() {			LogicalResult MemcpyOp::verify() {
	auto srcType = getSrc().getType();			auto srcType = getSrc().getType();
	auto dstType = getDst().getType();			auto dstType = getDst().getType();

	▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	} // namespace			} // namespace

	void AllocOp::getCanonicalizationPatterns(RewritePatternSet &results,			void AllocOp::getCanonicalizationPatterns(RewritePatternSet &results,
	MLIRContext *context) {			MLIRContext *context) {
	results.add<SimplifyDimOfAllocOp>(context);			results.add<SimplifyDimOfAllocOp>(context);
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				// GPU select object attribute
				//===----------------------------------------------------------------------===//

				LogicalResult
				gpu::SelectObjectAttr::verify(function_ref<InFlightDiagnostic()> emitError,
				Attribute target) {
				// Check `target`, it can be null, an integer attr or a GPU Target attribute.
				if (target) {
				if (auto intAttr = mlir::dyn_cast<IntegerAttr>(target)) {
				if (intAttr.getInt() < 0) {
				return emitError() << "The object index must be positive.";
				}
				} else if (!(::mlir::isa<TargetAttrInterface>(target))) {
				return emitError()
				<< "The target attribute must be a GPU Target attribute.";
				}
				}
				return success();
				}

				//===----------------------------------------------------------------------===//
	// GPU target options			// GPU target options
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	TargetOptions::TargetOptions(StringRef toolkitPath,			TargetOptions::TargetOptions(StringRef toolkitPath,
	ArrayRef<std::string> linkFiles,			ArrayRef<std::string> linkFiles,
	StringRef cmdOptions,			StringRef cmdOptions,
	CompilationTarget compilationTarget)			CompilationTarget compilationTarget)
	: TargetOptions(TypeID::get<TargetOptions>(), toolkitPath, linkFiles,			: TargetOptions(TypeID::get<TargetOptions>(), toolkitPath, linkFiles,
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

mlir/lib/Target/LLVMIR/Dialect/GPU/CMakeLists.txt

	add_mlir_translation_library(MLIRGPUToLLVMIRTranslation			add_mlir_translation_library(MLIRGPUToLLVMIRTranslation
	GPUToLLVMIRTranslation.cpp			GPUToLLVMIRTranslation.cpp
				SelectObjectAttr.cpp

	LINK_COMPONENTS			LINK_COMPONENTS
	Core			Core

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRGPUDialect			MLIRGPUDialect
	MLIRLLVMDialect			MLIRLLVMDialect
	MLIRSupport			MLIRSupport
	MLIRTargetLLVMIRExport			MLIRTargetLLVMIRExport
	)			)

mlir/lib/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.cpp

	Show All 30 Lines

	} // namespace			} // namespace

	void mlir::registerGPUDialectTranslation(DialectRegistry &registry) {			void mlir::registerGPUDialectTranslation(DialectRegistry &registry) {
	registry.insert<gpu::GPUDialect>();			registry.insert<gpu::GPUDialect>();
	registry.addExtension(+[](MLIRContext ctx, gpu::GPUDialect dialect) {			registry.addExtension(+[](MLIRContext ctx, gpu::GPUDialect dialect) {
	dialect->addInterfaces<GPUDialectLLVMIRTranslationInterface>();			dialect->addInterfaces<GPUDialectLLVMIRTranslationInterface>();
	});			});
				gpu::registerOffloadingLLVMTranslationInterfacesExternalModels(registry);
	}			}

	void mlir::registerGPUDialectTranslation(MLIRContext &context) {			void mlir::registerGPUDialectTranslation(MLIRContext &context) {
	DialectRegistry registry;			DialectRegistry registry;
	registerGPUDialectTranslation(registry);			registerGPUDialectTranslation(registry);
	context.appendDialectRegistry(registry);			context.appendDialectRegistry(registry);
	}			}

mlir/lib/Target/LLVMIR/Dialect/GPU/SelectObjectAttr.cpp

This file was added.

				//===- ObjectHandler.cpp - Implements base ObjectManager attributes -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the `OffloadingLLVMTranslationAttrInterface` for the
				// `SelectObject` attribute.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/GPU/IR/GPUDialect.h"

				#include "mlir/Target/LLVMIR/Export.h"
				#include "mlir/Target/LLVMIR/ModuleTranslation.h"

				#include "llvm/IR/Constants.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Support/FormatVariadic.h"

				using namespace mlir;

				namespace {
				// Implementation of the `OffloadingLLVMTranslationAttrInterface` model.
				class SelectObjectAttrImpl
				: public gpu::OffloadingLLVMTranslationAttrInterface::FallbackModel<
				SelectObjectAttrImpl> {
				public:
				// Translates a `gpu.binary`, embedding the binary into a host LLVM module as
				// global binary string.
				LogicalResult embedBinary(Attribute attribute, Operation *operation,
				llvm::IRBuilderBase &builder,
				LLVM::ModuleTranslation &moduleTranslation) const;

				// Translates a `gpu.launch_func` to a sequence of LLVM instructions resulting
				// in a kernel launch call.
				LogicalResult launchKernel(Attribute attribute,
				Operation *launchFuncOperation,
				Operation *binaryOperation,
				llvm::IRBuilderBase &builder,
				LLVM::ModuleTranslation &moduleTranslation) const;
				};
				// Returns an identifier for the global string holding the binary.
				std::string getBinaryIdentifier(StringRef binaryName) {
				return binaryName.str() + "_bin_cst";
				}
				} // namespace

				void mlir::gpu::registerOffloadingLLVMTranslationInterfacesExternalModels(
				DialectRegistry &registry) {
				registry.addExtension(+[](MLIRContext ctx, gpu::GPUDialect dialect) {
				SelectObjectAttr::attachInterface<SelectObjectAttrImpl>(*ctx);
				});
				}

				LogicalResult SelectObjectAttrImpl::embedBinary(
				Attribute attribute, Operation *operation, llvm::IRBuilderBase &builder,
				LLVM::ModuleTranslation &moduleTranslation) const {
				assert(operation && "The binary operation must be non null.");
				if (!operation)
				return failure();

				auto op = mlir::dyn_cast<gpu::BinaryOp>(operation);
				if (!op) {
				operation->emitError("Operation must be a GPU binary.");
				return failure();
				}

				ArrayRef<Attribute> objects = op.getObjectsAttr().getValue();

				// Obtain the index of the object to select.
				int64_t index = -1;
				if (Attribute target = cast<gpu::SelectObjectAttr>(attribute).getTarget()) {
				// If the target attribute is a number it is the index. Otherwise compare
				// the attribute to every target inside the object array to find the index.
				if (auto indexAttr = mlir::dyn_cast<IntegerAttr>(target)) {
				index = indexAttr.getInt();
				} else {
				for (auto [i, attr] : llvm::enumerate(objects)) {
				auto obj = mlir::dyn_cast<gpu::ObjectAttr>(attr);
				if (obj.getTarget() == target) {
				index = i;
				}
				}
				}
				} else {
				// If the target attribute is null then it's selecting the first object in
				// the object array.
				index = 0;
				}

				if (index < 0 \|\| index >= static_cast<int64_t>(objects.size())) {
				op->emitError("The requested target object couldn't be found.");
				return failure();
				}
				auto object = mlir::dyn_cast<gpu::ObjectAttr>(objects[index]);

				llvm::Module *module = moduleTranslation.getLLVMModule();

				// Embed the object as a global string.
				llvm::Constant *binary = llvm::ConstantDataArray::getString(
				builder.getContext(), object.getObject().getValue(), false);
				llvm::GlobalVariable *serializedObj =
				new llvm::GlobalVariable(*module, binary->getType(), true,
				llvm::GlobalValue::LinkageTypes::InternalLinkage,
				binary, getBinaryIdentifier(op.getName()));
				serializedObj->setLinkage(llvm::GlobalValue::LinkageTypes::InternalLinkage);
				serializedObj->setAlignment(llvm::MaybeAlign(8));
				serializedObj->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::None);
				return success();
				}

				namespace llvm {
				namespace {
				class LaunchKernel {
				public:
				LaunchKernel(Module &module, IRBuilderBase &builder,
				mlir::LLVM::ModuleTranslation &moduleTranslation);
				// Get the kernel launch callee.
				FunctionCallee getKernelLaunchFn();

				// Get the module function callee.
				FunctionCallee getModuleFunctionFn();

				// Get the module load callee.
				FunctionCallee getModuleLoadFn();

				// Get the module unload callee.
				FunctionCallee getModuleUnloadFn();

				// Get the stream create callee.
				FunctionCallee getStreamCreateFn();

				// Get the stream destroy callee.
				FunctionCallee getStreamDestroyFn();

				// Get the stream sync callee.
				FunctionCallee getStreamSyncFn();

				// Ger or create the function name global string.
				Value *getOrCreateFunctionName(StringRef moduleName, StringRef kernelName);

				// Create the void* kernel array for passing the arguments.
				Value *createKernelArgArray(mlir::gpu::LaunchFuncOp op);

				// Create the full kernel launch.
				mlir::LogicalResult createKernelLaunch(mlir::gpu::LaunchFuncOp op);

				private:
				Module &module;
				IRBuilderBase &builder;
				mlir::LLVM::ModuleTranslation &moduleTranslation;
				Type *i32Ty{};
				Type *voidTy{};
				Type *intPtrTy{};
				PointerType *ptrTy{};
				};
				} // namespace
				} // namespace llvm

				LogicalResult SelectObjectAttrImpl::launchKernel(
				Attribute attribute, Operation *launchFuncOperation,
				Operation *binaryOperation, llvm::IRBuilderBase &builder,
				LLVM::ModuleTranslation &moduleTranslation) const {

				assert(launchFuncOperation && "The launch func operation must be non null.");
				if (!launchFuncOperation)
				return failure();

				auto launchFuncOp = mlir::dyn_cast<gpu::LaunchFuncOp>(launchFuncOperation);
				if (!launchFuncOp) {
				launchFuncOperation->emitError("Operation must be a GPU launch func Op.");
				return failure();
				}

				return llvm::LaunchKernel(*moduleTranslation.getLLVMModule(), builder,
				moduleTranslation)
				.createKernelLaunch(launchFuncOp);
				}

				llvm::LaunchKernel::LaunchKernel(
				Module &module, IRBuilderBase &builder,
				mlir::LLVM::ModuleTranslation &moduleTranslation)
				: module(module), builder(builder), moduleTranslation(moduleTranslation) {
				i32Ty = builder.getInt32Ty();
				ptrTy = builder.getPtrTy(0);
				voidTy = builder.getVoidTy();
				intPtrTy = builder.getIntPtrTy(module.getDataLayout());
				}

				llvm::FunctionCallee llvm::LaunchKernel::getKernelLaunchFn() {
				return module.getOrInsertFunction(
				"mgpuLaunchKernel",
				FunctionType::get(
				voidTy,
				ArrayRef<Type *>({ptrTy, intPtrTy, intPtrTy, intPtrTy, intPtrTy,
				intPtrTy, intPtrTy, i32Ty, ptrTy, ptrTy, ptrTy}),
				false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getModuleFunctionFn() {
				return module.getOrInsertFunction(
				"mgpuModuleGetFunction",
				FunctionType::get(ptrTy, ArrayRef<Type *>({ptrTy, ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getModuleLoadFn() {
				return module.getOrInsertFunction(
				"mgpuModuleLoad",
				FunctionType::get(ptrTy, ArrayRef<Type *>({ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getModuleUnloadFn() {
				return module.getOrInsertFunction(
				"mgpuModuleUnload",
				FunctionType::get(voidTy, ArrayRef<Type *>({ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getStreamCreateFn() {
				return module.getOrInsertFunction("mgpuStreamCreate",
				FunctionType::get(ptrTy, false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getStreamDestroyFn() {
				return module.getOrInsertFunction(
				"mgpuStreamDestroy",
				FunctionType::get(voidTy, ArrayRef<Type *>({ptrTy}), false));
				}

				llvm::FunctionCallee llvm::LaunchKernel::getStreamSyncFn() {
				return module.getOrInsertFunction(
				"mgpuStreamSynchronize",
				FunctionType::get(voidTy, ArrayRef<Type *>({ptrTy}), false));
				}

				// Generates an LLVM IR dialect global that contains the name of the given
				// kernel function as a C string, and returns a pointer to its beginning.
				llvm::Value *llvm::LaunchKernel::getOrCreateFunctionName(StringRef moduleName,
				StringRef kernelName) {
				std::string globalName =
				std::string(formatv("{0}_{1}_kernel_name", moduleName, kernelName));

				if (GlobalVariable *gv = module.getGlobalVariable(globalName))
				return gv;

				return builder.CreateGlobalString(kernelName, globalName);
				}

				// Creates a struct containing all kernel parameters on the stack and returns
				// an array of type-erased pointers to the fields of the struct. The array can
				// then be passed to the CUDA / ROCm (HIP) kernel launch calls.
				// The generated code is essentially as follows:
				//
				// %struct = alloca(sizeof(struct { Parameters... }))
				// %array = alloca(NumParameters * sizeof(void *))
				// for (i : [0, NumParameters))
				// %fieldPtr = llvm.getelementptr %struct[0, i]
				// llvm.store parameters[i], %fieldPtr
				// %elementPtr = llvm.getelementptr %array[i]
				// llvm.store %fieldPtr, %elementPtr
				// return %array
				llvm::Value *
				llvm::LaunchKernel::createKernelArgArray(mlir::gpu::LaunchFuncOp op) {
				SmallVector<Value *> args =
				moduleTranslation.lookupValues(op.getKernelOperands());
				SmallVector<Type *> structTypes(args.size(), nullptr);

				for (auto [i, arg] : llvm::enumerate(args))
				structTypes[i] = arg->getType();

				Type *structTy = StructType::create(module.getContext(), structTypes);
				Value *argStruct = builder.CreateAlloca(structTy, 0u);
				Value *argArray = builder.CreateAlloca(
				ptrTy, ConstantInt::get(intPtrTy, structTypes.size()));

				for (auto [i, arg] : enumerate(args)) {
				Value *structMember = builder.CreateStructGEP(structTy, argStruct, i);
				builder.CreateStore(arg, structMember);
				Value *arrayMember = builder.CreateConstGEP1_32(ptrTy, argArray, i);
				builder.CreateStore(structMember, arrayMember);
				}
				return argArray;
				}

				// Emits LLVM IR to launch a kernel function:
				// %0 = call %binarygetter
				// %1 = call %moduleLoad(%0)
				// %2 = <see generateKernelNameConstant>
				// %3 = call %moduleGetFunction(%1, %2)
				// %4 = call %streamCreate()
				// %5 = <see generateParamsArray>
				// call %launchKernel(%3, <launchOp operands 0..5>, 0, %4, %5, nullptr)
				// call %streamSynchronize(%4)
				// call %streamDestroy(%4)
				// call %moduleUnload(%1)
				mlir::LogicalResult
				llvm::LaunchKernel::createKernelLaunch(mlir::gpu::LaunchFuncOp op) {
				auto llvmValue = [&](mlir::Value value) -> Value * {
				Value *v = moduleTranslation.lookupValue(value);
				assert(v && "Value has not been translated.");
				return v;
				};

				// Get grid dimensions.
				mlir::gpu::KernelDim3 grid = op.getGridSizeOperandValues();
				Value gx = llvmValue(grid.x), gy = llvmValue(grid.y),
				*gz = llvmValue(grid.z);

				// Get block dimensions.
				mlir::gpu::KernelDim3 block = op.getBlockSizeOperandValues();
				Value bx = llvmValue(block.x), by = llvmValue(block.y),
				*bz = llvmValue(block.z);

				// Get dynamic shared memory size.
				Value *dynamicMemorySize = nullptr;
				if (mlir::Value dynSz = op.getDynamicSharedMemorySize())
				dynamicMemorySize = llvmValue(dynSz);
				else
				dynamicMemorySize = ConstantInt::get(i32Ty, 0);

				// Create the argument array.
				Value *argArray = createKernelArgArray(op);

				// Load the kernel module.
				StringRef moduleName = op.getKernelModuleName().getValue();
				std::string binaryIdentifier = getBinaryIdentifier(moduleName);
				Value *binary = module.getGlobalVariable(binaryIdentifier, true);
				if (!binary)
				return op.emitError() << "Couldn't find the binary: " << binaryIdentifier;
				Value *moduleObject = builder.CreateCall(getModuleLoadFn(), {binary});

				// Load the kernel function.
				Value *moduleFunction = builder.CreateCall(
				getModuleFunctionFn(),
				{moduleObject,
				getOrCreateFunctionName(moduleName, op.getKernelName().getValue())});

				// Get the stream to use for execution. If there's no async object then create
				// a stream to make a synchronous kernel launch.
				Value *stream = nullptr;
				bool handleStream = false;
				if (mlir::Value asyncObject = op.getAsyncObject()) {
				stream = llvmValue(asyncObject);
				} else {
				handleStream = true;
				stream = builder.CreateCall(getStreamCreateFn(), {});
				}

				// Create the launch call.
				Value *nullPtr = ConstantPointerNull::get(ptrTy);
				builder.CreateCall(
				getKernelLaunchFn(),
				ArrayRef<Value *>({moduleFunction, gx, gy, gz, bx, by, bz,
				dynamicMemorySize, stream, argArray, nullPtr}));

				// Sync & destroy the stream, for synchronous launches.
				if (handleStream) {
				builder.CreateCall(getStreamSyncFn(), {stream});
				builder.CreateCall(getStreamDestroyFn(), {stream});
				}

				// Unload the kernel module.
				builder.CreateCall(getModuleUnloadFn(), {moduleObject});

				return success();
				}

mlir/test/Dialect/GPU/invalid.mlir

	Show First 20 Lines • Show All 631 Lines • ▼ Show 20 Lines
	module {			module {
	// expected-error @+1 {{'gpu.binary' op attribute 'objects' failed to satisfy constraint: an array of GPU object attributes with at least 1 elements}}			// expected-error @+1 {{'gpu.binary' op attribute 'objects' failed to satisfy constraint: an array of GPU object attributes with at least 1 elements}}
	gpu.binary @binary []			gpu.binary @binary []
	}			}

	// -----			// -----

	module {			module {
	// expected-error @+1 {{custom op 'gpu.binary' invalid kind of attribute specified}}			// expected-error @+1 {{'gpu.binary' op attribute 'offloadingHandler' failed to satisfy constraint: OffloadingLLVMTranslationAttrInterface instance}}
	gpu.binary @binary <1> [#gpu.object<#nvvm.target, "">]			gpu.binary @binary <1> [#gpu.object<#nvvm.target, "">]
	}			}

mlir/test/Dialect/GPU/ops.mlir

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	gpu.func @kernel_1(%arg0 : f32, %arg1 : memref<?xf32, 1>) kernel {
gpu.return		gpu.return
}		}

gpu.func @kernel_2() kernel {		gpu.func @kernel_2() kernel {
gpu.return		gpu.return
}		}
}		}

		gpu.binary @binary_1 [#gpu.object<#nvvm.target, "">]

		gpu.binary @binary_2 <#gpu.select_object<#rocdl.target>> [#gpu.object<#nvvm.target, "">, #gpu.object<#rocdl.target, "">]

		gpu.binary @binary_3 <#gpu.select_object<1>> [#gpu.object<#nvvm.target, "">, #gpu.object<#rocdl.target, "">]

func.func private @two_value_generator() -> (f32, memref<?xf32, 1>)		func.func private @two_value_generator() -> (f32, memref<?xf32, 1>)

func.func @foo() {		func.func @foo() {
%0 = "op"() : () -> (f32)		%0 = "op"() : () -> (f32)
%1 = "op"() : () -> (memref<?xf32, 1>)		%1 = "op"() : () -> (memref<?xf32, 1>)
// CHECK: %{{.*}} = arith.constant 8		// CHECK: %{{.*}} = arith.constant 8
%cst = arith.constant 8 : index		%cst = arith.constant 8 : index
%cstI64 = arith.constant 8 : i64		%cstI64 = arith.constant 8 : i64
Show All 13 Lines	func.func @foo() {
%t1 = gpu.launch_func async [%t0] @kernels::@kernel_2 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst)		%t1 = gpu.launch_func async [%t0] @kernels::@kernel_2 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst)

// CHECK: gpu.launch_func <%{{.}} : !llvm.ptr> @kernels::@kernel_1 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}}) : i64 args(%{{.}} : f32, %{{.*}} : memref<?xf32, 1>)		// CHECK: gpu.launch_func <%{{.}} : !llvm.ptr> @kernels::@kernel_1 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}}) : i64 args(%{{.}} : f32, %{{.*}} : memref<?xf32, 1>)
gpu.launch_func <%lowStream : !llvm.ptr> @kernels::@kernel_1 blocks in (%cstI64, %cstI64, %cstI64) threads in (%cstI64, %cstI64, %cstI64) : i64 args(%0 : f32, %1 : memref<?xf32, 1>)		gpu.launch_func <%lowStream : !llvm.ptr> @kernels::@kernel_1 blocks in (%cstI64, %cstI64, %cstI64) threads in (%cstI64, %cstI64, %cstI64) : i64 args(%0 : f32, %1 : memref<?xf32, 1>)

// CHECK: gpu.launch_func @kernels::@kernel_1 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}}) : i32 args(%{{.}} : f32, %{{.}} : memref<?xf32, 1>)		// CHECK: gpu.launch_func @kernels::@kernel_1 blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}}) : i32 args(%{{.}} : f32, %{{.}} : memref<?xf32, 1>)
gpu.launch_func @kernels::@kernel_1 blocks in (%c0, %c0, %c0) threads in (%c0, %c0, %c0) : i32 args(%0 : f32, %1 : memref<?xf32, 1>)		gpu.launch_func @kernels::@kernel_1 blocks in (%c0, %c0, %c0) threads in (%c0, %c0, %c0) : i32 args(%0 : f32, %1 : memref<?xf32, 1>)

		// CHECK: gpu.launch_func @binary_1::@kernel blocks in (%{{.}}, %{{.}}, %{{.}}) threads in (%{{.}}, %{{.}}, %{{.}}) : i32 args(%{{.}} : f32, %{{.}} : memref<?xf32, 1>)
		gpu.launch_func @binary_1::@kernel blocks in (%c0, %c0, %c0) threads in (%c0, %c0, %c0) : i32 args(%0 : f32, %1 : memref<?xf32, 1>)

// CHECK: %[[VALUES:.*]]:2 = call		// CHECK: %[[VALUES:.*]]:2 = call
%values:2 = func.call @two_value_generator() : () -> (f32, memref<?xf32, 1>)		%values:2 = func.call @two_value_generator() : () -> (f32, memref<?xf32, 1>)
// CHECK: gpu.launch_func @kernels::@kernel_1 {{.*}} args(%[[VALUES]]#0 : f32, %[[VALUES]]#1 : memref<?xf32, 1>)		// CHECK: gpu.launch_func @kernels::@kernel_1 {{.*}} args(%[[VALUES]]#0 : f32, %[[VALUES]]#1 : memref<?xf32, 1>)
gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) args(%values#0 : f32, %values#1 : memref<?xf32, 1>)		gpu.launch_func @kernels::@kernel_1 blocks in (%cst, %cst, %cst) threads in (%cst, %cst, %cst) args(%values#0 : f32, %values#1 : memref<?xf32, 1>)

return		return
}		}

▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add the Select Object compilation attribute.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 547744

mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/include/mlir/InitAllDialects.h

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/lib/Target/LLVMIR/Dialect/GPU/CMakeLists.txt

mlir/lib/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.cpp

mlir/lib/Target/LLVMIR/Dialect/GPU/SelectObjectAttr.cpp

mlir/test/Dialect/GPU/invalid.mlir

mlir/test/Dialect/GPU/ops.mlir

[mlir][gpu] Add the Select Object compilation attribute.
ClosedPublic