This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/
-
GPUCommon/
-
GPUCommonPass.h
4/4
Passes.td
-
Dialect/SPIRV/
-
SPIRV/
-
Passes.h
-
Passes.td
-
lib/
-
Conversion/GPUCommon/
-
GPUCommon/
-
CMakeLists.txt
15/17
ConvertLaunchFuncToLLVMCalls.cpp
-
Dialect/SPIRV/Transforms/
-
SPIRV/
-
Transforms/
-
CMakeLists.txt
-
EncodeDescriptorSetsPass.cpp
-
test/
-
Conversion/GPUCommon/
-
GPUCommon/
-
emulate-kernel-call.mlir
-
gpu-launch-to-std-call.mlir
-
Dialect/SPIRV/Transforms/
-
SPIRV/
-
Transforms/
-
descriptor-sets-encoding.mlir

Differential D86112

[MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVM
ClosedPublic

Authored by georgemitenkov on Aug 17 2020, 2:51 PM.

Download Raw Diff

Details

Reviewers

antiagainst
mravishankar
ftynse
herhut
mehdi_amini
rriddle

Commits

rGcae4067ec1cd: [MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVM

Summary

This patch introduces a pass for running
mlir-spirv-cpu-runner - LowerHostCodeToLLVMPass.

This pass emulates gpu.launch_func call in LLVM dialect and lowers
the host module code to LLVM. It removes the gpu.module, creates a
sequence of global variables that are later linked to the varables
in the kernel module, as well as a series of copies to/from
them to emulate the memory transfer to/from the host or to/from the
device sides. It also converts the remaining Standard dialect into
LLVM dialect, emitting C wrappers.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

georgemitenkov created this revision.Aug 17 2020, 2:51 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2020, 2:51 PM

Herald added subscribers: ThomasRaoux, AlexeySotkin, msifontes and 13 others. · View Herald Transcript

georgemitenkov requested review of this revision.Aug 17 2020, 2:51 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptAug 17 2020, 2:51 PM

georgemitenkov added a parent revision: D86109: [MLIR][SPIRVToLLVM] Additional conversions for spirv-runner.Aug 17 2020, 2:52 PM

georgemitenkov added a child revision: D86108: [MLIR][mlir-spirv-cpu-runner] A SPIR-V cpu runner prototype.

Harbormaster completed remote builds in B68681: Diff 286155.Aug 17 2020, 3:12 PM

I did an initial pass on this. Will come back and look in more detail, but have some high-level comments.

mravishankar added inline comments.Aug 17 2020, 11:37 PM

mlir/include/mlir/Conversion/Passes.td
94	Looking through this, I am not sure why this is split into two. It might be better to combine this into one pass. AFAICS you have all the information you need to convert a `gpu.launch_func` to a series of copies from argument to the global variables and then the Actual call to the kernel function a series of a copies to copy the result back.
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
181	The GPU Dialect adds the `gpu.container_module` to identify kernel side. Further the LLVM -> SPIR-V lowering preserve the `spv.target_env` attribute. Its better to check for one or both of these attributes to be more targeted in what is treated as a kernel module?

This revision now requires changes to proceed.Aug 17 2020, 11:37 PM

georgemitenkov added inline comments.Aug 18 2020, 1:04 AM

mlir/include/mlir/Conversion/Passes.td
94	The reason to split comes from the fact that we cannot convert Standard to LLVM before our pass (Then `gpu.launch_func` arguments' types do not match as memrefs are converted into memref descriptors in LLVM). So we would have something like this: "gpu.launch_func"(%24, %24, %24, %24, %24, %24, %16) {kernel = @kernels::@simple} : (!llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>) -> () Nor we can run `std-to-llvm` after. If we replace the launch op with `llvm.call`, then its arguments will also be of LLVMType. This means we cannot really write the series of copies from argument to the global variables. This is because arguments' types are in LLVM but actual values passed are still in standard. My solution to this was to preprocess `gpu.launch_func` so that it can be lowered to LLVM as a normal function, and then replace it as intended when everything is in LLVM. There might be something I am missing here, I will have a look at how to combine 2 passes in one (which is a definitely a nicer way) :)
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
181	Thanks! I am not sure how `gpu.container_module` in general helps because it only identifies that it contains a kernel module, but may also have other modules in it? In this case, we can check for the module that is `!= gpu.container_module` as we consider only 2 modules. I think reusing `spv.target_env` and `spv.module`'s attributes is a great way, but for that we have to have a way of passing this info in SPIR-V to LLVM conversion. Something like module attires { "kernel" } { ... Also, I am wondering why GPU-> SPIR-V module conversion doesn't pass the symbolic name? If we would have a SPIR-V module with the name "spv.{gpu module name here}" for example, we can find the kernel straight away by looking up in symbol table?

georgemitenkov marked 2 inline comments as not done.Aug 18 2020, 1:06 AM

georgemitenkov marked an inline comment as done.Aug 19 2020, 8:59 AM

georgemitenkov added inline comments.

mlir/include/mlir/Conversion/Passes.td
94	Actually, there is a way to do it by pulling in `std-to-llvm` patterns. I will update the patch with a new combined version of the passes.

Combined 2 passes into 1.
Finding kernel module is the same for now (to be changed).

Harbormaster completed remote builds in B69012: Diff 286779.Aug 20 2020, 4:53 AM

Added a utilty function for kernel global variable lookup.

georgemitenkov marked an inline comment as done.Aug 20 2020, 5:19 AM

Added missing change.

Harbormaster completed remote builds in B69017: Diff 286789.Aug 20 2020, 5:39 AM

Harbormaster completed remote builds in B69016: Diff 286787.

georgemitenkov edited the summary of this revision. (Show Details)Aug 20 2020, 7:00 AM

georgemitenkov retitled this revision from [MLIR][mlir-spirv-runner] Passes for spirv-runner to [MLIR][mlir-spirv-cpu-runner] Passes for spirv-cpu-runner.Aug 20 2020, 7:10 AM

mravishankar requested changes to this revision.Aug 20 2020, 10:59 PM

mravishankar added inline comments.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
111	Correct me if I am wrong, but it seems like the assumption here is that the structure is actually module { spv.module { spv.func @foo } module @kernel_module_name {gpu.container_module} { gpu.func @foo } gpu.launch_func @kernel_module_name::foo } This is the case now, but that could change. We should be really looking at at the `spv.module` directly. There some steps though to get there You probably need to add `SymAttr` to `spv.module`. As you mentioned, that when converting the gpu module to spir-v module, we can use the same symbol name `spv.EntryPoint` actually is supposed to have a list of `spv.globalVariables` that are accessed within the entry point. We dont emit it currently cause it isnt needed by Vulkan side. But that is useful here to get the "arguments". The `spv.EntryPoint` is added by the `LowerABIAttributes` pass. Probably need to do it when lowering a `func` to `spv.func` where you have all the information. This is a fairly big change though. A intermediate stage would be given that you check for a single `spv.Module` and a single entry point function, under that assumption, all non builtin `spv.globalVariables` are kernel arguments, and they have the `[binding,descriptor_set]` as `[0, 0], [1,0], [2, 0]`... and so on. So argument 0 -> `[0, 0]`, argument 1 -> `[1, 0]`, etc. Use that to define/find the symbol that is used to do the copy. Does this make sense.
146	Can we also append the `gpu.container_module` name and the kernel function name to avoid conflicts.
181	Also, I am wondering why GPU-> SPIR-V module conversion doesn't pass the symbolic name? If we would have a SPIR-V module with the name "spv.{gpu module name here}" for example, we can find the kernel straight away by looking up in symbol table? Good point. We didnt need it, but totally agree that the spv.module should have a symbol name as well. (See comment above).

This revision now requires changes to proceed.Aug 20 2020, 10:59 PM

georgemitenkov added inline comments.Aug 21 2020, 2:10 AM

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
111	Actually the current structure was module {gpu.container_module} { module { llvm.func() @foo } gpu.module @kernel_module_name { gpu.func @foo } gpu.launch_func @kernel_module_name::foo } because we first convert SPIR-V to LLVM for easier global variable type deduction. But dealing with `gpu.launch_func` op before makes more sense as we do not lose `spv.EntryPoint` and binding/descriptor set info. Probably need to do it when lowering a func to spv.func where you have all the information. I think that this can be done within `LowerABIAttributes` where we convert `spv.func(args)` to `spv.func()`? Then the ideal flow would be: look up the needed SPIR-V module by `__spv_{kernel_module_name}` look up an entry point for this kernel and get the list of `spv.globalVariable`s associated with this kernel. I think these globals are named by auto name = funcOp.getName().str() + "_arg_" + std::to_string(argIndex); I think it will be better to append the module name in the beginning as well? Create the `llvm.global` (need to convert the type here) with the name: name + "_binding0_descriptor_set_1" The same naming convention can be used when we lower SPIR-V global's to LLVM. This would remove the need of `EncodeDescriptorSetsPass` then cause it can be done in `ConvertSPIRVToLLVM`?

This is a big update to the pass structure. Now it is run after GPU to SPIR-V
conversion (and all ABI lowering, etc.). This allows to use SPIR-V module,
binds of global variables for kernel call emulation. Also, we can have multiple
kernels in the program (To fully run these the linking has to be updated).

Also, kernel function and kernel operands (global variables) are renamed during
the pass to have a kernel name in front of their names.

EncodeDescriptorSetsPass is removed as binding encoding can be done in SPIR-V
to LLVM conversion (TODO).

Added missing file.

Removed EncodeDescriptorSetsPass.

Harbormaster completed remote builds in B69190: Diff 287128.Aug 21 2020, 3:56 PM

Harbormaster completed remote builds in B69192: Diff 287130.Aug 21 2020, 4:02 PM

georgemitenkov added a parent revision: D86384: [MLIR][GPUToSPIRV] Passing gpu module name to SPIR-V module.Aug 21 2020, 4:04 PM

Harbormaster completed remote builds in B69191: Diff 287129.Aug 21 2020, 4:07 PM

mravishankar added inline comments.Aug 24 2020, 12:09 PM

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
111	Thanks for clarifying. This makes it clear whats happening. The `LowerABIAttributes` pass already does what you want for the `spv.globalVariable` (here). I think encoding the name as you mentioned seems fine to me. Removing an extra pass is good as well.

georgemitenkov added a parent revision: D86515: [MLIR][SPIRVToLLVM] Added a hook for descriptor set / binding encoding.Aug 25 2020, 3:11 AM

Rebase on top of other patches and minor changes: variables/function names now keep __spv__ from SPIR-V module.

georgemitenkov marked an inline comment as done.Aug 26 2020, 10:39 AM

georgemitenkov retitled this revision from [MLIR][mlir-spirv-cpu-runner] Passes for spirv-cpu-runner to [MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVM.Aug 26 2020, 10:41 AM

georgemitenkov edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B69637: Diff 288037.Aug 26 2020, 10:58 AM

rriddle added inline comments.Aug 26 2020, 2:57 PM

mlir/include/mlir/Conversion/Passes.td
95	Is this wrapped at 80 characters?
mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
98	Please avoid using distance as it is O(N) in size. You can use llvm::hasSingleElement() for this check.
99	nit: You can return this directly, it auto-converts to failure().
115	nit: Please just use `module.sym_name()` instead.
127	You can set an attribute via `module.sym_name(newName)`.
254	nit: Spell out auto.
290	nit: Drop trivial braces.

georgemitenkov marked 6 inline comments as done.Aug 26 2020, 8:29 PM

georgemitenkov added inline comments.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
127	I am not sure this can be done like that? Do you mean using `sym_nameAttr()` and passing a string attribute instead?

Addressed comments.

Harbormaster completed remote builds in B69714: Diff 288173.Aug 26 2020, 8:31 PM

This looks good! I just have a few final nits.

Btw, It would be better for me to commit this for you. Please let me know when this is ready, and I will commit this. I am setting this as "request change" only so that once you update this it shows up on my dashboard.

Thanks for addressing all the issues raised! It looks fairly clean to me now.

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp
30	Nit: Instead of macros I would rather have a `static` method that returns a std::string.
162	Nit: Just add `{ ... }` around statements that span multiple lines...
177	Use `OpBuilder::InsertionGuard(rewriter)`. It will reset the insertion point after the scope.

This revision now requires changes to proceed.Aug 26 2020, 9:46 PM

Addressed comments and rebased on master to pick up committed patches.

Harbormaster completed remote builds in B69727: Diff 288205.Aug 26 2020, 11:38 PM

Fixed StringRef error.

Added a missing change.

Harbormaster completed remote builds in B69747: Diff 288244.Aug 27 2020, 2:51 AM

Harbormaster completed remote builds in B69748: Diff 288249.Aug 27 2020, 2:54 AM

flaub added a subscriber: flaub.Aug 30 2020, 1:45 PM

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 30 2020, 1:45 PM

mravishankar mentioned this in D89448: [MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVM.Oct 14 2020, 10:56 PM

Coming around to landing this change now. This was causing some issues with builds (internal builds within Google) and I figured a better placement of the files works better. See this patch : https://reviews.llvm.org/D89448 for the changes. I will commit that change instead. The dependent change should be modified to suit this though. I can take care of that as well

This revision is now accepted and ready to land.Oct 14 2020, 11:01 PM

Herald added subscribers: rdzhabarov, tatianashp. · View Herald TranscriptOct 14 2020, 11:01 PM

Moved files around as suggested by @mravishankar.

In D86112#2331714, @mravishankar wrote:

Coming around to landing this change now. This was causing some issues with builds (internal builds within Google) and I figured a better placement of the files works better. See this patch : https://reviews.llvm.org/D89448 for the changes. I will commit that change instead. The dependent change should be modified to suit this though. I can take care of that as well

Great, thanks! I will update the dependent runner patch accordingly.

Harbormaster completed remote builds in B75228: Diff 298474.Oct 15 2020, 3:15 PM

Updated struct types according to https://reviews.llvm.org/D87206.

Harbormaster completed remote builds in B75308: Diff 298625.Oct 16 2020, 7:17 AM

@mravishankar So this patch should be good to land then?

This revision was landed with ongoing or failed builds.Oct 26 2020, 5:14 AM

Closed by commit rGcae4067ec1cd: [MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVM (authored by georgemitenkov, committed by antiagainst). · Explain Why

This revision was automatically updated to reflect the committed changes.

antiagainst added a commit: rGcae4067ec1cd: [MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVM.

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

GPUCommon/

GPUCommonPass.h

6 lines

Passes.td

10 lines

Dialect/

SPIRV/

Passes.h

5 lines

Passes.td

7 lines

lib/

Conversion/

GPUCommon/

CMakeLists.txt

1 line

ConvertLaunchFuncToLLVMCalls.cpp

300 lines

Dialect/

SPIRV/

Transforms/

CMakeLists.txt

1 line

EncodeDescriptorSetsPass.cpp

75 lines

test/

Conversion/

GPUCommon/

emulate-kernel-call.mlir

79 lines

gpu-launch-to-std-call.mlir

52 lines

Dialect/

SPIRV/

Transforms/

descriptor-sets-encoding.mlir

29 lines

Diff 286155

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h

	Show All 32 Lines
	} // namespace LLVM			} // namespace LLVM

	using OwnedBlob = std::unique_ptr<std::vector<char>>;			using OwnedBlob = std::unique_ptr<std::vector<char>>;
	using BlobGenerator =			using BlobGenerator =
	std::function<OwnedBlob(const std::string &, Location, StringRef)>;			std::function<OwnedBlob(const std::string &, Location, StringRef)>;
	using LoweringCallback = std::function<std::unique_ptr<llvm::Module>(			using LoweringCallback = std::function<std::unique_ptr<llvm::Module>(
	Operation *, llvm::LLVMContext &, StringRef)>;			Operation *, llvm::LLVMContext &, StringRef)>;

				/// Creates a pass to convert `gpu.launch_func` to a standard function call.
				std::unique_ptr<OperationPass<ModuleOp>>
				createGPULaunchFuncToStandardCallPass();
				/// Creates a pass to emulate a kernel function call in LLVM dialect.
				std::unique_ptr<OperationPass<ModuleOp>> createEmulateKernelCallInLLVMPass();

	/// Creates a pass to convert a gpu.launch_func operation into a sequence of			/// Creates a pass to convert a gpu.launch_func operation into a sequence of
	/// GPU runtime calls.			/// GPU runtime calls.
	///			///
	/// This pass does not generate code to call GPU runtime APIs directly but			/// This pass does not generate code to call GPU runtime APIs directly but
	/// instead uses a small wrapper library that exports a stable and conveniently			/// instead uses a small wrapper library that exports a stable and conveniently
	/// typed ABI on top of GPU runtimes such as CUDA or ROCm (HIP).			/// typed ABI on top of GPU runtimes such as CUDA or ROCm (HIP).
	std::unique_ptr<OperationPass<ModuleOp>>			std::unique_ptr<OperationPass<ModuleOp>>
	createGpuToLLVMConversionPass(StringRef gpuBinaryAnnotation = "");			createGpuToLLVMConversionPass(StringRef gpuBinaryAnnotation = "");
	Show All 37 Lines

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	def GpuToLLVMConversionPass : Pass<"gpu-to-llvm", "ModuleOp"> {
let summary = "Convert GPU dialect to LLVM dialect with GPU runtime calls";		let summary = "Convert GPU dialect to LLVM dialect with GPU runtime calls";
let constructor = "mlir::createGpuToLLVMConversionPass()";		let constructor = "mlir::createGpuToLLVMConversionPass()";
let options = [		let options = [
Option<"gpuBinaryAnnotation", "gpu-binary-annotation", "std::string",		Option<"gpuBinaryAnnotation", "gpu-binary-annotation", "std::string",
"", "Annotation attribute string for GPU binary">,		"", "Annotation attribute string for GPU binary">,
];		];
}		}

		def GPULaunchFuncToStandardCall : Pass<"gpu-launch-to-std-call", "ModuleOp"> {
		mravishankarUnsubmitted Done Reply Inline Actions Looking through this, I am not sure why this is split into two. It might be better to combine this into one pass. AFAICS you have all the information you need to convert a `gpu.launch_func` to a series of copies from argument to the global variables and then the Actual call to the kernel function a series of a copies to copy the result back. mravishankar: Looking through this, I am not sure why this is split into two. It might be better to combine…
		georgemitenkovAuthorUnsubmitted Done Reply Inline Actions The reason to split comes from the fact that we cannot convert Standard to LLVM before our pass (Then `gpu.launch_func` arguments' types do not match as memrefs are converted into memref descriptors in LLVM). So we would have something like this: "gpu.launch_func"(%24, %24, %24, %24, %24, %24, %16) {kernel = @kernels::@simple} : (!llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>) -> () Nor we can run `std-to-llvm` after. If we replace the launch op with `llvm.call`, then its arguments will also be of LLVMType. This means we cannot really write the series of copies from argument to the global variables. This is because arguments' types are in LLVM but actual values passed are still in standard. My solution to this was to preprocess `gpu.launch_func` so that it can be lowered to LLVM as a normal function, and then replace it as intended when everything is in LLVM. There might be something I am missing here, I will have a look at how to combine 2 passes in one (which is a definitely a nicer way) :) georgemitenkov: The reason to split comes from the fact that we cannot convert Standard to LLVM before our pass…
		georgemitenkovAuthorUnsubmitted Done Reply Inline Actions Actually, there is a way to do it by pulling in `std-to-llvm` patterns. I will update the patch with a new combined version of the passes. georgemitenkov: Actually, there is a way to do it by pulling in `std-to-llvm` patterns. I will update the patch…
		let summary = "Convert gpu.launch_func to a standard dialect call";
		rriddleUnsubmitted Done Reply Inline Actions Is this wrapped at 80 characters? rriddle: Is this wrapped at 80 characters?
		let constructor = "mlir::createGPULaunchFuncToStandardCallPass()";
		}

		def EmulateKernelCallInLLVM : Pass<"emulate-kernel-llvm-call", "ModuleOp"> {
		let summary = "Emulate a kernel function call in LLVM dialect";
		let constructor = "mlir::createEmulateKernelCallInLLVMPass()";
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GPUToNVVM		// GPUToNVVM
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def ConvertGpuOpsToNVVMOps : Pass<"convert-gpu-to-nvvm", "gpu::GPUModuleOp"> {		def ConvertGpuOpsToNVVMOps : Pass<"convert-gpu-to-nvvm", "gpu::GPUModuleOp"> {
let summary = "Generate NVVM operations for gpu operations";		let summary = "Generate NVVM operations for gpu operations";
let constructor = "mlir::createLowerGpuOpsToNVVMOpsPass()";		let constructor = "mlir::createLowerGpuOpsToNVVMOpsPass()";
let options = [		let options = [
▲ Show 20 Lines • Show All 235 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/SPIRV/Passes.h

	Show All 20 Lines
	class ModuleOp;			class ModuleOp;
	/// Creates a module pass that converts composite types used by objects in the			/// Creates a module pass that converts composite types used by objects in the
	/// StorageBuffer, PhysicalStorageBuffer, Uniform, and PushConstant storage			/// StorageBuffer, PhysicalStorageBuffer, Uniform, and PushConstant storage
	/// classes with layout information.			/// classes with layout information.
	/// Right now this pass only supports Vulkan layout rules.			/// Right now this pass only supports Vulkan layout rules.
	std::unique_ptr<OperationPass<mlir::ModuleOp>>			std::unique_ptr<OperationPass<mlir::ModuleOp>>
	createDecorateSPIRVCompositeTypeLayoutPass();			createDecorateSPIRVCompositeTypeLayoutPass();

				/// Creates a module pass that encodes bind attribute of each
				/// `spv.globalVariable` into its symbolic name.
				std::unique_ptr<OperationPass<spirv::ModuleOp>>
				createEncodeDescriptorSetsPass();

	/// Creates an operation pass that deduces and attaches the minimal version/			/// Creates an operation pass that deduces and attaches the minimal version/
	/// capabilities/extensions requirements for spv.module ops.			/// capabilities/extensions requirements for spv.module ops.
	/// For each spv.module op, this pass requires a `spv.target_env` attribute on			/// For each spv.module op, this pass requires a `spv.target_env` attribute on
	/// it or an enclosing module-like op to drive the deduction. The reason is			/// it or an enclosing module-like op to drive the deduction. The reason is
	/// that an op can be enabled by multiple extensions/capabilities. So we need			/// that an op can be enabled by multiple extensions/capabilities. So we need
	/// to know which one to pick. `spv.target_env` gives the hard limit as for			/// to know which one to pick. `spv.target_env` gives the hard limit as for
	/// what the target environment can support; this pass deduces what are			/// what the target environment can support; this pass deduces what are
	/// actually needed for a specific spv.module op.			/// actually needed for a specific spv.module op.
	Show All 28 Lines

mlir/include/mlir/Dialect/SPIRV/Passes.td

	Show All 11 Lines
	include "mlir/Pass/PassBase.td"			include "mlir/Pass/PassBase.td"

	def SPIRVCompositeTypeLayout			def SPIRVCompositeTypeLayout
	: Pass<"decorate-spirv-composite-type-layout", "ModuleOp"> {			: Pass<"decorate-spirv-composite-type-layout", "ModuleOp"> {
	let summary = "Decorate SPIR-V composite type with layout info";			let summary = "Decorate SPIR-V composite type with layout info";
	let constructor = "mlir::spirv::createDecorateSPIRVCompositeTypeLayoutPass()";			let constructor = "mlir::spirv::createDecorateSPIRVCompositeTypeLayoutPass()";
	}			}

				def SPIRVEncodeDescriptorSets
				: Pass<"spirv-encode-descriptor-sets", "spirv::ModuleOp"> {
				let summary = "Encode `spv.globalVariable`'s' bind attribute into its "
				"symbolic name";
				let constructor = "mlir::spirv::createEncodeDescriptorSetsPass()";
				}

	def SPIRVLowerABIAttributes : Pass<"spirv-lower-abi-attrs", "spirv::ModuleOp"> {			def SPIRVLowerABIAttributes : Pass<"spirv-lower-abi-attrs", "spirv::ModuleOp"> {
	let summary = "Decorate SPIR-V composite type with layout info";			let summary = "Decorate SPIR-V composite type with layout info";
	let constructor = "mlir::spirv::createLowerABIAttributesPass()";			let constructor = "mlir::spirv::createLowerABIAttributesPass()";
	}			}

	def SPIRVRewriteInsertsPass : Pass<"spirv-rewrite-inserts", "spirv::ModuleOp"> {			def SPIRVRewriteInsertsPass : Pass<"spirv-rewrite-inserts", "spirv::ModuleOp"> {
	let summary = "Rewrite sequential chains of spv.CompositeInsert operations into "			let summary = "Rewrite sequential chains of spv.CompositeInsert operations into "
	"spv.CompositeConstruct operations";			"spv.CompositeConstruct operations";
	Show All 10 Lines

mlir/lib/Conversion/GPUCommon/CMakeLists.txt

Show All 9 Lines	if (MLIR_ROCM_CONVERSIONS_ENABLED)
set(AMDGPU_LIBS		set(AMDGPU_LIBS
AMDGPUCodeGen		AMDGPUCodeGen
AMDGPUDesc		AMDGPUDesc
AMDGPUInfo		AMDGPUInfo
)		)
endif()		endif()

add_mlir_conversion_library(MLIRGPUToGPURuntimeTransforms		add_mlir_conversion_library(MLIRGPUToGPURuntimeTransforms
		ConvertLaunchFuncToLLVMCalls.cpp
ConvertLaunchFuncToRuntimeCalls.cpp		ConvertLaunchFuncToRuntimeCalls.cpp
ConvertKernelFuncToBlob.cpp		ConvertKernelFuncToBlob.cpp

DEPENDS		DEPENDS
MLIRConversionPassIncGen		MLIRConversionPassIncGen
intrinsics_gen		intrinsics_gen

LINK_COMPONENTS		LINK_COMPONENTS
Show All 13 Lines

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp

This file was added.

				//===- ConvertLaunchFuncToLLVMCalls.cpp - MLIR GPU launch to LLVM pass ----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements passes to convert `gpu.launch_func` op into a sequence
				// of LLVM calls that emulate the host and device sides.
				//
				//===----------------------------------------------------------------------===//

				#include "../PassDetail.h"
				#include "mlir/Conversion/GPUCommon/GPUCommonPass.h"
				#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h"
				#include "mlir/Dialect/GPU/GPUDialect.h"
				#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/IR/Module.h"
				#include "mlir/IR/SymbolTable.h"
				#include "mlir/Transforms/DialectConversion.h"
				#include "llvm/Support/FormatVariadic.h"

				using namespace mlir;

				static constexpr const char kLLVMLaunchName[] = "__llvm_launch";
				static constexpr const char kStandardLaunchName[] = "__std_launch";
				static constexpr const unsigned memRefDescriptorNumElements = 5;
				static constexpr const unsigned numElementsOffset = 3;
				mravishankarUnsubmitted Done Reply Inline Actions Nit: Instead of macros I would rather have a `static` method that returns a std::string. mravishankar: Nit: Instead of macros I would rather have a `static` method that returns a std::string.

				//===----------------------------------------------------------------------===//
				// Utility functions
				//===----------------------------------------------------------------------===//

				/// Erases redundant functions from the module.
				static void clearFunctionDefinitions(ModuleOp module) {
				for (auto funcOp :
				llvm::make_early_inc_range(module.getOps<LLVM::LLVMFuncOp>())) {
				StringRef name = funcOp.getName();
				if (name == kStandardLaunchName \|\|
				name ==
				StringRef(llvm::formatv("_mlir_ciface_{0}", kStandardLaunchName)))
				funcOp.erase();
				}
				}

				/// Emits code to copy the given number of bytes from src to dst pointers.
				static void copyData(Location loc, Value dst, Value src, Value size,
				OpBuilder &builder) {
				MLIRContext *context = builder.getContext();
				auto llvmI1Type = LLVM::LLVMType::getInt1Ty(context);
				Value isVolatile = builder.create<LLVM::ConstantOp>(
				loc, llvmI1Type, builder.getBoolAttr(false));
				builder.create<LLVM::MemcpyOp>(loc, dst, src, size, isVolatile);
				}

				/// Returns true if the function call is ex-`gpu.launch_func` op.
				static bool isLaunchCallOp(LLVM::CallOp callOp) {
				return callOp.callee() && callOp.callee().getValue() == kStandardLaunchName;
				}

				/// Verifies if the module contains exactly one nested module with exactly one
				/// (kernel) function.
				static bool hasOneNestedModuleAndOneKernel(ModuleOp module) {
				bool hasOneNestedModule = false;
				auto walkResult =
				module.walk([&hasOneNestedModule](ModuleOp moduleOp) -> WalkResult {
				// Interrupt if more than one nested module has been found.
				if (moduleOp.getParentOp() && hasOneNestedModule)
				return WalkResult::interrupt();

				// If there is a parent operation, it means we walked to a nested
				// module. Verify there is only a single function in it.
				if (moduleOp.getParentOp()) {
				auto funcs = moduleOp.getOps<LLVM::LLVMFuncOp>();
				SmallVector<LLVM::LLVMFuncOp, 4> funcVector(funcs.begin(),
				funcs.end());
				if (funcVector.size() != 1)
				return WalkResult::interrupt();
				hasOneNestedModule = true;
				}
				// Otherwise, advance.
				return WalkResult::advance();
				});

				if (walkResult.wasInterrupted())
				return false;
				return hasOneNestedModule;
				}

				static LogicalResult renameKernel(ModuleOp kernelModule) {
				kernelModule.walk([&](LLVM::LLVMFuncOp funcOp) {
				SymbolTable::setSymbolName(funcOp, kLLVMLaunchName);
				});
				return success();
				}

				rriddleUnsubmitted Done Reply Inline Actions Please avoid using distance as it is O(N) in size. You can use llvm::hasSingleElement() for this check. rriddle: Please avoid using distance as it is O(N) in size. You can use llvm::hasSingleElement() for…
				//===----------------------------------------------------------------------===//
				rriddleUnsubmitted Done Reply Inline Actions nit: You can return this directly, it auto-converts to failure(). rriddle: nit: You can return this directly, it auto-converts to failure().
				// Conversion patterns
				//===----------------------------------------------------------------------===//

				namespace {
				class GPULaunchFuncToStandardCall
				: public GPULaunchFuncToStandardCallBase<GPULaunchFuncToStandardCall> {
				public:
				void runOnOperation() override;
				};

				class EmulateKernelCallInLLVM
				: public EmulateKernelCallInLLVMBase<EmulateKernelCallInLLVM> {
				mravishankarUnsubmitted Done Reply Inline Actions Correct me if I am wrong, but it seems like the assumption here is that the structure is actually module { spv.module { spv.func @foo } module @kernel_module_name {gpu.container_module} { gpu.func @foo } gpu.launch_func @kernel_module_name::foo } This is the case now, but that could change. We should be really looking at at the `spv.module` directly. There some steps though to get there You probably need to add `SymAttr` to `spv.module`. As you mentioned, that when converting the gpu module to spir-v module, we can use the same symbol name `spv.EntryPoint` actually is supposed to have a list of `spv.globalVariables` that are accessed within the entry point. We dont emit it currently cause it isnt needed by Vulkan side. But that is useful here to get the "arguments". The `spv.EntryPoint` is added by the `LowerABIAttributes` pass. Probably need to do it when lowering a `func` to `spv.func` where you have all the information. This is a fairly big change though. A intermediate stage would be given that you check for a single `spv.Module` and a single entry point function, under that assumption, all non builtin `spv.globalVariables` are kernel arguments, and they have the `[binding,descriptor_set]` as `[0, 0], [1,0], [2, 0]`... and so on. So argument 0 -> `[0, 0]`, argument 1 -> `[1, 0]`, etc. Use that to define/find the symbol that is used to do the copy. Does this make sense. mravishankar: Correct me if I am wrong, but it seems like the assumption here is that the structure is…
				georgemitenkovAuthorUnsubmitted Done Reply Inline Actions Actually the current structure was module {gpu.container_module} { module { llvm.func() @foo } gpu.module @kernel_module_name { gpu.func @foo } gpu.launch_func @kernel_module_name::foo } because we first convert SPIR-V to LLVM for easier global variable type deduction. But dealing with `gpu.launch_func` op before makes more sense as we do not lose `spv.EntryPoint` and binding/descriptor set info. Probably need to do it when lowering a func to spv.func where you have all the information. I think that this can be done within `LowerABIAttributes` where we convert `spv.func(args)` to `spv.func()`? Then the ideal flow would be: look up the needed SPIR-V module by `__spv_{kernel_module_name}` look up an entry point for this kernel and get the list of `spv.globalVariable`s associated with this kernel. I think these globals are named by auto name = funcOp.getName().str() + "_arg_" + std::to_string(argIndex); I think it will be better to append the module name in the beginning as well? Create the `llvm.global` (need to convert the type here) with the name: name + "_binding0_descriptor_set_1" The same naming convention can be used when we lower SPIR-V global's to LLVM. This would remove the need of `EncodeDescriptorSetsPass` then cause it can be done in `ConvertSPIRVToLLVM`? georgemitenkov: Actually the current structure was ``` module {gpu.container_module} { module { llvm.
				mravishankarUnsubmitted Done Reply Inline Actions Thanks for clarifying. This makes it clear whats happening. The `LowerABIAttributes` pass already does what you want for the `spv.globalVariable` (here). I think encoding the name as you mentioned seems fine to me. Removing an extra pass is good as well. mravishankar: Thanks for clarifying. This makes it clear whats happening. The `LowerABIAttributes` pass…
				public:
				void runOnOperation() override;
				};

				rriddleUnsubmitted Done Reply Inline Actions nit: Please just use `module.sym_name()` instead. rriddle: nit: Please just use `module.sym_name()` instead.
				/// This pattern prepares the conversion of kernel launch to LLVM dialect. It:
				/// - Declares a placeholder function with the same argument types and a void
				/// result type.
				/// - Replaces `gpu.launch_func` with a call to placeholder.
				class KernelCallPattern : public OpRewritePattern<gpu::LaunchFuncOp> {
				public:
				using OpRewritePattern<gpu::LaunchFuncOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(gpu::LaunchFuncOp launchOp,
				PatternRewriter &rewriter) const override {
				MLIRContext *context = rewriter.getContext();
				auto module = launchOp.getParentOfType<ModuleOp>();
				rriddleUnsubmitted Not Done Reply Inline Actions You can set an attribute via `module.sym_name(newName)`. rriddle: You can set an attribute via `module.sym_name(newName)`.
				georgemitenkovAuthorUnsubmitted Not Done Reply Inline Actions I am not sure this can be done like that? Do you mean using `sym_nameAttr()` and passing a string attribute instead? georgemitenkov: I am not sure this can be done like that? Do you mean using `sym_nameAttr()` and passing a…
				OpBuilder::InsertionGuard guard(rewriter);
				rewriter.setInsertionPointToStart(module.getBody());

				// Declare a standard function that will keep kernel memref arguments.
				SmallVector<Type, 8> gpuLaunchTypes(launchOp.getOperandTypes());
				SmallVector<Type, 4> kernelOperands(gpuLaunchTypes.begin() +
				gpu::LaunchOp::kNumConfigOperands,
				gpuLaunchTypes.end());
				auto kernelFunc =
				rewriter.create<FuncOp>(rewriter.getUnknownLoc(), kStandardLaunchName,
				FunctionType::get(kernelOperands, {}, context));

				// Replace `gpu.launch_func` with a standard function call.
				rewriter.setInsertionPoint(launchOp);
				SmallVector<Value, 4> operands;
				for (unsigned i = 0, e = launchOp.getNumKernelOperands(); i < e; ++i)
				operands.push_back(launchOp.getKernelOperand(i));
				rewriter.replaceOpWithNewOp<CallOp>(launchOp, kernelFunc, operands);
				return success();
				mravishankarUnsubmitted Done Reply Inline Actions Can we also append the `gpu.container_module` name and the kernel function name to avoid conflicts. mravishankar: Can we also append the `gpu.container_module` name and the kernel function name to avoid…
				}
				};

				/// This pattern emulates a call to kernel in LLVM dialect. For that, we
				/// copy the data to the global variable (emulating device side), call
				/// the kernel as a normal void LLVM function, and copy the data back
				/// (emulating host side).
				class StandardCallPattern : public OpRewritePattern<LLVM::CallOp> {
				public:
				using OpRewritePattern<LLVM::CallOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(LLVM::CallOp op,
				PatternRewriter &rewriter) const override {
				MLIRContext *context = rewriter.getContext();
				auto module = op.getParentOfType<ModuleOp>();
				OpBuilder::InsertionGuard guard(rewriter);
				mravishankarUnsubmitted Done Reply Inline Actions Nit: Just add `{ ... }` around statements that span multiple lines... mravishankar: Nit: Just add `{ ... }` around statements that span multiple lines...
				rewriter.setInsertionPointToStart(module.getBody());

				// Declare kernel function in the main module so that it later can be linked
				// with its definition from the kernel module. We know that the kernel
				// function would have no arguments and the data is passed via global
				// variables.
				auto kernelFunc = rewriter.create<LLVM::LLVMFuncOp>(
				rewriter.getUnknownLoc(), kLLVMLaunchName,
				LLVM::LLVMType::getFunctionTy(LLVM::LLVMType::getVoidTy(context),
				ArrayRef<LLVM::LLVMType>(),
				/isVarArg=/false));
				rewriter.setInsertionPoint(op);
				Location loc = op.getLoc();

				// Find ex-GPU module (now a nested LLVM module). Rename the kernel function
				mravishankarUnsubmitted Done Reply Inline Actions Use `OpBuilder::InsertionGuard(rewriter)`. It will reset the insertion point after the scope. mravishankar: Use `OpBuilder::InsertionGuard(rewriter)`. It will reset the insertion point after the scope.
				// to adhere to convention.
				ModuleOp kernelModule;
				auto walkResult = module.walk([&](ModuleOp nested) -> WalkResult {
				if (nested.getParentOp()) {
				mravishankarUnsubmitted Done Reply Inline Actions The GPU Dialect adds the `gpu.container_module` to identify kernel side. Further the LLVM -> SPIR-V lowering preserve the `spv.target_env` attribute. Its better to check for one or both of these attributes to be more targeted in what is treated as a kernel module? mravishankar: The GPU Dialect adds the `gpu.container_module` to identify kernel side. Further the LLVM ->…
				georgemitenkovAuthorUnsubmitted Done Reply Inline Actions Thanks! I am not sure how `gpu.container_module` in general helps because it only identifies that it contains a kernel module, but may also have other modules in it? In this case, we can check for the module that is `!= gpu.container_module` as we consider only 2 modules. I think reusing `spv.target_env` and `spv.module`'s attributes is a great way, but for that we have to have a way of passing this info in SPIR-V to LLVM conversion. Something like module attires { "kernel" } { ... Also, I am wondering why GPU-> SPIR-V module conversion doesn't pass the symbolic name? If we would have a SPIR-V module with the name "spv.{gpu module name here}" for example, we can find the kernel straight away by looking up in symbol table? georgemitenkov: Thanks! I am not sure how `gpu.container_module` in general helps because it only identifies…
				mravishankarUnsubmitted Done Reply Inline Actions Also, I am wondering why GPU-> SPIR-V module conversion doesn't pass the symbolic name? If we would have a SPIR-V module with the name "spv.{gpu module name here}" for example, we can find the kernel straight away by looking up in symbol table? Good point. We didnt need it, but totally agree that the spv.module should have a symbol name as well. (See comment above). mravishankar: > Also, I am wondering why GPU-> SPIR-V module conversion doesn't pass the symbolic name? If we…
				kernelModule = nested;
				renameKernel(kernelModule);
				return WalkResult::interrupt();
				}
				return WalkResult::advance();
				});
				if (!walkResult.wasInterrupted())
				return failure();

				// Kernel's memref arguments are converted into MemRefDescriptors in LLVM.
				// Walk over pointers to the allocated data and the number of elements to
				// calculate the number of bytes to copy.
				SmallVector<ValueRange, 4> copyTriples;
				for (unsigned i = 0, e = op.getNumOperands(); i < e;
				i += memRefDescriptorNumElements) {
				Value src = op.getOperand(i);

				// Calculate a size of buffer in bytes. Support only integers for now for
				// easier calculation.
				auto elementType =
				src.getType().cast<LLVM::LLVMPointerType>().getPointerElementTy();
				auto integerType = elementType.dyn_cast<LLVM::LLVMIntegerType>();
				if (!integerType)
				return failure();
				unsigned elementSizeInBytes = integerType.getBitWidth() / 8;
				auto llvmI64Type = LLVM::LLVMType::getInt64Ty(context);
				Value constantSize = rewriter.create<LLVM::ConstantOp>(
				loc, llvmI64Type, rewriter.getI64IntegerAttr(elementSizeInBytes));
				Value numElements = op.getOperand(i + numElementsOffset);
				Value size = rewriter.create<LLVM::MulOp>(loc, numElements, constantSize);

				// Look up the global variable from the kernel module. For each, create a
				// global variable that will be linked with the global from the kernel
				// module. We follow convention taken in EncodeDescriptorSetsPass to label
				// globals as __var__{i}, where i is incremented sequentially.
				std::string symName =
				"__var__" + std::to_string(i / memRefDescriptorNumElements);
				auto extrenalGlobal = kernelModule.lookupSymbol<LLVM::GlobalOp>(symName);
				rewriter.setInsertionPointToStart(module.getBody());
				auto global = rewriter.create<LLVM::GlobalOp>(
				loc, extrenalGlobal.type().cast<LLVM::LLVMType>(),
				/isConstant=/false, LLVM::Linkage::Linkonce, symName, Attribute());
				rewriter.setInsertionPoint(op);

				// Get the address of created global. Then copy data from src pointer to
				// dst pointer.
				Value dst = rewriter.create<LLVM::AddressOfOp>(loc, global);
				copyData(loc, dst, src, size, rewriter);

				// Store src and dst pointers to reuse after the kernel call.
				copyTriples.push_back({dst, src, size});
				}

				// Emulate the kernel call and then copy the data back from global variables
				// to buffers.
				rewriter.replaceOpWithNewOp<LLVM::CallOp>(op, kernelFunc,
				ArrayRef<Value>());
				for (auto triple : copyTriples) {
				copyData(loc, triple[1], triple[0], triple[2], rewriter);
				}
				return success();
				}
				};
				} // namespace

				void GPULaunchFuncToStandardCall::runOnOperation() {
				ModuleOp module = getOperation();
				auto *context = module.getContext();
				OwningRewritePatternList patterns;
				patterns.insert<KernelCallPattern>(context);

				// Convert `gpu.launch_func` to standard function call.
				ConversionTarget target(*context);
				rriddleUnsubmitted Done Reply Inline Actions nit: Spell out auto. rriddle: nit: Spell out auto.
				target.addIllegalOp<gpu::LaunchFuncOp>();
				target.addLegalOp<CallOp>();
				target.addLegalOp<FuncOp>();
				if (failed(applyPartialConversion(module, target, patterns))) {
				signalPassFailure();
				}

				// Erase GPU module.
				for (auto gpuModule :
				llvm::make_early_inc_range(getOperation().getOps<gpu::GPUModuleOp>()))
				gpuModule.erase();
				}

				void EmulateKernelCallInLLVM::runOnOperation() {
				ModuleOp module = getOperation();

				// Check if the module structure can be supported by the current conversion.
				// For now, only 2 modules (main and kernel) are supported.
				if (!hasOneNestedModuleAndOneKernel(module))
				module.emitError("Module should contain exactly one nested module");

				// Emulate kernel call.
				auto *context = module.getContext();
				OwningRewritePatternList patterns;
				patterns.insert<StandardCallPattern>(context);
				ConversionTarget target(*context);
				target.addLegalDialect<LLVM::LLVMDialect>();
				target.addDynamicallyLegalOp<LLVM::CallOp>(
				[&](LLVM::CallOp op) { return !isLaunchCallOp(op); });
				if (failed(applyPartialConversion(module, target, patterns))) {
				signalPassFailure();
				}

				// Remove unused functions that were generated during the previous pass.
				clearFunctionDefinitions(module);
				}
				rriddleUnsubmitted Done Reply Inline Actions nit: Drop trivial braces. rriddle: nit: Drop trivial braces.

				std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
				mlir::createGPULaunchFuncToStandardCallPass() {
				return std::make_unique<GPULaunchFuncToStandardCall>();
				}

				std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
				mlir::createEmulateKernelCallInLLVMPass() {
				return std::make_unique<EmulateKernelCallInLLVM>();
				}

mlir/lib/Dialect/SPIRV/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRSPIRVTransforms			add_mlir_dialect_library(MLIRSPIRVTransforms
	DecorateSPIRVCompositeTypeLayoutPass.cpp			DecorateSPIRVCompositeTypeLayoutPass.cpp
				EncodeDescriptorSetsPass.cpp
	LowerABIAttributesPass.cpp			LowerABIAttributesPass.cpp
	RewriteInsertsPass.cpp			RewriteInsertsPass.cpp
	UpdateVCEPass.cpp			UpdateVCEPass.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/SPIRV			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/SPIRV

	DEPENDS			DEPENDS
	MLIRSPIRVPassIncGen			MLIRSPIRVPassIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRPass			MLIRPass
	MLIRSPIRV			MLIRSPIRV
	)			)

mlir/lib/Dialect/SPIRV/Transforms/EncodeDescriptorSetsPass.cpp

This file was added.

				//===- EncodeDescriptorSetsPass.cpp ---------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass encodes set and binding of the global variable into its symbolic
				// name.
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Dialect/SPIRV/Passes.h"
				#include "mlir/Dialect/SPIRV/SPIRVDialect.h"
				#include "mlir/Dialect/SPIRV/SPIRVLowering.h"
				#include "mlir/Dialect/SPIRV/SPIRVOps.h"
				#include "mlir/IR/SymbolTable.h"
				#include "mlir/Transforms/DialectConversion.h"
				#include "llvm/Support/FormatVariadic.h"

				#define BINDING_NAME \
				llvm::convertToSnakeFromCamelCase( \
				stringifyDecoration(spirv::Decoration::Binding))
				#define DESCRIPTOR_SET_NAME \
				llvm::convertToSnakeFromCamelCase( \
				stringifyDecoration(spirv::Decoration::DescriptorSet))

				using namespace mlir;

				/// Returns true if the given global variable has both a descriptor set numer
				/// and a binding number.
				static bool hasDescriptorSetAndBinding(spirv::GlobalVariableOp op) {
				IntegerAttr descriptorSet =
				op.getAttrOfType<IntegerAttr>(DESCRIPTOR_SET_NAME);
				IntegerAttr binding = op.getAttrOfType<IntegerAttr>(BINDING_NAME);
				return descriptorSet && binding;
				}

				namespace {

				class EncodeDescriptorSetsPass
				: public SPIRVEncodeDescriptorSetsBase<EncodeDescriptorSetsPass> {
				public:
				void runOnOperation() override {
				spirv::ModuleOp module = getOperation();
				unsigned i = 0;

				// Walk over all `spv.globalVariable` ops within the module. If the variable
				// has a bind attribute, update the symbol and all symbol's uses.
				module.walk([&](spirv::GlobalVariableOp op) {
				if (hasDescriptorSetAndBinding(op)) {
				// For now, we do not need to store bind attribute info. May need
				// to revisit in the future.
				// TODO: encode the data from the descriptor set and binding
				// numbers in the global variable.
				std::string name = llvm::formatv("__var__{0}", i++);
				if (failed(SymbolTable::replaceAllSymbolUses(op, name, module)))
				return signalPassFailure();

				// Set new symbol name and remove unused attributes.
				SymbolTable::setSymbolName(op, name);
				op.removeAttr(DESCRIPTOR_SET_NAME);
				op.removeAttr(BINDING_NAME);
				}
				});
				}
				};
				} // namespace

				std::unique_ptr<OperationPass<spirv::ModuleOp>>
				mlir::spirv::createEncodeDescriptorSetsPass() {
				return std::make_unique<EncodeDescriptorSetsPass>();
				}

mlir/test/Conversion/GPUCommon/emulate-kernel-call.mlir

This file was added.

				// RUN: mlir-opt --emulate-kernel-llvm-call %s \| FileCheck %s

				// CHECK: module
				// CHECK: llvm.mlir.global linkonce @__var__0() : !llvm.struct<packed (array<6 x i32>)>
				// CHECK-LABEL: @__llvm_launch
				// CHECK: module
				// CHECK-LABEL: @__llvm_launch

				// CHECK-LABEL: @main
				// CHECK: %[[ELEMENT_SIZE:.*]] = llvm.mlir.constant(4 : i64) : !llvm.i64
				// CHECK: %[[SIZE:.]] = llvm.mul %{{.}}, %[[ELEMENT_SIZE]] : !llvm.i64
				// CHECK: %[[DST:.*]] = llvm.mlir.addressof @__var__0 : !llvm.ptr<struct<packed (array<6 x i32>)>>
				// CHECK: llvm.mlir.constant(false) : !llvm.i1
				// CHECK: "llvm.intr.memcpy"(%[[DST]], %{{.}}, %[[SIZE]], %{{.}}) : (!llvm.ptr<struct<packed (array<6 x i32>)>>, !llvm.ptr<i32>, !llvm.i64, !llvm.i1) -> ()
				// CHECK: llvm.call @__llvm_launch() : () -> ()
				// CHECK: llvm.mlir.constant(false) : !llvm.i1
				// CHECK: "llvm.intr.memcpy"(%{{.}}, %[[DST]], %[[SIZE]], %{{.}}) : (!llvm.ptr<i32>, !llvm.ptr<struct<packed (array<6 x i32>)>>, !llvm.i64, !llvm.i1) -> ()

				module attributes {gpu.container_module, spv.target_env = #spv.target_env<#spv.vce<v1.0, [Shader], [SPV_KHR_variable_pointers]>, {max_compute_workgroup_invocations = 128 : i32, max_compute_workgroup_size = dense<[128, 128, 64]> : vector<3xi32>}>} {
				llvm.func @__std_launch(%arg0: !llvm.ptr<i32>, %arg1: !llvm.ptr<i32>, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64) {
				llvm.return
				}
				llvm.func @_mlir_ciface___std_launch(!llvm.ptr<struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>>)
				module {
				llvm.mlir.global external @__var__0() : !llvm.struct<packed (array<6 x i32>)>
				llvm.func @simple() {
				llvm.return
				}
				}
				llvm.func @main() {
				%0 = llvm.mlir.constant(6 : index) : !llvm.i64
				%1 = llvm.mlir.null : !llvm.ptr<i32>
				%2 = llvm.mlir.constant(1 : index) : !llvm.i64
				%3 = llvm.getelementptr %1[%2] : (!llvm.ptr<i32>, !llvm.i64) -> !llvm.ptr<i32>
				%4 = llvm.ptrtoint %3 : !llvm.ptr<i32> to !llvm.i64
				%5 = llvm.mul %0, %4 : !llvm.i64
				%6 = llvm.mlir.constant(1 : index) : !llvm.i64
				%7 = llvm.call @malloc(%5) : (!llvm.i64) -> !llvm.ptr<i8>
				%8 = llvm.bitcast %7 : !llvm.ptr<i8> to !llvm.ptr<i32>
				%9 = llvm.mlir.undef : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%10 = llvm.insertvalue %8, %9[0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%11 = llvm.insertvalue %8, %10[1] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%12 = llvm.mlir.constant(0 : index) : !llvm.i64
				%13 = llvm.insertvalue %12, %11[2] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%14 = llvm.mlir.constant(1 : index) : !llvm.i64
				%15 = llvm.insertvalue %0, %13[3, 0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%16 = llvm.insertvalue %14, %15[4, 0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%17 = llvm.mlir.constant(4 : i32) : !llvm.i32
				%18 = llvm.bitcast %16 : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)> to !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%19 = llvm.extractvalue %18[0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%20 = llvm.extractvalue %18[1] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%21 = llvm.extractvalue %18[2] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%22 = llvm.extractvalue %18[3, 0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%23 = llvm.extractvalue %18[4, 0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				llvm.call @fillI32Buffer(%19, %20, %21, %22, %23, %17) : (!llvm.ptr<i32>, !llvm.ptr<i32>, !llvm.i64, !llvm.i64, !llvm.i64, !llvm.i32) -> ()
				%24 = llvm.mlir.constant(1 : index) : !llvm.i64
				%25 = llvm.extractvalue %16[0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%26 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%27 = llvm.extractvalue %16[2] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%28 = llvm.extractvalue %16[3, 0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				%29 = llvm.extractvalue %16[4, 0] : !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>
				llvm.call @__std_launch(%25, %26, %27, %28, %29) : (!llvm.ptr<i32>, !llvm.ptr<i32>, !llvm.i64, !llvm.i64, !llvm.i64) -> ()
				%30 = llvm.mlir.constant(1 : index) : !llvm.i64
				%31 = llvm.alloca %30 x !llvm.struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)> : (!llvm.i64) -> !llvm.ptr<struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>>
				llvm.store %16, %31 : !llvm.ptr<struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>>
				%32 = llvm.bitcast %31 : !llvm.ptr<struct<(ptr<i32>, ptr<i32>, i64, array<1 x i64>, array<1 x i64>)>> to !llvm.ptr<i8>
				%33 = llvm.mlir.constant(1 : i64) : !llvm.i64
				%34 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
				%35 = llvm.insertvalue %33, %34[0] : !llvm.struct<(i64, ptr<i8>)>
				%36 = llvm.insertvalue %32, %35[1] : !llvm.struct<(i64, ptr<i8>)>
				%37 = llvm.extractvalue %36[0] : !llvm.struct<(i64, ptr<i8>)>
				%38 = llvm.extractvalue %36[1] : !llvm.struct<(i64, ptr<i8>)>
				llvm.call @print_memref_i32(%37, %38) : (!llvm.i64, !llvm.ptr<i8>) -> ()
				llvm.return
				}

				llvm.func @fillI32Buffer(%arg0: !llvm.ptr<i32>, %arg1: !llvm.ptr<i32>, %arg2: !llvm.i64, %arg3: !llvm.i64, %arg4: !llvm.i64, %arg5: !llvm.i32)
				llvm.func @print_memref_i32(%arg0: !llvm.i64, %arg1: !llvm.ptr<i8>)
				}

mlir/test/Conversion/GPUCommon/gpu-launch-to-std-call.mlir

This file was added.

				// RUN: mlir-opt --gpu-launch-to-std-call %s \| FileCheck %s

				module attributes {gpu.container_module, spv.target_env = #spv.target_env<#spv.vce<v1.0, [Shader], [SPV_KHR_variable_pointers]>, {max_compute_workgroup_invocations = 128 : i32, max_compute_workgroup_size = dense<[128, 128, 64]> : vector<3xi32>}>} {

				// CHECK: func @__std_launch(memref<6xi32>, memref<6xi32>)

				// CHECK-LABEL: @main
				// CHECK: %[[BUFFER1:.*]] = alloc() : memref<6xi32>
				// CHECK: %[[BUFFER2:.*]] = alloc() : memref<6xi32>
				// CHECK: call @__std_launch(%[[BUFFER1]], %[[BUFFER2]]) : (memref<6xi32>, memref<6xi32>) -> ()

				spv.module Logical GLSL450 requires #spv.vce<v1.0, [Shader], [SPV_KHR_variable_pointers]> {
				spv.globalVariable @__var__0 : !spv.ptr<!spv.struct<!spv.array<6 x i32, stride=4> [0]>, StorageBuffer>
				spv.globalVariable @__var__1 : !spv.ptr<!spv.struct<!spv.array<6 x i32, stride=4> [0]>, StorageBuffer>
				spv.func @simple() "None" attributes {workgroup_attributions = 0 : i64} {
				%0 = spv._address_of @__var__1 : !spv.ptr<!spv.struct<!spv.array<6 x i32, stride=4> [0]>, StorageBuffer>
				%1 = spv._address_of @__var__0 : !spv.ptr<!spv.struct<!spv.array<6 x i32, stride=4> [0]>, StorageBuffer>
				spv.Return
				}
				spv.EntryPoint "GLCompute" @simple
				spv.ExecutionMode @simple "LocalSize", 1, 1, 1
				}

				gpu.module @kernels {
				gpu.func @simple(%arg0: memref<6xi32>, %arg1: memref<6xi32>) kernel attributes {spv.entry_point_abi = {local_size = dense<1> : vector<3xi32>}} {
				%c5_i32 = constant 5 : i32
				%c0 = constant 0 : index
				%0 = load %arg0[%c0] : memref<6xi32>
				%1 = addi %0, %c5_i32 : i32
				store %1, %arg1[%c0] : memref<6xi32>
				gpu.return
				}
				}

				func @main() {
				%0 = alloc() : memref<6xi32>
				%1 = alloc() : memref<6xi32>
				%c4_i32 = constant 4 : i32
				%c3_i32 = constant 3 : i32
				%2 = memref_cast %0 : memref<6xi32> to memref<?xi32>
				%3 = memref_cast %1 : memref<6xi32> to memref<?xi32>
				call @fillI32Buffer(%2, %c3_i32) : (memref<?xi32>, i32) -> ()
				call @fillI32Buffer(%3, %c4_i32) : (memref<?xi32>, i32) -> ()
				%c1 = constant 1 : index
				"gpu.launch_func"(%c1, %c1, %c1, %c1, %c1, %c1, %0, %1) {kernel = @kernels::@simple} : (index, index, index, index, index, index, memref<6xi32>, memref<6xi32>) -> ()
				%4 = memref_cast %1 : memref<6xi32> to memref<*xi32>
				call @print_memref_i32(%4) : (memref<*xi32>) -> ()
				return
				}
				func @fillI32Buffer(memref<?xi32>, i32)
				func @print_memref_i32(memref<*xi32>)
				}

mlir/test/Dialect/SPIRV/Transforms/descriptor-sets-encoding.mlir

This file was added.

				// RUN: mlir-opt -spirv-encode-descriptor-sets -verify-diagnostics %s \| FileCheck %s

				spv.module Logical GLSL450 {

				// CHECK: spv.module
				// CHECK: spv.globalVariable [[VAR0:@.*__var__0]] : !spv.ptr<i32, StorageBuffer>
				// CHECK: spv.globalVariable [[VAR1:@.*__var__1]] : !spv.ptr<i32, StorageBuffer>
				// CHECK: spv.globalVariable [[VAR2:@.*]] : !spv.ptr<i32, Input>
				// CHECK: spv.func @func0
				// CHECK: spv._address_of [[VAR0]]
				// CHECK: spv._address_of [[VAR1]]
				// CHECK: spv.func @func1
				// CHECK: spv._address_of [[VAR1]]
				// CHECK: spv._address_of [[VAR2]]

				spv.globalVariable @var0 bind(0,0) : !spv.ptr<i32, StorageBuffer>
				spv.globalVariable @var1 bind(0,1) : !spv.ptr<i32, StorageBuffer>
				spv.globalVariable @var2 : !spv.ptr<i32, Input>
				spv.func @func0() -> () "None" {
				%ptr0 = spv._address_of @var0 : !spv.ptr<i32, StorageBuffer>
				%ptr1 = spv._address_of @var1 : !spv.ptr<i32, StorageBuffer>
				spv.Return
				}
				spv.func @func1() -> () "None" {
				%ptr1 = spv._address_of @var1 : !spv.ptr<i32, StorageBuffer>
				%ptr2 = spv._address_of @var2 : !spv.ptr<i32, Input>
				spv.Return
				}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVMClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286155

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Dialect/SPIRV/Passes.h

mlir/include/mlir/Dialect/SPIRV/Passes.td

mlir/lib/Conversion/GPUCommon/CMakeLists.txt

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToLLVMCalls.cpp

mlir/lib/Dialect/SPIRV/Transforms/CMakeLists.txt

mlir/lib/Dialect/SPIRV/Transforms/EncodeDescriptorSetsPass.cpp

mlir/test/Conversion/GPUCommon/emulate-kernel-call.mlir

mlir/test/Conversion/GPUCommon/gpu-launch-to-std-call.mlir

mlir/test/Dialect/SPIRV/Transforms/descriptor-sets-encoding.mlir

[MLIR][mlir-spirv-cpu-runner] A pass to emulate a call to kernel in LLVM
ClosedPublic