This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td
24	Wondering why is this in the gpu dialect and not a target specific one? (like nvvm)
83	Nit: `hasFastMath` since it returns a bool. I would think that `getFastMath` would return a FastMathFlag attribute.

mehdi_amini added inline comments.Jul 24 2023, 12:38 AM

mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp
214	We should have options to stop after emitting LLVMIR and PTX, seems like we have to go to cubin right now? I would also think that going to PTX can be possible without requiring any dependency on the Cuda toolkit?

Do we have a way to test this without too much of the other code by the way?

In D154117#4526902, @mehdi_amini wrote:

Do we have a way to test this without too much of the other code by the way?

Not really? I added some on the final patches were you have all the infra ready. The best test I can think of at this point, is an unit test for this serializer?

mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td
24	I initially added these attributes (NVPTX & AMDGPU) to `nvvm` & `rocdl`, however I decided against that in the final patch because it added GPU dependencies to those dialects (includes & libs) and created more libs instead of a single (GPUTargets). I'm open to change it.
83	I'll change it.
mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp
214	These series of patches intend to be a full replacement of the current pipeline, upon approval we should intermediately deprecate the current pipeline and then remove it after a deprecation period. The idea is that in future patches we change this behavior and stop at LLVM, but that requires having the LLVM Offload work done. The benefit of this approach is that those future changes would be invisible to users and at the same time it makes them adopt this new mechanism now, that's why I go straight to cubin.

mehdi_amini added inline comments.Jul 24 2023, 10:47 PM

mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td
24	Where will the dependency come from? The GPUTargetAttrInterface? Maybe we should revisit the way the interface is specified. Worse case we're just one level of indirection away :) Here if the issue is just the gpu::TargetOptions, maybe it can be declared with the interface definition, which makes it an isolated library?
mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp
214	OK, thanks for explaining, can you add a TODO with this info in the code?

Move the target attribute to NVVM and added it as an external model that it's promised.

Herald added a reviewer: ftynse. · View Herald TranscriptJul 30 2023, 5:07 PM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, awarzynski. · View Herald Transcript

Harbormaster completed remote builds in B249090: Diff 545485.Jul 30 2023, 5:08 PM

Fix typo in unit test.

Harbormaster completed remote builds in B249091: Diff 545486.Jul 30 2023, 5:12 PM

fmorac retitled this revision from [mlir][gpu] Adds the NVPTX target attribute. to [mlir][NVVM] Adds the NVVM target attribute..Jul 30 2023, 5:13 PM

fmorac edited the summary of this revision. (Show Details)

mehdi_amini added inline comments.Jul 30 2023, 5:53 PM

mlir/lib/Target/LLVM/NVVM/Target.cpp
55 ↗	(On Diff #545486)	Do we need to the init in the registration? Can we do the init when the serialization is called instead?
mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
33 ↗	(On Diff #545486)	Not clear to me where we need the native target?
46 ↗	(On Diff #545486)	Can we tone this down? There is no reason to link all of MLIR into this unit-test.

Addressed comments.

Harbormaster completed remote builds in B249099: Diff 545495.Jul 30 2023, 6:19 PM

fmorac mentioned this in D154100: [mlir][Target][LLVM] Adds an utility class for serializing operations to binary strings..Jul 31 2023, 5:01 AM

mehdi_amini accepted this revision.Aug 1 2023, 3:06 PM

This revision is now accepted and ready to land.Aug 1 2023, 3:06 PM

(Waiting on update to use ptxas)

This revision now requires changes to proceed.Aug 1 2023, 7:12 PM

Swtiched to ptxas, added options for stopping compilation at LLVM IR & PTX, as well as unit tests testing the added functionalty.

Harbormaster completed remote builds in B249749: Diff 546432.Aug 2 2023, 8:20 AM

Bring @tra here to get feedback on pros/cons of invoking ptxas through a temp file vs using the library APIs exposed by nvptxcompile.

In D154117#4554942, @mehdi_amini wrote:

Bring @tra here to get feedback on pros/cons of invoking ptxas through a temp file vs using the library APIs exposed by nvptxcompile.

One immediate benefit is that we don't longer have a dependency on the toolkit to build this. You can detect ptxas by setting the env variable CUDA_ROOT to the location of the toolkit or by adding it to PATH.

mlir/lib/Target/LLVM/NVVM/Target.cpp
55 ↗	(On Diff #545486)	Yes, it's possible I'll change it.
mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
33 ↗	(On Diff #545486)	Not needed, removed.
46 ↗	(On Diff #545486)	Yes, I've removed it.

In D154117#4554966, @fmorac wrote:

In D154117#4554942, @mehdi_amini wrote:

Bring @tra here to get feedback on pros/cons of invoking ptxas through a temp file vs using the library APIs exposed by nvptxcompile.

One immediate benefit is that we don't longer have a dependency on the toolkit to build this. You can detect ptxas by setting the env variable CUDA_ROOT to the location of the toolkit or by adding it to PATH.

I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.

guraypp added inline comments.Aug 3 2023, 7:35 AM

mlir/lib/Target/LLVM/NVVM/Target.cpp
290 ↗	(On Diff #546432)	It would be nice to pass parameters to `ptxas` from MLIR.
325 ↗	(On Diff #546432)	What do you think about passsing `-v` always to `ptxas` and print the output (perhaps stdout file) using `llvm::errs`. I found `-v` very useful to see compile-time register usage and local memory.

guraypp added inline comments.Aug 3 2023, 7:43 AM

mlir/lib/Target/LLVM/NVVM/Target.cpp
286 ↗	(On Diff #546432)	Can we keep the PTX file? After codegen, it is very natural to look at the PTX file, if we keep the file via a flag, I think it would be great. I implemented the `dump-ptx` flag to the `gpu-to-cubin` Pass earlier. I guess we cannot use that flag here.

fmorac marked 3 inline comments as done.Aug 3 2023, 9:47 AM

fmorac added inline comments.

mlir/lib/Target/LLVM/NVVM/Target.cpp
286 ↗	(On Diff #546432)	If you pass `-debug-only=serialize-to-ptx` (I'm changing it to `serialize-to-isa`) to `mlir-opt` then the PTX file will be printed to `stdout`. This mechanism also allows emitting PTX instead of binary, so you have multiple ways of obtaining the PTX.
290 ↗	(On Diff #546432)	You can, the args are passed through the `targetOptions` variable. Below you'll find that I add them to the PTX invocation.
325 ↗	(On Diff #546432)	We could consider adding a dedicated variable, but as it stands you can pass that variable through the cmdline and use `--debug-only=serialize-to-binary` to dump that output into `stdout`.

I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.

nvptxcompile provides the same facility I believe, doesn't it?

In D154117#4558565, @mehdi_amini wrote:

I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.

nvptxcompile provides the same facility I believe, doesn't it?

Though I haven't personally worked with nvptxcompile, it appears to be bundled with the toolkit. If one could use different toolkit version for PTX compilation, then yes it gives the same facility.

Recently, I used the driver (current MLIR compilation) and ptxas for PTX compilation. I noticed that even if I feed the same PTX code, the driver occasionally generates different SASS with ptxas. Maybe I hit a corner case but it was really hard to find the reason.

I believe it's crucial for us to know the potential disparities between nvptxcompile and ptxas if there is any.

My 2 cents, as far as I know the nvptxcompiler and ptxas should produce the same code for a given release. However unlike ptxas the only way to change the version of nvptxcompiler is to recompile the NVVMTarget library with a different version.

Adding support for nvptxcompiler is like extra 20 lines of code in this patch, and a couple on CMake, how do we feel about having both?

If nvptxcompiler is detected during build then nvptxcompiler is used, if not, then ptxas is used, and we also give the user the ability to choose on CMake?

If nvptxcompiler is detected during build then nvptxcompiler is used, if not, then ptxas is used, and we also give the user the ability to choose on CMake?

Adding a CMake option looks good to me, I would pick a default and stick to it (that is fail the cmake configuration with a helpful message).

I don't like "magic" fallback because detection is fragile and it makes the behavior dependent on the environment of the user. It's harder to know what you build, and getting bug reports it makes it also harder to know what's happening.

Added NVPTX compiler support & removed unnecessary fwd declarations. The CMake variable MLIR_ENABLE_NVPTXCOMPILER controls the usage of the NVPTXCompiler library and is disabled by default, as CMake 3.20 doesn't support CUDA::nvptxcompiler_static to find the library.

Harbormaster completed remote builds in B250743: Diff 547733.Aug 7 2023, 6:42 AM

tra added inline comments.Aug 7 2023, 11:05 AM

mlir/lib/Target/LLVM/NVVM/Target.cpp
123 ↗	(On Diff #547733)	A word of caution -- some linux distributions scatter CUDA SDK across the 'standard' linux filsystem paths, so a single `getToolkitPath()` would not be able to find all the necessary bits, as libdevice and the binaries will be in different places. You may need additional heuristics along the lines of what clang driver does. https://github.com/llvm/llvm-project/blob/1b74459df8a6d960f7387f0c8379047e42811f58/clang/lib/Driver/ToolChains/Cuda.cpp#L180

fmorac added inline comments.Aug 7 2023, 2:03 PM

mlir/lib/Target/LLVM/NVVM/Target.cpp
123 ↗	(On Diff #547733)	Thank you, that's good to know. I'll see how to rework it or add docs indicating how to make it work. As it stands the user could specify the CUDA path to `/usr/lib/cuda` for `libdevice`, somewhat similar to `--cuda-path`. For `ptxas` this mechanism searches several places in the following order: Command line option. PATH variable. The toolkit path as specified by a couple of env variables or the one detected by CMake. So it would work on Debian & Ubuntu, but the mechanism could be over burdening the user.

Reapplied clang-format.
Adds a couple of syntax tests & fixes a bug not displaying the link option in the attribute.

Thanks Fabian!

mlir/CMakeLists.txt
122 ↗	(On Diff #548030)	The description isn't super clear, because it conflicts with what "the NVPTX backend" is in LLVM. What about: "Statically link the nvptxlibrary instead of calling `ptxas` as a subprocess for compiling PTX to cubin" ?

This revision is now accepted and ready to land.Aug 7 2023, 7:37 PM

Harbormaster completed remote builds in B250968: Diff 548030.Aug 7 2023, 9:23 PM

Changed the description of the CMake variable MLIR_ENABLE_NVPTXCOMPILER. Applied clang-format again.

Harbormaster completed remote builds in B251075: Diff 548180.Aug 8 2023, 7:36 AM

Fixed merge conflicts with D157183.

Harbormaster completed remote builds in B251122: Diff 548238.Aug 8 2023, 11:43 AM

Closed by commit rG211c9752c820: [mlir][NVVM] Adds the NVVM target attribute. (authored by fmorac). · Explain WhyAug 8 2023, 12:21 PM

This revision was automatically updated to reflect the committed changes.

fmorac added a commit: rG211c9752c820: [mlir][NVVM] Adds the NVVM target attribute..

fmorac mentioned this in rGb43068e8707d: [mlir][gpu] Update GPU translation to accept binaries..Aug 11 2023, 5:29 PM

guraypp mentioned this in D159347: [MLIR] Run the TMA test for sm_90.Sep 1 2023, 3:20 AM

guraypp mentioned this in rG8031a088eb40: [MLIR] Run the TMA test for sm_90.Sep 4 2023, 9:15 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUCompilationAttr.td

91 lines

GPUOps.td

1 line

lib/

Dialect/

GPU/

CMakeLists.txt

54 lines

Targets/

NVPTXTarget.cpp

254 lines

Diff 535906

mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td

This file was added.

				//===-- GPUTargetAttr.td - GPU compilation attributes ------- tablegen --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines the GPU NVPTX target attribute.
				//
				//===----------------------------------------------------------------------===//

				#ifndef GPU_COMPILATIONATTR
				#define GPU_COMPILATIONATTR

				include "mlir/Dialect/GPU/IR/GPUBase.td"
				include "mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td"

				//===----------------------------------------------------------------------===//
				// GPU NVPTX target attribute.
				//===----------------------------------------------------------------------===//

				def GPU_NVPTXTargetAttr : GPU_Attr<"NVPTXTarget", "nvptx", [
				DeclareAttrInterfaceMethods<GPUTargetAttrInterface, [
				mehdi_aminiUnsubmitted Done Reply Inline Actions Wondering why is this in the gpu dialect and not a target specific one? (like nvvm) mehdi_amini: Wondering why is this in the gpu dialect and not a target specific one? (like nvvm)
				fmoracAuthorUnsubmitted Done Reply Inline Actions I initially added these attributes (NVPTX & AMDGPU) to `nvvm` & `rocdl`, however I decided against that in the final patch because it added GPU dependencies to those dialects (includes & libs) and created more libs instead of a single (GPUTargets). I'm open to change it. fmorac: I initially added these attributes (NVPTX & AMDGPU) to `nvvm` & `rocdl`, however I decided…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Where will the dependency come from? The GPUTargetAttrInterface? Maybe we should revisit the way the interface is specified. Worse case we're just one level of indirection away :) Here if the issue is just the gpu::TargetOptions, maybe it can be declared with the interface definition, which makes it an isolated library? mehdi_amini: Where will the dependency come from? The GPUTargetAttrInterface? Maybe we should revisit the…
				"serializeToObject"
				]>
				]> {
				let description = [{
				NVPTX target attribute for controlling compilation of NVIDIA targets. All
				parameters decay into default values if not present.

				Examples:

				1. Target with default values.
				```
				gpu.module @mymodule [#gpu.nvptx] attributes {...} {
				...
				}
				```

				2. Target with `sm_90` chip and fast math.
				```
				gpu.module @mymodule [#gpu.nvptx<chip = "sm_90", flags = {fast}>] {
				...
				}
				```
				}];
				let parameters = (ins
				DefaultValuedParameter<"int", "2", "Optimization level to apply.">:$O,
				StringRefParameter<"Target triple.", "\"nvptx64-nvidia-cuda\"">:$triple,
				StringRefParameter<"Target chip.", "\"sm_50\"">:$chip,
				StringRefParameter<"Target chip features.", "\"+ptx60\"">:$features,
				OptionalParameter<"DictionaryAttr", "Target specific flags.">:$flags,
				OptionalParameter<"ArrayAttr", "Files to link to the LLVM module.">:$link
				);
				let assemblyFormat = [{
				(`<` struct($O, $triple, $chip, $features, $flags)^ `>`)?
				}];
				let builders = [
				AttrBuilder<(ins CArg<"int", "2">:$optLevel,
				CArg<"StringRef", "\"nvptx64-nvidia-cuda\"">:$triple,
				CArg<"StringRef", "\"sm_50\"">:$chip,
				CArg<"StringRef", "\"+ptx60\"">:$features,
				CArg<"DictionaryAttr", "nullptr">:$targetFlags,
				CArg<"ArrayAttr", "nullptr">:$linkFiles), [{
				return Base::get($_ctxt, optLevel, triple, chip, features, targetFlags, linkFiles);
				}]>
				];
				let skipDefaultBuilders = 1;
				let genVerifyDecl = 1;
				let extraClassDeclaration = [{
				bool hasFlag(StringRef flag) const;
				bool getFastMath() const;
				bool getFtz() const;
				}];
				let extraClassDefinition = [{
				bool $cppClass::hasFlag(StringRef flag) const {
				if (DictionaryAttr flags = getFlags())
				return flags.get(flag) != nullptr;
				return false;
				}
				bool $cppClass::getFastMath() const {
				return hasFlag("fast");
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Nit: `hasFastMath` since it returns a bool. I would think that `getFastMath` would return a FastMathFlag attribute. mehdi_amini: Nit: `hasFastMath` since it returns a bool. I would think that `getFastMath` would return a…
				fmoracAuthorUnsubmitted Done Reply Inline Actions I'll change it. fmorac: I'll change it.
				}
				bool $cppClass::getFtz() const {
				return hasFlag("ftz");
				}
				}];
				}

				#endif // GPU_COMPILATIONATTR

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

	Show All 10 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef GPU_OPS			#ifndef GPU_OPS
	#define GPU_OPS			#define GPU_OPS

	include "mlir/Dialect/DLTI/DLTIBase.td"			include "mlir/Dialect/DLTI/DLTIBase.td"
	include "mlir/Dialect/GPU/IR/GPUBase.td"			include "mlir/Dialect/GPU/IR/GPUBase.td"
	include "mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td"			include "mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td"
				include "mlir/Dialect/GPU/IR/GPUCompilationAttr.td"
	include "mlir/Dialect/GPU/IR/ParallelLoopMapperAttr.td"			include "mlir/Dialect/GPU/IR/ParallelLoopMapperAttr.td"
	include "mlir/Dialect/GPU/TransformOps/GPUDeviceMappingAttr.td"			include "mlir/Dialect/GPU/TransformOps/GPUDeviceMappingAttr.td"
	include "mlir/IR/EnumAttr.td"			include "mlir/IR/EnumAttr.td"
	include "mlir/IR/FunctionInterfaces.td"			include "mlir/IR/FunctionInterfaces.td"
	include "mlir/IR/SymbolInterfaces.td"			include "mlir/IR/SymbolInterfaces.td"
	include "mlir/Interfaces/DataLayoutInterfaces.td"			include "mlir/Interfaces/DataLayoutInterfaces.td"
	include "mlir/Interfaces/InferIntRangeInterface.td"			include "mlir/Interfaces/InferIntRangeInterface.td"
	include "mlir/Interfaces/InferTypeOpInterface.td"			include "mlir/Interfaces/InferTypeOpInterface.td"
	▲ Show 20 Lines • Show All 2,166 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/CMakeLists.txt

Show All 36 Lines	add_mlir_dialect_library(MLIRGPUDialect
LINK_LIBS PUBLIC		LINK_LIBS PUBLIC
MLIRArithDialect		MLIRArithDialect
MLIRDLTIDialect		MLIRDLTIDialect
MLIRInferIntRangeInterface		MLIRInferIntRangeInterface
MLIRIR		MLIRIR
MLIRMemRefDialect		MLIRMemRefDialect
MLIRSideEffectInterfaces		MLIRSideEffectInterfaces
MLIRSupport		MLIRSupport

		PRIVATE
		MLIRGPUTargets
		)

		add_mlir_dialect_library(MLIRGPUTargets
		Targets/NVPTXTarget.cpp

		ADDITIONAL_HEADER_DIRS
		${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU

		LINK_COMPONENTS
		Core
		MC
		Target
		${NVPTX_LIBS}

		LINK_LIBS PUBLIC
		MLIRIR
		MLIRExecutionEngineUtils
		MLIRSupport
		MLIRTargetLLVMIRExport

		PRIVATE
		MLIRGPUDialect
)		)

add_mlir_dialect_library(MLIRGPUTransforms		add_mlir_dialect_library(MLIRGPUTransforms
Transforms/AllReduceLowering.cpp		Transforms/AllReduceLowering.cpp
Transforms/AsyncRegionRewriter.cpp		Transforms/AsyncRegionRewriter.cpp
Transforms/GlobalIdRewriter.cpp		Transforms/GlobalIdRewriter.cpp
Transforms/KernelOutlining.cpp		Transforms/KernelOutlining.cpp
Transforms/MemoryPromotion.cpp		Transforms/MemoryPromotion.cpp
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	if(MLIR_ENABLE_CUDA_RUNNER)

find_library(CUDA_DRIVER_LIBRARY cuda HINTS ${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES} REQUIRED)		find_library(CUDA_DRIVER_LIBRARY cuda HINTS ${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES} REQUIRED)
target_link_libraries(MLIRGPUTransforms		target_link_libraries(MLIRGPUTransforms
PRIVATE		PRIVATE
MLIRNVVMToLLVMIRTranslation		MLIRNVVMToLLVMIRTranslation
${CUDA_DRIVER_LIBRARY}		${CUDA_DRIVER_LIBRARY}
)		)

		# Find the CUDA toolkit.
		if (NOT DEFINED CUDAToolkit_ROOT)
		find_package(CUDAToolkit)
		get_filename_component(CUDAToolkit_ROOT ${CUDAToolkit_BIN_DIR} DIRECTORY ABSOLUTE)
		endif()
		message(VERBOSE "MLIR Default CUDA toolkit path: ${CUDAToolkit_ROOT}")

		# Enable the gpu to cubin target.
		target_compile_definitions(obj.MLIRGPUTargets
		PRIVATE
		MLIR_GPU_NVPTX_TARGET_ENABLED=1
		__DEFAULT_CUDATOOLKIT_PATH__="${CUDAToolkit_ROOT}"
		)
		# Enable the gpu to cubin target.
		target_compile_definitions(obj.MLIRGPUTransforms
		PRIVATE
		MLIR_GPU_NVPTX_TARGET_ENABLED=1
		)

		# Add CUDA headers includes and the libcuda.so library.
		target_include_directories(obj.MLIRGPUTargets
		PRIVATE
		${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
		)
		target_link_libraries(MLIRGPUTargets
		PRIVATE
		${CUDA_DRIVER_LIBRARY}
		)

endif()		endif()

if(MLIR_ENABLE_ROCM_CONVERSIONS)		if(MLIR_ENABLE_ROCM_CONVERSIONS)
if (NOT ("AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD))		if (NOT ("AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD))
message(SEND_ERROR		message(SEND_ERROR
"Building mlir with ROCm support requires the AMDGPU backend")		"Building mlir with ROCm support requires the AMDGPU backend")
endif()		endif()

Show All 12 Lines

mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp

This file was added.

				//===- NVPTXTarget.cpp - MLIR GPU Dialect NVPTX target attribute ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This files implements the NVPTX target attribute.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/GPU/IR/GPUDialect.h"
				#include "mlir/Dialect/GPU/Transforms/Passes.h"

				using namespace mlir;
				using namespace mlir::gpu;

				#ifdef MLIR_GPU_NVPTX_TARGET_ENABLED
				#include "mlir/ExecutionEngine/ModuleToObject.h"
				#include "mlir/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.h"
				#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"
				#include "mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h"
				#include "mlir/Target/LLVMIR/Export.h"

				#include "llvm/Support/FileSystem.h"
				#include "llvm/Support/Path.h"
				#include "llvm/Support/TargetSelect.h"

				#ifndef __DEFAULT_CUDATOOLKIT_PATH__
				#define __DEFAULT_CUDATOOLKIT_PATH__ ""
				#endif

				#define DEBUG_TYPE "serialize-to-object"

				#include <cuda.h>

				static void emitCudaError(const llvm::Twine &expr, const char *buffer,
				CUresult result, Location loc) {
				const char *error;
				cuGetErrorString(result, &error);
				emitError(loc, expr.concat(" failed with error code ")
				.concat(llvm::Twine{error})
				.concat("[")
				.concat(buffer)
				.concat("]"));
				}

				#define RETURN_ON_CUDA_ERROR(expr) \
				do { \
				if (auto status = (expr)) { \
				emitCudaError(#expr, jitErrorBuffer, status, loc); \
				return {}; \
				} \
				} while (false)

				namespace {
				struct InitTarget {
				InitTarget() {
				LLVMInitializeNVPTXTarget();
				LLVMInitializeNVPTXTargetInfo();
				LLVMInitializeNVPTXTargetMC();
				LLVMInitializeNVPTXAsmPrinter();
				}
				};

				class SerializeToCubin : public ModuleToObject {
				public:
				SerializeToCubin(Operation &module, NVPTXTargetAttr target,
				TargetOptions targetOptions = {});

				// Init the target.
				static void init();

				std::optional<SmallVector<std::unique_ptr<llvm::Module>>>
				loadBitcodeFiles(llvm::LLVMContext &context, llvm::Module &module) override;

				std::optional<SmallVector<char, 0>>
				moduleToObject(llvm::Module &llvmModule,
				llvm::TargetMachine &targetMachine) override;

				private:
				StringRef toolkitPath;
				SmallVector<std::string> fileList;
				};
				} // namespace

				SerializeToCubin::SerializeToCubin(Operation &module, NVPTXTargetAttr target,
				TargetOptions targetOptions)
				: ModuleToObject(module, target.getTriple(), target.getChip(),
				target.getFeatures(), target.getO()),
				toolkitPath(targetOptions.getToolkitPath()),
				fileList(targetOptions.getBitcodeFiles()) {
				if (toolkitPath.empty())
				toolkitPath = __DEFAULT_CUDATOOLKIT_PATH__;

				if (ArrayAttr files = target.getLink())
				for (Attribute attr : files.getValue())
				if (auto file = dyn_cast<StringAttr>(attr))
				fileList.push_back(file.str());
				}

				void SerializeToCubin::init() { static InitTarget target = InitTarget(); }

				std::optional<SmallVector<std::unique_ptr<llvm::Module>>>
				SerializeToCubin::loadBitcodeFiles(llvm::LLVMContext &context,
				llvm::Module &module) {
				// Try loading `libdevice` from a CUDA toolkit installation.
				StringRef pathRef = toolkitPath;
				if (pathRef.size()) {
				SmallVector<char, 256> path;
				path.insert(path.begin(), pathRef.begin(), pathRef.end());
				pathRef = StringRef(path.data(), path.size());
				if (!llvm::sys::fs::is_directory(pathRef)) {
				getOperation().emitError() << "CUDA path: " << pathRef
				<< " does not exist or is not a directory.\n";
				return std::nullopt;
				}
				// TODO remove this hard coded path.
				llvm::sys::path::append(path, "nvvm", "libdevice", "libdevice.10.bc");
				pathRef = StringRef(path.data(), path.size());
				if (!llvm::sys::fs::is_regular_file(pathRef)) {
				getOperation().emitError() << "LibDevice path: " << pathRef
				<< " does not exist or is not a file.\n";
				return std::nullopt;
				}
				fileList.push_back(pathRef.str());
				}

				SmallVector<std::unique_ptr<llvm::Module>> bcFiles;
				if (failed(loadBitcodeFilesFromList(context, fileList, bcFiles, true)))
				return std::nullopt;
				return bcFiles;
				}

				std::optional<SmallVector<char, 0>>
				SerializeToCubin::moduleToObject(llvm::Module &llvmModule,
				llvm::TargetMachine &targetMachine) {
				std::optional<std::string> serializedISA =
				translateToISA(llvmModule, targetMachine);
				if (!serializedISA) {
				getOperation().emitError() << "Failed translating the module to ISA.";
				return std::nullopt;
				}

				LLVM_DEBUG({
				llvm::dbgs() << "ISA for module: "
				<< dyn_cast<GPUModuleOp>(&getOperation()).getNameAttr()
				<< "\n";
				llvm::dbgs() << *serializedISA << "\n";
				llvm::dbgs().flush();
				});

				auto loc = getOperation().getLoc();
				char jitErrorBuffer[4096] = {0};

				RETURN_ON_CUDA_ERROR(cuInit(0));

				// Linking requires a device context.
				CUdevice device;
				RETURN_ON_CUDA_ERROR(cuDeviceGet(&device, 0));
				CUcontext context;
				RETURN_ON_CUDA_ERROR(cuCtxCreate(&context, 0, device));
				CUlinkState linkState;

				CUjit_option jitOptions[] = {CU_JIT_ERROR_LOG_BUFFER,
				CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES};
				void *jitOptionsVals[] = {jitErrorBuffer,
				reinterpret_cast<void *>(sizeof(jitErrorBuffer))};

				RETURN_ON_CUDA_ERROR(cuLinkCreate(2, /* number of jit options */
				jitOptions, /* jit options */
				jitOptionsVals, /* jit option values */
				&linkState));

				auto kernelName = dyn_cast<gpu::GPUModuleOp>(getOperation()).getName().str();
				RETURN_ON_CUDA_ERROR(cuLinkAddData(
				linkState, CUjitInputType::CU_JIT_INPUT_PTX,
				const_cast<void >(static_cast<const void >(serializedISA->c_str())),
				serializedISA->length(), kernelName.c_str(),
				0, /* number of jit options */
				nullptr, /* jit options */
				nullptr /* jit option values */
				));

				void *cubinData;
				size_t cubinSize;
				RETURN_ON_CUDA_ERROR(cuLinkComplete(linkState, &cubinData, &cubinSize));

				char cubinAsChar = static_cast<char >(cubinData);
				auto result = SmallVector<char, 0>(cubinAsChar, cubinAsChar + cubinSize);

				// This will also destroy the cubin data.
				RETURN_ON_CUDA_ERROR(cuLinkDestroy(linkState));
				RETURN_ON_CUDA_ERROR(cuCtxDestroy(context));
				return result;
				}

				std::optional<SmallVector<char, 0>>
				NVPTXTargetAttr::serializeToObject(Operation *module,
				const TargetOptions &options) const {
				assert(module && "The module must be non null.");
				if (!module)
				return std::nullopt;
				if (!mlir::isa<GPUModuleOp>(module)) {
				module->emitError("Module must be a GPU module.");
				return std::nullopt;
				}
				SerializeToCubin::init();
				SerializeToCubin serializer(module, this, options);
				return serializer.run();
				}

				#else
				mehdi_aminiUnsubmitted Done Reply Inline Actions We should have options to stop after emitting LLVMIR and PTX, seems like we have to go to cubin right now? I would also think that going to PTX can be possible without requiring any dependency on the Cuda toolkit? mehdi_amini: We should have options to stop after emitting LLVMIR and PTX, seems like we have to go to cubin…
				fmoracAuthorUnsubmitted Done Reply Inline Actions These series of patches intend to be a full replacement of the current pipeline, upon approval we should intermediately deprecate the current pipeline and then remove it after a deprecation period. The idea is that in future patches we change this behavior and stop at LLVM, but that requires having the LLVM Offload work done. The benefit of this approach is that those future changes would be invisible to users and at the same time it makes them adopt this new mechanism now, that's why I go straight to cubin. fmorac: These series of patches intend to be a full replacement of the current pipeline, upon approval…
				mehdi_aminiUnsubmitted Done Reply Inline Actions OK, thanks for explaining, can you add a TODO with this info in the code? mehdi_amini: OK, thanks for explaining, can you add a TODO with this info in the code?
				// Provide a null vector for testing purposes.
				std::optional<SmallVector<char, 0>>
				NVPTXTargetAttr::serializeToObject(Operation *module,
				const TargetOptions &options) const {
				assert(module && "The module must be non null.");
				if (!module)
				return std::nullopt;
				if (!mlir::isa<GPUModuleOp>(module)) {
				module->emitError("Module must be a GPU module.");
				return std::nullopt;
				}
				return SmallVector<char, 0>{};
				}
				#endif // MLIR_GPU_NVPTX_TARGET_ENABLED

				LogicalResult
				NVPTXTargetAttr::verify(function_ref<InFlightDiagnostic()> emitError,
				int optLevel, StringRef triple, StringRef chip,
				StringRef features, DictionaryAttr flags,
				ArrayAttr files) {
				if (optLevel < 0 \|\| optLevel > 3) {
				emitError() << "The optimization level must be a number between 0 and 3.";
				return failure();
				}
				if (triple.empty()) {
				emitError() << "The target triple cannot be empty.";
				return failure();
				}
				if (chip.empty()) {
				emitError() << "The target chip cannot be empty.";
				return failure();
				}
				if (files && llvm::all_of(files, [](::mlir::Attribute attr) {
				return attr && mlir::isa<StringAttr>(attr);
				})) {
				emitError() << "All the elements in the `link` array must be strings.";
				return failure();
				}
				return success();
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][NVVM] Adds the NVVM target attribute.ClosedPublic

Details

Diff Detail