Download Raw Diff

Details

Reviewers

herhut
rriddle
ftynse
mehdi_amini

Commits

rG2224221fb3fa: [mlir] Add NVVM to CUBIN conversion to mlir-opt

Summary

If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.

The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.

Depends On D98279

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

csigg created this revision.Mar 8 2021, 11:26 AM

Herald added subscribers: dcaballe, cota, teijeong and 18 others. · View Herald TranscriptMar 8 2021, 11:26 AM

csigg requested review of this revision.Mar 8 2021, 11:26 AM

Herald added a reviewer: herhut. · View Herald TranscriptMar 8 2021, 11:26 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

rriddle requested changes to this revision.Mar 8 2021, 11:30 AM

rriddle added inline comments.

mlir/include/mlir/Conversion/GPUToCUBIN/GPUToCUBINPass.h
1 ↗	(On Diff #329078)	The Conversion/ directory is intended for dialect->dialect conversions, that doesn't seem to be the case here.

This revision now requires changes to proceed.Mar 8 2021, 11:30 AM

csigg added inline comments.Mar 8 2021, 11:55 AM

mlir/include/mlir/Conversion/GPUToCUBIN/GPUToCUBINPass.h
1 ↗	(On Diff #329078)	This change registers the pre-existing `GpuKernelToBlobPass` pass in mlir/lib/Conversion/GPUCommon/ConvertKernelFuncToBlob.cpp. Where do you think this registration should live, and should the pass move as well?

mehdi_amini added a reviewer: ftynse.Mar 8 2021, 11:56 AM

mehdi_amini added inline comments.

mlir/lib/Conversion/GPUToCUBIN/LowerGPUToCUBIN.cpp
96 ↗	(On Diff #329078)	I don't think this safe to do in a multi-threaded environment. I suspect we're missing an assert for this in the context @ftynse?
mlir/test/Integration/GPU/CUDA/shuffle.mlir
1–3	I think this is a leftover debugging option

rriddle added inline comments.Mar 8 2021, 12:01 PM

mlir/include/mlir/Conversion/GPUToCUBIN/GPUToCUBINPass.h
1 ↗	(On Diff #329078)	The pass looks like a much more natural fit for the Target/ directory, given that it is translating an MLIR module to an external format.

Harbormaster completed remote builds in B92712: Diff 329078.Mar 8 2021, 4:48 PM

mehdi_amini added inline comments.Mar 8 2021, 8:56 PM

mlir/include/mlir/Conversion/GPUToCUBIN/GPUToCUBINPass.h
1 ↗	(On Diff #329078)	Not exactly: it takes MLIR and generates MLIR, it only takes a subset of the IR and turn it into a CUBIN serialized into the IR :) I suspect it may better be a GPU/Transforms pass?

switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner

If mlir-cpu-runner can also run things on the GPUs, it's name should be changed (to mlir-runner?). Or there could be an mlir-gpu-runner in the interest of reducing the link time for those who may just want to run on the CPU and not link in many other passes and dependences?

csigg planned changes to this revision.Mar 9 2021, 11:40 AM

csigg added inline comments.

mlir/include/mlir/Conversion/GPUToCUBIN/GPUToCUBINPass.h
1 ↗	(On Diff #329078)	Thanks River and Mehdi. I will move it to GPU/Transforms. For that, I would like to first land a prep revision.
mlir/lib/Conversion/GPUToCUBIN/LowerGPUToCUBIN.cpp
96 ↗	(On Diff #329078)	Good point. I will change GpuKernelToBlobPass to a base class (and move it to GPU/Transforms), so that we can add those as dependent dialects.
mlir/test/Integration/GPU/CUDA/shuffle.mlir
1–3	Indeed. Thanks for spotting it.

csigg mentioned this in D98279: [mlir] Add base class for GpuKernelToBlobPass.Mar 9 2021, 11:47 AM

mehdi_amini added inline comments.Mar 9 2021, 12:03 PM

mlir/lib/Conversion/GPUToCUBIN/LowerGPUToCUBIN.cpp
96 ↗	(On Diff #329078)	We can't rely on "dependent dialects": these registration are separate from the dialect registration. I would suggest using the Pass initialization that is invoked at the beginning of the pipeline before the pass manager runs for this purpose.

Use base class added in D98279.

Harbormaster completed remote builds in B93033: Diff 329565.Mar 10 2021, 1:02 AM

csigg edited the summary of this revision. (Show Details)Mar 10 2021, 1:03 AM

csigg added a parent revision: D98279: [mlir] Add base class for GpuKernelToBlobPass.

csigg mentioned this in rG4d295cf5b54e: [mlir] Add base class for GpuKernelToBlobPass.Mar 10 2021, 3:14 AM

Rebase, cmake fixes.

This change should be ready for review again.

mlir/lib/Conversion/GPUToCUBIN/LowerGPUToCUBIN.cpp
96 ↗	(On Diff #329078)	Thanks Mehdi. Does the reworked implementation look correct?

LGTM, but please give a chance to @herhut to have a general look since he reviewed the previous revisions in this series!

mlir/lib/Conversion/GPUToCUBIN/LowerGPUToCUBIN.cpp
96 ↗	(On Diff #329078)	I had another hook in mind ( https://mlir.llvm.org/docs/PassManagement/#initialization ) but what you wrote is just equally fine in this case I think!

csigg added a child revision: D98360: [mlir] Change test-gpu-to-cubin to derive from SerializeToBlobPass.Mar 10 2021, 9:44 AM

rriddle accepted this revision.Mar 10 2021, 10:27 AM

rriddle added inline comments.

mlir/include/mlir/Dialect/GPU/Passes.h
89	nit: ///

This revision is now accepted and ready to land.Mar 10 2021, 10:27 AM

Harbormaster completed remote builds in B93086: Diff 329650.Mar 10 2021, 5:27 PM

csigg added a child revision: D98396: [mlir] Remove mlir-cuda-runner.Mar 10 2021, 11:35 PM

Fix nit.

Thanks!

csigg edited the summary of this revision. (Show Details)Mar 11 2021, 1:06 AM

This revision was landed with ongoing or failed builds.Mar 11 2021, 1:07 AM

Closed by commit rG2224221fb3fa: [mlir] Add NVVM to CUBIN conversion to mlir-opt (authored by csigg). · Explain Why

This revision was automatically updated to reflect the committed changes.

csigg added a commit: rG2224221fb3fa: [mlir] Add NVVM to CUBIN conversion to mlir-opt.

Harbormaster completed remote builds in B93235: Diff 329864.Mar 11 2021, 7:06 AM

csigg mentioned this in D98447: [mlir] Remove mlir-rocm-runner.Mar 11 2021, 12:36 PM

csigg mentioned this in rGa825fb2c0733: [mlir] Remove mlir-rocm-runner.Mar 19 2021, 12:24 AM

Diff 329565

mlir/include/mlir/Dialect/GPU/Passes.h

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	/// user-specified IR and add the resulting blob as module attribute.			/// user-specified IR and add the resulting blob as module attribute.
	class SerializeToBlobPass : public OperationPass<gpu::GPUModuleOp> {			class SerializeToBlobPass : public OperationPass<gpu::GPUModuleOp> {
	public:			public:
	SerializeToBlobPass(TypeID passID);			SerializeToBlobPass(TypeID passID);
	SerializeToBlobPass(const SerializeToBlobPass &other);			SerializeToBlobPass(const SerializeToBlobPass &other);

	void runOnOperation() final;			void runOnOperation() final;

				protected:
				void getDependentDialects(DialectRegistry &registry) const override;

	private:			private:
	// Creates the LLVM target machine to generate the ISA.			// Creates the LLVM target machine to generate the ISA.
	std::unique_ptr<llvm::TargetMachine> createTargetMachine();			std::unique_ptr<llvm::TargetMachine> createTargetMachine();

	// Translates the 'getOperation()' result to an LLVM module.			// Translates the 'getOperation()' result to an LLVM module.
	virtual std::unique_ptr<llvm::Module>			virtual std::unique_ptr<llvm::Module>
	translateToLLVMIR(llvm::LLVMContext &llvmContext) = 0;			translateToLLVMIR(llvm::LLVMContext &llvmContext);

	// Serializes the target ISA to binary form.			// Serializes the target ISA to binary form.
	virtual std::unique_ptr<std::vector<char>>			virtual std::unique_ptr<std::vector<char>>
	serializeISA(const std::string &isa) = 0;			serializeISA(const std::string &isa) = 0;

	protected:			protected:
	Option<std::string> triple{*this, "triple",			Option<std::string> triple{*this, "triple",
	::llvm::cl::desc("Target triple")};			::llvm::cl::desc("Target triple")};
	Option<std::string> chip{*this, "chip",			Option<std::string> chip{*this, "chip",
	::llvm::cl::desc("Target architecture")};			::llvm::cl::desc("Target architecture")};
	Option<std::string> features{*this, "features",			Option<std::string> features{*this, "features",
	::llvm::cl::desc("Target features")};			::llvm::cl::desc("Target features")};
	Option<std::string> gpuBinaryAnnotation{			Option<std::string> gpuBinaryAnnotation{
	*this, "gpu-binary-annotation",			*this, "gpu-binary-annotation",
	llvm::cl::desc("Annotation attribute string for GPU binary"),			llvm::cl::desc("Annotation attribute string for GPU binary"),
	llvm::cl::init(getDefaultGpuBinaryAnnotation())};			llvm::cl::init(getDefaultGpuBinaryAnnotation())};
	};			};
	} // namespace gpu			} // namespace gpu

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				// Register pass to serialize GPU kernel functions to a CUBIN binary annotation.
				rriddleUnsubmitted Not Done Reply Inline Actions nit: /// rriddle: nit: ///
				void registerGpuSerializeToCubinPass();

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/GPU/Passes.h.inc"			#include "mlir/Dialect/GPU/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_GPU_PASSES_H_			#endif // MLIR_DIALECT_GPU_PASSES_H_

mlir/include/mlir/InitAllPasses.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	inline void registerAllPasses() {

// Conversion passes		// Conversion passes
registerConversionPasses();		registerConversionPasses();

// Dialect passes		// Dialect passes
registerAffinePasses();		registerAffinePasses();
registerAsyncPasses();		registerAsyncPasses();
registerGPUPasses();		registerGPUPasses();
		registerGpuSerializeToCubinPass();
registerLinalgPasses();		registerLinalgPasses();
LLVM::registerLLVMPasses();		LLVM::registerLLVMPasses();
quant::registerQuantPasses();		quant::registerQuantPasses();
registerSCFPasses();		registerSCFPasses();
registerShapePasses();		registerShapePasses();
spirv::registerSPIRVPasses();		spirv::registerSPIRVPasses();
registerStandardPasses();		registerStandardPasses();
tensor::registerTensorPasses();		tensor::registerTensorPasses();
tosa::registerTosaOptPasses();		tosa::registerTosaOptPasses();
}		}

} // namespace mlir		} // namespace mlir

#endif // MLIR_INITALLPASSES_H_		#endif // MLIR_INITALLPASSES_H_

mlir/lib/Conversion/GPUCommon/ConvertKernelFuncToBlob.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	GpuKernelToBlobPass(LoweringCallback loweringCallback,
if (!features.empty())		if (!features.empty())
this->features = features.str();		this->features = features.str();
if (!gpuBinaryAnnotation.empty())		if (!gpuBinaryAnnotation.empty())
this->gpuBinaryAnnotation = gpuBinaryAnnotation.str();		this->gpuBinaryAnnotation = gpuBinaryAnnotation.str();
}		}

private:		private:
// Translates the 'getOperation()' result to an LLVM module.		// Translates the 'getOperation()' result to an LLVM module.
		// Note: when this class is removed, this function no longer needs to be
		// virtual.
std::unique_ptr<llvm::Module>		std::unique_ptr<llvm::Module>
translateToLLVMIR(llvm::LLVMContext &llvmContext) override {		translateToLLVMIR(llvm::LLVMContext &llvmContext) override {
return loweringCallback(getOperation(), llvmContext, "LLVMDialectModule");		return loweringCallback(getOperation(), llvmContext, "LLVMDialectModule");
}		}

// Serializes the target ISA to binary form.		// Serializes the target ISA to binary form.
std::unique_ptr<std::vector<char>>		std::unique_ptr<std::vector<char>>
serializeISA(const std::string &isa) override {		serializeISA(const std::string &isa) override {
Show All 20 Lines

mlir/lib/Dialect/GPU/CMakeLists.txt

	add_mlir_dialect_library(MLIRGPU			add_mlir_dialect_library(MLIRGPU
	IR/GPUDialect.cpp			IR/GPUDialect.cpp
	Transforms/AllReduceLowering.cpp			Transforms/AllReduceLowering.cpp
	Transforms/AsyncRegionRewriter.cpp			Transforms/AsyncRegionRewriter.cpp
	Transforms/KernelOutlining.cpp			Transforms/KernelOutlining.cpp
	Transforms/MemoryPromotion.cpp			Transforms/MemoryPromotion.cpp
	Transforms/ParallelLoopMapper.cpp			Transforms/ParallelLoopMapper.cpp
	Transforms/SerializeToBlob.cpp			Transforms/SerializeToBlob.cpp
				Transforms/SerializeToCUBIN.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU

	DEPENDS			DEPENDS
	MLIRGPUOpsIncGen			MLIRGPUOpsIncGen
	MLIRGPUOpInterfacesIncGen			MLIRGPUOpInterfacesIncGen
	MLIRGPUPassIncGen			MLIRGPUPassIncGen
	MLIRParallelLoopMapperAttrGen			MLIRParallelLoopMapperAttrGen
	MLIRParallelLoopMapperEnumsGen			MLIRParallelLoopMapperEnumsGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRAsync			MLIRAsync
	MLIREDSC			MLIREDSC
	MLIRIR			MLIRIR
	MLIRLLVMIR			MLIRLLVMIR
	MLIRSCF			MLIRSCF
	MLIRPass			MLIRPass
	MLIRSideEffectInterfaces			MLIRSideEffectInterfaces
	MLIRStandard			MLIRStandard
	MLIRSupport			MLIRSupport
	MLIRTransformUtils			MLIRTransformUtils
	)			)

				if(MLIR_CUDA_RUNNER_ENABLED)
				if(NOT MLIR_CUDA_CONVERSIONS_ENABLED)
				message(SEND_ERROR
				"Building mlir with cuda support requires the NVPTX backend")
				endif()

				# Configure CUDA language support. Using check_language first allows us to
				# give a custom error message.
				include(CheckLanguage)
				check_language(CUDA)
				if (CMAKE_CUDA_COMPILER)
				enable_language(CUDA)
				else()
				message(SEND_ERROR
				"Building mlir with cuda support requires a working CUDA install")
				endif()

				# Enable gpu-to-cubin pass.
				target_compile_definitions(MLIRGPU
				PRIVATE
				MLIR_CUDA_SERIALIZATION_ENABLED
				)

				# Add CUDA headers includes and the libcuda.so library.
				target_include_directories(MLIRGPU
				PRIVATE
				${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
				)

				find_library(CUDA_DRIVER_LIBRARY cuda)

				target_link_libraries(MLIRGPU
				PRIVATE
				MLIRGPUToGPURuntimeTransforms
				MLIRLLVMToLLVMIRTranslation
				MLIRNVVMToLLVMIRTranslation
				${CUDA_DRIVER_LIBRARY}
				)

				endif()

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

//===- SerializeToBlob.cpp - MLIR GPU lowering pass -----------------------===//		//===- SerializeToBlob.cpp - MLIR GPU lowering pass -----------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements a base class for a pass to serialize a gpu module		// This file implements a base class for a pass to serialize a gpu module
// into a binary blob that can be executed on a GPU. The binary blob is added		// into a binary blob that can be executed on a GPU. The binary blob is added
// as a string attribute to the gpu module.		// as a string attribute to the gpu module.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/GPU/Passes.h"		#include "mlir/Dialect/GPU/Passes.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
		#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"
		#include "mlir/Target/LLVMIR/Export.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"

using namespace mlir;		using namespace mlir;

std::string gpu::getDefaultGpuBinaryAnnotation() { return "gpu.binary"; }		std::string gpu::getDefaultGpuBinaryAnnotation() { return "gpu.binary"; }
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	void gpu::SerializeToBlobPass::runOnOperation() {
if (!blob)		if (!blob)
return signalPassFailure();		return signalPassFailure();

// Add the blob as module attribute.		// Add the blob as module attribute.
auto attr = StringAttr::get(&getContext(), {blob->data(), blob->size()});		auto attr = StringAttr::get(&getContext(), {blob->data(), blob->size()});
getOperation()->setAttr(gpuBinaryAnnotation, attr);		getOperation()->setAttr(gpuBinaryAnnotation, attr);
}		}

		void gpu::SerializeToBlobPass::getDependentDialects(
		DialectRegistry &registry) const {
		registerLLVMDialectTranslation(registry);
		OperationPass<gpu::GPUModuleOp>::getDependentDialects(registry);
		}

std::unique_ptr<llvm::TargetMachine>		std::unique_ptr<llvm::TargetMachine>
gpu::SerializeToBlobPass::createTargetMachine() {		gpu::SerializeToBlobPass::createTargetMachine() {
Location loc = getOperation().getLoc();		Location loc = getOperation().getLoc();
std::string error;		std::string error;
const llvm::Target *target =		const llvm::Target *target =
llvm::TargetRegistry::lookupTarget(triple, error);		llvm::TargetRegistry::lookupTarget(triple, error);
if (!target) {		if (!target) {
emitError(loc, Twine("failed to lookup target: ") + error);		emitError(loc, Twine("failed to lookup target: ") + error);
return {};		return {};
}		}
llvm::TargetMachine *machine =		llvm::TargetMachine *machine =
target->createTargetMachine(triple, chip, features, {}, {});		target->createTargetMachine(triple, chip, features, {}, {});
if (!machine) {		if (!machine) {
emitError(loc, "failed to create target machine");		emitError(loc, "failed to create target machine");
return {};		return {};
}		}

return std::unique_ptr<llvm::TargetMachine>{machine};		return std::unique_ptr<llvm::TargetMachine>{machine};
}		}

		std::unique_ptr<llvm::Module>
		gpu::SerializeToBlobPass::translateToLLVMIR(llvm::LLVMContext &llvmContext) {
		return translateModuleToLLVMIR(getOperation(), llvmContext,
		"LLVMDialectModule");
		}

mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp

This file was added.

				//===- LowerGPUToCUBIN.cpp - Convert GPU kernel to CUBIN blob -------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a pass that serializes a gpu module into CUBIN blob and
				// adds that blob as a string attribute of the module.
				//
				//===----------------------------------------------------------------------===//
				#include "mlir/Dialect/GPU/Passes.h"

				#if MLIR_CUDA_SERIALIZATION_ENABLED
				#include "mlir/Pass/Pass.h"
				#include "mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h"
				#include "mlir/Target/LLVMIR/Export.h"
				#include "llvm/Support/TargetSelect.h"

				#include <cuda.h>

				using namespace mlir;

				static void emitCudaError(const llvm::Twine &expr, const char *buffer,
				CUresult result, Location loc) {
				const char *error;
				cuGetErrorString(result, &error);
				emitError(loc, expr.concat(" failed with error code ")
				.concat(llvm::Twine{error})
				.concat("[")
				.concat(buffer)
				.concat("]"));
				}

				#define RETURN_ON_CUDA_ERROR(expr) \
				do { \
				if (auto status = (expr)) { \
				emitCudaError(#expr, jitErrorBuffer, status, loc); \
				return {}; \
				} \
				} while (false)

				namespace {
				class SerializeToCubinPass
				: public PassWrapper<SerializeToCubinPass, gpu::SerializeToBlobPass> {
				public:
				SerializeToCubinPass();

				private:
				void getDependentDialects(DialectRegistry &registry) const override;

				// Serializes PTX to CUBIN.
				std::unique_ptr<std::vector<char>>
				serializeISA(const std::string &isa) override;
				};
				} // namespace

				// Sets the 'option' to 'value' unless it already has a value.
				static void maybeSetOption(Pass::Option<std::string> &option,
				const char *value) {
				if (!option.hasValue())
				option = value;
				}

				SerializeToCubinPass::SerializeToCubinPass() {
				maybeSetOption(this->triple, "nvptx64-nvidia-cuda");
				maybeSetOption(this->chip, "sm_35");
				maybeSetOption(this->features, "+ptx60");
				}

				void SerializeToCubinPass::getDependentDialects(
				DialectRegistry &registry) const {
				registerNVVMDialectTranslation(registry);
				gpu::SerializeToBlobPass::getDependentDialects(registry);
				}

				std::unique_ptr<std::vector<char>>
				SerializeToCubinPass::serializeISA(const std::string &isa) {
				Location loc = getOperation().getLoc();
				char jitErrorBuffer[4096] = {0};

				RETURN_ON_CUDA_ERROR(cuInit(0));

				// Linking requires a device context.
				CUdevice device;
				RETURN_ON_CUDA_ERROR(cuDeviceGet(&device, 0));
				CUcontext context;
				RETURN_ON_CUDA_ERROR(cuCtxCreate(&context, 0, device));
				CUlinkState linkState;

				CUjit_option jitOptions[] = {CU_JIT_ERROR_LOG_BUFFER,
				CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES};
				void *jitOptionsVals[] = {jitErrorBuffer,
				reinterpret_cast<void *>(sizeof(jitErrorBuffer))};

				RETURN_ON_CUDA_ERROR(cuLinkCreate(2, /* number of jit options */
				jitOptions, /* jit options */
				jitOptionsVals, /* jit option values */
				&linkState));

				auto kernelName = getOperation().getName().str();
				RETURN_ON_CUDA_ERROR(cuLinkAddData(
				linkState, CUjitInputType::CU_JIT_INPUT_PTX,
				const_cast<void >(static_cast<const void >(isa.c_str())), isa.length(),
				kernelName.c_str(), 0, /* number of jit options */
				nullptr, /* jit options */
				nullptr /* jit option values */
				));

				void *cubinData;
				size_t cubinSize;
				RETURN_ON_CUDA_ERROR(cuLinkComplete(linkState, &cubinData, &cubinSize));

				char cubinAsChar = static_cast<char >(cubinData);
				auto result =
				std::make_unique<std::vector<char>>(cubinAsChar, cubinAsChar + cubinSize);

				// This will also destroy the cubin data.
				RETURN_ON_CUDA_ERROR(cuLinkDestroy(linkState));
				RETURN_ON_CUDA_ERROR(cuCtxDestroy(context));

				return result;
				}

				// Register pass to serialize GPU kernel functions to a CUBIN binary annotation.
				void mlir::registerGpuSerializeToCubinPass() {
				PassRegistration<SerializeToCubinPass> registerSerializeToCubin(
				"gpu-to-cubin", "Lower GPU kernel function to CUBIN binary annotations",
				[] {
				// Initialize LLVM NVPTX backend.
				LLVMInitializeNVPTXTarget();
				LLVMInitializeNVPTXTargetInfo();
				LLVMInitializeNVPTXTargetMC();
				LLVMInitializeNVPTXAsmPrinter();

				return std::make_unique<SerializeToCubinPass>();
				});
				}
				#else // MLIR_CUDA_SERIALIZATION_ENABLED
				void mlir::registerGpuSerializeToCubinPass() {}
				#endif // MLIR_CUDA_SERIALIZATION_ENABLED

mlir/lib/Target/CUDA/CMakeLists.txt

This file was added.

				if (MLIR_CUDA_CONVERSIONS_ENABLED AND MLIR_CUDA_RUNNER_ENABLED)

				# Configure CUDA language support. Using check_language first allows us to
				# give a custom error message.
				include(CheckLanguage)
				check_language(CUDA)
				if (CMAKE_CUDA_COMPILER)
				enable_language(CUDA)
				else()
				message(SEND_ERROR
				"Building mlir with cuda support requires a working CUDA install")
				endif()

				# We need the libcuda.so library.
				find_library(CUDA_DRIVER_LIBRARY cuda)

				add_mlir_conversion_library(MLIRGPUToCUBIN
				GPUToCUBINPass.cpp

				LINK_COMPONENTS
				NVPTXCodeGen
				NVPTXDesc
				NVPTXInfo

				LINK_LIBS PRIVATE
				MLIRGPUToGPURuntimeTransforms
				MLIRGPUToNVVMTransforms
				MLIRLLVMToLLVMIRTranslation
				MLIRNVVMToLLVMIRTranslation
				${CUDA_DRIVER_LIBRARY}
				)
				endif()

mlir/test/Integration/GPU/CUDA/shuffle.mlir

	// RUN: mlir-cuda-runner %s \			// RUN: mlir-opt %s \
	// RUN: -gpu-to-cubin="gpu-binary-annotation=nvvm.cubin" \			// RUN: -gpu-kernel-outlining \
				// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin{gpu-binary-annotation=nvvm.cubin})' \
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I think this is a leftover debugging option mehdi_amini: I think this is a leftover debugging option
				csiggAuthorUnsubmitted Done Reply Inline Actions Indeed. Thanks for spotting it. csigg: Indeed. Thanks for spotting it.
	// RUN: -gpu-to-llvm="gpu-binary-annotation=nvvm.cubin" \			// RUN: -gpu-to-llvm="gpu-binary-annotation=nvvm.cubin" \
				// RUN: \| mlir-cpu-runner \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_cuda_runtime%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_cuda_runtime%shlibext \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: --entry-point-result=void \			// RUN: --entry-point-result=void \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// CHECK: [4, 5, 6, 7, 0, 1, 2, 3, 12, -1, -1, -1, 8]			// CHECK: [4, 5, 6, 7, 0, 1, 2, 3, 12, -1, -1, -1, 8]
	func @main() {			func @main() {
	%arg = alloc() : memref<13xf32>			%arg = alloc() : memref<13xf32>
	Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add NVVM to CUBIN conversion to mlir-opt
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 329565

mlir/include/mlir/Dialect/GPU/Passes.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Conversion/GPUCommon/ConvertKernelFuncToBlob.cpp

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp

mlir/lib/Target/CUDA/CMakeLists.txt

mlir/test/Integration/GPU/CUDA/shuffle.mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add NVVM to CUBIN conversion to mlir-optClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 329565

mlir/include/mlir/Dialect/GPU/Passes.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Conversion/GPUCommon/ConvertKernelFuncToBlob.cpp

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp

mlir/lib/Target/CUDA/CMakeLists.txt

mlir/test/Integration/GPU/CUDA/shuffle.mlir

[mlir] Add NVVM to CUBIN conversion to mlir-opt
ClosedPublic