This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/LLVMIR/
-
LLVMIR/
-
LLVMDialect.h
-
LLVMOpBase.td
-
Target/LLVMIR/
-
LLVMIR/
1/2
ModuleTranslation.h
-
lib/
-
Conversion/GPUToCUDA/
-
GPUToCUDA/
-
CMakeLists.txt
3
ConvertKernelFuncToCubin.cpp
-
ConvertLaunchFuncToCudaCalls.cpp
-
Dialect/LLVMIR/IR/
-
LLVMIR/
-
IR/
-
LLVMDialect.cpp
-
Target/LLVMIR/
-
LLVMIR/
1
ModuleTranslation.cpp
-
test/mlir-cuda-runner/
-
mlir-cuda-runner/
-
two-modules.mlir

Differential D78207

[MLIR] Allow for multiple gpu modules during translation.
ClosedPublic

Authored by herhut on Apr 15 2020, 7:03 AM.

Download Raw Diff

Details

Reviewers

ftynse
jdoerfert
csigg

Commits

rG69040d5b0bfa: [MLIR] Allow for multiple gpu modules during translation.

Summary

This change makes the ModuleTranslation threadsafe by locking on the
LLVMContext. Furthermore, we now clone the llvm module into a new
context when compiling to PTX similar to what the OrcJit does.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

herhut created this revision.Apr 15 2020, 7:03 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptApr 15 2020, 7:03 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, frgossen, grosul1 and 15 others. · View Herald Transcript

Harbormaster failed remote builds in B53356: Diff 257707!Apr 15 2020, 7:06 AM

csigg accepted this revision.Apr 15 2020, 8:03 AM

csigg added inline comments.

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
29	Is this needed?

This revision is now accepted and ready to land.Apr 15 2020, 8:03 AM

Rebase and mild cleanup.

herhut marked an inline comment as done.Apr 15 2020, 8:27 AM

herhut added inline comments.

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
29	No, a leftover from using `std::mutex` in an earlier version. Thanks!

Harbormaster failed remote builds in B53372: Diff 257731!Apr 15 2020, 8:44 AM

I don't suppose there is a way to make this method only visible to ModuleTranslation...

mehdi_amini added inline comments.Apr 15 2020, 10:15 PM

mlir/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp

115

Why don't you use the high level API the same way this is done in the ExecutionEngine?

// TODO(zinenko): Reevaluate model of ownership of LLVMContext in LLVMDialect.
SmallVector<char, 1> buffer;
{
  llvm::raw_svector_ostream os(buffer);
  WriteBitcodeToFile(*llvmModule, os);
}
llvm::MemoryBufferRef bufferRef(StringRef(buffer.data(), buffer.size()),
                                "cloned module buffer");
auto expectedModule = parseBitcodeFile(bufferRef, *ctx);
if (!expectedModule)
  return expectedModule.takeError();
std::unique_ptr<Module> deserModule = std::move(*expectedModule);
auto dataLayout = deserModule->getDataLayout();

I'd also like a TODO also here for Alex to actually fix: the fact that we have a LLVMContext tied to the LLVM dialect is really something we need to fix.

115

(It may be even worth extracting this logic in a helper by the way)

Extract helper and rebase.

Harbormaster failed remote builds in B53544: Diff 258012!Apr 16 2020, 4:30 AM

ftynse accepted this revision.Apr 16 2020, 4:36 AM

ftynse added inline comments.

mlir/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp
115	I'd also like a TODO also here for Alex to actually fix: the fact that we have a LLVMContext tied to the LLVM dialect is really something we need to fix. Yeah, this is the next big thing on my todo list. Let's see how many discussion we can have in parallel :)

Closed by commit rG69040d5b0bfa: [MLIR] Allow for multiple gpu modules during translation. (authored by herhut). · Explain WhyApr 16 2020, 5:37 AM

This revision was automatically updated to reflect the committed changes.

This is causing TSAN failures. Looks like ConvertKernelFuncToCubin isn't thread safe. More specifically the call to LLVMInitializeNVPTXTargetInfo.

mlir/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp:62:5

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
500	Can we just lock once at beginning?

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

LLVMIR/

LLVMDialect.h

4 lines

LLVMOpBase.td

1 line

Target/

LLVMIR/

ModuleTranslation.h

4 lines

lib/

Conversion/

GPUToCUDA/

CMakeLists.txt

2 lines

ConvertKernelFuncToCubin.cpp

19 lines

ConvertLaunchFuncToCudaCalls.cpp

12 lines

Dialect/

LLVMIR/

IR/

LLVMDialect.cpp

3 lines

Target/

LLVMIR/

ModuleTranslation.cpp

15 lines

test/

mlir-cuda-runner/

two-modules.mlir

28 lines

Diff 257707

mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h

	Show All 26 Lines
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include "llvm/IR/Type.h"			#include "llvm/IR/Type.h"

	#include "mlir/Dialect/LLVMIR/LLVMOpsEnums.h.inc"			#include "mlir/Dialect/LLVMIR/LLVMOpsEnums.h.inc"

	namespace llvm {			namespace llvm {
	class Type;			class Type;
	class LLVMContext;			class LLVMContext;
				namespace sys {
				template <bool mt_only>
				class SmartMutex;
				} // end namespace sys
	} // end namespace llvm			} // end namespace llvm

	namespace mlir {			namespace mlir {
	namespace LLVM {			namespace LLVM {
	class LLVMDialect;			class LLVMDialect;

	namespace detail {			namespace detail {
	struct LLVMTypeStorage;			struct LLVMTypeStorage;
	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td

	Show All 18 Lines
	def LLVM_Dialect : Dialect {			def LLVM_Dialect : Dialect {
	let name = "llvm";			let name = "llvm";
	let cppNamespace = "LLVM";			let cppNamespace = "LLVM";
	let hasRegionArgAttrVerify = 1;			let hasRegionArgAttrVerify = 1;
	let extraClassDeclaration = [{			let extraClassDeclaration = [{
	~LLVMDialect();			~LLVMDialect();
	llvm::LLVMContext &getLLVMContext();			llvm::LLVMContext &getLLVMContext();
	llvm::Module &getLLVMModule();			llvm::Module &getLLVMModule();
				llvm::sys::SmartMutex<true> &getLLVMContextMutex();

	private:			private:
	friend LLVMType;			friend LLVMType;

	std::unique_ptr<detail::LLVMDialectImpl> impl;			std::unique_ptr<detail::LLVMDialectImpl> impl;
	}];			}];
	}			}

	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h

Show All 20 Lines
#include "mlir/IR/Value.h"		#include "mlir/IR/Value.h"

#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"		#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/MatrixBuilder.h"		#include "llvm/IR/MatrixBuilder.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
		#include <mutex>
		csiggUnsubmitted Not Done Reply Inline Actions Is this needed? csigg: Is this needed?
		herhutAuthorUnsubmitted Done Reply Inline Actions No, a leftover from using `std::mutex` in an earlier version. Thanks! herhut: No, a leftover from using `std::mutex` in an earlier version. Thanks!

namespace mlir {		namespace mlir {
class Attribute;		class Attribute;
class Location;		class Location;
class ModuleOp;		class ModuleOp;
class Operation;		class Operation;

namespace LLVM {		namespace LLVM {
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	private:
LogicalResult convertBlock(Block &bb, bool ignoreArguments);		LogicalResult convertBlock(Block &bb, bool ignoreArguments);

llvm::Constant getLLVMConstant(llvm::Type llvmType, Attribute attr,		llvm::Constant getLLVMConstant(llvm::Type llvmType, Attribute attr,
Location loc);		Location loc);

/// Original and translated module.		/// Original and translated module.
Operation *mlirModule;		Operation *mlirModule;
std::unique_ptr<llvm::Module> llvmModule;		std::unique_ptr<llvm::Module> llvmModule;

/// A converter for translating debug information.		/// A converter for translating debug information.
std::unique_ptr<detail::DebugTranslation> debugTranslation;		std::unique_ptr<detail::DebugTranslation> debugTranslation;

/// Builder for LLVM IR generation of OpenMP constructs.		/// Builder for LLVM IR generation of OpenMP constructs.
std::unique_ptr<llvm::OpenMPIRBuilder> ompBuilder;		std::unique_ptr<llvm::OpenMPIRBuilder> ompBuilder;
/// Precomputed pointer to OpenMP dialect.		/// Precomputed pointer to OpenMP dialect.
const Dialect *ompDialect;		const Dialect *ompDialect;
		/// Pointer to the llvmDialect;
		LLVMDialect *llvmDialect;

/// Mappings between llvm.mlir.global definitions and corresponding globals.		/// Mappings between llvm.mlir.global definitions and corresponding globals.
DenseMap<Operation , llvm::GlobalValue > globalsMapping;		DenseMap<Operation , llvm::GlobalValue > globalsMapping;

protected:		protected:
/// Mappings between original and translated values, used for lookups.		/// Mappings between original and translated values, used for lookups.
llvm::StringMap<llvm::Function *> functionMapping;		llvm::StringMap<llvm::Function *> functionMapping;
DenseMap<Value, llvm::Value *> valueMapping;		DenseMap<Value, llvm::Value *> valueMapping;
DenseMap<Block , llvm::BasicBlock > blockMapping;		DenseMap<Block , llvm::BasicBlock > blockMapping;
};		};

} // namespace LLVM		} // namespace LLVM
} // namespace mlir		} // namespace mlir

#endif // MLIR_TARGET_LLVMIR_MODULETRANSLATION_H		#endif // MLIR_TARGET_LLVMIR_MODULETRANSLATION_H

mlir/lib/Conversion/GPUToCUDA/CMakeLists.txt

Show All 18 Lines	add_mlir_conversion_library(MLIRGPUtoCUDATransforms
${SOURCES}		${SOURCES}

DEPENDS		DEPENDS
MLIRConversionPassIncGen		MLIRConversionPassIncGen
)		)
target_link_libraries(MLIRGPUtoCUDATransforms		target_link_libraries(MLIRGPUtoCUDATransforms
PUBLIC		PUBLIC
${NVPTX_LIBS}		${NVPTX_LIBS}
		LLVMBitReader
		LLVMBitWriter
LLVMCore		LLVMCore
LLVMMC		LLVMMC
LLVMSupport		LLVMSupport
MLIRGPU		MLIRGPU
MLIRIR		MLIRIR
MLIRLLVMIR		MLIRLLVMIR
MLIRNVVMIR		MLIRNVVMIR
MLIRPass		MLIRPass
MLIRSupport		MLIRSupport
MLIRTargetNVVMIR		MLIRTargetNVVMIR
)		)

mlir/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp

	Show All 20 Lines
	#include "mlir/IR/Module.h"			#include "mlir/IR/Module.h"
	#include "mlir/Pass/Pass.h"			#include "mlir/Pass/Pass.h"
	#include "mlir/Pass/PassRegistry.h"			#include "mlir/Pass/PassRegistry.h"
	#include "mlir/Support/LogicalResult.h"			#include "mlir/Support/LogicalResult.h"
	#include "mlir/Target/NVVMIR.h"			#include "mlir/Target/NVVMIR.h"

	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include "llvm/ADT/Twine.h"			#include "llvm/ADT/Twine.h"
				#include "llvm/Bitcode/BitcodeReader.h"
				#include "llvm/Bitcode/BitcodeWriter.h"
	#include "llvm/IR/Constants.h"			#include "llvm/IR/Constants.h"
	#include "llvm/IR/LegacyPassManager.h"			#include "llvm/IR/LegacyPassManager.h"
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include "llvm/Support/TargetRegistry.h"			#include "llvm/Support/TargetRegistry.h"
	#include "llvm/Support/TargetSelect.h"			#include "llvm/Support/TargetSelect.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"

	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	};			};

	} // anonymous namespace			} // anonymous namespace

	std::string GpuKernelToCubinPass::translateModuleToPtx(			std::string GpuKernelToCubinPass::translateModuleToPtx(
	llvm::Module &module, llvm::TargetMachine &target_machine) {			llvm::Module &module, llvm::TargetMachine &target_machine) {
	std::string ptx;			std::string ptx;
	{			{
				// Clone the llvm module into a new context to enable concurrent compilation
				// with multiple threads.
				llvm::LLVMContext llvmContext;
				llvm::SmallVector<char, 1> bitcodeBuffer;
				llvm::BitcodeWriter bitcodeWriter(bitcodeBuffer);
				bitcodeWriter.writeModule(module);
				bitcodeWriter.writeSymtab();
				bitcodeWriter.writeStrtab();
				llvm::MemoryBufferRef clonedModuleBufferRef(
				StringRef(bitcodeBuffer.data(), bitcodeBuffer.size()),
				"cloned module buffer");
				auto clone = llvm::cantFail(
				llvm::parseBitcodeFile(clonedModuleBufferRef, llvmContext));
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Why don't you use the high level API the same way this is done in the ExecutionEngine? // TODO(zinenko): Reevaluate model of ownership of LLVMContext in LLVMDialect. SmallVector<char, 1> buffer; { llvm::raw_svector_ostream os(buffer); WriteBitcodeToFile(llvmModule, os); } llvm::MemoryBufferRef bufferRef(StringRef(buffer.data(), buffer.size()), "cloned module buffer"); auto expectedModule = parseBitcodeFile(bufferRef, ctx); if (!expectedModule) return expectedModule.takeError(); std::unique_ptr<Module> deserModule = std::move(expectedModule); auto dataLayout = deserModule->getDataLayout(); I'd also like a TODO also here for Alex to actually fix: the fact that we have a LLVMContext tied to the LLVM dialect is really something we need to fix. mehdi_amini:* Why don't you use the high level API the same way this is done in the ExecutionEngine? ```…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions (It may be even worth extracting this logic in a helper by the way) mehdi_amini: (It may be even worth extracting this logic in a helper by the way)
				ftynseUnsubmitted Not Done Reply Inline Actions I'd also like a TODO also here for Alex to actually fix: the fact that we have a LLVMContext tied to the LLVM dialect is really something we need to fix. Yeah, this is the next big thing on my todo list. Let's see how many discussion we can have in parallel :) ftynse: > I'd also like a TODO also here for Alex to actually fix: the fact that we have a LLVMContext…

	llvm::raw_string_ostream stream(ptx);			llvm::raw_string_ostream stream(ptx);
	llvm::buffer_ostream pstream(stream);			llvm::buffer_ostream pstream(stream);
	llvm::legacy::PassManager codegen_passes;			llvm::legacy::PassManager codegen_passes;
	target_machine.addPassesToEmitFile(codegen_passes, pstream, nullptr,			target_machine.addPassesToEmitFile(codegen_passes, pstream, nullptr,
	llvm::CGFT_AssemblyFile);			llvm::CGFT_AssemblyFile);
	codegen_passes.run(module);
				codegen_passes.run(*clone);
	}			}

	return ptx;			return ptx;
	}			}

	OwnedCubin GpuKernelToCubinPass::convertModuleToCubin(llvm::Module &llvmModule,			OwnedCubin GpuKernelToCubinPass::convertModuleToCubin(llvm::Module &llvmModule,
	Location loc,			Location loc,
	StringRef name) {			StringRef name) {
	Show All 36 Lines

mlir/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	Value allocatePointer(OpBuilder &builder, Location loc) {
return builder.create<LLVM::AllocaOp>(loc, getPointerPointerType(), one,		return builder.create<LLVM::AllocaOp>(loc, getPointerPointerType(), one,
/alignment=/0);		/alignment=/0);
}		}

void declareCudaFunctions(Location loc);		void declareCudaFunctions(Location loc);
void addParamToList(OpBuilder &builder, Location loc, Value param, Value list,		void addParamToList(OpBuilder &builder, Location loc, Value param, Value list,
unsigned pos, Value one);		unsigned pos, Value one);
Value setupParamsArray(gpu::LaunchFuncOp launchOp, OpBuilder &builder);		Value setupParamsArray(gpu::LaunchFuncOp launchOp, OpBuilder &builder);
Value generateKernelNameConstant(StringRef name, Location loc,		Value generateKernelNameConstant(StringRef moduleName, StringRef name,
OpBuilder &builder);		Location loc, OpBuilder &builder);
void translateGpuLaunchCalls(mlir::gpu::LaunchFuncOp launchOp);		void translateGpuLaunchCalls(mlir::gpu::LaunchFuncOp launchOp);

public:		public:
// Run the dialect converter on the module.		// Run the dialect converter on the module.
void runOnOperation() override {		void runOnOperation() override {
// Cache the LLVMDialect for the current module.		// Cache the LLVMDialect for the current module.
llvmDialect = getContext().getRegisteredDialect<LLVM::LLVMDialect>();		llvmDialect = getContext().getRegisteredDialect<LLVM::LLVMDialect>();
// Cache the used LLVM types.		// Cache the used LLVM types.
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines
//		//
// llvm.global constant @kernel_name("function_name\00")		// llvm.global constant @kernel_name("function_name\00")
// func(...) {		// func(...) {
// %0 = llvm.addressof @kernel_name		// %0 = llvm.addressof @kernel_name
// %1 = llvm.constant (0 : index)		// %1 = llvm.constant (0 : index)
// %2 = llvm.getelementptr %0[%1, %1] : !llvm<"i8*">		// %2 = llvm.getelementptr %0[%1, %1] : !llvm<"i8*">
// }		// }
Value GpuLaunchFuncToCudaCallsPass::generateKernelNameConstant(		Value GpuLaunchFuncToCudaCallsPass::generateKernelNameConstant(
StringRef name, Location loc, OpBuilder &builder) {		StringRef moduleName, StringRef name, Location loc, OpBuilder &builder) {
// Make sure the trailing zero is included in the constant.		// Make sure the trailing zero is included in the constant.
std::vector<char> kernelName(name.begin(), name.end());		std::vector<char> kernelName(name.begin(), name.end());
kernelName.push_back('\0');		kernelName.push_back('\0');

std::string globalName = std::string(llvm::formatv("{0}_kernel_name", name));		std::string globalName =
		std::string(llvm::formatv("{0}_{1}_kernel_name", moduleName, name));
return LLVM::createGlobalString(		return LLVM::createGlobalString(
loc, builder, globalName, StringRef(kernelName.data(), kernelName.size()),		loc, builder, globalName, StringRef(kernelName.data(), kernelName.size()),
LLVM::Linkage::Internal, llvmDialect);		LLVM::Linkage::Internal, llvmDialect);
}		}

// Emits LLVM IR to launch a kernel function. Expects the module that contains		// Emits LLVM IR to launch a kernel function. Expects the module that contains
// the compiled kernel function as a cubin in the 'nvvm.cubin' attribute of the		// the compiled kernel function as a cubin in the 'nvvm.cubin' attribute of the
// kernel function in the IR.		// kernel function in the IR.
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	auto cuModuleLoad =
getOperation().lookupSymbol<LLVM::LLVMFuncOp>(cuModuleLoadName);		getOperation().lookupSymbol<LLVM::LLVMFuncOp>(cuModuleLoadName);
builder.create<LLVM::CallOp>(loc, ArrayRef<Type>{getCUResultType()},		builder.create<LLVM::CallOp>(loc, ArrayRef<Type>{getCUResultType()},
builder.getSymbolRefAttr(cuModuleLoad),		builder.getSymbolRefAttr(cuModuleLoad),
ArrayRef<Value>{cuModule, data});		ArrayRef<Value>{cuModule, data});
// Get the function from the module. The name corresponds to the name of		// Get the function from the module. The name corresponds to the name of
// the kernel function.		// the kernel function.
auto cuOwningModuleRef =		auto cuOwningModuleRef =
builder.create<LLVM::LoadOp>(loc, getPointerType(), cuModule);		builder.create<LLVM::LoadOp>(loc, getPointerType(), cuModule);
auto kernelName = generateKernelNameConstant(launchOp.kernel(), loc, builder);		auto kernelName = generateKernelNameConstant(launchOp.getKernelModuleName(),
		launchOp.kernel(), loc, builder);
auto cuFunction = allocatePointer(builder, loc);		auto cuFunction = allocatePointer(builder, loc);
auto cuModuleGetFunction =		auto cuModuleGetFunction =
getOperation().lookupSymbol<LLVM::LLVMFuncOp>(cuModuleGetFunctionName);		getOperation().lookupSymbol<LLVM::LLVMFuncOp>(cuModuleGetFunctionName);
builder.create<LLVM::CallOp>(		builder.create<LLVM::CallOp>(
loc, ArrayRef<Type>{getCUResultType()},		loc, ArrayRef<Type>{getCUResultType()},
builder.getSymbolRefAttr(cuModuleGetFunction),		builder.getSymbolRefAttr(cuModuleGetFunction),
ArrayRef<Value>{cuFunction, cuOwningModuleRef, kernelName});		ArrayRef<Value>{cuFunction, cuOwningModuleRef, kernelName});
// Grab the global stream needed for execution.		// Grab the global stream needed for execution.
Show All 40 Lines

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp

	Show First 20 Lines • Show All 1,674 Lines • ▼ Show 20 Lines

	LLVMDialect::~LLVMDialect() {}			LLVMDialect::~LLVMDialect() {}

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/LLVMIR/LLVMOps.cpp.inc"			#include "mlir/Dialect/LLVMIR/LLVMOps.cpp.inc"

	llvm::LLVMContext &LLVMDialect::getLLVMContext() { return impl->llvmContext; }			llvm::LLVMContext &LLVMDialect::getLLVMContext() { return impl->llvmContext; }
	llvm::Module &LLVMDialect::getLLVMModule() { return impl->module; }			llvm::Module &LLVMDialect::getLLVMModule() { return impl->module; }
				llvm::sys::SmartMutex<true> &LLVMDialect::getLLVMContextMutex() {
				return impl->mutex;
				}

	/// Parse a type registered to this dialect.			/// Parse a type registered to this dialect.
	Type LLVMDialect::parseType(DialectAsmParser &parser) const {			Type LLVMDialect::parseType(DialectAsmParser &parser) const {
	StringRef tyData = parser.getFullSymbolSpec();			StringRef tyData = parser.getFullSymbolSpec();

	// LLVM is not thread-safe, so lock access to it.			// LLVM is not thread-safe, so lock access to it.
	llvm::sys::SmartScopedLock<true> lock(impl->mutex);			llvm::sys::SmartScopedLock<true> lock(impl->mutex);

	▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
}		}

ModuleTranslation::ModuleTranslation(Operation *module,		ModuleTranslation::ModuleTranslation(Operation *module,
std::unique_ptr<llvm::Module> llvmModule)		std::unique_ptr<llvm::Module> llvmModule)
: mlirModule(module), llvmModule(std::move(llvmModule)),		: mlirModule(module), llvmModule(std::move(llvmModule)),
debugTranslation(		debugTranslation(
std::make_unique<DebugTranslation>(module, *this->llvmModule)),		std::make_unique<DebugTranslation>(module, *this->llvmModule)),
ompDialect(		ompDialect(
module->getContext()->getRegisteredDialect<omp::OpenMPDialect>()) {		module->getContext()->getRegisteredDialect<omp::OpenMPDialect>()),
		llvmDialect(module->getContext()->getRegisteredDialect<LLVMDialect>()) {
assert(satisfiesLLVMModule(mlirModule) &&		assert(satisfiesLLVMModule(mlirModule) &&
"mlirModule should honor LLVM's module semantics.");		"mlirModule should honor LLVM's module semantics.");
}		}
ModuleTranslation::~ModuleTranslation() {}		ModuleTranslation::~ModuleTranslation() {}

/// Given an OpenMP MLIR operation, create the corresponding LLVM IR		/// Given an OpenMP MLIR operation, create the corresponding LLVM IR
/// (including OpenMP runtime calls).		/// (including OpenMP runtime calls).
LogicalResult		LogicalResult
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	LogicalResult ModuleTranslation::convertBlock(Block &bb, bool ignoreArguments) {
}		}

return success();		return success();
}		}

/// Create named global variables that correspond to llvm.mlir.global		/// Create named global variables that correspond to llvm.mlir.global
/// definitions.		/// definitions.
LogicalResult ModuleTranslation::convertGlobals() {		LogicalResult ModuleTranslation::convertGlobals() {
		// Lock access to the llvm context.
		llvm::sys::SmartScopedLock<true> scopedLock(
		rriddleUnsubmitted Not Done Reply Inline Actions Can we just lock once at beginning? rriddle: Can we just lock once at beginning?
		llvmDialect->getLLVMContextMutex());
for (auto op : getModuleBody(mlirModule).getOps<LLVM::GlobalOp>()) {		for (auto op : getModuleBody(mlirModule).getOps<LLVM::GlobalOp>()) {
llvm::Type *type = op.getType().getUnderlyingType();		llvm::Type *type = op.getType().getUnderlyingType();
llvm::Constant *cst = llvm::UndefValue::get(type);		llvm::Constant *cst = llvm::UndefValue::get(type);
if (op.getValueOrNull()) {		if (op.getValueOrNull()) {
// String attributes are treated separately because they cannot appear as		// String attributes are treated separately because they cannot appear
// in-function constants and are thus not supported by getLLVMConstant.		// as in-function constants and are thus not supported by
		// getLLVMConstant.
if (auto strAttr = op.getValueOrNull().dyn_cast_or_null<StringAttr>()) {		if (auto strAttr = op.getValueOrNull().dyn_cast_or_null<StringAttr>()) {
cst = llvm::ConstantDataArray::getString(		cst = llvm::ConstantDataArray::getString(
llvmModule->getContext(), strAttr.getValue(), /AddNull=/false);		llvmModule->getContext(), strAttr.getValue(), /AddNull=/false);
type = cst->getType();		type = cst->getType();
} else if (!(cst = getLLVMConstant(type, op.getValueOrNull(),		} else if (!(cst = getLLVMConstant(type, op.getValueOrNull(),
op.getLoc()))) {		op.getLoc()))) {
return failure();		return failure();
}		}
▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	if (!isa<LLVM::LLVMFuncOp>(&o) && !isa<LLVM::GlobalOp>(&o) &&
!o.isKnownTerminator())		!o.isKnownTerminator())
return o.emitOpError("unsupported module-level operation");		return o.emitOpError("unsupported module-level operation");
return success();		return success();
}		}

LogicalResult ModuleTranslation::convertFunctions() {		LogicalResult ModuleTranslation::convertFunctions() {
// Declare all functions first because there may be function calls that form a		// Declare all functions first because there may be function calls that form a
// call graph with cycles.		// call graph with cycles.
		llvm::sys::SmartScopedLock<true> scopedLock(
		llvmDialect->getLLVMContextMutex());
for (auto function : getModuleBody(mlirModule).getOps<LLVMFuncOp>()) {		for (auto function : getModuleBody(mlirModule).getOps<LLVMFuncOp>()) {
llvm::FunctionCallee llvmFuncCst = llvmModule->getOrInsertFunction(		llvm::FunctionCallee llvmFuncCst = llvmModule->getOrInsertFunction(
function.getName(),		function.getName(),
cast<llvm::FunctionType>(function.getType().getUnderlyingType()));		cast<llvm::FunctionType>(function.getType().getUnderlyingType()));
llvm::Function *llvmFunc = cast<llvm::Function>(llvmFuncCst.getCallee());		llvm::Function *llvmFunc = cast<llvm::Function>(llvmFuncCst.getCallee());
functionMapping[function.getName()] = llvmFunc;		functionMapping[function.getName()] = llvmFunc;

// Forward the pass-through attributes to LLVM.		// Forward the pass-through attributes to LLVM.
Show All 26 Lines	ModuleTranslation::lookupValues(ValueRange values) {
}		}
return remapped;		return remapped;
}		}

std::unique_ptr<llvm::Module>		std::unique_ptr<llvm::Module>
ModuleTranslation::prepareLLVMModule(Operation *m) {		ModuleTranslation::prepareLLVMModule(Operation *m) {
auto *dialect = m->getContext()->getRegisteredDialect<LLVM::LLVMDialect>();		auto *dialect = m->getContext()->getRegisteredDialect<LLVM::LLVMDialect>();
assert(dialect && "LLVM dialect must be registered");		assert(dialect && "LLVM dialect must be registered");
		// Lock the LLVM context as we might create new types here.
		llvm::sys::SmartScopedLock<true> scopedLock(dialect->getLLVMContextMutex());

auto llvmModule = llvm::CloneModule(dialect->getLLVMModule());		auto llvmModule = llvm::CloneModule(dialect->getLLVMModule());
if (!llvmModule)		if (!llvmModule)
return nullptr;		return nullptr;

llvm::LLVMContext &llvmContext = llvmModule->getContext();		llvm::LLVMContext &llvmContext = llvmModule->getContext();
llvm::IRBuilder<> builder(llvmContext);		llvm::IRBuilder<> builder(llvmContext);

Show All 9 Lines

mlir/test/mlir-cuda-runner/two-modules.mlir

This file was added.

				// RUN: mlir-cuda-runner %s --print-ir-after-all --shared-libs=%cuda_wrapper_library_dir/libcuda-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void \| FileCheck %s --dump-input=always

				// CHECK: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
				func @main() {
				%arg = alloc() : memref<13xi32>
				%dst = memref_cast %arg : memref<13xi32> to memref<?xi32>
				%one = constant 1 : index
				%sx = dim %dst, 0 : memref<?xi32>
				call @mcuMemHostRegisterMemRef1dInt32(%dst) : (memref<?xi32>) -> ()
				gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %one, %grid_y = %one, %grid_z = %one)
				threads(%tx, %ty, %tz) in (%block_x = %sx, %block_y = %one, %block_z = %one) {
				%t0 = index_cast %tx : index to i32
				store %t0, %dst[%tx] : memref<?xi32>
				gpu.terminator
				}
				gpu.launch blocks(%bx, %by, %bz) in (%grid_x = %one, %grid_y = %one, %grid_z = %one)
				threads(%tx, %ty, %tz) in (%block_x = %sx, %block_y = %one, %block_z = %one) {
				%t0 = index_cast %tx : index to i32
				store %t0, %dst[%tx] : memref<?xi32>
				gpu.terminator
				}
				%U = memref_cast %dst : memref<?xi32> to memref<*xi32>
				call @print_memref_i32(%U) : (memref<*xi32>) -> ()
				return
				}

				func @mcuMemHostRegisterMemRef1dInt32(%ptr : memref<?xi32>)
				func @print_memref_i32(%ptr : memref<*xi32>)

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Allow for multiple gpu modules during translation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 257707

mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h

mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h

mlir/lib/Conversion/GPUToCUDA/CMakeLists.txt

mlir/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp

mlir/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp

mlir/test/mlir-cuda-runner/two-modules.mlir

[MLIR] Allow for multiple gpu modules during translation.
ClosedPublic