This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPU] make the index bitwidth configurable during the GPU to LLVM conversion
AbandonedPublic

Authored by gysit on Jan 21 2021, 12:06 PM.

Download Raw Diff

Details

Reviewers

herhut
csigg

Summary

The patch exposes the LLVM lowering options of the GPU to LLVM conversion pass. Additionally, it introduces the logic needed to cast index type parameters passed to the GPU runtime wrapper if the bitwidth of the index type is set to less than the 64-bits used by the GPU runtime wrapper.

For the review: This is a rather intrusive change and it may interfere with the entire data layout refactoring. If you prefer I can maintain this as a local change.

Diff Detail

Event Timeline

gysit created this revision.Jan 21 2021, 12:06 PM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 15 others. · View Herald TranscriptJan 21 2021, 12:06 PM

gysit requested review of this revision.Jan 21 2021, 12:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2021, 12:06 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B86142: Diff 318265.Jan 21 2021, 2:15 PM

Hi Tobias!

I'm curious, wouldn't it be easier to compile the runtime wrappers for 32bit instead of casting the arguments?

mehdi_amini added inline comments.Feb 22 2021, 2:43 PM

mlir/include/mlir/Conversion/Passes.td
127	You're adding many options that I don't find tests for at the moment?

In D95160#2579798, @csigg wrote:

Hi Tobias!

I'm curious, wouldn't it be easier to compile the runtime wrappers for 32bit instead of casting the arguments?

Hi Christian,

My idea was to keep things isolated in the pass and not touch the runtime wrapper. I didn't think about 32-bit compilation. It seems like a clean solution assuming all the rocm / cuda libraries work together with a 32-bit binary? An alternative could be to use preprocessor directives in the wrapper. I am not a big fan of that solution but it may be cleaner / require less code (there are multiple wrappers though).

I suggest we abandon this revision for the time being (we can apply the patch locally in our project). Without the casting it basically boils down to adding the llvm lowering options which should probably be replaced by data layout attributes anyways?

mlir/include/mlir/Conversion/Passes.td
127	That is a good point! I test only the option I actually use and added the other ones for completeness. I should have added and tested only the ones I use instead. AFAIK the llvm lowering options are replaced by data layout attributes anyways? I think that is a much nicer solution not only from a testing perspective. I thus suggest to abandon this revision (see my answer to @csigg 's comment).

gysit abandoned this revision.Mar 1 2021, 6:16 AM

Herald added a subscriber: cota. · View Herald TranscriptMar 1 2021, 6:16 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

GPUCommon/

GPUCommonPass.h

5 lines

Passes.td

16 lines

lib/

Conversion/

GPUCommon/

ConvertLaunchFuncToRuntimeCalls.cpp

85 lines

test/

Conversion/

GPUCommon/

lower-alloc-to-gpu-runtime-calls.mlir

3 lines

lower-launch-func-to-gpu-runtime-calls.mlir

3 lines

lower-memcpy-to-gpu-runtime-calls.mlir

3 lines

Diff 318265

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h

	//===- GPUCommonPass.h - MLIR GPU runtime support -------------------------===//			//===- GPUCommonPass.h - MLIR GPU runtime support -------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	#ifndef MLIR_CONVERSION_GPUCOMMON_GPUCOMMONPASS_H_			#ifndef MLIR_CONVERSION_GPUCOMMON_GPUCOMMONPASS_H_
	#define MLIR_CONVERSION_GPUCOMMON_GPUCOMMONPASS_H_			#define MLIR_CONVERSION_GPUCOMMON_GPUCOMMONPASS_H_

				#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"
	#include "mlir/Support/LLVM.h"			#include "mlir/Support/LLVM.h"
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include <vector>			#include <vector>

	namespace mlir {			namespace mlir {

	class LLVMTypeConverter;			class LLVMTypeConverter;
	class Location;			class Location;
	Show All 21 Lines

	/// Creates a pass to convert a gpu.launch_func operation into a sequence of			/// Creates a pass to convert a gpu.launch_func operation into a sequence of
	/// GPU runtime calls.			/// GPU runtime calls.
	///			///
	/// This pass does not generate code to call GPU runtime APIs directly but			/// This pass does not generate code to call GPU runtime APIs directly but
	/// instead uses a small wrapper library that exports a stable and conveniently			/// instead uses a small wrapper library that exports a stable and conveniently
	/// typed ABI on top of GPU runtimes such as CUDA or ROCm (HIP).			/// typed ABI on top of GPU runtimes such as CUDA or ROCm (HIP).
	std::unique_ptr<OperationPass<ModuleOp>>			std::unique_ptr<OperationPass<ModuleOp>>
	createGpuToLLVMConversionPass(StringRef gpuBinaryAnnotation = "");			createGpuToLLVMConversionPass(StringRef gpuBinaryAnnotation = "",
				const LowerToLLVMOptions &options =
				LowerToLLVMOptions::getDefaultOptions());

	/// Collect a set of patterns to convert from the GPU dialect to LLVM.			/// Collect a set of patterns to convert from the GPU dialect to LLVM.
	void populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,			void populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,
	OwningRewritePatternList &patterns,			OwningRewritePatternList &patterns,
	StringRef gpuBinaryAnnotation);			StringRef gpuBinaryAnnotation);

	/// Creates a pass to convert kernel functions into GPU target object blobs.			/// Creates a pass to convert kernel functions into GPU target object blobs.
	///			///
	Show All 29 Lines

mlir/include/mlir/Conversion/Passes.td

	Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines

	def GpuToLLVMConversionPass : Pass<"gpu-to-llvm", "ModuleOp"> {			def GpuToLLVMConversionPass : Pass<"gpu-to-llvm", "ModuleOp"> {
	let summary = "Convert GPU dialect to LLVM dialect with GPU runtime calls";			let summary = "Convert GPU dialect to LLVM dialect with GPU runtime calls";
	let constructor = "mlir::createGpuToLLVMConversionPass()";			let constructor = "mlir::createGpuToLLVMConversionPass()";
	let dependentDialects = ["LLVM::LLVMDialect"];			let dependentDialects = ["LLVM::LLVMDialect"];
	let options = [			let options = [
	Option<"gpuBinaryAnnotation", "gpu-binary-annotation", "std::string",			Option<"gpuBinaryAnnotation", "gpu-binary-annotation", "std::string",
	"", "Annotation attribute string for GPU binary">,			"", "Annotation attribute string for GPU binary">,
				Option<"useAlignedAlloc", "use-aligned-alloc", "bool", /default=/"false",
				"Use aligned_alloc in place of malloc for heap allocations">,
				Option<"useBarePtrCallConv", "use-bare-ptr-memref-call-conv", "bool",
				/default=/"false",
				"Replace FuncOp's MemRef arguments with bare pointers to the MemRef "
				"element types">,
				Option<"emitCWrappers", "emit-c-wrappers", "bool", /default=/"false",
				"Emit wrappers for C-compatible pointer-to-struct memref "
				"descriptors">,
				Option<"indexBitwidth", "index-bitwidth", "unsigned",
				/default=kDeriveIndexBitwidthFromDataLayout/"0",
				"Bitwidth of the index type, 0 to use size of machine word">,
				Option<"dataLayout", "data-layout", "std::string",
				/default=/"\"\"",
				"String description (LLVM format) of the data layout that is "
				"expected on the produced module">
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions You're adding many options that I don't find tests for at the moment? mehdi_amini: You're adding many options that I don't find tests for at the moment?
				gysitAuthorUnsubmitted Done Reply Inline Actions That is a good point! I test only the option I actually use and added the other ones for completeness. I should have added and tested only the ones I use instead. AFAIK the llvm lowering options are replaced by data layout attributes anyways? I think that is a much nicer solution not only from a testing perspective. I thus suggest to abandon this revision (see my answer to @csigg 's comment). gysit: That is a good point! I test only the option I actually use and added the other ones for…
	];			];
	}			}

	def LowerHostCodeToLLVM : Pass<"lower-host-to-llvm", "ModuleOp"> {			def LowerHostCodeToLLVM : Pass<"lower-host-to-llvm", "ModuleOp"> {
	let summary = "Lowers the host module code and `gpu.launch_func` to LLVM";			let summary = "Lowers the host module code and `gpu.launch_func` to LLVM";
	let constructor = "mlir::createLowerHostCodeToLLVMPass()";			let constructor = "mlir::createLowerHostCodeToLLVMPass()";
	let dependentDialects = ["LLVM::LLVMDialect"];			let dependentDialects = ["LLVM::LLVMDialect"];
	}			}
	▲ Show 20 Lines • Show All 403 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp

Show All 35 Lines

static constexpr const char *kGpuBinaryStorageSuffix = "_gpubin_cst";		static constexpr const char *kGpuBinaryStorageSuffix = "_gpubin_cst";

namespace {		namespace {

class GpuToLLVMConversionPass		class GpuToLLVMConversionPass
: public GpuToLLVMConversionPassBase<GpuToLLVMConversionPass> {		: public GpuToLLVMConversionPassBase<GpuToLLVMConversionPass> {
public:		public:
GpuToLLVMConversionPass(StringRef gpuBinaryAnnotation) {		GpuToLLVMConversionPass(StringRef gpuBinaryAnnotation,
		bool useBarePtrCallConv, bool emitCWrappers,
		unsigned indexBitwidth, bool useAlignedAlloc,
		const llvm::DataLayout &dataLayout) {
if (!gpuBinaryAnnotation.empty())		if (!gpuBinaryAnnotation.empty())
this->gpuBinaryAnnotation = gpuBinaryAnnotation.str();		this->gpuBinaryAnnotation = gpuBinaryAnnotation.str();
		this->useBarePtrCallConv = useBarePtrCallConv;
		this->emitCWrappers = emitCWrappers;
		this->indexBitwidth = indexBitwidth;
		this->useAlignedAlloc = useAlignedAlloc;
		this->dataLayout = dataLayout.getStringRepresentation();
}		}

// Run the dialect converter on the module.		// Run the dialect converter on the module.
void runOnOperation() override;		void runOnOperation() override;
};		};

		/// Helper class to build cast operations that adapt the bitwidth of index and
		/// size arguments to match the target function parameters.
		class IndexCastBuilder {
		public:
		IndexCastBuilder(unsigned indexBitwidth) : indexBitwidth(indexBitwidth) {}
		Value create(Location loc, OpBuilder &builder, Value value,
		Type paramType) const;

		private:
		unsigned indexBitwidth;
		};

class FunctionCallBuilder {		class FunctionCallBuilder {
public:		public:
FunctionCallBuilder(StringRef functionName, Type returnType,		FunctionCallBuilder(StringRef functionName, Type returnType,
ArrayRef<Type> argumentTypes)		ArrayRef<Type> argumentTypes)
: functionName(functionName),		: functionName(functionName),
functionType(LLVM::LLVMFunctionType::get(returnType, argumentTypes)) {}		functionType(LLVM::LLVMFunctionType::get(returnType, argumentTypes)) {}
LLVM::CallOp create(Location loc, OpBuilder &builder,		LLVM::CallOp create(Location loc, OpBuilder &builder,
ArrayRef<Value> arguments) const;		ArrayRef<Value> arguments,
		const IndexCastBuilder *indexCastBuilder = nullptr) const;

private:		private:
StringRef functionName;		StringRef functionName;
LLVM::LLVMFunctionType functionType;		LLVM::LLVMFunctionType functionType;
};		};

template <typename OpTy>		template <typename OpTy>
class ConvertOpToGpuRuntimeCallPattern : public ConvertOpToLLVMPattern<OpTy> {		class ConvertOpToGpuRuntimeCallPattern : public ConvertOpToLLVMPattern<OpTy> {
Show All 9 Lines	Type llvmPointerType =
LLVM::LLVMPointerType::get(IntegerType::get(context, 8));		LLVM::LLVMPointerType::get(IntegerType::get(context, 8));
Type llvmPointerPointerType = LLVM::LLVMPointerType::get(llvmPointerType);		Type llvmPointerPointerType = LLVM::LLVMPointerType::get(llvmPointerType);
Type llvmInt8Type = IntegerType::get(context, 8);		Type llvmInt8Type = IntegerType::get(context, 8);
Type llvmInt32Type = IntegerType::get(context, 32);		Type llvmInt32Type = IntegerType::get(context, 32);
Type llvmInt64Type = IntegerType::get(context, 64);		Type llvmInt64Type = IntegerType::get(context, 64);
Type llvmIntPtrType = IntegerType::get(		Type llvmIntPtrType = IntegerType::get(
context, this->getTypeConverter()->getPointerBitwidth(0));		context, this->getTypeConverter()->getPointerBitwidth(0));

		IndexCastBuilder indexCastBuilder = {
		this->getTypeConverter()->getIndexTypeBitwidth()};

FunctionCallBuilder moduleLoadCallBuilder = {		FunctionCallBuilder moduleLoadCallBuilder = {
"mgpuModuleLoad",		"mgpuModuleLoad",
llvmPointerType /* void module /,		llvmPointerType /* void module /,
{llvmPointerType /* void cubin /}};		{llvmPointerType /* void cubin /}};
FunctionCallBuilder moduleUnloadCallBuilder = {		FunctionCallBuilder moduleUnloadCallBuilder = {
"mgpuModuleUnload", llvmVoidType, {llvmPointerType /* void module /}};		"mgpuModuleUnload", llvmVoidType, {llvmPointerType /* void module /}};
FunctionCallBuilder moduleGetFunctionCallBuilder = {		FunctionCallBuilder moduleGetFunctionCallBuilder = {
"mgpuModuleGetFunction",		"mgpuModuleGetFunction",
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
private:		private:
LogicalResult		LogicalResult
matchAndRewrite(gpu::MemcpyOp memcpyOp, ArrayRef<Value> operands,		matchAndRewrite(gpu::MemcpyOp memcpyOp, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const override;		ConversionPatternRewriter &rewriter) const override;
};		};
} // namespace		} // namespace

void GpuToLLVMConversionPass::runOnOperation() {		void GpuToLLVMConversionPass::runOnOperation() {
LLVMTypeConverter converter(&getContext());		LowerToLLVMOptions options = {useBarePtrCallConv, emitCWrappers,
		indexBitwidth, useAlignedAlloc,
		llvm::DataLayout(this->dataLayout)};
		LLVMTypeConverter converter(&getContext(), options);
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
populateStdToLLVMConversionPatterns(converter, patterns);		populateStdToLLVMConversionPatterns(converter, patterns);
populateGpuToLLVMConversionPatterns(converter, patterns, gpuBinaryAnnotation);		populateGpuToLLVMConversionPatterns(converter, patterns, gpuBinaryAnnotation);

LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
if (failed(		if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))		applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

LLVM::CallOp FunctionCallBuilder::create(Location loc, OpBuilder &builder,		Value IndexCastBuilder::create(Location loc, OpBuilder &builder, Value value,
ArrayRef<Value> arguments) const {		Type paramType) const {
		// Only cast arguments of index or integer arguments if their bitwidth is
		// lower than the bitwidth of the target function parameter.
		if ((value.getType().isIndex() \|\|
		value.getType().isSignlessInteger(indexBitwidth)) &&
		paramType.getIntOrFloatBitWidth() > indexBitwidth) {
		return builder.create<LLVM::ZExtOp>(loc, paramType, value);
		}
		return value;
		}

		LLVM::CallOp
		FunctionCallBuilder::create(Location loc, OpBuilder &builder,
		ArrayRef<Value> arguments,
		const IndexCastBuilder *indexCastBuilder) const {
auto module = builder.getBlock()->getParent()->getParentOfType<ModuleOp>();		auto module = builder.getBlock()->getParent()->getParentOfType<ModuleOp>();
auto function = [&] {		auto function = [&] {
if (auto function = module.lookupSymbol<LLVM::LLVMFuncOp>(functionName))		if (auto function = module.lookupSymbol<LLVM::LLVMFuncOp>(functionName))
return function;		return function;
return OpBuilder(module.getBody()->getTerminator())		return OpBuilder(module.getBody()->getTerminator())
.create<LLVM::LLVMFuncOp>(loc, functionName, functionType);		.create<LLVM::LLVMFuncOp>(loc, functionName, functionType);
}();		}();
		// Optionally cast the index arguments to extend the bitwidth of index or
		// integer arguments to match the bitwidth of the function parameters.
		SmallVector<Value, 4> castedArguments(arguments.begin(), arguments.end());
		if (indexCastBuilder) {
		castedArguments.reserve(arguments.size());
		for (auto en : llvm::enumerate(arguments)) {
		// Get the function parameter type.
		auto paramType = const_cast<LLVM::LLVMFunctionType &>(functionType)
		.getParamType(en.index());
		castedArguments[en.index()] =
		indexCastBuilder->create(loc, builder, en.value(), paramType);
		}
		}
return builder.create<LLVM::CallOp>(		return builder.create<LLVM::CallOp>(
loc, const_cast<LLVM::LLVMFunctionType &>(functionType).getReturnType(),		loc, const_cast<LLVM::LLVMFunctionType &>(functionType).getReturnType(),
builder.getSymbolRefAttr(function), arguments);		builder.getSymbolRefAttr(function), castedArguments);
}		}

// Returns whether all operands are of LLVM type.		// Returns whether all operands are of LLVM type.
static LogicalResult areAllLLVMTypes(Operation *op, ValueRange operands,		static LogicalResult areAllLLVMTypes(Operation *op, ValueRange operands,
ConversionPatternRewriter &rewriter) {		ConversionPatternRewriter &rewriter) {
if (!llvm::all_of(operands, [](Value value) {		if (!llvm::all_of(operands, [](Value value) {
return LLVM::isCompatibleType(value.getType());		return LLVM::isCompatibleType(value.getType());
}))		}))
Show All 26 Lines	LogicalResult ConvertHostRegisterOpToGpuRuntimeCallPattern::matchAndRewrite(

auto memRefType = hostRegisterOp.value().getType();		auto memRefType = hostRegisterOp.value().getType();
auto elementType = memRefType.cast<UnrankedMemRefType>().getElementType();		auto elementType = memRefType.cast<UnrankedMemRefType>().getElementType();
auto elementSize = getSizeInBytes(loc, elementType, rewriter);		auto elementSize = getSizeInBytes(loc, elementType, rewriter);

auto arguments = getTypeConverter()->promoteOperands(loc, op->getOperands(),		auto arguments = getTypeConverter()->promoteOperands(loc, op->getOperands(),
operands, rewriter);		operands, rewriter);
arguments.push_back(elementSize);		arguments.push_back(elementSize);
hostRegisterCallBuilder.create(loc, rewriter, arguments);		hostRegisterCallBuilder.create(loc, rewriter, arguments, &indexCastBuilder);

rewriter.eraseOp(op);		rewriter.eraseOp(op);
return success();		return success();
}		}

LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(		LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::AllocOp allocOp, ArrayRef<Value> operands,		gpu::AllocOp allocOp, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
Show All 15 Lines	LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
getMemRefDescriptorSizes(loc, memRefType, adaptor.dynamicSizes(), rewriter,		getMemRefDescriptorSizes(loc, memRefType, adaptor.dynamicSizes(), rewriter,
shape, strides, sizeBytes);		shape, strides, sizeBytes);

// Allocate the underlying buffer and store a pointer to it in the MemRef		// Allocate the underlying buffer and store a pointer to it in the MemRef
// descriptor.		// descriptor.
Type elementPtrType = this->getElementPtrType(memRefType);		Type elementPtrType = this->getElementPtrType(memRefType);
auto stream = adaptor.asyncDependencies().front();		auto stream = adaptor.asyncDependencies().front();
Value allocatedPtr =		Value allocatedPtr =
allocCallBuilder.create(loc, rewriter, {sizeBytes, stream}).getResult(0);		allocCallBuilder
		.create(loc, rewriter, {sizeBytes, stream}, &indexCastBuilder)
		.getResult(0);
allocatedPtr =		allocatedPtr =
rewriter.create<LLVM::BitcastOp>(loc, elementPtrType, allocatedPtr);		rewriter.create<LLVM::BitcastOp>(loc, elementPtrType, allocatedPtr);

// No alignment.		// No alignment.
Value alignedPtr = allocatedPtr;		Value alignedPtr = allocatedPtr;

// Create the MemRef descriptor.		// Create the MemRef descriptor.
auto memRefDescriptor = this->createMemRefDescriptor(		auto memRefDescriptor = this->createMemRefDescriptor(
▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	LogicalResult ConvertLaunchFuncOpToGpuRuntimeCallPattern::matchAndRewrite(
auto kernelParams = generateParamsArray(launchOp, operands, rewriter);		auto kernelParams = generateParamsArray(launchOp, operands, rewriter);
auto nullpointer = rewriter.create<LLVM::NullOp>(loc, llvmPointerPointerType);		auto nullpointer = rewriter.create<LLVM::NullOp>(loc, llvmPointerPointerType);
launchKernelCallBuilder.create(loc, rewriter,		launchKernelCallBuilder.create(loc, rewriter,
{function.getResult(0), launchOp.gridSizeX(),		{function.getResult(0), launchOp.gridSizeX(),
launchOp.gridSizeY(), launchOp.gridSizeZ(),		launchOp.gridSizeY(), launchOp.gridSizeZ(),
launchOp.blockSizeX(), launchOp.blockSizeY(),		launchOp.blockSizeX(), launchOp.blockSizeY(),
launchOp.blockSizeZ(),		launchOp.blockSizeZ(),
/sharedMemBytes=/zero, stream, kernelParams,		/sharedMemBytes=/zero, stream, kernelParams,
/extra=/nullpointer});		/extra=/nullpointer},
		&indexCastBuilder);

if (launchOp.asyncToken()) {		if (launchOp.asyncToken()) {
// Async launch: make dependent ops use the same stream.		// Async launch: make dependent ops use the same stream.
rewriter.replaceOp(launchOp, {stream});		rewriter.replaceOp(launchOp, {stream});
} else {		} else {
// Synchronize with host and destroy stream. This must be the stream created		// Synchronize with host and destroy stream. This must be the stream created
// above (with no other uses) because we check that the synchronous version		// above (with no other uses) because we check that the synchronous version
// does not have any async dependencies.		// does not have any async dependencies.
Show All 38 Lines	LogicalResult ConvertMemcpyOpToGpuRuntimeCallPattern::matchAndRewrite(

auto src = rewriter.create<LLVM::BitcastOp>(		auto src = rewriter.create<LLVM::BitcastOp>(
loc, llvmPointerType, srcDesc.alignedPtr(rewriter, loc));		loc, llvmPointerType, srcDesc.alignedPtr(rewriter, loc));
auto dst = rewriter.create<LLVM::BitcastOp>(		auto dst = rewriter.create<LLVM::BitcastOp>(
loc, llvmPointerType,		loc, llvmPointerType,
MemRefDescriptor(adaptor.dst()).alignedPtr(rewriter, loc));		MemRefDescriptor(adaptor.dst()).alignedPtr(rewriter, loc));

auto stream = adaptor.asyncDependencies().front();		auto stream = adaptor.asyncDependencies().front();
memcpyCallBuilder.create(loc, rewriter, {dst, src, sizeBytes, stream});		memcpyCallBuilder.create(loc, rewriter, {dst, src, sizeBytes, stream},
		&indexCastBuilder);

rewriter.replaceOp(memcpyOp, {stream});		rewriter.replaceOp(memcpyOp, {stream});

return success();		return success();
}		}

std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>		std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
mlir::createGpuToLLVMConversionPass(StringRef gpuBinaryAnnotation) {		mlir::createGpuToLLVMConversionPass(StringRef gpuBinaryAnnotation,
return std::make_unique<GpuToLLVMConversionPass>(gpuBinaryAnnotation);		const LowerToLLVMOptions &options) {
		return std::make_unique<GpuToLLVMConversionPass>(
		gpuBinaryAnnotation, options.useBarePtrCallConv, options.emitCWrappers,
		options.indexBitwidth, options.useAlignedAlloc, options.dataLayout);
}		}

void mlir::populateGpuToLLVMConversionPatterns(		void mlir::populateGpuToLLVMConversionPatterns(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns,		LLVMTypeConverter &converter, OwningRewritePatternList &patterns,
StringRef gpuBinaryAnnotation) {		StringRef gpuBinaryAnnotation) {
converter.addConversion(		converter.addConversion(
[context = &converter.getContext()](gpu::AsyncTokenType type) -> Type {		[context = &converter.getContext()](gpu::AsyncTokenType type) -> Type {
return LLVM::LLVMPointerType::get(IntegerType::get(context, 8));		return LLVM::LLVMPointerType::get(IntegerType::get(context, 8));
Show All 11 Lines

mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir

	// RUN: mlir-opt %s --gpu-to-llvm \| FileCheck %s			// RUN: mlir-opt %s --gpu-to-llvm \| FileCheck %s
				// RUN: mlir-opt %s --gpu-to-llvm="index-bitwidth=32" \| FileCheck %s --check-prefix=CHECK32

	module attributes {gpu.container_module} {			module attributes {gpu.container_module} {
	// CHECK-LABEL: llvm.func @main			// CHECK-LABEL: llvm.func @main
	// CHECK-SAME: %[[size:.*]]: i64			// CHECK-SAME: %[[size:.*]]: i64
	func @main(%size : index) {			func @main(%size : index) {
	// CHECK: %[[stream:.*]] = llvm.call @mgpuStreamCreate()			// CHECK: %[[stream:.*]] = llvm.call @mgpuStreamCreate()
	%0 = gpu.wait async			%0 = gpu.wait async
	// CHECK: %[[gep:.]] = llvm.getelementptr {{.}}[%[[size]]]			// CHECK: %[[gep:.]] = llvm.getelementptr {{.}}[%[[size]]]
	// CHECK: %[[size_bytes:.*]] = llvm.ptrtoint %[[gep]]			// CHECK: %[[size_bytes:.*]] = llvm.ptrtoint %[[gep]]
				// CHECK32: %[[size_bytes:.*]] = llvm.ptrtoint
				// CHECK32: {{%.}} = llvm.zext %[[size_bytes:.]] : i32 to i64
	// CHECK: llvm.call @mgpuMemAlloc(%[[size_bytes]], %[[stream]])			// CHECK: llvm.call @mgpuMemAlloc(%[[size_bytes]], %[[stream]])
	%1, %2 = gpu.alloc async [%0] (%size) : memref<?xf32>			%1, %2 = gpu.alloc async [%0] (%size) : memref<?xf32>
	// CHECK: %[[float_ptr:.]] = llvm.extractvalue {{.}}[0]			// CHECK: %[[float_ptr:.]] = llvm.extractvalue {{.}}[0]
	// CHECK: %[[void_ptr:.*]] = llvm.bitcast %[[float_ptr]]			// CHECK: %[[void_ptr:.*]] = llvm.bitcast %[[float_ptr]]
	// CHECK: llvm.call @mgpuMemFree(%[[void_ptr]], %[[stream]])			// CHECK: llvm.call @mgpuMemFree(%[[void_ptr]], %[[stream]])
	%3 = gpu.dealloc async [%2] %1 : memref<?xf32>			%3 = gpu.dealloc async [%2] %1 : memref<?xf32>
	// CHECK: llvm.call @mgpuStreamSynchronize(%[[stream]])			// CHECK: llvm.call @mgpuStreamSynchronize(%[[stream]])
	// CHECK: llvm.call @mgpuStreamDestroy(%[[stream]])			// CHECK: llvm.call @mgpuStreamDestroy(%[[stream]])
	gpu.wait [%3]			gpu.wait [%3]
	return			return
	}			}
	}			}

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir

// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=nvvm.cubin" \| FileCheck %s		// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=nvvm.cubin" \| FileCheck %s
		// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=nvvm.cubin index-bitwidth=32" \| FileCheck %s --check-prefix=CHECK32
// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=rocdl.hsaco" \| FileCheck %s --check-prefix=ROCDL		// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=rocdl.hsaco" \| FileCheck %s --check-prefix=ROCDL

module attributes {gpu.container_module} {		module attributes {gpu.container_module} {

// CHECK: llvm.mlir.global internal constant @[[KERNEL_NAME:.*]]("kernel\00")		// CHECK: llvm.mlir.global internal constant @[[KERNEL_NAME:.*]]("kernel\00")
// CHECK: llvm.mlir.global internal constant @[[GLOBAL:.*]]("CUBIN")		// CHECK: llvm.mlir.global internal constant @[[GLOBAL:.*]]("CUBIN")
// ROCDL: llvm.mlir.global internal constant @[[GLOBAL:.*]]("HSACO")		// ROCDL: llvm.mlir.global internal constant @[[GLOBAL:.*]]("HSACO")

Show All 13 Lines	func @foo(%buffer: memref<?xf32>) {
gpu.launch_func @kernel_module::@kernel		gpu.launch_func @kernel_module::@kernel
blocks in (%c8, %c8, %c8)		blocks in (%c8, %c8, %c8)
threads in (%c8, %c8, %c8)		threads in (%c8, %c8, %c8)
args(%c32 : i32, %buffer : memref<?xf32>)		args(%c32 : i32, %buffer : memref<?xf32>)
return		return
}		}

// CHECK: [[C8:%.*]] = llvm.mlir.constant(8 : index) : i64		// CHECK: [[C8:%.*]] = llvm.mlir.constant(8 : index) : i64
		// CHECK32: [[C8:%.*]] = llvm.mlir.constant(8 : index) : i32
		// CHECK32: {{%.*}} = llvm.zext [[C8]] : i32 to i64
// CHECK: [[ADDRESSOF:%.*]] = llvm.mlir.addressof @[[GLOBAL]]		// CHECK: [[ADDRESSOF:%.*]] = llvm.mlir.addressof @[[GLOBAL]]
// CHECK: [[C0:%.*]] = llvm.mlir.constant(0 : index)		// CHECK: [[C0:%.*]] = llvm.mlir.constant(0 : index)
// CHECK: [[BINARY:%.*]] = llvm.getelementptr [[ADDRESSOF]]{{\[}}[[C0]], [[C0]]]		// CHECK: [[BINARY:%.*]] = llvm.getelementptr [[ADDRESSOF]]{{\[}}[[C0]], [[C0]]]
// CHECK-SAME: -> !llvm.ptr<i8>		// CHECK-SAME: -> !llvm.ptr<i8>

// CHECK: [[MODULE:%.*]] = llvm.call @mgpuModuleLoad([[BINARY]])		// CHECK: [[MODULE:%.*]] = llvm.call @mgpuModuleLoad([[BINARY]])
// CHECK: [[FUNC:%.]] = llvm.call @mgpuModuleGetFunction([[MODULE]], {{.}})		// CHECK: [[FUNC:%.]] = llvm.call @mgpuModuleGetFunction([[MODULE]], {{.}})

Show All 15 Lines

mlir/test/Conversion/GPUCommon/lower-memcpy-to-gpu-runtime-calls.mlir

	// RUN: mlir-opt %s --gpu-to-llvm \| FileCheck %s			// RUN: mlir-opt %s --gpu-to-llvm \| FileCheck %s
				// RUN: mlir-opt %s --gpu-to-llvm="index-bitwidth=32" \| FileCheck %s --check-prefix=CHECK32

	module attributes {gpu.container_module} {			module attributes {gpu.container_module} {

	// CHECK: func @foo			// CHECK: func @foo
	func @foo(%dst : memref<7xf32, 1>, %src : memref<7xf32>) {			func @foo(%dst : memref<7xf32, 1>, %src : memref<7xf32>) {
	// CHECK: %[[t0:.*]] = llvm.call @mgpuStreamCreate			// CHECK: %[[t0:.*]] = llvm.call @mgpuStreamCreate
	%t0 = gpu.wait async			%t0 = gpu.wait async
	// CHECK: %[[size_bytes:.*]] = llvm.ptrtoint			// CHECK: %[[size_bytes:.*]] = llvm.ptrtoint
				// CHECK32: %[[size_bytes:.*]] = llvm.ptrtoint
				// CHECK32: {{%.}} = llvm.zext %[[size_bytes:.]] : i32 to i64
	// CHECK: %[[src:.*]] = llvm.bitcast			// CHECK: %[[src:.*]] = llvm.bitcast
	// CHECK: %[[dst:.*]] = llvm.bitcast			// CHECK: %[[dst:.*]] = llvm.bitcast
	// CHECK: llvm.call @mgpuMemcpy(%[[dst]], %[[src]], %[[size_bytes]], %[[t0]])			// CHECK: llvm.call @mgpuMemcpy(%[[dst]], %[[src]], %[[size_bytes]], %[[t0]])
	%t1 = gpu.memcpy async [%t0] %dst, %src : memref<7xf32, 1>, memref<7xf32>			%t1 = gpu.memcpy async [%t0] %dst, %src : memref<7xf32, 1>, memref<7xf32>
	// CHECK: llvm.call @mgpuStreamSynchronize(%[[t0]])			// CHECK: llvm.call @mgpuStreamSynchronize(%[[t0]])
	// CHECK: llvm.call @mgpuStreamDestroy(%[[t0]])			// CHECK: llvm.call @mgpuStreamDestroy(%[[t0]])
	gpu.wait [%t1]			gpu.wait [%t1]
	return			return
	}			}
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPU] make the index bitwidth configurable during the GPU to LLVM conversionAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 318265

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h

mlir/include/mlir/Conversion/Passes.td

mlir/lib/Conversion/GPUCommon/ConvertLaunchFuncToRuntimeCalls.cpp

mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir

mlir/test/Conversion/GPUCommon/lower-memcpy-to-gpu-runtime-calls.mlir

[mlir][GPU] make the index bitwidth configurable during the GPU to LLVM conversion
AbandonedPublic