Download Raw Diff

Details

Reviewers

ftynse
ThomasRaoux
dcaballe
herhut
mehdi_amini

Commits

rGfcfeb1e5b3cd: [mlir][gpu] Add GPU target support to `gpu-to-llvm`.

Summary

For an explanation of these patches see D154153.

This patch modifies the lowering of gpu.module & gpu.launch_func in the gpu-to-llvm pass,
allowing the usage of the new GPU compilation mechanism in the patch series ending in D154153.

Instead of removing Modules, this patch preserves the module if it has target attributes so that the
gpu-module-to-binary pass can later serialize them.

Instead of lowering the kernel calls to the LLVM dialect, this patch primarily updates the operation's
arguments, leaving the job of converting the operation into LLVM instructions to the translation stage.
The reason for not lowering the operation to LLVM at this stage is that kernel launches do not have a
single one-to-one representation in LLVM. For example, a kernel launch can be represented by a call
to a kernel stub, like in CUDA or HIP.
Kernel launches are also intrinsically linked to the binary associated with the call, and the binaries are
converted during translation.

Depends on D154149

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fmorac created this revision.Jun 29 2023, 2:04 PM

Herald added a reviewer: ftynse. · View Herald TranscriptJun 29 2023, 2:04 PM

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 25 others. · View Herald Transcript

fmorac added a child revision: D154153: [mlir][gpu] Update GPU translation to accept binaries..Jun 29 2023, 2:12 PM

Harbormaster completed remote builds in B242236: Diff 535992.Jun 29 2023, 4:17 PM

Rebasing.

Harbormaster completed remote builds in B242305: Diff 536081.Jun 29 2023, 10:01 PM

fmorac published this revision for review.Jun 30 2023, 6:30 AM

Herald added a reviewer: herhut. · View Herald TranscriptJun 30 2023, 6:30 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Can we get a test for the bare pointer calling convention please?

Added bare pointer test.

Added new line.

Harbormaster completed remote builds in B247432: Diff 543215.Jul 22 2023, 12:24 PM

Can you add context / motivation in the description? The why is often more important than the what there.

(This would actually be valuable for most of the patches in this series)

mehdi_amini added inline comments.Jul 23 2023, 11:51 PM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1090–1091	You created a symbolTable, we should try to take advantage of it everywhere I think?
1135	If you're just change the operands of the op: can't it be done in-place?
mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir
2	Please remove -allow-unregistered-dialect, you may use the test dialect if needed

I think I understand this: this is for backward compatibility where the new mechanism isn't used, and when it is used there is a target attribute and the LLVM translation will be done through the new mechanism.

As mentioned in another patch: this is the kind of context that is helpful to explain in the description.

In D154152#4526546, @mehdi_amini wrote:

Can you add context / motivation in the description? The why is often more important than the what there.

(This would actually be valuable for most of the patches in this series)

Ok, I'll do it for the patch series.

In D154152#4526813, @mehdi_amini wrote:

I think I understand this: this is for backward compatibility where the new mechanism isn't used, and when it is used there is a target attribute and the LLVM translation will be done through the new mechanism.
As mentioned in another patch: this is the kind of context that is helpful to explain in the description.

Yes, the changes in this patch preserve the old mechanism while introducing the new one. The idea is to deprecate the old one once this gets approval, and then patch the LaunchFuncOP pattern to only use the new one.

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1090–1091	Yes, I'll change it.
1135	I think so? I don't remember if the in-place update gave me issues. In any case, I'll change it or say what was the issue.
mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir
2	Ok, yeah this was a test I copied & modified for testing the bare-ptr convention, so I presume it was an old test. I'll change it.

Removed the flag allow-unregistered-dialect & added the option for reusing the symbol table.

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1135	I'm not sure the `launch_func` op can be updated safely in place, as in some instances I change the number of results of the op. (I have to get rid of the async token as it's illegal),

fmorac added a reviewer: mehdi_amini.Aug 7 2023, 5:50 AM

Harbormaster completed remote builds in B250759: Diff 547749.Aug 7 2023, 7:57 AM

Please update the commit description to provide better context as discussed before

This revision is now accepted and ready to land.Aug 7 2023, 7:51 PM

fmorac edited the summary of this revision. (Show Details)Aug 9 2023, 7:00 PM

Rebasing.

Harbormaster completed remote builds in B251916: Diff 549335.Aug 11 2023, 6:03 AM

Closed by commit rGfcfeb1e5b3cd: [mlir][gpu] Add GPU target support to `gpu-to-llvm`. (authored by fmorac). · Explain WhyAug 11 2023, 5:27 PM

This revision was automatically updated to reflect the committed changes.

fmorac added a commit: rGfcfeb1e5b3cd: [mlir][gpu] Add GPU target support to `gpu-to-llvm`..

Diff 549335

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	LLVM::CallOp create(Location loc, OpBuilder &builder,
ArrayRef<Value> arguments) const;		ArrayRef<Value> arguments) const;

StringRef functionName;		StringRef functionName;
LLVM::LLVMFunctionType functionType;		LLVM::LLVMFunctionType functionType;
};		};

/// Collect a set of patterns to convert from the GPU dialect to LLVM and		/// Collect a set of patterns to convert from the GPU dialect to LLVM and
/// populate converter for gpu types.		/// populate converter for gpu types.
void populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,		void populateGpuToLLVMConversionPatterns(
RewritePatternSet &patterns,		LLVMTypeConverter &converter, RewritePatternSet &patterns,
StringRef gpuBinaryAnnotation = {},		StringRef gpuBinaryAnnotation = {}, bool kernelBarePtrCallConv = false,
bool kernelBarePtrCallConv = false);		SymbolTable *cachedModuleTable = nullptr);

/// A function that maps a MemorySpace enum to a target-specific integer value.		/// A function that maps a MemorySpace enum to a target-specific integer value.
using MemorySpaceMapping = std::function<unsigned(gpu::AddressSpace)>;		using MemorySpaceMapping = std::function<unsigned(gpu::AddressSpace)>;

/// Populates memory space attribute conversion rules for lowering		/// Populates memory space attribute conversion rules for lowering
/// gpu.address_space to integer values.		/// gpu.address_space to integer values.
void populateGpuMemorySpaceAttributeConversions(		void populateGpuMemorySpaceAttributeConversions(
TypeConverter &typeConverter, const MemorySpaceMapping &mapping);		TypeConverter &typeConverter, const MemorySpaceMapping &mapping);
} // namespace mlir		} // namespace mlir

#endif // MLIR_CONVERSION_GPUCOMMON_GPUCOMMONPASS_H_		#endif // MLIR_CONVERSION_GPUCOMMON_GPUCOMMONPASS_H_

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

Show First 20 Lines • Show All 444 Lines • ▼ Show 20 Lines
/// * streamSynchronize -- waits for operations on the stream to finish		/// * streamSynchronize -- waits for operations on the stream to finish
///		///
/// Intermediate data structures are allocated on the stack.		/// Intermediate data structures are allocated on the stack.
class ConvertLaunchFuncOpToGpuRuntimeCallPattern		class ConvertLaunchFuncOpToGpuRuntimeCallPattern
: public ConvertOpToGpuRuntimeCallPattern<gpu::LaunchFuncOp> {		: public ConvertOpToGpuRuntimeCallPattern<gpu::LaunchFuncOp> {
public:		public:
ConvertLaunchFuncOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter,		ConvertLaunchFuncOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter,
StringRef gpuBinaryAnnotation,		StringRef gpuBinaryAnnotation,
bool kernelBarePtrCallConv)		bool kernelBarePtrCallConv,
		SymbolTable *cachedModuleTable)
: ConvertOpToGpuRuntimeCallPattern<gpu::LaunchFuncOp>(typeConverter),		: ConvertOpToGpuRuntimeCallPattern<gpu::LaunchFuncOp>(typeConverter),
gpuBinaryAnnotation(gpuBinaryAnnotation),		gpuBinaryAnnotation(gpuBinaryAnnotation),
kernelBarePtrCallConv(kernelBarePtrCallConv) {}		kernelBarePtrCallConv(kernelBarePtrCallConv),
		cachedModuleTable(cachedModuleTable) {}

private:		private:
Value generateParamsArray(gpu::LaunchFuncOp launchOp, OpAdaptor adaptor,		Value generateParamsArray(gpu::LaunchFuncOp launchOp, OpAdaptor adaptor,
OpBuilder &builder) const;		OpBuilder &builder) const;
Value generateKernelNameConstant(StringRef moduleName, StringRef name,		Value generateKernelNameConstant(StringRef moduleName, StringRef name,
Location loc, OpBuilder &builder) const;		Location loc, OpBuilder &builder) const;

LogicalResult		LogicalResult
matchAndRewrite(gpu::LaunchFuncOp launchOp, OpAdaptor adaptor,		matchAndRewrite(gpu::LaunchFuncOp launchOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override;		ConversionPatternRewriter &rewriter) const override;

llvm::SmallString<32> gpuBinaryAnnotation;		llvm::SmallString<32> gpuBinaryAnnotation;
bool kernelBarePtrCallConv;		bool kernelBarePtrCallConv;
		SymbolTable *cachedModuleTable;
};		};

class EraseGpuModuleOpPattern : public OpRewritePattern<gpu::GPUModuleOp> {		class EraseGpuModuleOpPattern : public OpRewritePattern<gpu::GPUModuleOp> {
using OpRewritePattern<gpu::GPUModuleOp>::OpRewritePattern;		using OpRewritePattern<gpu::GPUModuleOp>::OpRewritePattern;

LogicalResult matchAndRewrite(gpu::GPUModuleOp op,		LogicalResult matchAndRewrite(gpu::GPUModuleOp op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
// GPU kernel modules are no longer necessary since we have a global		// GPU kernel modules are no longer necessary since we have a global
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	void GpuToLLVMConversionPass::runOnOperation() {
LowerToLLVMOptions options(&getContext());		LowerToLLVMOptions options(&getContext());
options.useOpaquePointers = useOpaquePointers;		options.useOpaquePointers = useOpaquePointers;
options.useBarePtrCallConv = hostBarePtrCallConv;		options.useBarePtrCallConv = hostBarePtrCallConv;

LLVMTypeConverter converter(&getContext(), options);		LLVMTypeConverter converter(&getContext(), options);
RewritePatternSet patterns(&getContext());		RewritePatternSet patterns(&getContext());
LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());

target.addIllegalDialect<gpu::GPUDialect>();		SymbolTable symbolTable = SymbolTable(getOperation());
		// Preserve GPU modules if they have target attributes.
		target.addDynamicallyLegalOp<gpu::GPUModuleOp>(
		[](gpu::GPUModuleOp module) -> bool {
		return module.getTargetsAttr() != nullptr;
		});
		// Accept as legal LaunchFuncOps if they refer to GPU Modules with targets and
		// the operands have been lowered.
		target.addDynamicallyLegalOp<gpu::LaunchFuncOp>(
		[&](gpu::LaunchFuncOp op) -> bool {
		auto module =
		symbolTable.lookup<gpu::GPUModuleOp>(op.getKernelModuleName());
		return converter.isLegal(op->getOperandTypes()) &&
		converter.isLegal(op->getResultTypes()) &&
		(module && module.getTargetsAttr() &&
		module.getTargetsAttr().size());
		});

mlir::arith::populateArithToLLVMConversionPatterns(converter, patterns);		mlir::arith::populateArithToLLVMConversionPatterns(converter, patterns);
mlir::cf::populateControlFlowToLLVMConversionPatterns(converter, patterns);		mlir::cf::populateControlFlowToLLVMConversionPatterns(converter, patterns);
populateVectorToLLVMConversionPatterns(converter, patterns);		populateVectorToLLVMConversionPatterns(converter, patterns);
populateFinalizeMemRefToLLVMConversionPatterns(converter, patterns);		populateFinalizeMemRefToLLVMConversionPatterns(converter, patterns);
populateFuncToLLVMConversionPatterns(converter, patterns);		populateFuncToLLVMConversionPatterns(converter, patterns);
populateAsyncStructuralTypeConversionsAndLegality(converter, patterns,		populateAsyncStructuralTypeConversionsAndLegality(converter, patterns,
target);		target);
populateGpuToLLVMConversionPatterns(converter, patterns, gpuBinaryAnnotation,		populateGpuToLLVMConversionPatterns(converter, patterns, gpuBinaryAnnotation,
kernelBarePtrCallConv);		kernelBarePtrCallConv, &symbolTable);

if (failed(		if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))		applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

LLVM::CallOp FunctionCallBuilder::create(Location loc, OpBuilder &builder,		LLVM::CallOp FunctionCallBuilder::create(Location loc, OpBuilder &builder,
ArrayRef<Value> arguments) const {		ArrayRef<Value> arguments) const {
▲ Show 20 Lines • Show All 470 Lines • ▼ Show 20 Lines	LogicalResult ConvertLaunchFuncOpToGpuRuntimeCallPattern::matchAndRewrite(
// use of the stream after this op.		// use of the stream after this op.
if (!launchOp.getAsyncToken() && !launchOp.getAsyncDependencies().empty())		if (!launchOp.getAsyncToken() && !launchOp.getAsyncDependencies().empty())
return rewriter.notifyMatchFailure(		return rewriter.notifyMatchFailure(
launchOp, "Cannot convert non-async op with async dependencies.");		launchOp, "Cannot convert non-async op with async dependencies.");

Location loc = launchOp.getLoc();		Location loc = launchOp.getLoc();

// Create an LLVM global with CUBIN extracted from the kernel annotation and		// Create an LLVM global with CUBIN extracted from the kernel annotation and
// obtain a pointer to the first byte in it.		// obtain a pointer to the first byte in it.
auto kernelModule = SymbolTable::lookupNearestSymbolFrom<gpu::GPUModuleOp>(		gpu::GPUModuleOp kernelModule;
		mehdi_aminiUnsubmitted Done Reply Inline Actions You created a symbolTable, we should try to take advantage of it everywhere I think? mehdi_amini: You created a symbolTable, we should try to take advantage of it everywhere I think?
		fmoracAuthorUnsubmitted Done Reply Inline Actions Yes, I'll change it. fmorac: Yes, I'll change it.
		if (cachedModuleTable)
		kernelModule = cachedModuleTable->lookup<gpu::GPUModuleOp>(
		launchOp.getKernelModuleName());
		else
		kernelModule = SymbolTable::lookupNearestSymbolFrom<gpu::GPUModuleOp>(
launchOp, launchOp.getKernelModuleName());		launchOp, launchOp.getKernelModuleName());
assert(kernelModule && "expected a kernel module");		assert(kernelModule && "expected a kernel module");

		// If the module has Targets then just update the op operands.
		if (ArrayAttr targets = kernelModule.getTargetsAttr()) {
		Value stream = Value();
		if (adaptor.getAsyncDependencies().size())
		stream = adaptor.getAsyncDependencies().front();
		// If the async keyword is present and there are no dependencies, then a
		// stream must be created to pass to subsequent operations.
		else if (launchOp.getAsyncToken())
		stream = streamCreateCallBuilder.create(loc, rewriter, {}).getResult();

		// Lower the kernel operands to match kernel parameters.
		SmallVector<Value, 4> arguments;
		if (kernelBarePtrCallConv) {
		// Hack the bare pointer value on just for the argument promotion
		LLVMTypeConverter *converter = getTypeConverter();
		LowerToLLVMOptions options = converter->getOptions();
		LowerToLLVMOptions overrideToMatchKernelOpts = options;
		overrideToMatchKernelOpts.useBarePtrCallConv = true;
		converter->dangerousSetOptions(overrideToMatchKernelOpts);
		arguments =
		converter->promoteOperands(loc, launchOp.getKernelOperands(),
		adaptor.getKernelOperands(), rewriter);
		converter->dangerousSetOptions(options);
		} else {
		arguments = getTypeConverter()->promoteOperands(
		loc, launchOp.getKernelOperands(), adaptor.getKernelOperands(),
		rewriter);
		}

		rewriter.create<gpu::LaunchFuncOp>(
		launchOp.getLoc(), launchOp.getKernelAttr(),
		gpu::KernelDim3{adaptor.getGridSizeX(), adaptor.getGridSizeY(),
		adaptor.getGridSizeZ()},
		gpu::KernelDim3{adaptor.getBlockSizeX(), adaptor.getBlockSizeY(),
		adaptor.getBlockSizeZ()},
		adaptor.getDynamicSharedMemorySize(), arguments, stream);
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions If you're just change the operands of the op: can't it be done in-place? mehdi_amini: If you're just change the operands of the op: can't it be done in-place?
		fmoracAuthorUnsubmitted Done Reply Inline Actions I think so? I don't remember if the in-place update gave me issues. In any case, I'll change it or say what was the issue. fmorac: I think so? I don't remember if the in-place update gave me issues. In any case, I'll change it…
		fmoracAuthorUnsubmitted Done Reply Inline Actions I'm not sure the `launch_func` op can be updated safely in place, as in some instances I change the number of results of the op. (I have to get rid of the async token as it's illegal), fmorac: I'm not sure the `launch_func` op can be updated safely in place, as in some instances I change…
		if (launchOp.getAsyncToken())
		rewriter.replaceOp(launchOp, {stream});
		else
		rewriter.eraseOp(launchOp);
		return success();
		}

auto binaryAttr =		auto binaryAttr =
kernelModule->getAttrOfType<StringAttr>(gpuBinaryAnnotation);		kernelModule->getAttrOfType<StringAttr>(gpuBinaryAnnotation);
if (!binaryAttr) {		if (!binaryAttr) {
kernelModule.emitOpError()		kernelModule.emitOpError()
<< "missing " << gpuBinaryAnnotation << " attribute";		<< "missing " << gpuBinaryAnnotation << " attribute";
return failure();		return failure();
}		}

▲ Show 20 Lines • Show All 756 Lines • ▼ Show 20 Lines	createSetCsrPointersBuilder.create(
loc, rewriter, {adaptor.getSpmat(), pPos, pCrd, pVal, stream});		loc, rewriter, {adaptor.getSpmat(), pPos, pCrd, pVal, stream});
rewriter.replaceOp(op, {stream});		rewriter.replaceOp(op, {stream});
return success();		return success();
}		}

void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,		void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,
RewritePatternSet &patterns,		RewritePatternSet &patterns,
StringRef gpuBinaryAnnotation,		StringRef gpuBinaryAnnotation,
bool kernelBarePtrCallConv) {		bool kernelBarePtrCallConv,
		SymbolTable *cachedModuleTable) {
addOpaquePointerConversion<gpu::AsyncTokenType>(converter);		addOpaquePointerConversion<gpu::AsyncTokenType>(converter);
addOpaquePointerConversion<gpu::SparseDnTensorHandleType>(converter);		addOpaquePointerConversion<gpu::SparseDnTensorHandleType>(converter);
addOpaquePointerConversion<gpu::SparseSpMatHandleType>(converter);		addOpaquePointerConversion<gpu::SparseSpMatHandleType>(converter);
addOpaquePointerConversion<gpu::SparseSpGEMMOpHandleType>(converter);		addOpaquePointerConversion<gpu::SparseSpGEMMOpHandleType>(converter);

patterns.add<ConvertAllocOpToGpuRuntimeCallPattern,		patterns.add<ConvertAllocOpToGpuRuntimeCallPattern,
ConvertDeallocOpToGpuRuntimeCallPattern,		ConvertDeallocOpToGpuRuntimeCallPattern,
ConvertHostRegisterOpToGpuRuntimeCallPattern,		ConvertHostRegisterOpToGpuRuntimeCallPattern,
Show All 19 Lines	patterns.add<ConvertAllocOpToGpuRuntimeCallPattern,
ConvertSDDMMOpToGpuRuntimeCallPattern,		ConvertSDDMMOpToGpuRuntimeCallPattern,
ConvertSpGEMMCreateDescrOpToGpuRuntimeCallPattern,		ConvertSpGEMMCreateDescrOpToGpuRuntimeCallPattern,
ConvertSpGEMMDestroyDescrOpToGpuRuntimeCallPattern,		ConvertSpGEMMDestroyDescrOpToGpuRuntimeCallPattern,
ConvertSpGEMMWorkEstimationOrComputeOpToGpuRuntimeCallPattern,		ConvertSpGEMMWorkEstimationOrComputeOpToGpuRuntimeCallPattern,
ConvertSpGEMMCopyOpToGpuRuntimeCallPattern,		ConvertSpGEMMCopyOpToGpuRuntimeCallPattern,
ConvertSpGEMMGetSizeOpToGpuRuntimeCallPattern,		ConvertSpGEMMGetSizeOpToGpuRuntimeCallPattern,
ConvertSetCsrPointersOpToGpuRuntimeCallPattern>(converter);		ConvertSetCsrPointersOpToGpuRuntimeCallPattern>(converter);
patterns.add<ConvertLaunchFuncOpToGpuRuntimeCallPattern>(		patterns.add<ConvertLaunchFuncOpToGpuRuntimeCallPattern>(
converter, gpuBinaryAnnotation, kernelBarePtrCallConv);		converter, gpuBinaryAnnotation, kernelBarePtrCallConv, cachedModuleTable);
patterns.add<EraseGpuModuleOpPattern>(&converter.getContext());		patterns.add<EraseGpuModuleOpPattern>(&converter.getContext());
}		}

mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir

This file was added.

				// RUN: mlir-opt %s --gpu-to-llvm="use-bare-pointers-for-kernels=1" -split-input-file \| FileCheck %s

				mehdi_aminiUnsubmitted Done Reply Inline Actions Please remove -allow-unregistered-dialect, you may use the test dialect if needed mehdi_amini: Please remove -allow-unregistered-dialect, you may use the test dialect if needed
				fmoracAuthorUnsubmitted Done Reply Inline Actions Ok, yeah this was a test I copied & modified for testing the bare-ptr convention, so I presume it was an old test. I'll change it. fmorac: Ok, yeah this was a test I copied & modified for testing the bare-ptr convention, so I presume…
				module attributes {gpu.container_module} {
				gpu.module @kernels [#nvvm.target] {
				llvm.func @kernel_1(%arg0: f32, %arg1: !llvm.ptr<1>) attributes {gpu.kernel, nvvm.kernel} {
				%0 = llvm.mlir.undef : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				%1 = llvm.insertvalue %arg1, %0[0] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				%2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				%3 = llvm.mlir.constant(0 : index) : i64
				%4 = llvm.insertvalue %3, %2[2] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				%5 = llvm.mlir.constant(10 : index) : i64
				%6 = llvm.insertvalue %5, %4[3, 0] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				%7 = llvm.mlir.constant(1 : index) : i64
				%8 = llvm.insertvalue %7, %6[4, 0] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				llvm.return
				}
				}
				func.func @foo() {
				// CHECK: [[MEMREF:%.*]] = gpu.alloc () : memref<10xf32, 1>
				// CHECK: [[DESCRIPTOR:%.*]] = builtin.unrealized_conversion_cast [[MEMREF]] : memref<10xf32, 1> to !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				// CHECK: [[PTR:%.*]] = llvm.extractvalue [[DESCRIPTOR]][1] : !llvm.struct<(ptr<1>, ptr<1>, i64, array<1 x i64>, array<1 x i64>)>
				// CHECK: gpu.launch_func @kernels::@kernel_1 blocks in ({{.}}) threads in ({{.}}) : i64
				// CHECK: args(%{{.*}} : f32, [[PTR]] : !llvm.ptr<1>)
				%0 = arith.constant 0. : f32
				%1 = gpu.alloc () : memref<10xf32, 1>
				%c8 = arith.constant 8 : index
				gpu.launch_func @kernels::@kernel_1 blocks in (%c8, %c8, %c8) threads in (%c8, %c8, %c8) args(%0 : f32, %1 : memref<10xf32, 1>)
				return
				}
				}

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir

// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=nvvm.cubin use-opaque-pointers=1" \| FileCheck %s		// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=nvvm.cubin use-opaque-pointers=1" -split-input-file \| FileCheck %s
// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=rocdl.hsaco use-opaque-pointers=1" \| FileCheck %s --check-prefix=ROCDL		// RUN: mlir-opt %s --gpu-to-llvm="gpu-binary-annotation=rocdl.hsaco use-opaque-pointers=1" -split-input-file \| FileCheck %s --check-prefix=ROCDL

module attributes {gpu.container_module} {		module attributes {gpu.container_module} {

// CHECK: llvm.mlir.global internal constant @[[KERNEL_NAME:.*]]("kernel\00")		// CHECK: llvm.mlir.global internal constant @[[KERNEL_NAME:.*]]("kernel\00")
// CHECK: llvm.mlir.global internal constant @[[GLOBAL:.*]]("CUBIN")		// CHECK: llvm.mlir.global internal constant @[[GLOBAL:.*]]("CUBIN")
// ROCDL: llvm.mlir.global internal constant @[[GLOBAL:.*]]("HSACO")		// ROCDL: llvm.mlir.global internal constant @[[GLOBAL:.*]]("HSACO")

gpu.module @kernel_module attributes {		gpu.module @kernel_module attributes {
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	module attributes {gpu.container_module} {

// CHECK: llvm.call @mgpuLaunchKernel([[FUNC]], [[C8]], [[C8]], [[C8]],		// CHECK: llvm.call @mgpuLaunchKernel([[FUNC]], [[C8]], [[C8]], [[C8]],
// CHECK-SAME: [[C8]], [[C8]], [[C8]], [[C256]], [[STREAM]],		// CHECK-SAME: [[C8]], [[C8]], [[C8]], [[C256]], [[STREAM]],
// CHECK-SAME: [[PARAMS]], [[EXTRA_PARAMS]])		// CHECK-SAME: [[PARAMS]], [[EXTRA_PARAMS]])
// CHECK: llvm.call @mgpuStreamSynchronize		// CHECK: llvm.call @mgpuStreamSynchronize
// CHECK: llvm.call @mgpuStreamDestroy		// CHECK: llvm.call @mgpuStreamDestroy
// CHECK: llvm.call @mgpuModuleUnload		// CHECK: llvm.call @mgpuModuleUnload
}		}

		// -----

		module attributes {gpu.container_module} {
		// CHECK: gpu.module
		// ROCDL: gpu.module
		gpu.module @kernel_module [#nvvm.target] {
		llvm.func @kernel(%arg0: i32, %arg1: !llvm.ptr<f32>,
		%arg2: !llvm.ptr<f32>, %arg3: i64, %arg4: i64,
		%arg5: i64) attributes {gpu.kernel} {
		llvm.return
		}
		}

		func.func @foo(%buffer: memref<?xf32>) {
		// CHECK: [[C8:%.*]] = llvm.mlir.constant(8 : index) : i64
		// CHECK: [[C32:%.*]] = llvm.mlir.constant(32 : i32) : i32
		// CHECK: [[C256:%.*]] = llvm.mlir.constant(256 : i32) : i32
		%c8 = arith.constant 8 : index
		%c32 = arith.constant 32 : i32
		%c256 = arith.constant 256 : i32

		// CHECK: gpu.launch_func @kernel_module::@kernel
		// CHECK: blocks in ([[C8]], [[C8]], [[C8]]) threads in ([[C8]], [[C8]], [[C8]]) : i64
		// CHECK: dynamic_shared_memory_size [[C256]]
		// CHECK: args([[C32]] : i32, %{{.}} : !llvm.ptr, %{{.}} : !llvm.ptr, %{{.}} : i64, %{{.}} : i64, %{{.*}} : i64)
		gpu.launch_func @kernel_module::@kernel
		blocks in (%c8, %c8, %c8)
		threads in (%c8, %c8, %c8)
		dynamic_shared_memory_size %c256
		args(%c32 : i32, %buffer : memref<?xf32>)
		return
		}
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add GPU target support to `gpu-to-llvm`.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 549335

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add GPU target support to `gpu-to-llvm`.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 549335

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

mlir/test/Conversion/GPUCommon/lower-launch-func-bare-ptr.mlir

mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir

[mlir][gpu] Add GPU target support to `gpu-to-llvm`.
ClosedPublic