This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/
-
GPUToROCDL/
1
GPUToROCDLPass.h
-
Runtimes.h
-
Passes.td
-
Dialect/GPU/
-
GPU/
1
GPUOps.td
-
lib/
-
Conversion/
-
GPUCommon/
1/3
GPUOpsLowering.h
7/12
GPUOpsLowering.cpp
-
GPUToROCDL/
1/2
LowerGpuOpsToROCDLOps.cpp
-
PassDetail.h
-
Dialect/GPU/Transforms/
-
GPU/
-
Transforms/
-
SerializeToHsaco.cpp
-
Target/LLVMIR/Dialect/ROCDL/
-
LLVMIR/
-
Dialect/
-
ROCDL/
-
ROCDLToLLVMIRTranslation.cpp
-
test/
-
Conversion/GPUToROCDL/
-
GPUToROCDL/
2
gpu-to-rocdl-hip.mlir
-
gpu-to-rocdl-opencl.mlir
-
Dialect/GPU/
-
GPU/
-
ops.mlir
-
Integration/GPU/ROCM/
-
GPU/
-
ROCM/
-
printf.mlir

Differential D110448

[MLIR][GPU] Define gpu.printf op and its lowerings
ClosedPublic

Authored by krzysz00 on Sep 24 2021, 2:42 PM.

Download Raw Diff

Details

Reviewers

herhut
ftynse
bondhugula
mehdi_amini
nicolasvasilache
ThomasRaoux
dcaballe

Commits

rG79a0330a5257: Fix crash from use of a temporary after its scope exit
rGe1da62910e14: [MLIR][GPU] Define gpu.printf op and its lowerings

Summary

Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Sep 24 2021, 2:42 PM

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 20 others. · View Herald TranscriptSep 24 2021, 2:42 PM

krzysz00 requested review of this revision.Sep 24 2021, 2:42 PM

Herald added a reviewer: herhut. · View Herald TranscriptSep 24 2021, 2:42 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B125648: Diff 374967.Sep 24 2021, 3:10 PM

rriddle added inline comments.Sep 24 2021, 3:17 PM

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
86	This overload is deprecated, please switch to the adaptor variant.
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
908–916 ↗	(On Diff #374967)	Can you encode this in ODS instead?

bondhugula added a subscriber: bondhugula.Sep 24 2021, 6:19 PM

bondhugula added inline comments.

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
149	Doc comment here or on the struct.
197	As compact to just use `op.args()` inline below.
198–200	Reserve first before pushing?
201	Much more compact with: printArgs.append(..., ...);
mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
121	Please add this in sorted order.
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
910 ↗	(On Diff #374967)	This constraint can be placed on the TD definition.
911–912 ↗	(On Diff #374967)	`return op.emitOpError(...`

bondhugula requested changes to this revision.Sep 24 2021, 6:21 PM

bondhugula added inline comments.

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
160	This isn't needed. Ops have a default null init. Nit: `printf` -> `printfOp`.
171–195	Code comments for the major blocks please.

This revision now requires changes to proceed.Sep 24 2021, 6:21 PM

It isn't clear to me that this belong to the GPU dialect: what is GPU specific to this op or the lowering right now?

Adress first round of review comments

In D110448#3022192, @mehdi_amini wrote:

It isn't clear to me that this belong to the GPU dialect: what is GPU specific to this op or the lowering right now?

For the lowering I've defined, there's not much GPU-specific about it (except that the globals are placed in the gpu.module). However, IIRC, platforms like Vulkan have a much different printf() machinery that would require a different lowering.

From what I can tell, there could be an argument for putting printf as a concept somewhere else. Do you think std would be a good place for it instead?

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
160	Good to know, and done.
198–200	Whoops, fixed.
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
910 ↗	(On Diff #374967)	Thanks for pointing out that's possible now. This whole function has been removed

Harbormaster completed remote builds in B125892: Diff 375289.Sep 27 2021, 9:31 AM

Commit title fixes:

ILVM -> LLVM
it's -> its

bondhugula added inline comments.Sep 27 2021, 8:40 PM

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
195	pointr -> pointer
203	`args()` gives you a `ValueRange`. You don't need rvalue references here. You can simply use: `ValueRange args = adaptor.args();` like it's done everywhere.

To update y'all, it turns out the printf() lowering I was trying to use targets OpenCL, while AMD's MLIR kernels are often run under HIP, which has a different printf() arrangement. I'll likely be resubmitting with a much different lowering pass soon

bondhugula removed a reviewer: bondhugula.Sep 29 2021, 7:37 PM

Rework printf lowering to account for HIP being different

Herald added a reviewer: ftynse. · View Herald TranscriptOct 27 2021, 5:42 PM

Herald added subscribers: ormris, steven_wu, hiraditya and 2 others. · View Herald Transcript

(I'm definitely open to having this op in a different dialect, say std, but that feels like a broader discussion that would need to happen)

Harbormaster completed remote builds in B131086: Diff 382865.Oct 27 2021, 5:56 PM

(adding back Uday since the issue isn't inactive anymore?)

@herhut @ftynse - could I get another look at this, since it's been sitting here for a while?

[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
[MLIR][GPU] Define gpu.printf op and its lowering
Add integration test and call to set amdgpu_hostcall attribute

Herald added a subscriber: sdasgup3. · View Herald TranscriptNov 22 2021, 3:33 PM

Harbormaster completed remote builds in B135521: Diff 389056.Nov 22 2021, 3:34 PM

krzysz00 edited the summary of this revision. (Show Details)Nov 22 2021, 3:34 PM

Correctly promote floats

Harbormaster completed remote builds in B135665: Diff 389245.Nov 23 2021, 10:03 AM

mehdi_amini accepted this revision.Dec 7 2021, 5:46 PM

mehdi_amini added inline comments.

mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
29	This should be documented as requiring to be executed on a ModulePass.
mlir/include/mlir/Dialect/GPU/GPUOps.td
562	This would seem nicer :)
mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
51
mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
129	Can you document the address space?

This revision is now accepted and ready to land.Dec 7 2021, 5:46 PM

[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels
[MLIR][GPU] Define gpu.printf op and its lowering
Add integration test and call to set amdgpu_hostcall attribute
Correctly promote floats
Address pre-landing review commants on gpu.printf

Harbormaster completed remote builds in B138301: Diff 392953.Dec 8 2021, 3:29 PM

Re-push for patch weirdness

Harbormaster completed remote builds in B138307: Diff 392961.Dec 8 2021, 3:48 PM

krzysz00 removed a parent revision: D112668: [MLIR][GPU] Update SerializeToHsaco to match downstream.Dec 8 2021, 3:51 PM

Removed dependency on old SerializeToHsaco revision

Harbormaster completed remote builds in B138310: Diff 392964.Dec 8 2021, 7:32 PM

Closed by commit rGe1da62910e14: [MLIR][GPU] Define gpu.printf op and its lowerings (authored by krzysz00). · Explain WhyDec 9 2021, 7:54 AM

This revision was automatically updated to reflect the committed changes.

krzysz00 added a commit: rGe1da62910e14: [MLIR][GPU] Define gpu.printf op and its lowerings.

mehdi_amini added inline comments.Dec 9 2021, 6:00 PM

mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir

This test is crashing on one of my bot.

The CMake invocation is:

cmake -G Ninja ${src_dir}/llvm -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU;NVPTX" -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_BUILD_EXAMPLES=ON -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=Off -DLLVM_LINK_LLVM_DYLIB=ON -DMLIR_INCLUDE_INTEGRATION_TESTS=ON -DMLIR_INCLUDE_INTEGRATION_TESTS=ON

(on Linux Ubuntu with gcc 8 as host compiler)

Can you have a look at it?

mehdi_amini added inline comments.Dec 9 2021, 9:00 PM

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

243

The issue is here, this ValueRange refers to a temporary. I'll fix it.

mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir

ASAN:

=================================================================
==4963==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f17d905cde0 at pc 0x560629d0fdcf bp 0x7ffd9716f9b0 sp 0x7ffd9716f9a8
READ of size 8 at 0x7f17d905cde0 thread T0
    #0 0x560629d0fdce in dereference_iterator third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:608:12
    #1 0x560629d0fdce in operator* third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1133:14
    #2 0x560629d0fdce in mlir::Value* std::__u::uninitialized_copy<llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<mlir::Value const*, mlir::OpOperand*, mlir::detail::OpResultImpl*>, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::Value*>(llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<mlir::Value const*, mlir::OpOperand*, mlir::detail::OpResultImpl*>, mlir::Value, mlir::Value, mlir::Value>::iterator, llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<mlir::Value const*, mlir::OpOperand*, mlir::detail::OpResultImpl*>, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::Value*) third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:36:62
    #3 0x560629d054d4 in uninitialized_copy<llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<const mlir::Value *, mlir::OpOperand *, mlir::detail::OpResultImpl *>, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::Value *> third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:490:5
    #4 0x560629d054d4 in append<llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<const mlir::Value *, mlir::OpOperand *, mlir::detail::OpResultImpl *>, mlir::Value, mlir::Value, mlir::Value>::iterator, void> third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:652:5
    #5 0x560629d054d4 in mlir::OperationState::addOperands(mlir::ValueRange) third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:188:12
    #6 0x560628e65810 in mlir::LLVM::CallOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::LLVM::LLVMFuncOp, mlir::ValueRange, llvm::ArrayRef<mlir::NamedAttribute>) blaze-out/k8-asan-opt/bin/third_party/llvm/llvm-project/mlir/_virtual_includes/LLVMOpsIncGen/mlir/Dialect/LLVMIR/LLVMOps.cpp.inc:4117:16
    #7 0x560623ffb6b8 in mlir::LLVM::CallOp mlir::OpBuilder::create<mlir::LLVM::CallOp, mlir::LLVM::LLVMFuncOp&, mlir::ValueRange&>(mlir::Location, mlir::LLVM::LLVMFuncOp&, mlir::ValueRange&) third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h:427:5
    #8 0x560623ff8bc8 in mlir::GPUPrintfOpToHIPLowering::matchAndRewrite(mlir::gpu::PrintfOp, mlir::gpu::PrintfOpAdaptor, mlir::ConversionPatternRewriter&) const third_party/llvm/llvm-project/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp:245:16
    #9 0x560623fff4e0 in mlir::ConvertOpToLLVMPattern<mlir::gpu::PrintfOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::Value>,

mehdi_amini added a commit: rG79a0330a5257: Fix crash from use of a temporary after its scope exit.Dec 9 2021, 9:08 PM

mehdi_amini added inline comments.Mon, Nov 27, 11:10 PM

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
166	(same for the implementation)
mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
41	Hey @krzysz00 : seems like this should be in the ROCDL implementation instead of the common code?

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMon, Nov 27, 11:10 PM

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: TimAtGoogle, gysit, Dinistro and 5 others. · View Herald Transcript

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

GPUToROCDL/

GPUToROCDLPass.h

9 lines

Runtimes.h

24 lines

Passes.td

10 lines

Dialect/

GPU/

GPUOps.td

16 lines

lib/

Conversion/

GPUCommon/

GPUOpsLowering.h

34 lines

GPUOpsLowering.cpp

198 lines

GPUToROCDL/

LowerGpuOpsToROCDLOps.cpp

24 lines

PassDetail.h

2 lines

Dialect/

GPU/

Transforms/

SerializeToHsaco.cpp

6 lines

Target/

LLVMIR/

Dialect/

ROCDL/

ROCDLToLLVMIRTranslation.cpp

7 lines

test/

Conversion/

GPUToROCDL/

gpu-to-rocdl-hip.mlir

44 lines

gpu-to-rocdl-opencl.mlir

16 lines

Dialect/

GPU/

ops.mlir

8 lines

Integration/

GPU/

ROCM/

printf.mlir

29 lines

Diff 393167

mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h

	//===- GPUToROCDLPass.h - Convert GPU kernel to ROCDL dialect ---- C++ --===//			//===- GPUToROCDLPass.h - Convert GPU kernel to ROCDL dialect ---- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	#ifndef MLIR_CONVERSION_GPUTOROCDL_GPUTOROCDLPASS_H_			#ifndef MLIR_CONVERSION_GPUTOROCDL_GPUTOROCDLPASS_H_
	#define MLIR_CONVERSION_GPUTOROCDL_GPUTOROCDLPASS_H_			#define MLIR_CONVERSION_GPUTOROCDL_GPUTOROCDLPASS_H_

				#include "mlir/Conversion/GPUToROCDL/Runtimes.h"
	#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"			#include "mlir/Conversion/LLVMCommon/LoweringOptions.h"
	#include <memory>			#include <memory>

	namespace mlir {			namespace mlir {
	class LLVMTypeConverter;			class LLVMTypeConverter;
	class ConversionTarget;			class ConversionTarget;
	class RewritePatternSet;			class RewritePatternSet;
	using OwningRewritePatternList = RewritePatternSet;			using OwningRewritePatternList = RewritePatternSet;

	template <typename OpT>			template <typename OpT>
	class OperationPass;			class OperationPass;

	namespace gpu {			namespace gpu {
	class GPUModuleOp;			class GPUModuleOp;
	} // namespace gpu			} // namespace gpu

	/// Collect a set of patterns to convert from the GPU dialect to ROCDL.			/// Collect a set of patterns to convert from the GPU dialect to ROCDL.
				/// If `runtime` is Unknown, gpu.printf will not be lowered
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions This should be documented as requiring to be executed on a ModulePass. mehdi_amini: This should be documented as requiring to be executed on a ModulePass.
				/// The resulting pattern set should be run over a gpu.module op
	void populateGpuToROCDLConversionPatterns(LLVMTypeConverter &converter,			void populateGpuToROCDLConversionPatterns(LLVMTypeConverter &converter,
	RewritePatternSet &patterns);			RewritePatternSet &patterns,
				gpu::amd::Runtime runtime);

	/// Configure target to convert from the GPU dialect to ROCDL.			/// Configure target to convert from the GPU dialect to ROCDL.
	void configureGpuToROCDLConversionLegality(ConversionTarget &target);			void configureGpuToROCDLConversionLegality(ConversionTarget &target);

	/// Creates a pass that lowers GPU dialect operations to ROCDL counterparts. The			/// Creates a pass that lowers GPU dialect operations to ROCDL counterparts. The
	/// index bitwidth used for the lowering of the device side index computations			/// index bitwidth used for the lowering of the device side index computations
	/// is configurable.			/// is configurable.
	std::unique_ptr<OperationPass<gpu::GPUModuleOp>>			std::unique_ptr<OperationPass<gpu::GPUModuleOp>>
	createLowerGpuOpsToROCDLOpsPass(			createLowerGpuOpsToROCDLOpsPass(
	unsigned indexBitwidth = kDeriveIndexBitwidthFromDataLayout);			unsigned indexBitwidth = kDeriveIndexBitwidthFromDataLayout,
				gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown);

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_CONVERSION_GPUTOROCDL_GPUTOROCDLPASS_H_			#endif // MLIR_CONVERSION_GPUTOROCDL_GPUTOROCDLPASS_H_

mlir/include/mlir/Conversion/GPUToROCDL/Runtimes.h

This file was added.

				//===- Runtimes.h - Possible runtimes for AMD GPUs ---- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				#ifndef MLIR_CONVERSION_GPUTOROCDL_RUNTIMES_H
				#define MLIR_CONVERSION_GPUTOROCDL_RUNTIMES_H

				namespace mlir {
				namespace gpu {
				namespace amd {
				/// Potential runtimes for AMD GPU kernels
				enum Runtime {
				Unknown = 0,
				HIP = 1,
				OpenCL = 2,
				};
				} // end namespace amd
				} // end namespace gpu
				} // end namespace mlir

				#endif // MLIR_CONVERSION_GPUTOROCDL_RUNTIMES_H

mlir/include/mlir/Conversion/Passes.td

	Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines

	def ConvertGpuOpsToROCDLOps : Pass<"convert-gpu-to-rocdl", "gpu::GPUModuleOp"> {			def ConvertGpuOpsToROCDLOps : Pass<"convert-gpu-to-rocdl", "gpu::GPUModuleOp"> {
	let summary = "Generate ROCDL operations for gpu operations";			let summary = "Generate ROCDL operations for gpu operations";
	let constructor = "mlir::createLowerGpuOpsToROCDLOpsPass()";			let constructor = "mlir::createLowerGpuOpsToROCDLOpsPass()";
	let dependentDialects = ["ROCDL::ROCDLDialect"];			let dependentDialects = ["ROCDL::ROCDLDialect"];
	let options = [			let options = [
	Option<"indexBitwidth", "index-bitwidth", "unsigned",			Option<"indexBitwidth", "index-bitwidth", "unsigned",
	/default=kDeriveIndexBitwidthFromDataLayout/"0",			/default=kDeriveIndexBitwidthFromDataLayout/"0",
	"Bitwidth of the index type, 0 to use size of machine word">			"Bitwidth of the index type, 0 to use size of machine word">,
				Option<"runtime", "runtime", "::mlir::gpu::amd::Runtime",
				"::mlir::gpu::amd::Runtime::Unknown",
				"Runtime code will be run on (default is Unknown, can also use HIP or OpenCl)",
				[{::llvm::cl::values(
				clEnumValN(::mlir::gpu::amd::Runtime::Unknown, "unknown", "Unknown (default)"),
				clEnumValN(::mlir::gpu::amd::Runtime::HIP, "HIP", "HIP"),
				clEnumValN(::mlir::gpu::amd::Runtime::OpenCL, "OpenCL", "OpenCL")
				)}]>
	];			];
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GPUToSPIRV			// GPUToSPIRV
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def ConvertGPUToSPIRV : Pass<"convert-gpu-to-spirv", "ModuleOp"> {			def ConvertGPUToSPIRV : Pass<"convert-gpu-to-spirv", "ModuleOp"> {
	▲ Show 20 Lines • Show All 555 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/GPUOps.td

Show First 20 Lines • Show All 541 Lines • ▼ Show 20 Lines

def GPU_LaunchOp : GPU_Op<"launch">,

}];

let parser = [{ return parseLaunchOp(parser, result); }];

let printer = [{ printLaunchOp(p, *this); }];

let verifier = [{ return ::verify(*this); }];

let hasCanonicalizer = 1;

}

def GPU_PrintfOp : GPU_Op<"printf", [MemoryEffects<[MemWrite]>]>,

Arguments<(ins StrAttr:$format,

Variadic<AnyTypeOf<[AnyInteger, Index, AnyFloat]>>:$args)> {

let summary = "Device-side printf, as in CUDA or OpenCL, for debugging";

let description = [{

`gpu.printf` takes a literal format string `format` and an arbitrary number of

scalar arguments that should be printed.

The format string is a C-style printf string, subject to any restrictions

imposed by one's target platform.

}];

let assemblyFormat = [{

$format attr-dict ($args^ `:` type($args))?

mehdi_aminiUnsubmitted

Not Done

let assemblyFormat = [{

- attr-dict ($args^ `:` type($args))?

+ $format %args attr-dict ($args^ `:` type($args))?

}];

}

This would seem nicer :)

mehdi_amini: This would seem nicer :)

}];

}

def GPU_ReturnOp : GPU_Op<"return", [HasParent<"GPUFuncOp">, NoSideEffect,

Terminator]>,

Arguments<(ins Variadic<AnyType>:$operands)>, Results<(outs)> {

let summary = "Terminator for GPU functions.";

let description = [{

A terminator operation for regions that appear in the body of `gpu.func`

functions. The operands to the `gpu.return` are the result values returned

by an invocation of the `gpu.func`.

▲ Show 20 Lines • Show All 638 Lines • Show Last 20 Lines

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h

Show All 27 Lines

private:

/// The address spcae to use for `alloca`s in private memory.

unsigned allocaAddrSpace;

/// The attribute name to use instead of `gpu.kernel`.

StringAttr kernelAttributeName;

};

/// The lowering of gpu.printf to a call to HIP hostcalls

///

/// Simplifies llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp, as we don't have

/// to deal with %s (even if there were first-class strings in MLIR, they're not

/// legal input to gpu.printf) or non-constant format strings

struct GPUPrintfOpToHIPLowering : public ConvertOpToLLVMPattern<gpu::PrintfOp> {

mehdi_aminiUnsubmitted

Not Done

Hey @krzysz00 : seems like this should be in the ROCDL implementation instead of the common code?

mehdi_amini: Hey @krzysz00 : seems like this should be in the ROCDL implementation instead of the common…

using ConvertOpToLLVMPattern<gpu::PrintfOp>::ConvertOpToLLVMPattern;

LogicalResult

matchAndRewrite(gpu::PrintfOp gpuPrintfOp, gpu::PrintfOpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override;

};

/// The lowering of gpu.printf to a call to an external printf() function

///

/// This pass will add a declaration of printf() to the GPUModule if needed

mehdi_aminiUnsubmitted

Not Done

/// The lowering of gpu.printf to a call to an external printf() function

///

- /// This pass will add a decleration of printf() to the GPUModule if needed

+ /// This pass will add a declaration of printf() to the GPUModule if needed

/// and seperate out the format strings into global constants. For some

mehdi_amini:

/// and seperate out the format strings into global constants. For some

/// runtimes, such as OpenCL on AMD, this is sufficient setup, as the compiler

/// will lower printf calls to appropriate device-side code

struct GPUPrintfOpToLLVMCallLowering

: public ConvertOpToLLVMPattern<gpu::PrintfOp> {

GPUPrintfOpToLLVMCallLowering(LLVMTypeConverter &converter,

int addressSpace = 0)

: ConvertOpToLLVMPattern<gpu::PrintfOp>(converter),

addressSpace(addressSpace) {}

LogicalResult

matchAndRewrite(gpu::PrintfOp gpuPrintfOp, gpu::PrintfOpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override;

private:

int addressSpace;

};

struct GPUReturnOpLowering : public ConvertOpToLLVMPattern<gpu::ReturnOp> {

using ConvertOpToLLVMPattern<gpu::ReturnOp>::ConvertOpToLLVMPattern;

LogicalResult

matchAndRewrite(gpu::ReturnOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override {

rewriter.replaceOpWithNewOp<LLVM::ReturnOp>(op, adaptor.getOperands());

return success();

}

};

} // namespace mlir

#endif // MLIR_CONVERSION_GPUCOMMON_GPUOPSLOWERING_H_

rriddleUnsubmitted

Done

LogicalResult

- matchAndRewrite(gpu::PrintfOp gpuPrintfOp, ArrayRef<Value> operands,

+ matchAndRewrite(gpu::PrintfOp gpuPrintfOp, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override;

This overload is deprecated, please switch to the adaptor variant.

rriddle: This overload is deprecated, please switch to the adaptor variant.

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

//===- GPUOpsLowering.cpp - GPU FuncOp / ReturnOp lowering ----------------===//		//===- GPUOpsLowering.cpp - GPU FuncOp / ReturnOp lowering ----------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "GPUOpsLowering.h"		#include "GPUOpsLowering.h"
		#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"

using namespace mlir;		using namespace mlir;

LogicalResult		LogicalResult
GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,		GPUFuncOpLowering::matchAndRewrite(gpu::GPUFuncOp gpuFuncOp, OpAdaptor adaptor,
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	rewriter.inlineRegionBefore(gpuFuncOp.getBody(), llvmFuncOp.getBody(),
llvmFuncOp.end());		llvmFuncOp.end());
if (failed(rewriter.convertRegionTypes(&llvmFuncOp.getBody(), *typeConverter,		if (failed(rewriter.convertRegionTypes(&llvmFuncOp.getBody(), *typeConverter,
&signatureConversion)))		&signatureConversion)))
return failure();		return failure();

rewriter.eraseOp(gpuFuncOp);		rewriter.eraseOp(gpuFuncOp);
return success();		return success();
}		}

		static const char formatStringPrefix[] = "printfFormat_";
		bondhugulaUnsubmitted Done Reply Inline Actions Doc comment here or on the struct. bondhugula: Doc comment here or on the struct.

		template <typename T>
		static LLVM::LLVMFuncOp getOrDefineFunction(T &moduleOp, const Location loc,
		ConversionPatternRewriter &rewriter,
		StringRef name,
		LLVM::LLVMFunctionType type) {
		LLVM::LLVMFuncOp ret;
		if (!(ret = moduleOp.template lookupSymbol<LLVM::LLVMFuncOp>(name))) {
		ConversionPatternRewriter::InsertionGuard guard(rewriter);
		rewriter.setInsertionPointToStart(moduleOp.getBody());
		ret = rewriter.create<LLVM::LLVMFuncOp>(loc, name, type,
		bondhugulaUnsubmitted Done Reply Inline Actions This isn't needed. Ops have a default null init. Nit: `printf` -> `printfOp`. bondhugula: This isn't needed. Ops have a default null init. Nit: `printf` -> `printfOp`.
		krzysz00AuthorUnsubmitted Done Reply Inline Actions Good to know, and done. krzysz00: Good to know, and done.
		LLVM::Linkage::External);
		}
		return ret;
		}

		LogicalResult GPUPrintfOpToHIPLowering::matchAndRewrite(
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions (same for the implementation) mehdi_amini: (same for the implementation)
		gpu::PrintfOp gpuPrintfOp, gpu::PrintfOpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		Location loc = gpuPrintfOp->getLoc();

		mlir::Type llvmI8 = typeConverter->convertType(rewriter.getI8Type());
		mlir::Type i8Ptr = LLVM::LLVMPointerType::get(llvmI8);
		mlir::Type llvmIndex = typeConverter->convertType(rewriter.getIndexType());
		mlir::Type llvmI32 = typeConverter->convertType(rewriter.getI32Type());
		mlir::Type llvmI64 = typeConverter->convertType(rewriter.getI64Type());
		// Note: this is the GPUModule op, not the ModuleOp that surrounds it
		// This ensures that global constants and declarations are placed within
		// the device code, not the host code
		auto moduleOp = gpuPrintfOp->getParentOfType<gpu::GPUModuleOp>();

		auto ocklBegin =
		getOrDefineFunction(moduleOp, loc, rewriter, "__ockl_printf_begin",
		LLVM::LLVMFunctionType::get(llvmI64, {llvmI64}));
		LLVM::LLVMFuncOp ocklAppendArgs;
		if (!adaptor.args().empty()) {
		ocklAppendArgs = getOrDefineFunction(
		moduleOp, loc, rewriter, "__ockl_printf_append_args",
		LLVM::LLVMFunctionType::get(
		llvmI64, {llvmI64, /numArgs/ llvmI32, llvmI64, llvmI64, llvmI64,
		llvmI64, llvmI64, llvmI64, llvmI64, /isLast/ llvmI32}));
		}
		auto ocklAppendStringN = getOrDefineFunction(
		moduleOp, loc, rewriter, "__ockl_printf_append_string_n",
		LLVM::LLVMFunctionType::get(
		llvmI64,
		bondhugulaUnsubmitted Done Reply Inline Actions Code comments for the major blocks please. bondhugula: Code comments for the major blocks please.
		bondhugulaUnsubmitted Not Done Reply Inline Actions pointr -> pointer bondhugula: pointr -> pointer
		{llvmI64, i8Ptr, /length (bytes)/ llvmI64, /isLast/ llvmI32}));

		bondhugulaUnsubmitted Not Done Reply Inline Actions As compact to just use `op.args()` inline below. bondhugula: As compact to just use `op.args()` inline below.
		/// Start the printf hostcall
		Value zeroI64 = rewriter.create<LLVM::ConstantOp>(
		loc, llvmI64, rewriter.getI64IntegerAttr(0));
		bondhugulaUnsubmitted Done Reply Inline Actions Reserve first before pushing? bondhugula: Reserve first before pushing?
		krzysz00AuthorUnsubmitted Done Reply Inline Actions Whoops, fixed. krzysz00: Whoops, fixed.
		auto printfBeginCall = rewriter.create<LLVM::CallOp>(loc, ocklBegin, zeroI64);
		bondhugulaUnsubmitted Done Reply Inline Actions Much more compact with: printArgs.append(..., ...); bondhugula: Much more compact with: ``` printArgs.append(..., ...); ```
		Value printfDesc = printfBeginCall.getResult(0);

		bondhugulaUnsubmitted Not Done Reply Inline Actions `args()` gives you a `ValueRange`. You don't need rvalue references here. You can simply use: `ValueRange args = adaptor.args();` like it's done everywhere. bondhugula: `args()` gives you a `ValueRange`. You don't need rvalue references here. You can simply use…
		// Create a global constant for the format string
		unsigned stringNumber = 0;
		SmallString<16> stringConstName;
		do {
		stringConstName.clear();
		(formatStringPrefix + Twine(stringNumber++)).toStringRef(stringConstName);
		} while (moduleOp.lookupSymbol(stringConstName));

		llvm::SmallString<20> formatString(adaptor.format().getValue());
		formatString.push_back('\0'); // Null terminate for C
		size_t formatStringSize = formatString.size_in_bytes();

		auto globalType = LLVM::LLVMArrayType::get(llvmI8, formatStringSize);
		LLVM::GlobalOp global;
		{
		ConversionPatternRewriter::InsertionGuard guard(rewriter);
		rewriter.setInsertionPointToStart(moduleOp.getBody());
		global = rewriter.create<LLVM::GlobalOp>(
		loc, globalType,
		/isConstant=/true, LLVM::Linkage::Internal, stringConstName,
		rewriter.getStringAttr(formatString));
		}

		// Get a pointer to the format string's first element and pass it to printf()
		Value globalPtr = rewriter.create<LLVM::AddressOfOp>(loc, global);
		Value zero = rewriter.create<LLVM::ConstantOp>(
		loc, llvmIndex, rewriter.getIntegerAttr(llvmIndex, 0));
		Value stringStart = rewriter.create<LLVM::GEPOp>(
		loc, i8Ptr, globalPtr, mlir::ValueRange({zero, zero}));
		Value stringLen = rewriter.create<LLVM::ConstantOp>(
		loc, llvmI64, rewriter.getI64IntegerAttr(formatStringSize));

		Value oneI32 = rewriter.create<LLVM::ConstantOp>(
		loc, llvmI32, rewriter.getI32IntegerAttr(1));
		Value zeroI32 = rewriter.create<LLVM::ConstantOp>(
		loc, llvmI32, rewriter.getI32IntegerAttr(0));

		mlir::ValueRange appendFormatArgs = {printfDesc, stringStart, stringLen,
		adaptor.args().empty() ? oneI32
		: zeroI32};
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions The issue is here, this ValueRange refers to a temporary. I'll fix it. mehdi_amini: The issue is here, this ValueRange refers to a temporary. I'll fix it.
		auto appendFormatCall =
		rewriter.create<LLVM::CallOp>(loc, ocklAppendStringN, appendFormatArgs);
		printfDesc = appendFormatCall.getResult(0);

		// __ockl_printf_append_args takes 7 values per append call
		constexpr size_t argsPerAppend = 7;
		size_t nArgs = adaptor.args().size();
		for (size_t group = 0; group < nArgs; group += argsPerAppend) {
		size_t bound = std::min(group + argsPerAppend, nArgs);
		size_t numArgsThisCall = bound - group;

		SmallVector<mlir::Value, 2 + argsPerAppend + 1> arguments;
		arguments.push_back(printfDesc);
		arguments.push_back(rewriter.create<LLVM::ConstantOp>(
		loc, llvmI32, rewriter.getI32IntegerAttr(numArgsThisCall)));
		for (size_t i = group; i < bound; ++i) {
		Value arg = adaptor.args()[i];
		if (auto floatType = arg.getType().dyn_cast<FloatType>()) {
		if (!floatType.isF64())
		arg = rewriter.create<LLVM::FPExtOp>(
		loc, typeConverter->convertType(rewriter.getF64Type()), arg);
		arg = rewriter.create<LLVM::BitcastOp>(loc, llvmI64, arg);
		}
		if (arg.getType().getIntOrFloatBitWidth() != 64)
		arg = rewriter.create<LLVM::ZExtOp>(loc, llvmI64, arg);

		arguments.push_back(arg);
		}
		// Pad out to 7 arguments since the hostcall always needs 7
		for (size_t extra = numArgsThisCall; extra < argsPerAppend; ++extra) {
		arguments.push_back(zeroI64);
		}

		auto isLast = (bound == nArgs) ? oneI32 : zeroI32;
		arguments.push_back(isLast);
		auto call = rewriter.create<LLVM::CallOp>(loc, ocklAppendArgs, arguments);
		printfDesc = call.getResult(0);
		}
		rewriter.eraseOp(gpuPrintfOp);
		return success();
		}

		LogicalResult GPUPrintfOpToLLVMCallLowering::matchAndRewrite(
		gpu::PrintfOp gpuPrintfOp, gpu::PrintfOpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		Location loc = gpuPrintfOp->getLoc();

		mlir::Type llvmI8 = typeConverter->convertType(rewriter.getIntegerType(8));
		mlir::Type i8Ptr = LLVM::LLVMPointerType::get(llvmI8, addressSpace);
		mlir::Type llvmIndex = typeConverter->convertType(rewriter.getIndexType());

		// Note: this is the GPUModule op, not the ModuleOp that surrounds it
		// This ensures that global constants and declarations are placed within
		// the device code, not the host code
		auto moduleOp = gpuPrintfOp->getParentOfType<gpu::GPUModuleOp>();

		auto printfType = LLVM::LLVMFunctionType::get(rewriter.getI32Type(), {i8Ptr},
		/isVarArg=/true);
		LLVM::LLVMFuncOp printfDecl =
		getOrDefineFunction(moduleOp, loc, rewriter, "printf", printfType);

		// Create a global constant for the format string
		unsigned stringNumber = 0;
		SmallString<16> stringConstName;
		do {
		stringConstName.clear();
		(formatStringPrefix + Twine(stringNumber++)).toStringRef(stringConstName);
		} while (moduleOp.lookupSymbol(stringConstName));

		llvm::SmallString<20> formatString(adaptor.format().getValue());
		formatString.push_back('\0'); // Null terminate for C
		auto globalType =
		LLVM::LLVMArrayType::get(llvmI8, formatString.size_in_bytes());
		LLVM::GlobalOp global;
		{
		ConversionPatternRewriter::InsertionGuard guard(rewriter);
		rewriter.setInsertionPointToStart(moduleOp.getBody());
		global = rewriter.create<LLVM::GlobalOp>(
		loc, globalType,
		/isConstant=/true, LLVM::Linkage::Internal, stringConstName,
		rewriter.getStringAttr(formatString), /allignment=/0, addressSpace);
		}

		// Get a pointer to the format string's first element
		Value globalPtr = rewriter.create<LLVM::AddressOfOp>(loc, global);
		Value zero = rewriter.create<LLVM::ConstantOp>(
		loc, llvmIndex, rewriter.getIntegerAttr(llvmIndex, 0));
		Value stringStart = rewriter.create<LLVM::GEPOp>(
		loc, i8Ptr, globalPtr, mlir::ValueRange({zero, zero}));

		// Construct arguments and function call
		auto argsRange = adaptor.args();
		SmallVector<Value, 4> printfArgs;
		printfArgs.reserve(argsRange.size() + 1);
		printfArgs.push_back(stringStart);
		printfArgs.append(argsRange.begin(), argsRange.end());

		rewriter.create<LLVM::CallOp>(loc, printfDecl, printfArgs);
		rewriter.eraseOp(gpuPrintfOp);
		return success();
		}

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
// A pass that replaces all occurrences of GPU device operations with their		// A pass that replaces all occurrences of GPU device operations with their
// corresponding ROCDL equivalent.		// corresponding ROCDL equivalent.
//		//
// This pass only handles device code and is not meant to be run on GPU host		// This pass only handles device code and is not meant to be run on GPU host
// code.		// code.
struct LowerGpuOpsToROCDLOpsPass		struct LowerGpuOpsToROCDLOpsPass
: public ConvertGpuOpsToROCDLOpsBase<LowerGpuOpsToROCDLOpsPass> {		: public ConvertGpuOpsToROCDLOpsBase<LowerGpuOpsToROCDLOpsPass> {
LowerGpuOpsToROCDLOpsPass() = default;		LowerGpuOpsToROCDLOpsPass() = default;
LowerGpuOpsToROCDLOpsPass(unsigned indexBitwidth) {		LowerGpuOpsToROCDLOpsPass(unsigned indexBitwidth, gpu::amd::Runtime runtime) {
this->indexBitwidth = indexBitwidth;		this->indexBitwidth = indexBitwidth;
		this->runtime = runtime;
}		}

void runOnOperation() override {		void runOnOperation() override {
gpu::GPUModuleOp m = getOperation();		gpu::GPUModuleOp m = getOperation();

/// Customize the bitwidth used for the device side index computations.		/// Customize the bitwidth used for the device side index computations.
LowerToLLVMOptions options(		LowerToLLVMOptions options(
m.getContext(),		m.getContext(),
Show All 10 Lines	void runOnOperation() override {
(void)applyPatternsAndFoldGreedily(m, std::move(patterns));		(void)applyPatternsAndFoldGreedily(m, std::move(patterns));

mlir::arith::populateArithmeticToLLVMConversionPatterns(converter,		mlir::arith::populateArithmeticToLLVMConversionPatterns(converter,
llvmPatterns);		llvmPatterns);
populateVectorToLLVMConversionPatterns(converter, llvmPatterns);		populateVectorToLLVMConversionPatterns(converter, llvmPatterns);
populateVectorToROCDLConversionPatterns(converter, llvmPatterns);		populateVectorToROCDLConversionPatterns(converter, llvmPatterns);
populateStdToLLVMConversionPatterns(converter, llvmPatterns);		populateStdToLLVMConversionPatterns(converter, llvmPatterns);
populateMemRefToLLVMConversionPatterns(converter, llvmPatterns);		populateMemRefToLLVMConversionPatterns(converter, llvmPatterns);
populateGpuToROCDLConversionPatterns(converter, llvmPatterns);		populateGpuToROCDLConversionPatterns(converter, llvmPatterns, runtime);
LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
configureGpuToROCDLConversionLegality(target);		configureGpuToROCDLConversionLegality(target);
if (failed(applyPartialConversion(m, target, std::move(llvmPatterns))))		if (failed(applyPartialConversion(m, target, std::move(llvmPatterns))))
signalPassFailure();		signalPassFailure();
}		}
};		};

} // namespace		} // namespace

void mlir::configureGpuToROCDLConversionLegality(ConversionTarget &target) {		void mlir::configureGpuToROCDLConversionLegality(ConversionTarget &target) {
target.addIllegalOp<FuncOp>();		target.addIllegalOp<FuncOp>();
target.addLegalDialect<::mlir::LLVM::LLVMDialect>();		target.addLegalDialect<::mlir::LLVM::LLVMDialect>();
target.addLegalDialect<ROCDL::ROCDLDialect>();		target.addLegalDialect<ROCDL::ROCDLDialect>();
target.addIllegalDialect<gpu::GPUDialect>();		target.addIllegalDialect<gpu::GPUDialect>();
target.addIllegalOp<LLVM::CosOp, LLVM::ExpOp, LLVM::Exp2Op, LLVM::FAbsOp,		target.addIllegalOp<LLVM::CosOp, LLVM::ExpOp, LLVM::Exp2Op, LLVM::FAbsOp,
LLVM::FCeilOp, LLVM::FFloorOp, LLVM::LogOp, LLVM::Log10Op,		LLVM::FCeilOp, LLVM::FFloorOp, LLVM::LogOp, LLVM::Log10Op,
LLVM::Log2Op, LLVM::PowOp, LLVM::SinOp, LLVM::SqrtOp>();		LLVM::Log2Op, LLVM::PowOp, LLVM::SinOp, LLVM::SqrtOp>();

// TODO: Remove once we support replacing non-root ops.		// TODO: Remove once we support replacing non-root ops.
target.addLegalOp<gpu::YieldOp, gpu::GPUModuleOp, gpu::ModuleEndOp>();		target.addLegalOp<gpu::YieldOp, gpu::GPUModuleOp, gpu::ModuleEndOp>();
}		}

void mlir::populateGpuToROCDLConversionPatterns(LLVMTypeConverter &converter,		void mlir::populateGpuToROCDLConversionPatterns(
RewritePatternSet &patterns) {		LLVMTypeConverter &converter, RewritePatternSet &patterns,
		mlir::gpu::amd::Runtime runtime) {
		using mlir::gpu::amd::Runtime;

populateWithGenerated(patterns);		populateWithGenerated(patterns);
patterns		patterns
.add<GPUIndexIntrinsicOpLowering<gpu::ThreadIdOp, ROCDL::ThreadIdXOp,		.add<GPUIndexIntrinsicOpLowering<gpu::ThreadIdOp, ROCDL::ThreadIdXOp,
ROCDL::ThreadIdYOp, ROCDL::ThreadIdZOp>,		ROCDL::ThreadIdYOp, ROCDL::ThreadIdZOp>,
GPUIndexIntrinsicOpLowering<gpu::BlockDimOp, ROCDL::BlockDimXOp,		GPUIndexIntrinsicOpLowering<gpu::BlockDimOp, ROCDL::BlockDimXOp,
ROCDL::BlockDimYOp, ROCDL::BlockDimZOp>,		ROCDL::BlockDimYOp, ROCDL::BlockDimZOp>,
GPUIndexIntrinsicOpLowering<gpu::BlockIdOp, ROCDL::BlockIdXOp,		GPUIndexIntrinsicOpLowering<gpu::BlockIdOp, ROCDL::BlockIdXOp,
ROCDL::BlockIdYOp, ROCDL::BlockIdZOp>,		ROCDL::BlockIdYOp, ROCDL::BlockIdZOp>,
GPUIndexIntrinsicOpLowering<gpu::GridDimOp, ROCDL::GridDimXOp,		GPUIndexIntrinsicOpLowering<gpu::GridDimOp, ROCDL::GridDimXOp,
ROCDL::GridDimYOp, ROCDL::GridDimZOp>,		ROCDL::GridDimYOp, ROCDL::GridDimZOp>,
GPUReturnOpLowering>(converter);		GPUReturnOpLowering>(converter);
		bondhugulaUnsubmitted Done Reply Inline Actions Please add this in sorted order. bondhugula: Please add this in sorted order.
patterns.add<GPUFuncOpLowering>(		patterns.add<GPUFuncOpLowering>(
converter, /allocaAddrSpace=/5,		converter, /allocaAddrSpace=/5,
StringAttr::get(&converter.getContext(),		StringAttr::get(&converter.getContext(),
ROCDL::ROCDLDialect::getKernelFuncAttrName()));		ROCDL::ROCDLDialect::getKernelFuncAttrName()));
		if (Runtime::HIP == runtime) {
		patterns.add<GPUPrintfOpToHIPLowering>(converter);
		} else if (Runtime::OpenCL == runtime) {
		// Use address space = 4 to match the OpenCL definition of printf()
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Can you document the address space? mehdi_amini: Can you document the address space?
		patterns.add<GPUPrintfOpToLLVMCallLowering>(converter, /addressSpace=/4);
		}

patterns.add<OpToFuncCallLowering<math::AbsOp>>(converter, "__ocml_fabs_f32",		patterns.add<OpToFuncCallLowering<math::AbsOp>>(converter, "__ocml_fabs_f32",
"__ocml_fabs_f64");		"__ocml_fabs_f64");
patterns.add<OpToFuncCallLowering<math::AtanOp>>(converter, "__ocml_atan_f32",		patterns.add<OpToFuncCallLowering<math::AtanOp>>(converter, "__ocml_atan_f32",
"__ocml_atan_f64");		"__ocml_atan_f64");
patterns.add<OpToFuncCallLowering<math::Atan2Op>>(		patterns.add<OpToFuncCallLowering<math::Atan2Op>>(
converter, "__ocml_atan2_f32", "__ocml_atan2_f64");		converter, "__ocml_atan2_f32", "__ocml_atan2_f64");
patterns.add<OpToFuncCallLowering<math::CeilOp>>(converter, "__ocml_ceil_f32",		patterns.add<OpToFuncCallLowering<math::CeilOp>>(converter, "__ocml_ceil_f32",
"__ocml_ceil_f64");		"__ocml_ceil_f64");
Show All 23 Lines	patterns.add<OpToFuncCallLowering<math::SinOp>>(converter, "__ocml_sin_f32",
"__ocml_sin_f64");		"__ocml_sin_f64");
patterns.add<OpToFuncCallLowering<math::SqrtOp>>(converter, "__ocml_sqrt_f32",		patterns.add<OpToFuncCallLowering<math::SqrtOp>>(converter, "__ocml_sqrt_f32",
"__ocml_sqrt_f64");		"__ocml_sqrt_f64");
patterns.add<OpToFuncCallLowering<math::TanhOp>>(converter, "__ocml_tanh_f32",		patterns.add<OpToFuncCallLowering<math::TanhOp>>(converter, "__ocml_tanh_f32",
"__ocml_tanh_f64");		"__ocml_tanh_f64");
}		}

std::unique_ptr<OperationPass<gpu::GPUModuleOp>>		std::unique_ptr<OperationPass<gpu::GPUModuleOp>>
mlir::createLowerGpuOpsToROCDLOpsPass(unsigned indexBitwidth) {		mlir::createLowerGpuOpsToROCDLOpsPass(unsigned indexBitwidth,
return std::make_unique<LowerGpuOpsToROCDLOpsPass>(indexBitwidth);		gpu::amd::Runtime runtime) {
		return std::make_unique<LowerGpuOpsToROCDLOpsPass>(indexBitwidth, runtime);
}		}

mlir/lib/Conversion/PassDetail.h

	//===- PassDetail.h - Conversion Pass class details -------------- C++ --===//			//===- PassDetail.h - Conversion Pass class details -------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef CONVERSION_PASSDETAIL_H_			#ifndef CONVERSION_PASSDETAIL_H_
	#define CONVERSION_PASSDETAIL_H_			#define CONVERSION_PASSDETAIL_H_

	#include "mlir/Pass/Pass.h"			#include "mlir/Pass/Pass.h"

				#include "mlir/Conversion/GPUToROCDL/Runtimes.h"

	namespace mlir {			namespace mlir {
	class AffineDialect;			class AffineDialect;
	class StandardOpsDialect;			class StandardOpsDialect;

	// Forward declaration from Dialect.h			// Forward declaration from Dialect.h
	template <typename ConcreteDialect>			template <typename ConcreteDialect>
	void registerDialect(DialectRegistry &registry);			void registerDialect(DialectRegistry &registry);

	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	for (std::unique_ptr<llvm::Module> &libModule : mbModules.getValue()) {
// True is linker failure		// True is linker failure
if (err) {		if (err) {
getOperation().emitError(		getOperation().emitError(
"Unrecoverable failure during device library linking.");		"Unrecoverable failure during device library linking.");
// We have no guaranties about the state of `ret`, so bail		// We have no guaranties about the state of `ret`, so bail
return nullptr;		return nullptr;
}		}
}		}

		// Set amdgpu_hostcall if host calls have been linked, as needed by newer LLVM
		// FIXME: Is there a way to set this during printf() lowering that makes sense
		if (ret->getFunction("__ockl_hostcall_internal"))
		if (!ret->getModuleFlag("amdgpu_hostcall"))
		ret->addModuleFlag(llvm::Module::Override, "amdgpu_hostcall", 1);
return ret;		return ret;
}		}

LogicalResult		LogicalResult
SerializeToHsacoPass::optimizeLlvm(llvm::Module &llvmModule,		SerializeToHsacoPass::optimizeLlvm(llvm::Module &llvmModule,
llvm::TargetMachine &targetMachine) {		llvm::TargetMachine &targetMachine) {
int optLevel = this->optLevel.getValue();		int optLevel = this->optLevel.getValue();
if (optLevel < 0 \|\| optLevel > 3)		if (optLevel < 0 \|\| optLevel > 3)
▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	amendOperation(Operation *op, NamedAttribute attribute,
LLVM::ModuleTranslation &moduleTranslation) const final {		LLVM::ModuleTranslation &moduleTranslation) const final {
if (attribute.getName() == ROCDL::ROCDLDialect::getKernelFuncAttrName()) {		if (attribute.getName() == ROCDL::ROCDLDialect::getKernelFuncAttrName()) {
auto func = dyn_cast<LLVM::LLVMFuncOp>(op);		auto func = dyn_cast<LLVM::LLVMFuncOp>(op);
if (!func)		if (!func)
return failure();		return failure();

// For GPU kernels,		// For GPU kernels,
// 1. Insert AMDGPU_KERNEL calling convention.		// 1. Insert AMDGPU_KERNEL calling convention.
// 2. Insert amdgpu-flat-workgroup-size(1, 1024) attribute.		// 2. Insert amdgpu-flat-workgroup-size(1, 256) attribute.
		// 3. Insert amdgpu-implicitarg-num-bytes=56 (which must be set on OpenCL
		// and HIP kernels per Clang)
llvm::Function *llvmFunc =		llvm::Function *llvmFunc =
moduleTranslation.lookupFunction(func.getName());		moduleTranslation.lookupFunction(func.getName());
llvmFunc->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);		llvmFunc->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
llvmFunc->addFnAttr("amdgpu-flat-work-group-size", "1, 1024");		llvmFunc->addFnAttr("amdgpu-flat-work-group-size", "1, 256");
		llvmFunc->addFnAttr("amdgpu-implicitarg-num-bytes", "56");
}		}
return success();		return success();
}		}
};		};
} // namespace		} // namespace

void mlir::registerROCDLDialectTranslation(DialectRegistry &registry) {		void mlir::registerROCDLDialectTranslation(DialectRegistry &registry) {
registry.insert<ROCDL::ROCDLDialect>();		registry.insert<ROCDL::ROCDLDialect>();
Show All 9 Lines

mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir

This file was added.

				// RUN: mlir-opt %s -convert-gpu-to-rocdl=runtime=HIP -split-input-file \| FileCheck %s
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions This test is crashing on one of my bot. The CMake invocation is: cmake -G Ninja ${src_dir}/llvm -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU;NVPTX" -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_BUILD_EXAMPLES=ON -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=Off -DLLVM_LINK_LLVM_DYLIB=ON -DMLIR_INCLUDE_INTEGRATION_TESTS=ON -DMLIR_INCLUDE_INTEGRATION_TESTS=ON (on Linux Ubuntu with gcc 8 as host compiler) Can you have a look at it? mehdi_amini: This test is crashing on one of my bot. The CMake invocation is: ``` cmake -G Ninja…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions ASAN: ================================================================= ==4963==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f17d905cde0 at pc 0x560629d0fdcf bp 0x7ffd9716f9b0 sp 0x7ffd9716f9a8 READ of size 8 at 0x7f17d905cde0 thread T0 #0 0x560629d0fdce in dereference_iterator third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:608:12 #1 0x560629d0fdce in operator* third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1133:14 #2 0x560629d0fdce in mlir::Value* std::__u::uninitialized_copy<llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<mlir::Value const, mlir::OpOperand, mlir::detail::OpResultImpl>, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::Value>(llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<mlir::Value const, mlir::OpOperand, mlir::detail::OpResultImpl>, mlir::Value, mlir::Value, mlir::Value>::iterator, llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<mlir::Value const, mlir::OpOperand, mlir::detail::OpResultImpl>, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::Value) third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:36:62 #3 0x560629d054d4 in uninitialized_copy<llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<const mlir::Value , mlir::OpOperand , mlir::detail::OpResultImpl >, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::Value > third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:490:5 #4 0x560629d054d4 in append<llvm::detail::indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<const mlir::Value , mlir::OpOperand , mlir::detail::OpResultImpl >, mlir::Value, mlir::Value, mlir::Value>::iterator, void> third_party/llvm/llvm-project/llvm/include/llvm/ADT/SmallVector.h:652:5 #5 0x560629d054d4 in mlir::OperationState::addOperands(mlir::ValueRange) third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:188:12 #6 0x560628e65810 in mlir::LLVM::CallOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::LLVM::LLVMFuncOp, mlir::ValueRange, llvm::ArrayRef<mlir::NamedAttribute>) blaze-out/k8-asan-opt/bin/third_party/llvm/llvm-project/mlir/_virtual_includes/LLVMOpsIncGen/mlir/Dialect/LLVMIR/LLVMOps.cpp.inc:4117:16 #7 0x560623ffb6b8 in mlir::LLVM::CallOp mlir::OpBuilder::create<mlir::LLVM::CallOp, mlir::LLVM::LLVMFuncOp&, mlir::ValueRange&>(mlir::Location, mlir::LLVM::LLVMFuncOp&, mlir::ValueRange&) third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h:427:5 #8 0x560623ff8bc8 in mlir::GPUPrintfOpToHIPLowering::matchAndRewrite(mlir::gpu::PrintfOp, mlir::gpu::PrintfOpAdaptor, mlir::ConversionPatternRewriter&) const third_party/llvm/llvm-project/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp:245:16 #9 0x560623fff4e0 in mlir::ConvertOpToLLVMPattern<mlir::gpu::PrintfOp>::matchAndRewrite(mlir::Operation, llvm::ArrayRef<mlir::Value>, mehdi_amini:* ASAN: ``` ================================================================= ==4963==ERROR…

				gpu.module @test_module {
				// CHECK-DAG: llvm.mlir.global internal constant @[[$PRINT_GLOBAL0:[A-Za-z0-9_]+]]("Hello, world\0A\00")
				// CHECK-DAG: llvm.mlir.global internal constant @[[$PRINT_GLOBAL1:[A-Za-z0-9_]+]]("Hello: %d\0A\00")
				// CHECK-DAG: llvm.func @__ockl_printf_append_args(i64, i32, i64, i64, i64, i64, i64, i64, i64, i32) -> i64
				// CHECK-DAG: llvm.func @__ockl_printf_append_string_n(i64, !llvm.ptr<i8>, i64, i32) -> i64
				// CHECK-DAG: llvm.func @__ockl_printf_begin(i64) -> i64

				// CHECK-LABEL: func @test_const_printf
				gpu.func @test_const_printf() {
				// CHECK: %[[CST0:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[DESC0:.*]] = llvm.call @__ockl_printf_begin(%0) : (i64) -> i64
				// CHECK-NEXT: %[[FORMATSTR:.*]] = llvm.mlir.addressof @[[$PRINT_GLOBAL0]] : !llvm.ptr<array<14 x i8>>
				// CHECK-NEXT: %[[CST1:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[FORMATSTART:.*]] = llvm.getelementptr %[[FORMATSTR]][%[[CST1]], %[[CST1]]] : (!llvm.ptr<array<14 x i8>>, i64, i64) -> !llvm.ptr<i8>
				// CHECK-NEXT: %[[FORMATLEN:.*]] = llvm.mlir.constant(14 : i64) : i64
				// CHECK-NEXT: %[[ISLAST:.*]] = llvm.mlir.constant(1 : i32) : i32
				// CHECK-NEXT: %[[ISNTLAST:.*]] = llvm.mlir.constant(0 : i32) : i32
				// CHECK-NEXT: %{{.*}} = llvm.call @__ockl_printf_append_string_n(%[[DESC0]], %[[FORMATSTART]], %[[FORMATLEN]], %[[ISLAST]]) : (i64, !llvm.ptr<i8>, i64, i32) -> i64
				gpu.printf "Hello, world\n"
				gpu.return
				}


				// CHECK-LABEL: func @test_printf
				// CHECK: (%[[ARG0:.*]]: i32)
				gpu.func @test_printf(%arg0: i32) {
				// CHECK: %[[CST0:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[DESC0:.*]] = llvm.call @__ockl_printf_begin(%0) : (i64) -> i64
				// CHECK-NEXT: %[[FORMATSTR:.*]] = llvm.mlir.addressof @[[$PRINT_GLOBAL1]] : !llvm.ptr<array<11 x i8>>
				// CHECK-NEXT: %[[CST1:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[FORMATSTART:.*]] = llvm.getelementptr %[[FORMATSTR]][%[[CST1]], %[[CST1]]] : (!llvm.ptr<array<11 x i8>>, i64, i64) -> !llvm.ptr<i8>
				// CHECK-NEXT: %[[FORMATLEN:.*]] = llvm.mlir.constant(11 : i64) : i64
				// CHECK-NEXT: %[[ISLAST:.*]] = llvm.mlir.constant(1 : i32) : i32
				// CHECK-NEXT: %[[ISNTLAST:.*]] = llvm.mlir.constant(0 : i32) : i32
				// CHECK-NEXT: %[[DESC1:.*]] = llvm.call @__ockl_printf_append_string_n(%[[DESC0]], %[[FORMATSTART]], %[[FORMATLEN]], %[[ISNTLAST]]) : (i64, !llvm.ptr<i8>, i64, i32) -> i64
				// CHECK-NEXT: %[[NARGS1:.*]] = llvm.mlir.constant(1 : i32) : i32
				// CHECK-NEXT: %[[ARG0_64:.*]] = llvm.zext %[[ARG0]] : i32 to i64
				// CHECK-NEXT: %{{.*}} = llvm.call @__ockl_printf_append_args(%[[DESC1]], %[[NARGS1]], %[[ARG0_64]], %[[CST0]], %[[CST0]], %[[CST0]], %[[CST0]], %[[CST0]], %[[CST0]], %[[ISLAST]]) : (i64, i32, i64, i64, i64, i64, i64, i64, i64, i32) -> i64
				gpu.printf "Hello: %d\n" %arg0 : i32
				gpu.return
				}
				}

mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir

This file was added.

				// RUN: mlir-opt %s -convert-gpu-to-rocdl=runtime=OpenCL \| FileCheck %s

				gpu.module @test_module {
				// CHECK: llvm.mlir.global internal constant @[[$PRINT_GLOBAL:[A-Za-z0-9_]+]]("Hello: %d\0A\00") {addr_space = 4 : i32}
				// CHECK: llvm.func @printf(!llvm.ptr<i8, 4>, ...) -> i32
				// CHECK-LABEL: func @test_printf
				// CHECK: (%[[ARG0:.*]]: i32)
				gpu.func @test_printf(%arg0: i32) {
				// CHECK: %[[IMM0:.*]] = llvm.mlir.addressof @[[$PRINT_GLOBAL]] : !llvm.ptr<array<11 x i8>, 4>
				// CHECK-NEXT: %[[IMM1:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[IMM2:.*]] = llvm.getelementptr %[[IMM0]][%[[IMM1]], %[[IMM1]]] : (!llvm.ptr<array<11 x i8>, 4>, i64, i64) -> !llvm.ptr<i8, 4>
				// CHECK-NEXT: %{{.*}} = llvm.call @printf(%[[IMM2]], %[[ARG0]]) : (!llvm.ptr<i8, 4>, i32) -> i32
				gpu.printf "Hello: %d\n" %arg0 : i32
				gpu.return
				}
				}

mlir/test/Dialect/GPU/ops.mlir

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	gpu.func @kernel_1(%arg0: f32)
kernel		kernel
attributes {foo="bar"} {		attributes {foo="bar"} {
"use"(%arg1) : (memref<42xf32, 3>) -> ()		"use"(%arg1) : (memref<42xf32, 3>) -> ()
"use"(%arg2) : (memref<2xf32, 5>) -> ()		"use"(%arg2) : (memref<2xf32, 5>) -> ()
"use"(%arg3) : (memref<1xf32, 5>) -> ()		"use"(%arg3) : (memref<1xf32, 5>) -> ()
gpu.return		gpu.return
}		}

		// CHECK-LABEL gpu.func @printf_test
		// CHECK: (%[[ARG0:.*]]: i32)
		// CHECK: gpu.printf "Value: %d" %[[ARG0]] : i32
		gpu.func @printf_test(%arg0 : i32) {
		gpu.printf "Value: %d" %arg0 : i32
		gpu.return
		}

// CHECK-LABEL: gpu.func @no_attribution		// CHECK-LABEL: gpu.func @no_attribution
// CHECK: {		// CHECK: {
gpu.func @no_attribution(%arg0: f32) {		gpu.func @no_attribution(%arg0: f32) {
gpu.return		gpu.return
}		}

// CHECK-LABEL: @no_attribution_attrs		// CHECK-LABEL: @no_attribution_attrs
// CHECK: attributes		// CHECK: attributes
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

mlir/test/Integration/GPU/ROCM/printf.mlir

This file was added.

				// RUN: mlir-opt %s \
				// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl{index-bitwidth=32 runtime=HIP},gpu-to-hsaco{chip=%chip})' \
				// RUN: -gpu-to-llvm \
				// RUN: \| mlir-cpu-runner \
				// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \
				// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
				// RUN: --entry-point-result=void \
				// RUN: \| FileCheck %s

				// CHECK: Hello from 0
				// CHECK: Hello from 1
				module attributes {gpu.container_module} {
				gpu.module @kernels {
				gpu.func @hello() kernel {
				%0 = "gpu.thread_id"() {dimension="x"} : () -> (index)
				gpu.printf "Hello from %d\n" %0 : index
				gpu.return
				}
				}

				func @main() {
				%c2 = arith.constant 2 : index
				%c1 = arith.constant 1 : index
				gpu.launch_func @kernels::@hello
				blocks in (%c1, %c1, %c1)
				threads in (%c2, %c1, %c1)
				return
				}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Define gpu.printf op and its loweringsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 393167

mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h

mlir/include/mlir/Conversion/GPUToROCDL/Runtimes.h

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Dialect/GPU/GPUOps.td

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h

mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

mlir/lib/Conversion/PassDetail.h

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

mlir/lib/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.cpp

mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir

mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir

mlir/test/Dialect/GPU/ops.mlir

mlir/test/Integration/GPU/ROCM/printf.mlir

[MLIR][GPU] Define gpu.printf op and its lowerings
ClosedPublic