mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp
51	This is a lot of common code. Could we have a base class, e.g., `GpuKernelToLLVMGeneratedBlobPass', that factors out all the common logic but has extensions points for the LLVM target to use, etc.?
147	`ptx` -> `hsa`?

Address part of code review comments.

whchung marked an inline comment as done.May 19 2020, 8:53 AM

whchung added inline comments.

mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp
51	@herhut I would argue factoring out common logic to a base class is a premature optimization at this point. In D80167, logic for lowering `gpu.launch` to runtime API invocations are more or less the same between the two platforms so I think it's good to make the logic generic. In this patch however, passes which get GPUModule to binary blobs essentially do following tasks: acquire GPUModule, and acquire a LLVM context lock. initialize LLVM backend. translate the GPUModule to llvm/nvvm/rocdl dialect. invoke LLVM backend, specify triple / datalayout / target features. invoke platform-specific binary creator (on NV platform is cubin generator, on AMD platform it's LLVM lld). convert the binary into a StringAttr. and 6) are platform-neutral, but 2) - 5) are highly platform-dependent and may change from time to time. Therefore, I'd like to keep both paths separate for now. Once ROCm side of logic become more matured we can consider factoring out common logic in subsequent patches.

Harbormaster failed remote builds in B57215: Diff 264930!May 19 2020, 9:17 AM

Address warnings found by clang-tidy.

Harbormaster failed remote builds in B57226: Diff 264948!May 19 2020, 10:55 AM

Revise function naming to better depict formats used in the process.

Fix build errors.

Fix build errors again.

Harbormaster failed remote builds in B57245: Diff 264993!May 19 2020, 1:10 PM

Harbormaster failed remote builds in B57241: Diff 264989!

clang-tidy.

Harbormaster failed remote builds in B57250: Diff 265000!May 19 2020, 1:45 PM

Harbormaster failed remote builds in B57262: Diff 265027!May 19 2020, 2:19 PM

herhut added inline comments.May 20 2020, 3:24 AM

mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp
51	The code is pretty much identical, except for the target triple the need for initializing an LLVM Target because of the triple, the use of a different annotation name a different compiler callback from ISA to blob. The target could also be passed in as a parameter. I am not even sure we need a different name, but that could just be a parameter to the pass. is already passed in and goes from string -> vector. So there is nothing platform dependent there. Remains 2, which we could also be a callback that is passed in, or the user has to make sure that the target is initialized. It does not need to be part of the pass itself. The only future extension I can envision is a different configuration of the lowering pipeline during code generation. That could also be done via a callback that gets a LegacyPassManager as parameter. Similar to how the LLVM lowering can be configured. Also, if we split this now, it will likely diverge, making later refactoring harder.

whchung marked an inline comment as done.May 20 2020, 10:28 AM

whchung added inline comments.

mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp
51	@herhut Points taken. I'm working on finalizing the implementation of `mlir-rocm-runner`. Let me revise the patch after its done in a couple of days.

Rewrite the patch to make the pass be generic between CUDA+NVPTX and ROCm+AMDGPU.

whchung retitled this revision from [mlir][gpu][rocdl] Introduce GPUToROCm conversion passes. to [mlir][gpu][mlir-cuda-runner] Refactor ConvertKernelFuncToCubin to be generic..May 22 2020, 3:01 PM

whchung edited the summary of this revision. (Show Details)

Herald added a subscriber: yaxunl. · View Herald TranscriptMay 22 2020, 3:01 PM

@herhut I've revised the patch so ConvertGpuKernelToCubin pass is now ConvertGpuKernelToBlob and works on both CUDA and ROCm platform. Could you help review it once again? Thanks.

In my downstream fork I also have the implementation of mlir-rocm-runner ready, pending on this patch.

Harbormaster failed remote builds in B57682: Diff 265806!May 22 2020, 3:34 PM

Thanks. I think this avoids a lot of code duplication. I would prefer to remove the initialize target callback and move that into the caller.

With that addressed I am happy to see this land.

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h
39 ↗	(On Diff #265806)	Do we need to actually do this in this pass or could we do this in the code that constructs the pipeline? If the calling context has to provide this callback, it could also just call the init itself.
40 ↗	(On Diff #265806)	Could this be a `std::function<std::unique_ptr<llvm::Module>(Operation *)>` instead, signalling failure by returning a `nullptr`?
56 ↗	(On Diff #265806)	Nit: `to target object`
mlir/lib/Conversion/GPUCommon/ConvertKernelFuncToBlob.cpp
92 ↗	(On Diff #265806)	`lob` -> `blob`

This revision is now accepted and ready to land.May 25 2020, 7:07 AM

Revise the patch addressing code review comments.

whchung marked 4 inline comments as done.May 27 2020, 12:42 PM

Remove obsolete comment.

rriddle added inline comments.May 27 2020, 1:50 PM

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h
12 ↗	(On Diff #266649)	Is this header really necessary?

whchung marked 2 inline comments as done.May 27 2020, 2:09 PM

whchung added inline comments.

mlir/include/mlir/Conversion/GPUCommon/GPUCommonPass.h
12 ↗	(On Diff #266649)	@rriddle it's necessary so `llvm::Module` is visible. Forward declaration is not possible because `sizeof(llvm::Module)` would be used in one of the unit test.

Harbormaster failed remote builds in B58093: Diff 266631!May 27 2020, 2:10 PM

whchung mentioned this in D80167: [mlir][gpu] Refactor ConvertGpuLaunchFuncToCudaCalls pass..May 27 2020, 2:25 PM

Remove unused headers.

Harbormaster failed remote builds in B58121: Diff 266676!May 27 2020, 2:43 PM

whchung mentioned this in D80676: [mlir][gpu] Introduce mlir-rocm-runner..May 27 2020, 4:34 PM

Thanks!

Harbormaster failed remote builds in B58105: Diff 266649!May 28 2020, 2:08 AM

Closed by commit rG061fb8eb2d9f: [mlir][gpu][mlir-cuda-runner] Refactor ConvertKernelFuncToCubin to be generic. (authored by whchung). · Explain WhyMay 28 2020, 7:36 AM

This revision was automatically updated to reflect the committed changes.

ftynse mentioned this in D80739: [mlir][GPU] Link relevant LLVM components in GPUCommon instead of test.May 28 2020, 10:01 AM

ftynse mentioned this in D80698: Ignore MLIRGPUtoCUDATransforms in the MLIRTestTransforms library when the NVPTX target is disabled.May 28 2020, 10:45 AM

ftynse mentioned this in rG72ede60b75ee: [mlir][GPU] Link relevant LLVM components in GPUCommon instead of test.May 28 2020, 11:33 AM

Diff 265027

mlir/CMakeLists.txt

	Show All 25 Lines
	if ("NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)			if ("NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
	set(MLIR_CUDA_CONVERSIONS_ENABLED 1)			set(MLIR_CUDA_CONVERSIONS_ENABLED 1)
	else()			else()
	set(MLIR_CUDA_CONVERSIONS_ENABLED 0)			set(MLIR_CUDA_CONVERSIONS_ENABLED 0)
	endif()			endif()
	# TODO: we should use a config.h file like LLVM does			# TODO: we should use a config.h file like LLVM does
	add_definitions(-DMLIR_CUDA_CONVERSIONS_ENABLED=${MLIR_CUDA_CONVERSIONS_ENABLED})			add_definitions(-DMLIR_CUDA_CONVERSIONS_ENABLED=${MLIR_CUDA_CONVERSIONS_ENABLED})

				# Build the ROCm conversions and run according tests if the AMDGPU backend
				# is available
				if ("AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD)
				set(MLIR_ROCM_CONVERSIONS_ENABLED 1)
				else()
				set(MLIR_ROCM_CONVERSIONS_ENABLED 0)
				endif()
				add_definitions(-DMLIR_ROCM_CONVERSIONS_ENABLED=${MLIR_ROCM_CONVERSIONS_ENABLED})

	set(MLIR_CUDA_RUNNER_ENABLED 0 CACHE BOOL "Enable building the mlir CUDA runner")			set(MLIR_CUDA_RUNNER_ENABLED 0 CACHE BOOL "Enable building the mlir CUDA runner")
	set(MLIR_VULKAN_RUNNER_ENABLED 0 CACHE BOOL "Enable building the mlir Vulkan runner")			set(MLIR_VULKAN_RUNNER_ENABLED 0 CACHE BOOL "Enable building the mlir Vulkan runner")

	option(MLIR_INCLUDE_TESTS			option(MLIR_INCLUDE_TESTS
	"Generate build targets for the MLIR unit tests."			"Generate build targets for the MLIR unit tests."
	${LLVM_INCLUDE_TESTS})			${LLVM_INCLUDE_TESTS})

	include_directories( "include")			include_directories( "include")
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/GPUToROCm/GPUToROCmPass.h

This file was added.

				//===- GPUToROCmPass.h - MLIR ROCm runtime support --------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				#ifndef MLIR_CONVERSION_GPUTOROCM_GPUTOROCMPASS_H_
				#define MLIR_CONVERSION_GPUTOROCM_GPUTOROCMPASS_H_

				#include "mlir/Support/LLVM.h"
				#include <functional>
				#include <memory>
				#include <string>
				#include <vector>

				namespace mlir {

				class Location;
				class ModuleOp;

				template <typename T>
				class OperationPass;

				namespace gpu {
				class GPUModuleOp;
				} // namespace gpu

				namespace LLVM {
				class LLVMDialect;
				} // namespace LLVM

				using OwnedHsaco = std::unique_ptr<std::vector<char>>;
				using HsacoGenerator =
				std::function<OwnedHsaco(const std::string &, Location, StringRef)>;

				/// Creates a pass to convert kernel functions into HSA code object blobs.
				///
				/// This transformation takes the body of each function that is annotated with
				/// the 'gpu.kernel' attribute, copies it to a new LLVM module, compiles the
				/// module with help of the AMDGPU backend to HSA code object and then invokes
				/// the provided hsacoGenerator to produce a binary blob (the hsaco). Such blob
				/// is then attached as a string attribute named 'rocdl.hsaco' to the kernel
				/// function.
				/// After the transformation, the body of the kernel function is removed (i.e.,
				/// it is turned into a declaration).
				std::unique_ptr<OperationPass<gpu::GPUModuleOp>>
				createConvertGPUKernelToHsacoPass(HsacoGenerator hsacoGenerator);

				} // namespace mlir

				#endif // MLIR_CONVERSION_GPUTOROCM_GPUTOROCMPASS_H_

mlir/include/mlir/InitAllPasses.h

	Show All 12 Lines

	#ifndef MLIR_INITALLPASSES_H_			#ifndef MLIR_INITALLPASSES_H_
	#define MLIR_INITALLPASSES_H_			#define MLIR_INITALLPASSES_H_

	#include "mlir/Conversion/AVX512ToLLVM/ConvertAVX512ToLLVM.h"			#include "mlir/Conversion/AVX512ToLLVM/ConvertAVX512ToLLVM.h"
	#include "mlir/Conversion/GPUToCUDA/GPUToCUDAPass.h"			#include "mlir/Conversion/GPUToCUDA/GPUToCUDAPass.h"
	#include "mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h"			#include "mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h"
	#include "mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h"			#include "mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h"
				#include "mlir/Conversion/GPUToROCm/GPUToROCmPass.h"
	#include "mlir/Conversion/GPUToSPIRV/ConvertGPUToSPIRVPass.h"			#include "mlir/Conversion/GPUToSPIRV/ConvertGPUToSPIRVPass.h"
	#include "mlir/Conversion/GPUToVulkan/ConvertGPUToVulkanPass.h"			#include "mlir/Conversion/GPUToVulkan/ConvertGPUToVulkanPass.h"
	#include "mlir/Conversion/LinalgToLLVM/LinalgToLLVM.h"			#include "mlir/Conversion/LinalgToLLVM/LinalgToLLVM.h"
	#include "mlir/Conversion/LinalgToSPIRV/LinalgToSPIRVPass.h"			#include "mlir/Conversion/LinalgToSPIRV/LinalgToSPIRVPass.h"
	#include "mlir/Conversion/LinalgToStandard/LinalgToStandard.h"			#include "mlir/Conversion/LinalgToStandard/LinalgToStandard.h"
	#include "mlir/Conversion/SCFToGPU/SCFToGPUPass.h"			#include "mlir/Conversion/SCFToGPU/SCFToGPUPass.h"
	#include "mlir/Conversion/SCFToStandard/SCFToStandard.h"			#include "mlir/Conversion/SCFToStandard/SCFToStandard.h"
	#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"			#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

mlir/lib/Conversion/CMakeLists.txt

	add_subdirectory(AffineToStandard)			add_subdirectory(AffineToStandard)
	add_subdirectory(AVX512ToLLVM)			add_subdirectory(AVX512ToLLVM)
	add_subdirectory(GPUToCUDA)			add_subdirectory(GPUToCUDA)
	add_subdirectory(GPUToNVVM)			add_subdirectory(GPUToNVVM)
	add_subdirectory(GPUToROCDL)			add_subdirectory(GPUToROCDL)
				add_subdirectory(GPUToROCm)
	add_subdirectory(GPUToSPIRV)			add_subdirectory(GPUToSPIRV)
	add_subdirectory(GPUToVulkan)			add_subdirectory(GPUToVulkan)
	add_subdirectory(LinalgToLLVM)			add_subdirectory(LinalgToLLVM)
	add_subdirectory(LinalgToSPIRV)			add_subdirectory(LinalgToSPIRV)
	add_subdirectory(LinalgToStandard)			add_subdirectory(LinalgToStandard)
	add_subdirectory(SCFToGPU)			add_subdirectory(SCFToGPU)
	add_subdirectory(SCFToStandard)			add_subdirectory(SCFToStandard)
	add_subdirectory(StandardToLLVM)			add_subdirectory(StandardToLLVM)
	add_subdirectory(StandardToSPIRV)			add_subdirectory(StandardToSPIRV)
	add_subdirectory(VectorToLLVM)			add_subdirectory(VectorToLLVM)
	add_subdirectory(VectorToSCF)			add_subdirectory(VectorToSCF)

mlir/lib/Conversion/GPUToROCm/CMakeLists.txt

This file was added.

				set(LLVM_OPTIONAL_SOURCES
				ConvertKernelFuncToHsaco.cpp
				)

				if (MLIR_ROCM_CONVERSIONS_ENABLED)
				list(APPEND SOURCES "ConvertKernelFuncToHsaco.cpp")
				set(AMDGPU_LIBS
				MC
				AMDGPUCodeGen
				AMDGPUDesc
				AMDGPUInfo
				)

				endif()

				add_mlir_conversion_library(MLIRGPUtoROCmTransforms
				${SOURCES}

				DEPENDS
				MLIRConversionPassIncGen
				intrinsics_gen

				LINK_COMPONENTS
				Core
				${AMDGPU_LIBS}

				LINK_LIBS PUBLIC
				MLIRGPU
				MLIRIR
				MLIRLLVMIR
				MLIRROCDLIR
				MLIRPass
				MLIRSupport
				MLIRTargetROCDLIR
				)

mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp

This file was added.

				//===- ConvertKernelFuncToHsaco.cpp - MLIR GPU lowering passes ------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a pass to convert gpu kernel functions into a
				// corresponding binary blob that can be executed on a ROCm GPU. Currently
				// only translates the function itself but no dependencies.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Conversion/GPUToROCm/GPUToROCmPass.h"

				#include "mlir/Dialect/GPU/GPUDialect.h"
				#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
				#include "mlir/IR/Attributes.h"
				#include "mlir/IR/Builders.h"
				#include "mlir/IR/Function.h"
				#include "mlir/IR/Module.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Pass/PassRegistry.h"
				#include "mlir/Support/LogicalResult.h"
				#include "mlir/Target/ROCDLIR.h"

				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/Twine.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/LegacyPassManager.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/Mutex.h"
				#include "llvm/Support/TargetRegistry.h"
				#include "llvm/Support/TargetSelect.h"
				#include "llvm/Target/TargetMachine.h"

				using namespace mlir;

				namespace {
				static constexpr const char *kHsacoAnnotation = "rocdl.hsaco";

				/// A pass converting tagged kernel modules to hsaco blobs.
				///
				/// If tagged as a kernel module, each contained function is translated to ROCDL
				/// IR. A user provided HsacoGenerator compiles the IR to GPU binary code in HSA
				/// code object format, which is then attached as an attribute to the function.
				/// The function body is erased.
				class GpuKernelToHsacoPass
				: public PassWrapper<GpuKernelToHsacoPass,
				herhutUnsubmitted Not Done Reply Inline Actions This is a lot of common code. Could we have a base class, e.g., `GpuKernelToLLVMGeneratedBlobPass', that factors out all the common logic but has extensions points for the LLVM target to use, etc.? herhut: This is a lot of common code. Could we have a base class, e.g.
				whchungAuthorUnsubmitted Done Reply Inline Actions @herhut I would argue factoring out common logic to a base class is a premature optimization at this point. In D80167, logic for lowering `gpu.launch` to runtime API invocations are more or less the same between the two platforms so I think it's good to make the logic generic. In this patch however, passes which get GPUModule to binary blobs essentially do following tasks: acquire GPUModule, and acquire a LLVM context lock. initialize LLVM backend. translate the GPUModule to llvm/nvvm/rocdl dialect. invoke LLVM backend, specify triple / datalayout / target features. invoke platform-specific binary creator (on NV platform is cubin generator, on AMD platform it's LLVM lld). convert the binary into a StringAttr. and 6) are platform-neutral, but 2) - 5) are highly platform-dependent and may change from time to time. Therefore, I'd like to keep both paths separate for now. Once ROCm side of logic become more matured we can consider factoring out common logic in subsequent patches. whchung: @herhut I would argue factoring out common logic to a base class is a premature optimization…
				herhutUnsubmitted Not Done Reply Inline Actions The code is pretty much identical, except for the target triple the need for initializing an LLVM Target because of the triple, the use of a different annotation name a different compiler callback from ISA to blob. The target could also be passed in as a parameter. I am not even sure we need a different name, but that could just be a parameter to the pass. is already passed in and goes from string -> vector. So there is nothing platform dependent there. Remains 2, which we could also be a callback that is passed in, or the user has to make sure that the target is initialized. It does not need to be part of the pass itself. The only future extension I can envision is a different configuration of the lowering pipeline during code generation. That could also be done via a callback that gets a LegacyPassManager as parameter. Similar to how the LLVM lowering can be configured. Also, if we split this now, it will likely diverge, making later refactoring harder. herhut: The code is pretty much identical, except for 1. the target triple 2. the need for…
				whchungAuthorUnsubmitted Done Reply Inline Actions @herhut Points taken. I'm working on finalizing the implementation of `mlir-rocm-runner`. Let me revise the patch after its done in a couple of days. whchung: @herhut Points taken. I'm working on finalizing the implementation of `mlir-rocm-runner`. Let…
				OperationPass<gpu::GPUModuleOp>> {
				public:
				GpuKernelToHsacoPass(HsacoGenerator hsacoGenerator)
				: hsacoGenerator(hsacoGenerator) {}

				void runOnOperation() override {
				gpu::GPUModuleOp module = getOperation();

				// Lock access to the llvm context.
				llvm::sys::SmartScopedLock<true> scopedLock(
				module.getContext()
				->getRegisteredDialect<LLVM::LLVMDialect>()
				->getLLVMContextMutex());

				// Make sure the AMDGPU target is initialized.
				LLVMInitializeAMDGPUTarget();
				LLVMInitializeAMDGPUTargetInfo();
				LLVMInitializeAMDGPUTargetMC();
				LLVMInitializeAMDGPUAsmPrinter();

				auto llvmModule = translateModuleToROCDLIR(module);
				if (!llvmModule)
				return signalPassFailure();

				// Translate the module to HSA code object and attach the result as
				// attribute to the module.
				if (auto hsacoAttr = translateGPUModuleToHsacoAnnotation(
				*llvmModule, module.getLoc(), module.getName()))
				module.setAttr(kHsacoAnnotation, hsacoAttr);
				else
				signalPassFailure();
				}

				private:
				std::string translateModuleToISA(llvm::Module &module,
				llvm::TargetMachine &targetMachine);

				/// Converts llvmModule to hsaco using the user-provided generator. Location
				/// is used for error reporting and name is forwarded to the HSACO generator
				/// to use in its logging mechanisms.
				OwnedHsaco convertModuleToHsaco(llvm::Module &llvmModule, Location loc,
				StringRef name);

				/// Translates llvmModule to hsaco and returns the result as attribute.
				StringAttr translateGPUModuleToHsacoAnnotation(llvm::Module &llvmModule,
				Location loc, StringRef name);

				HsacoGenerator hsacoGenerator;
				};

				} // anonymous namespace

				std::string
				GpuKernelToHsacoPass::translateModuleToISA(llvm::Module &module,
				llvm::TargetMachine &targetMachine) {
				std::string targetISA;
				{
				// Clone the llvm module into a new context to enable concurrent compilation
				// with multiple threads.
				llvm::LLVMContext llvmContext;
				auto clone = LLVM::cloneModuleIntoNewContext(&llvmContext, &module);

				llvm::raw_string_ostream stream(targetISA);
				llvm::buffer_ostream pstream(stream);
				llvm::legacy::PassManager codegenPasses;
				targetMachine.addPassesToEmitFile(codegenPasses, pstream, nullptr,
				llvm::CGFT_AssemblyFile);
				codegenPasses.run(*clone);
				}

				return targetISA;
				}

				OwnedHsaco GpuKernelToHsacoPass::convertModuleToHsaco(llvm::Module &llvmModule,
				Location loc,
				StringRef name) {
				std::unique_ptr<llvm::TargetMachine> targetMachine;
				{
				std::string error;
				constexpr const char *rocmTriple = "amdgcn-amd-amdhsa";
				llvm::Triple triple(rocmTriple);
				const llvm::Target *target =
				llvm::TargetRegistry::lookupTarget("", triple, error);
				if (target == nullptr) {
				emitError(loc, "cannot initialize target triple");
				return {};
				}
				// TODO(whchung): be able to set target.
				targetMachine.reset(
				target->createTargetMachine(triple.str(), "gfx900", "", {}, {}));
				}

				llvmModule.setDataLayout(targetMachine->createDataLayout());

				auto targetISA = translateModuleToISA(llvmModule, *targetMachine);

				herhutUnsubmitted Done Reply Inline Actions `ptx` -> `hsa`? herhut: `ptx` -> `hsa`?
				return hsacoGenerator(targetISA, loc, name);
				}

				StringAttr GpuKernelToHsacoPass::translateGPUModuleToHsacoAnnotation(
				llvm::Module &llvmModule, Location loc, StringRef name) {
				auto hsaco = convertModuleToHsaco(llvmModule, loc, name);
				if (!hsaco)
				return {};
				return StringAttr::get({hsaco->data(), hsaco->size()}, loc->getContext());
				}

				std::unique_ptr<OperationPass<gpu::GPUModuleOp>>
				mlir::createConvertGPUKernelToHsacoPass(HsacoGenerator hsacoGenerator) {
				return std::make_unique<GpuKernelToHsacoPass>(hsacoGenerator);
				}

mlir/test/Conversion/GPUToROCm/lit.local.cfg

This file was added.

				if not config.run_rocm_tests:
				config.unsupported = True

mlir/test/Conversion/GPUToROCm/lower-rocdl-kernel-to-hsaco.mlir

This file was added.

				// RUN: mlir-opt %s --test-kernel-to-hsaco -split-input-file \| FileCheck %s

				// CHECK: attributes {rocdl.hsaco = "HSACO"}
				gpu.module @foo {
				llvm.func @kernel(%arg0 : !llvm.float, %arg1 : !llvm<"float*">)
				// CHECK: attributes {gpu.kernel}
				attributes { gpu.kernel } {
				llvm.return
				}
				}

				// -----

				gpu.module @bar {
				// CHECK: func @kernel_a
				llvm.func @kernel_a()
				attributes { gpu.kernel } {
				llvm.return
				}

				// CHECK: func @kernel_b
				llvm.func @kernel_b()
				attributes { gpu.kernel } {
				llvm.return
				}
				}

mlir/test/lib/Transforms/CMakeLists.txt

# Exclude tests from libMLIR.so		# Exclude tests from libMLIR.so
add_mlir_library(MLIRTestTransforms		add_mlir_library(MLIRTestTransforms
TestAllReduceLowering.cpp		TestAllReduceLowering.cpp
TestBufferPlacement.cpp		TestBufferPlacement.cpp
TestCallGraph.cpp		TestCallGraph.cpp
TestConstantFold.cpp		TestConstantFold.cpp
TestConvertGPUKernelToCubin.cpp		TestConvertGPUKernelToCubin.cpp
		TestConvertGPUKernelToHsaco.cpp
TestDominance.cpp		TestDominance.cpp
TestLoopFusion.cpp		TestLoopFusion.cpp
TestGpuMemoryPromotion.cpp		TestGpuMemoryPromotion.cpp
TestGpuParallelLoopMapping.cpp		TestGpuParallelLoopMapping.cpp
TestInlining.cpp		TestInlining.cpp
TestLinalgTransforms.cpp		TestLinalgTransforms.cpp
TestLiveness.cpp		TestLiveness.cpp
TestLoopMapping.cpp		TestLoopMapping.cpp
Show All 16 Lines	add_mlir_library(MLIRTestTransforms
MLIRTestVectorTransformPatternsIncGen		MLIRTestVectorTransformPatternsIncGen

LINK_LIBS PUBLIC		LINK_LIBS PUBLIC
MLIRAffineOps		MLIRAffineOps
MLIRAnalysis		MLIRAnalysis
MLIREDSC		MLIREDSC
MLIRGPU		MLIRGPU
MLIRGPUtoCUDATransforms		MLIRGPUtoCUDATransforms
		MLIRGPUtoROCmTransforms
MLIRLinalgOps		MLIRLinalgOps
MLIRLinalgTransforms		MLIRLinalgTransforms
MLIRSCF		MLIRSCF
MLIRGPU		MLIRGPU
MLIRPass		MLIRPass
MLIRStandardOpsTransforms		MLIRStandardOpsTransforms
MLIRTestDialect		MLIRTestDialect
MLIRTransformUtils		MLIRTransformUtils
MLIRVectorToSCF		MLIRVectorToSCF
MLIRVector		MLIRVector
)		)

include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../Dialect/Test)		include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../Dialect/Test)
include_directories(${CMAKE_CURRENT_BINARY_DIR}/../Dialect/Test)		include_directories(${CMAKE_CURRENT_BINARY_DIR}/../Dialect/Test)
include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../DeclarativeTransforms)		include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../DeclarativeTransforms)
include_directories(${CMAKE_CURRENT_BINARY_DIR}/../DeclarativeTransforms)		include_directories(${CMAKE_CURRENT_BINARY_DIR}/../DeclarativeTransforms)

mlir/test/lib/Transforms/TestConvertGPUKernelToHsaco.cpp

This file was added.

				//===- TestConvertGPUKernelToHsaco.cpp - Test gpu kernel hsaco lowering ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Conversion/GPUToROCm/GPUToROCmPass.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Pass/PassManager.h"
				using namespace mlir;

				#if MLIR_ROCM_CONVERSIONS_ENABLED
				static OwnedHsaco compileROCDLToHsacoForTesting(const std::string &, Location,
				StringRef) {
				const char data[] = "HSACO";
				return std::make_unique<std::vector<char>>(data, data + sizeof(data) - 1);
				}

				namespace mlir {
				void registerTestConvertGPUKernelToHsacoPass() {
				PassPipelineRegistration<>("test-kernel-to-hsaco",
				"Convert all kernel functions to ROCm HSACO blobs",
				[](OpPassManager &pm) {
				pm.addPass(createConvertGPUKernelToHsacoPass(
				compileROCDLToHsacoForTesting));
				});
				}
				} // namespace mlir
				#endif

mlir/test/lit.site.cfg.py.in

	Show All 32 Lines
	config.mlir_obj_root = "@MLIR_BINARY_DIR@"			config.mlir_obj_root = "@MLIR_BINARY_DIR@"
	config.mlir_runner_utils_dir = "@MLIR_RUNNER_UTILS_DIR@"			config.mlir_runner_utils_dir = "@MLIR_RUNNER_UTILS_DIR@"
	config.mlir_tools_dir = "@MLIR_TOOLS_DIR@"			config.mlir_tools_dir = "@MLIR_TOOLS_DIR@"
	config.linalg_test_lib_dir = "@MLIR_DIALECT_LINALG_INTEGRATION_TEST_LIB_DIR@"			config.linalg_test_lib_dir = "@MLIR_DIALECT_LINALG_INTEGRATION_TEST_LIB_DIR@"
	config.build_examples = @LLVM_BUILD_EXAMPLES@			config.build_examples = @LLVM_BUILD_EXAMPLES@
	config.run_cuda_tests = @MLIR_CUDA_CONVERSIONS_ENABLED@			config.run_cuda_tests = @MLIR_CUDA_CONVERSIONS_ENABLED@
	config.cuda_wrapper_library_dir = "@MLIR_CUDA_WRAPPER_LIBRARY_DIR@"			config.cuda_wrapper_library_dir = "@MLIR_CUDA_WRAPPER_LIBRARY_DIR@"
	config.enable_cuda_runner = @MLIR_CUDA_RUNNER_ENABLED@			config.enable_cuda_runner = @MLIR_CUDA_RUNNER_ENABLED@
				config.run_rocm_tests = @MLIR_ROCM_CONVERSIONS_ENABLED@
	config.vulkan_wrapper_library_dir = "@MLIR_VULKAN_WRAPPER_LIBRARY_DIR@"			config.vulkan_wrapper_library_dir = "@MLIR_VULKAN_WRAPPER_LIBRARY_DIR@"
	config.enable_vulkan_runner = @MLIR_VULKAN_RUNNER_ENABLED@			config.enable_vulkan_runner = @MLIR_VULKAN_RUNNER_ENABLED@

	# Support substitution of the tools_dir with user parameters. This is			# Support substitution of the tools_dir with user parameters. This is
	# used when we can't determine the tool dir at configuration time.			# used when we can't determine the tool dir at configuration time.
	try:			try:
	config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params			config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params
	config.llvm_shlib_dir = config.llvm_shlib_dir % lit_config.params			config.llvm_shlib_dir = config.llvm_shlib_dir % lit_config.params
	Show All 10 Lines

mlir/tools/mlir-opt/mlir-opt.cpp

Show All 40 Lines
void registerTestAffineDataCopyPass();		void registerTestAffineDataCopyPass();
void registerTestAllReduceLoweringPass();		void registerTestAllReduceLoweringPass();
void registerTestAffineLoopUnswitchingPass();		void registerTestAffineLoopUnswitchingPass();
void registerTestBufferPlacementPreparationPass();		void registerTestBufferPlacementPreparationPass();
void registerTestLoopPermutationPass();		void registerTestLoopPermutationPass();
void registerTestCallGraphPass();		void registerTestCallGraphPass();
void registerTestConstantFold();		void registerTestConstantFold();
void registerTestConvertGPUKernelToCubinPass();		void registerTestConvertGPUKernelToCubinPass();
		void registerTestConvertGPUKernelToHsacoPass();
void registerTestDominancePass();		void registerTestDominancePass();
void registerTestFunc();		void registerTestFunc();
void registerTestGpuMemoryPromotionPass();		void registerTestGpuMemoryPromotionPass();
void registerTestLinalgTransforms();		void registerTestLinalgTransforms();
void registerTestLivenessPass();		void registerTestLivenessPass();
void registerTestLoopFusion();		void registerTestLoopFusion();
void registerTestLoopMappingPass();		void registerTestLoopMappingPass();
void registerTestLoopUnrollingPass();		void registerTestLoopUnrollingPass();
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	void registerTestPasses() {
registerTestAllReduceLoweringPass();		registerTestAllReduceLoweringPass();
registerTestAffineLoopUnswitchingPass();		registerTestAffineLoopUnswitchingPass();
registerTestLoopPermutationPass();		registerTestLoopPermutationPass();
registerTestCallGraphPass();		registerTestCallGraphPass();
registerTestConstantFold();		registerTestConstantFold();
#if MLIR_CUDA_CONVERSIONS_ENABLED		#if MLIR_CUDA_CONVERSIONS_ENABLED
registerTestConvertGPUKernelToCubinPass();		registerTestConvertGPUKernelToCubinPass();
#endif		#endif
		#if MLIR_ROCM_CONVERSIONS_ENABLED
		registerTestConvertGPUKernelToHsacoPass();
		#endif
registerTestBufferPlacementPreparationPass();		registerTestBufferPlacementPreparationPass();
registerTestDominancePass();		registerTestDominancePass();
registerTestFunc();		registerTestFunc();
registerTestGpuMemoryPromotionPass();		registerTestGpuMemoryPromotionPass();
registerTestLinalgTransforms();		registerTestLinalgTransforms();
registerTestLivenessPass();		registerTestLivenessPass();
registerTestLoopFusion();		registerTestLoopFusion();
registerTestLoopMappingPass();		registerTestLoopMappingPass();
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu][mlir-cuda-runner] Refactor ConvertKernelFuncToCubin to be generic.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 265027

mlir/CMakeLists.txt

mlir/include/mlir/Conversion/GPUToROCm/GPUToROCmPass.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Conversion/CMakeLists.txt

mlir/lib/Conversion/GPUToROCm/CMakeLists.txt

mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp

mlir/test/Conversion/GPUToROCm/lit.local.cfg

mlir/test/Conversion/GPUToROCm/lower-rocdl-kernel-to-hsaco.mlir

mlir/test/lib/Transforms/CMakeLists.txt

mlir/test/lib/Transforms/TestConvertGPUKernelToHsaco.cpp

mlir/test/lit.site.cfg.py.in

mlir/tools/mlir-opt/mlir-opt.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu][mlir-cuda-runner] Refactor ConvertKernelFuncToCubin to be generic.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 265027

mlir/CMakeLists.txt

mlir/include/mlir/Conversion/GPUToROCm/GPUToROCmPass.h

mlir/include/mlir/InitAllPasses.h

mlir/lib/Conversion/CMakeLists.txt

mlir/lib/Conversion/GPUToROCm/CMakeLists.txt

mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp

mlir/test/Conversion/GPUToROCm/lit.local.cfg

mlir/test/Conversion/GPUToROCm/lower-rocdl-kernel-to-hsaco.mlir

mlir/test/lib/Transforms/CMakeLists.txt

mlir/test/lib/Transforms/TestConvertGPUKernelToHsaco.cpp

mlir/test/lit.site.cfg.py.in

mlir/tools/mlir-opt/mlir-opt.cpp

[mlir][gpu][mlir-cuda-runner] Refactor ConvertKernelFuncToCubin to be generic.
ClosedPublic