This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Update SerializeToHsaco to match downstream
AbandonedPublic

Authored by krzysz00 on Oct 27 2021, 3:21 PM.

Download Raw Diff

Details

Reviewers

b-sumner
herhut
aartbik

Summary

[MLIR][GPU] Run generic LLVM optimizations when serializing (on AMD)

Adds hooks that allow SerializeTo* passes to arbitrarily transform

the produced LLVM Module before it is passed to the code generation
passes.

Uses these hooks within the SerializeToHsaco pass in order to run

LLVM optimizations and to set the optimization level on the
TargetMachine.

Adds an optLevel parameter to SerializeToHsaco

Future work may include moving much of what's been added to
SerializeToHsaco to SerializeToBlob, but that would require
confirmation from the NVVM backend maintainers that it would be
appropriate to do so.

[MLIR][AMDGPU] Link device libraries where needed

The ROCm library path is now computed at runtime instead of relying

on a compile-time default (except as a fallback)

SerializeToHsaco no longer depends on HIP, as it no longer does

chipset autodetection (which wasn't being used anyway)

A --rocm-path option has been added to allow the user to override

the ROCm path, in addition to the typical ROCM_PATH environment
variable

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Oct 27 2021, 3:21 PM

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 22 others. · View Herald TranscriptOct 27 2021, 3:21 PM

krzysz00 requested review of this revision.Oct 27 2021, 3:21 PM

Herald added a reviewer: herhut. · View Herald TranscriptOct 27 2021, 3:21 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B131057: Diff 382817.Oct 27 2021, 4:02 PM

krzysz00 added a child revision: D110448: [MLIR][GPU] Define gpu.printf op and its lowerings.Oct 27 2021, 5:42 PM

Restore the --gpu-to-hsaco option because it's used in integration tests. Update

Herald added a reviewer: aartbik. · View Herald TranscriptNov 2 2021, 8:31 AM

Harbormaster completed remote builds in B131988: Diff 384120.Nov 2 2021, 8:46 AM

Restore the --gpu-to-hsaco option because it's used in integration tests. Update
Pass rocm path to integration tests

Harbormaster completed remote builds in B133338: Diff 385938.Nov 9 2021, 1:06 PM

whchung added a subscriber: whchung.Nov 10 2021, 1:29 PM

Nit: "Update SerializeToHsaco to match downstream" isn't a very clear / descriptive commit title here.

Also the description seems like an aggregate of changes, see here for a more explicit guideline: https://mlir.llvm.org/getting_started/Contributing/#commit-messages (and https://chris.beams.io/posts/git-commit/ ).

If you can't describe the changes easily, that may be an indication that there are many changes that would better be split in separate changes.

Herald added a subscriber: sdasgup3. · View Herald TranscriptNov 16 2021, 4:45 PM

mehdi_amini added inline comments.Nov 16 2021, 4:48 PM

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp
73	The default copy constructor wouldn't copy all these already?

In D112668#3136285, @mehdi_amini wrote:

If you can't describe the changes easily, that may be an indication that there are many changes that would better be split in separate changes.

Yeah, I think I can split this into multiple changesets. The catch will be splitting in such a way that each changeset doesn't break the build. Fortunately, our repo has the history of the various changes, though I can't directly yank the patches out.

I'll find a way to withdraw this and send in smaller parts - can I add you as a reviewer on each of those?

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp
73	It didn't seem to (I end up with highly negative optLevel values) but maybe that was some other issue.

Split into, and replaced by, https://reviews.llvm.org/D114107 , https://reviews.llvm.org/D114113 , https://reviews.llvm.org/D114114 , and https://reviews.llvm.org/D114117 , in that order.

krzysz00 removed a child revision: D110448: [MLIR][GPU] Define gpu.printf op and its lowerings.Dec 8 2021, 3:51 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

Passes.h

15 lines

lib/

Dialect/

GPU/

CMakeLists.txt

27 lines

Transforms/

SerializeToBlob.cpp

40 lines

SerializeToHsaco.cpp

330 lines

test/

Integration/

GPU/

ROCM/

2 lines

21 lines

2 lines

2 lines

vector-transferops.mlir

2 lines

Diff 384120

mlir/include/mlir/Dialect/GPU/Passes.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	public:
SerializeToBlobPass(TypeID passID);		SerializeToBlobPass(TypeID passID);
SerializeToBlobPass(const SerializeToBlobPass &other);		SerializeToBlobPass(const SerializeToBlobPass &other);

void runOnOperation() final;		void runOnOperation() final;

protected:		protected:
void getDependentDialects(DialectRegistry &registry) const override;		void getDependentDialects(DialectRegistry &registry) const override;

private:		/// Translates the module to ISA
/// Creates the LLVM target machine to generate the ISA.		virtual Optional<std::string>
std::unique_ptr<llvm::TargetMachine> createTargetMachine();		translateToISA(llvm::Module &llvmModule, llvm::TargetMachine &targetMachine);

		/// Hook allowing the application of optimizations before codegen
		/// By default, does nothing
		virtual LogicalResult optimizeLlvm(llvm::Module &llvmModule,
		llvm::TargetMachine &targetMachine);

/// Translates the 'getOperation()' result to an LLVM module.		/// Translates the 'getOperation()' result to an LLVM module.
virtual std::unique_ptr<llvm::Module>		virtual std::unique_ptr<llvm::Module>
translateToLLVMIR(llvm::LLVMContext &llvmContext);		translateToLLVMIR(llvm::LLVMContext &llvmContext);

		private:
		/// Creates the LLVM target machine to generate the ISA.
		std::unique_ptr<llvm::TargetMachine> createTargetMachine();

/// Serializes the target ISA to binary form.		/// Serializes the target ISA to binary form.
virtual std::unique_ptr<std::vector<char>>		virtual std::unique_ptr<std::vector<char>>
serializeISA(const std::string &isa) = 0;		serializeISA(const std::string &isa) = 0;

protected:		protected:
Option<std::string> triple{*this, "triple",		Option<std::string> triple{*this, "triple",
::llvm::cl::desc("Target triple")};		::llvm::cl::desc("Target triple")};
Option<std::string> chip{*this, "chip",		Option<std::string> chip{*this, "chip",
Show All 29 Lines

mlir/lib/Dialect/GPU/CMakeLists.txt

if (MLIR_ENABLE_CUDA_CONVERSIONS)		if (MLIR_ENABLE_CUDA_CONVERSIONS)
set(NVPTX_LIBS		set(NVPTX_LIBS
NVPTXCodeGen		NVPTXCodeGen
NVPTXDesc		NVPTXDesc
NVPTXInfo		NVPTXInfo
)		)
endif()		endif()

if (MLIR_ENABLE_ROCM_CONVERSIONS)		if (MLIR_ENABLE_ROCM_CONVERSIONS)
set(AMDGPU_LIBS		set(AMDGPU_LIBS
		IRReader
		linker
MCParser		MCParser
AMDGPUAsmParser		AMDGPUAsmParser
AMDGPUCodeGen		AMDGPUCodeGen
AMDGPUDesc		AMDGPUDesc
AMDGPUInfo		AMDGPUInfo
		target
)		)
endif()		endif()

add_mlir_dialect_library(MLIRGPUOps		add_mlir_dialect_library(MLIRGPUOps
IR/GPUDialect.cpp		IR/GPUDialect.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU		${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	message(SEND_ERROR
"Building mlir with ROCm support requires the AMDGPU backend")		"Building mlir with ROCm support requires the AMDGPU backend")
endif()		endif()

# Ensure lld is enabled.		# Ensure lld is enabled.
if (NOT "lld" IN_LIST LLVM_ENABLE_PROJECTS)		if (NOT "lld" IN_LIST LLVM_ENABLE_PROJECTS)
message(SEND_ERROR "lld is not enabled. Please revise LLVM_ENABLE_PROJECTS")		message(SEND_ERROR "lld is not enabled. Please revise LLVM_ENABLE_PROJECTS")
endif()		endif()

# Configure ROCm support.		set(DEFAULT_ROCM_PATH "/opt/rocm" CACHE PATH "Fallback path to search for ROCm installs")
if (NOT DEFINED ROCM_PATH)
if (NOT DEFINED ENV{ROCM_PATH})
set(ROCM_PATH "/opt/rocm" CACHE PATH "Path to which ROCm has been installed")
else()
set(ROCM_PATH $ENV{ROCM_PATH} CACHE PATH "Path to which ROCm has been installed")
endif()
set(HIP_PATH "${ROCM_PATH}/hip" CACHE PATH " Path to which HIP has been installed")
endif()
set(CMAKE_MODULE_PATH "${HIP_PATH}/cmake" ${CMAKE_MODULE_PATH})
find_package(HIP)
if (NOT HIP_FOUND)
message(SEND_ERROR "Building mlir with ROCm support requires a working ROCm and HIP install")
else()
message(STATUS "ROCm HIP version: ${HIP_VERSION}")
endif()

target_compile_definitions(obj.MLIRGPUOps		target_compile_definitions(obj.MLIRGPUOps
PRIVATE		PRIVATE
__HIP_PLATFORM_HCC__		__DEFAULT_ROCM_PATH__="${DEFAULT_ROCM_PATH}"
__ROCM_PATH__="${ROCM_PATH}"
MLIR_GPU_TO_HSACO_PASS_ENABLE=1		MLIR_GPU_TO_HSACO_PASS_ENABLE=1
)		)

target_include_directories(obj.MLIRGPUOps		target_include_directories(obj.MLIRGPUOps
PRIVATE		PRIVATE
${MLIR_SOURCE_DIR}/../lld/include		${MLIR_SOURCE_DIR}/../lld/include
${HIP_PATH}/include
${ROCM_PATH}/include
)		)

target_link_libraries(MLIRGPUOps		target_link_libraries(MLIRGPUOps
PRIVATE		PRIVATE
lldELF		lldELF
		MLIRExecutionEngine
MLIRROCDLToLLVMIRTranslation		MLIRROCDLToLLVMIRTranslation
)		)

# Link lldELF also to libmlir.so. Create an alias that starts with LLVM		# Link lldELF also to libmlir.so. Create an alias that starts with LLVM
# because LINK_COMPONENTS elements are implicitly prefixed with LLVM.		# because LINK_COMPONENTS elements are implicitly prefixed with LLVM.
add_library(LLVMAliasTolldELF ALIAS lldELF)		add_library(LLVMAliasTolldELF ALIAS lldELF)
set_property(GLOBAL APPEND PROPERTY MLIR_LLVM_LINK_COMPONENTS AliasTolldELF)		set_property(GLOBAL APPEND PROPERTY MLIR_LLVM_LINK_COMPONENTS AliasTolldELF)

endif()		endif()

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

	Show All 25 Lines
	std::string gpu::getDefaultGpuBinaryAnnotation() { return "gpu.binary"; }			std::string gpu::getDefaultGpuBinaryAnnotation() { return "gpu.binary"; }

	gpu::SerializeToBlobPass::SerializeToBlobPass(TypeID passID)			gpu::SerializeToBlobPass::SerializeToBlobPass(TypeID passID)
	: OperationPass<gpu::GPUModuleOp>(passID) {}			: OperationPass<gpu::GPUModuleOp>(passID) {}

	gpu::SerializeToBlobPass::SerializeToBlobPass(const SerializeToBlobPass &other)			gpu::SerializeToBlobPass::SerializeToBlobPass(const SerializeToBlobPass &other)
	: OperationPass<gpu::GPUModuleOp>(other) {}			: OperationPass<gpu::GPUModuleOp>(other) {}

	static std::string translateToISA(llvm::Module &llvmModule,			Optional<std::string>
				gpu::SerializeToBlobPass::translateToISA(llvm::Module &llvmModule,
	llvm::TargetMachine &targetMachine) {			llvm::TargetMachine &targetMachine) {
	llvmModule.setDataLayout(targetMachine.createDataLayout());			llvmModule.setDataLayout(targetMachine.createDataLayout());

				if (failed(optimizeLlvm(llvmModule, targetMachine))) {
				return llvm::None;
				}
	std::string targetISA;			std::string targetISA;
	llvm::raw_string_ostream stream(targetISA);			llvm::raw_string_ostream stream(targetISA);
	llvm::buffer_ostream pstream(stream);
	llvm::legacy::PassManager codegenPasses;			llvm::legacy::PassManager codegenPasses;
	targetMachine.addPassesToEmitFile(codegenPasses, pstream, nullptr,
	llvm::CGFT_AssemblyFile);			{ // Drop pstream after this to prevent the ISA from being stuck buffering
				llvm::buffer_ostream pstream(stream);
				if (targetMachine.addPassesToEmitFile(codegenPasses, pstream, nullptr,
				llvm::CGFT_AssemblyFile)) {
				return llvm::None;
				}
	codegenPasses.run(llvmModule);			codegenPasses.run(llvmModule);
	return targetISA;			}
				return stream.str();
	}			}

	void gpu::SerializeToBlobPass::runOnOperation() {			void gpu::SerializeToBlobPass::runOnOperation() {
	// Lower the module to an LLVM IR module using a separate context to enable			// Lower the module to an LLVM IR module using a separate context to enable
	// multi-threaded processing.			// multi-threaded processing.
	llvm::LLVMContext llvmContext;			llvm::LLVMContext llvmContext;
	std::unique_ptr<llvm::Module> llvmModule = translateToLLVMIR(llvmContext);			std::unique_ptr<llvm::Module> llvmModule = translateToLLVMIR(llvmContext);
	if (!llvmModule)			if (!llvmModule)
	return signalPassFailure();			return signalPassFailure();

	// Lower the LLVM IR module to target ISA.			// Lower the LLVM IR module to target ISA.
	std::unique_ptr<llvm::TargetMachine> targetMachine = createTargetMachine();			std::unique_ptr<llvm::TargetMachine> targetMachine = createTargetMachine();
	if (!targetMachine)			if (!targetMachine)
	return signalPassFailure();			return signalPassFailure();

	std::string targetISA = translateToISA(llvmModule, targetMachine);			Optional<std::string> maybeTargetISA =
				translateToISA(llvmModule, targetMachine);

				if (!maybeTargetISA.hasValue()) {
				return signalPassFailure();
				}
				std::string targetISA = maybeTargetISA.getValue();

	// Serialize the target ISA.			// Serialize the target ISA.
	std::unique_ptr<std::vector<char>> blob = serializeISA(targetISA);			std::unique_ptr<std::vector<char>> blob = serializeISA(targetISA);
	if (!blob)			if (!blob)
	return signalPassFailure();			return signalPassFailure();

	// Add the blob as module attribute.			// Add the blob as module attribute.
	auto attr =			auto attr =
	StringAttr::get(&getContext(), StringRef(blob->data(), blob->size()));			StringAttr::get(&getContext(), StringRef(blob->data(), blob->size()));
	getOperation()->setAttr(gpuBinaryAnnotation, attr);			getOperation()->setAttr(gpuBinaryAnnotation, attr);
	}			}

				LogicalResult
				gpu::SerializeToBlobPass::optimizeLlvm(llvm::Module &llvmModule,
				llvm::TargetMachine &targetMachine) {
				// TODO: If serializeToCubin ends up defining optimizations, factor them
				// into here from SerializeToHsaco
				return success();
				}

	void gpu::SerializeToBlobPass::getDependentDialects(			void gpu::SerializeToBlobPass::getDependentDialects(
	DialectRegistry &registry) const {			DialectRegistry &registry) const {
	registerLLVMDialectTranslation(registry);			registerLLVMDialectTranslation(registry);
	OperationPass<gpu::GPUModuleOp>::getDependentDialects(registry);			OperationPass<gpu::GPUModuleOp>::getDependentDialects(registry);
	}			}

	std::unique_ptr<llvm::TargetMachine>			std::unique_ptr<llvm::TargetMachine>
	gpu::SerializeToBlobPass::createTargetMachine() {			gpu::SerializeToBlobPass::createTargetMachine() {
	Show All 23 Lines

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

//===- LowerGPUToHSACO.cpp - Convert GPU kernel to HSACO blob -------------===//		//===- LowerGPUToHSACO.cpp - Convert GPU kernel to HSACO blob -------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements a pass that serializes a gpu module into HSAco blob and		// This file implements a pass that serializes a gpu module into HSAco blob and
// adds that blob as a string attribute of the module.		// adds that blob as a string attribute of the module.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "mlir/Dialect/GPU/Passes.h"		#include "mlir/Dialect/GPU/Passes.h"

#if MLIR_GPU_TO_HSACO_PASS_ENABLE		#if MLIR_GPU_TO_HSACO_PASS_ENABLE
		#include "mlir/ExecutionEngine/OptUtils.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Support/FileUtilities.h"		#include "mlir/Support/FileUtilities.h"
#include "mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h"		#include "mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h"
#include "mlir/Target/LLVMIR/Export.h"		#include "mlir/Target/LLVMIR/Export.h"

		#include "llvm/IR/Constants.h"
		#include "llvm/IR/GlobalVariable.h"
		#include "llvm/IRReader/IRReader.h"
		#include "llvm/Linker/Linker.h"

#include "llvm/MC/MCAsmBackend.h"		#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCCodeEmitter.h"		#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCObjectFileInfo.h"		#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCObjectWriter.h"		#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCParser/MCTargetAsmParser.h"		#include "llvm/MC/MCParser/MCTargetAsmParser.h"
#include "llvm/MC/MCStreamer.h"		#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"

#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"

		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FileUtilities.h"		#include "llvm/Support/FileUtilities.h"
#include "llvm/Support/LineIterator.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
		#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/WithColor.h"		#include "llvm/Support/WithColor.h"

#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"

#include "lld/Common/Driver.h"		#include "lld/Common/Driver.h"

#include "hip/hip_version.h"

#include <mutex>		#include <mutex>

using namespace mlir;		using namespace mlir;

namespace {		namespace {
class SerializeToHsacoPass		class SerializeToHsacoPass
: public PassWrapper<SerializeToHsacoPass, gpu::SerializeToBlobPass> {		: public PassWrapper<SerializeToHsacoPass, gpu::SerializeToBlobPass> {
public:		public:
		// Needed to make options work
SerializeToHsacoPass();		SerializeToHsacoPass();
		SerializeToHsacoPass(const SerializeToHsacoPass &other) {
		if (other.triple.hasValue()) {
		this->triple = other.triple;
		}
		if (other.chip.hasValue()) {
		this->chip = other.chip;
		}
		if (other.features.hasValue()) {
		this->features = other.features;
		}
		if (other.rocmPath.hasValue()) {
		this->rocmPath = other.rocmPath;
		}
		this->optLevel = other.optLevel;
		};
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions The default copy constructor wouldn't copy all these already? mehdi_amini: The default copy constructor wouldn't copy all these already?
		krzysz00AuthorUnsubmitted Done Reply Inline Actions It didn't seem to (I end up with highly negative optLevel values) but maybe that was some other issue. krzysz00: It didn't seem to (I end up with highly negative optLevel values) but maybe that was some other…

		SerializeToHsacoPass(StringRef triple, StringRef arch, StringRef features,
		int optLevel);

StringRef getArgument() const override { return "gpu-to-hsaco"; }		StringRef getArgument() const override { return "gpu-to-hsaco"; }
StringRef getDescription() const override {		StringRef getDescription() const override {
return "Lower GPU kernel function to HSACO binary annotations";		return "Lower GPU kernel function to HSACO binary annotations";
}		}

		protected:
		Option<std::string> rocmPath{*this, "rocm-path",
		llvm::cl::desc("Path to ROCm install")};

private:		private:
void getDependentDialects(DialectRegistry &registry) const override;		void getDependentDialects(DialectRegistry &registry) const override;

// Serializes ROCDL to HSACO.		// Serializes ROCDL to HSACO.
std::unique_ptr<std::vector<char>>		std::unique_ptr<std::vector<char>>
serializeISA(const std::string &isa) override;		serializeISA(const std::string &isa) override;

		// Overload to allow linking in device libs
		std::unique_ptr<llvm::Module>
		translateToLLVMIR(llvm::LLVMContext &llvmContext) override;

		/// Adds LLVM optimization passes
		LogicalResult optimizeLlvm(llvm::Module &llvmModule,
		llvm::TargetMachine &targetMachine) override;

std::unique_ptr<SmallVectorImpl<char>> assembleIsa(const std::string &isa);		std::unique_ptr<SmallVectorImpl<char>> assembleIsa(const std::string &isa);
std::unique_ptr<std::vector<char>>		std::unique_ptr<std::vector<char>>
createHsaco(const SmallVectorImpl<char> &isaBinary);		createHsaco(const SmallVectorImpl<char> &isaBinary);
};
} // namespace

static std::string getDefaultChip() {		std::string getRocmPath();
const char kDefaultChip[] = "gfx900";

// Locate rocm_agent_enumerator.		int optLevel;
const char kRocmAgentEnumerator[] = "rocm_agent_enumerator";		};
llvm::ErrorOr<std::string> rocmAgentEnumerator = llvm::sys::findProgramByName(		} // end namespace
kRocmAgentEnumerator, {__ROCM_PATH__ "/bin"});
if (!rocmAgentEnumerator) {
llvm::WithColor::warning(llvm::errs())
<< kRocmAgentEnumerator << "couldn't be located under " << __ROCM_PATH__
<< "/bin\n";
return kDefaultChip;
}

// Prepare temp file to hold the outputs.		/// Get a user-specified path to ROCm
int tempFd = -1;		// Tries, in order, the --rocm-path option, the ROCM_PATH environment variable
SmallString<128> tempFilename;		// and a compile-time default
if (llvm::sys::fs::createTemporaryFile("rocm_agent", "txt", tempFd,		std::string SerializeToHsacoPass::getRocmPath() {
tempFilename)) {		if (rocmPath.getNumOccurrences() > 0) {
llvm::WithColor::warning(llvm::errs())		return rocmPath.getValue();
<< "temporary file for " << kRocmAgentEnumerator << " creation error\n";
return kDefaultChip;
}		}
llvm::FileRemover cleanup(tempFilename);		if (auto env = llvm::sys::Process::GetEnv("ROCM_PATH")) {
		return env.getValue();
// Invoke rocm_agent_enumerator.
std::string errorMessage;
SmallVector<StringRef, 2> args{"-t", "GPU"};
Optional<StringRef> redirects[3] = {{""}, tempFilename.str(), {""}};
int result =
llvm::sys::ExecuteAndWait(rocmAgentEnumerator.get(), args, llvm::None,
redirects, 0, 0, &errorMessage);
if (result) {
llvm::WithColor::warning(llvm::errs())
<< kRocmAgentEnumerator << " invocation error: " << errorMessage
<< "\n";
return kDefaultChip;
}

// Load and parse the result.
auto gfxIsaList = openInputFile(tempFilename);
if (!gfxIsaList) {
llvm::WithColor::error(llvm::errs())
<< "read ROCm agent list temp file error\n";
return kDefaultChip;
}
for (llvm::line_iterator lines(*gfxIsaList); !lines.is_at_end(); ++lines) {
// Skip the line with content "gfx000".
if (*lines == "gfx000")
continue;
// Use the first ISA version found.
return lines->str();
}		}
		return __DEFAULT_ROCM_PATH__;
return kDefaultChip;
}		}

// Sets the 'option' to 'value' unless it already has a value.		// Sets the 'option' to 'value' unless it already has a value.
static void maybeSetOption(Pass::Option<std::string> &option,		static void maybeSetOption(Pass::Option<std::string> &option,
function_ref<std::string()> getValue) {		function_ref<std::string()> getValue) {
if (!option.hasValue())		if (!option.hasValue())
option = getValue();		option = getValue();
}		}

SerializeToHsacoPass::SerializeToHsacoPass() {		SerializeToHsacoPass::SerializeToHsacoPass(StringRef triple, StringRef arch,
maybeSetOption(this->triple, [] { return "amdgcn-amd-amdhsa"; });		StringRef features, int optLevel)
maybeSetOption(this->chip, [] {		: optLevel(optLevel) {
static auto chip = getDefaultChip();		maybeSetOption(this->triple, [&triple] { return triple.str(); });
return chip;		maybeSetOption(this->chip, [&arch] { return arch.str(); });
});		maybeSetOption(this->features, [&features] { return features.str(); });
}		}

void SerializeToHsacoPass::getDependentDialects(		void SerializeToHsacoPass::getDependentDialects(
DialectRegistry &registry) const {		DialectRegistry &registry) const {
registerROCDLDialectTranslation(registry);		registerROCDLDialectTranslation(registry);
gpu::SerializeToBlobPass::getDependentDialects(registry);		gpu::SerializeToBlobPass::getDependentDialects(registry);
}		}

		static Optional<SmallVector<std::unique_ptr<llvm::Module>, 3>>
		loadLibraries(SmallVectorImpl<char> &path,
		SmallVectorImpl<StringRef> &libraries,
		llvm::LLVMContext &context) {
		SmallVector<std::unique_ptr<llvm::Module>, 3> ret;
		auto dirLength = path.size();

		if (!llvm::sys::fs::is_directory(path)) {
		llvm::dbgs() << "Bitcode path: " << path
		<< " does not exist or is not a directory\n";
		return llvm::None;
		}

		for (const auto &file : libraries) {
		llvm::SMDiagnostic error;
		llvm::sys::path::append(path, file);
		llvm::StringRef pathRef(path.data(), path.size());
		std::unique_ptr<llvm::Module> library =
		llvm::getLazyIRFileModule(pathRef, error, context);
		path.set_size(dirLength);
		if (!library) {
		llvm::dbgs() << "Failed to load library " << file << " from " << path;
		error.print("[MLIR backend]", llvm::dbgs());
		return llvm::None;
		}
		// Some ROCM builds don't strip this like they should
		if (auto *openclVersion = library->getNamedMetadata("opencl.ocl.version")) {
		library->eraseNamedMetadata(openclVersion);
		}
		// Stop spamming us with clang version numbers
		if (auto *ident = library->getNamedMetadata("llvm.ident")) {
		library->eraseNamedMetadata(ident);
		}
		ret.push_back(std::move(library));
		}

		return ret;
		}

		std::unique_ptr<llvm::Module>
		SerializeToHsacoPass::translateToLLVMIR(llvm::LLVMContext &llvmContext) {
		// MLIR -> LLVM translation
		std::unique_ptr<llvm::Module> ret =
		gpu::SerializeToBlobPass::translateToLLVMIR(llvmContext);

		if (!ret) {
		llvm::dbgs() << "Module creation failed";
		return ret;
		}
		// Walk the LLVM module in order to determine if we need to link in device
		// libs
		bool needOpenCl = false;
		bool needOckl = false;
		bool needOcml = false;
		for (auto &f : ret->functions()) {
		if (f.hasExternalLinkage() && f.hasName() && !f.hasExactDefinition()) {
		StringRef funcName = f.getName();
		if ("printf" == funcName) {
		needOpenCl = true;
		}
		if (funcName.startswith("__ockl_")) {
		needOckl = true;
		}
		if (funcName.startswith("__ocml_")) {
		needOcml = true;
		}
		}
		}

		if (needOpenCl) {
		needOcml = needOckl = true;
		}

		// No libraries needed (the typical case)
		if (!(needOpenCl \|\| needOcml \|\| needOckl)) {
		return ret;
		}

		auto addControlConstant = [&module = *ret](StringRef name, uint32_t value,
		uint32_t bitwidth) {
		using llvm::GlobalVariable;
		if (module.getNamedGlobal(name)) {
		return;
		}
		llvm::IntegerType *type =
		llvm::IntegerType::getIntNTy(module.getContext(), bitwidth);
		auto initializer = llvm::ConstantInt::get(type, value, /isSigned=*/false);
		auto *constant = new GlobalVariable(
		module, type,
		/isConstant=/true, GlobalVariable::LinkageTypes::LinkOnceODRLinkage,
		initializer, name,
		/before=/nullptr,
		/threadLocalMode=/GlobalVariable::ThreadLocalMode::NotThreadLocal,
		/addressSpace=/4);
		constant->setUnnamedAddr(GlobalVariable::UnnamedAddr::Local);
		constant->setVisibility(
		GlobalVariable::VisibilityTypes::ProtectedVisibility);
		constant->setAlignment(llvm::MaybeAlign(bitwidth / 8));
		};

		// Set up control variables in the module instead of linking in tiny bitcode
		if (needOcml) {
		// TODO(kdrewnia): Enable math optimizations once we have support for
		// `-ffast-math`-like options
		addControlConstant("__oclc_finite_only_opt", 0, 8);
		addControlConstant("__oclc_daz_opt", 0, 8);
		addControlConstant("__oclc_correctly_rounded_sqrt32", 1, 8);
		addControlConstant("__oclc_unsafe_math_opt", 0, 8);
		}
		if (needOcml \|\| needOckl) {
		addControlConstant("__oclc_wavefrontsize64", 1, 8);
		StringRef chipSet = this->chip.getValue();
		if (chipSet.startswith("gfx")) {
		chipSet = chipSet.substr(3);
		}
		uint32_t minor =
		llvm::APInt(32, chipSet.substr(chipSet.size() - 2), 16).getZExtValue();
		uint32_t major = llvm::APInt(32, chipSet.substr(0, chipSet.size() - 2), 10)
		.getZExtValue();
		uint32_t isaNumber = minor + 1000 * major;
		addControlConstant("__oclc_ISA_version", isaNumber, 32);
		}

		// Determine libraries we need to link
		llvm::SmallVector<StringRef, 4> libraries;
		if (needOpenCl) {
		libraries.push_back("opencl.bc");
		}
		if (needOcml) {
		libraries.push_back("ocml.bc");
		}
		if (needOckl) {
		libraries.push_back("ockl.bc");
		}

		Optional<SmallVector<std::unique_ptr<llvm::Module>, 3>> mbModules;
		auto theRocmPath = getRocmPath();
		llvm::SmallString<32> bitcodePath(theRocmPath);
		llvm::sys::path::append(bitcodePath, "amdgcn", "bitcode");
		mbModules = loadLibraries(bitcodePath, libraries, llvmContext);

		// Handle legacy override variable
		auto env = llvm::sys::Process::GetEnv("HIP_DEVICE_LIB_PATH");
		if (env && (rocmPath.getNumOccurrences() == 0)) {
		llvm::SmallString<32> overrideValue(env.getValue());
		auto mbAtOldPath = loadLibraries(overrideValue, libraries, llvmContext);
		if (mbAtOldPath) {
		mbModules = std::move(mbAtOldPath);
		}
		}

		if (!mbModules) {
		llvm::WithColor::warning(llvm::errs())
		<< "Warning: Could not load required device labraries\n";
		llvm::WithColor::note(llvm::errs())
		<< "Note: this will probably cause link-time or run-time failures\n";
		return ret; // We can still abort here
		}

		llvm::Linker linker(*ret);
		for (auto &libModule : mbModules.getValue()) {
		// Failure is true
		auto err = linker.linkInModule(
		std::move(libModule), llvm::Linker::Flags::LinkOnlyNeeded,
		[](llvm::Module &m, const StringSet<> &gvs) {
		llvm::internalizeModule(m, [&gvs](const llvm::GlobalValue &gv) {
		return !gv.hasName() \|\| (gvs.count(gv.getName()) == 0);
		});
		});
		if (err) {
		llvm::errs() << "Error: Failure in library bitcode linking\n";
		// We have no guaranties about the state of `ret`, so bail
		return nullptr;
		}
		}
		return ret;
		}

		LogicalResult
		SerializeToHsacoPass::optimizeLlvm(llvm::Module &llvmModule,
		llvm::TargetMachine &targetMachine) {
		if (optLevel < 0 \|\| optLevel > 3) {
		llvm::errs() << "Invalid optimization level passed to SerializeToHsaco: "
		<< optLevel << "\n";
		return failure();
		}
		targetMachine.setOptLevel(static_cast<llvm::CodeGenOpt::Level>(optLevel));

		auto transformer =
		makeOptimizingTransformer(optLevel, /sizeLevel=/0, &targetMachine);
		auto error = transformer(&llvmModule);
		if (error) {
		llvm::handleAllErrors(std::move(error), [](const llvm::ErrorInfoBase &ei) {
		llvm::errs() << "Could not optimize LLVM IR: ";
		ei.log(llvm::errs());
		llvm::errs() << "\n";
		});
		return failure();
		}
		return success();
		}

std::unique_ptr<SmallVectorImpl<char>>		std::unique_ptr<SmallVectorImpl<char>>
SerializeToHsacoPass::assembleIsa(const std::string &isa) {		SerializeToHsacoPass::assembleIsa(const std::string &isa) {
auto loc = getOperation().getLoc();		auto loc = getOperation().getLoc();

SmallVector<char, 0> result;		SmallVector<char, 0> result;
llvm::raw_svector_ostream os(result);		llvm::raw_svector_ostream os(result);

llvm::Triple triple(llvm::Triple::normalize(this->triple));		llvm::Triple triple(llvm::Triple::normalize(this->triple));
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	PassRegistration<SerializeToHsacoPass> registerSerializeToHSACO(
[] {		[] {
// Initialize LLVM AMDGPU backend.		// Initialize LLVM AMDGPU backend.
LLVMInitializeAMDGPUAsmParser();		LLVMInitializeAMDGPUAsmParser();
LLVMInitializeAMDGPUAsmPrinter();		LLVMInitializeAMDGPUAsmPrinter();
LLVMInitializeAMDGPUTarget();		LLVMInitializeAMDGPUTarget();
LLVMInitializeAMDGPUTargetInfo();		LLVMInitializeAMDGPUTargetInfo();
LLVMInitializeAMDGPUTargetMC();		LLVMInitializeAMDGPUTargetMC();

return std::make_unique<SerializeToHsacoPass>();		return std::make_unique<SerializeToHsacoPass>("amdgcn-amd-amdhsa", "",
		"", 2);
});		});
}		}
#else // MLIR_GPU_TO_HSACO_PASS_ENABLE		#else // MLIR_GPU_TO_HSACO_PASS_ENABLE
void mlir::registerGpuSerializeToHsacoPass() {}		void mlir::registerGpuSerializeToHsacoPass() {}
#endif // MLIR_GPU_TO_HSACO_PASS_ENABLE		#endif // MLIR_GPU_TO_HSACO_PASS_ENABLE

mlir/test/Integration/GPU/ROCM/gpu-to-hsaco.mlir

	// RUN: mlir-opt %s \			// RUN: mlir-opt %s \
	// RUN: -gpu-kernel-outlining \			// RUN: -gpu-kernel-outlining \
	// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' \			// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco{chip=%chip})' \
	// RUN: -gpu-to-llvm \			// RUN: -gpu-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: --entry-point-result=void \			// RUN: --entry-point-result=void \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func @other_func(%arg0 : f32, %arg1 : memref<?xf32>) {			func @other_func(%arg0 : f32, %arg1 : memref<?xf32>) {
	Show All 29 Lines

mlir/test/Integration/GPU/ROCM/lit.local.cfg

				import subprocess

	if not config.enable_rocm_runner:			if not config.enable_rocm_runner:
	config.unsupported = True			config.unsupported = True

				# Need to specify the chip sub-option to the gpu-to-hsaco pass, and also check
				# that we have a gpu at all. Use rocm_agent_enumerator to find the chip and
				# use %chip to insert it, and if no chip is found, mark the test unsupported.
				# Even though the tests here use mlir-cpu-runner, they still call mgpu
				# functions.
				config.chip = 'gfx000'
				if config.rocm_path:
				try:
				p = subprocess.run([config.rocm_path + "/bin/rocm_agent_enumerator"],
				check=True, stdout=subprocess.PIPE)
				agents = [x for x in p.stdout.split() if x != b'gfx000']
				if agents:
				config.chip = agents[0].decode('utf-8')
				else:
				config.unsupported = True
				except subprocess.CalledProcessError:
				config.unsupported = True
				config.substitutions.append(('%chip', config.chip))

mlir/test/Integration/GPU/ROCM/two-modules.mlir

	// RUN: mlir-opt %s \			// RUN: mlir-opt %s \
	// RUN: -gpu-kernel-outlining \			// RUN: -gpu-kernel-outlining \
	// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' \			// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco{chip=%chip})' \
	// RUN: -gpu-to-llvm \			// RUN: -gpu-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: --entry-point-result=void \			// RUN: --entry-point-result=void \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// CHECK: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]			// CHECK: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
	Show All 27 Lines

mlir/test/Integration/GPU/ROCM/vecadd.mlir

	// RUN: mlir-opt %s \			// RUN: mlir-opt %s \
	// RUN: -convert-scf-to-std \			// RUN: -convert-scf-to-std \
	// RUN: -gpu-kernel-outlining \			// RUN: -gpu-kernel-outlining \
	// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' \			// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco{chip=%chip})' \
	// RUN: -gpu-to-llvm \			// RUN: -gpu-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: --entry-point-result=void \			// RUN: --entry-point-result=void \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func @vecadd(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>, %arg2 : memref<?xf32>) {			func @vecadd(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>, %arg2 : memref<?xf32>) {
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

mlir/test/Integration/GPU/ROCM/vector-transferops.mlir

	// RUN: mlir-opt %s \			// RUN: mlir-opt %s \
	// RUN: -convert-scf-to-std \			// RUN: -convert-scf-to-std \
	// RUN: -gpu-kernel-outlining \			// RUN: -gpu-kernel-outlining \
	// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco)' \			// RUN: -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-rocdl,gpu-to-hsaco{chip=%chip})' \
	// RUN: -gpu-to-llvm \			// RUN: -gpu-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_rocm_runtime%shlibext \
	// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: --shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: --entry-point-result=void \			// RUN: --entry-point-result=void \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func @vectransferx2(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>) {			func @vectransferx2(%arg0 : memref<?xf32>, %arg1 : memref<?xf32>) {
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Update SerializeToHsaco to match downstreamAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 384120

mlir/include/mlir/Dialect/GPU/Passes.h

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

mlir/test/Integration/GPU/ROCM/gpu-to-hsaco.mlir

mlir/test/Integration/GPU/ROCM/lit.local.cfg

mlir/test/Integration/GPU/ROCM/two-modules.mlir

mlir/test/Integration/GPU/ROCM/vecadd.mlir

mlir/test/Integration/GPU/ROCM/vector-transferops.mlir

[MLIR][GPU] Update SerializeToHsaco to match downstream
AbandonedPublic