Download Raw Diff

Details

Reviewers

bondhugula
ThomasRaoux
nicolasvasilache
herhut
csigg
ftynse

Commits

rG01c755ff80cb: Make optimize llvm common to both gpu-to-hsaco/cubin

Summary

Before serializing, optimizations on llvm were only called on path to
hsaco, and not cubin. Define opt-level for gpu-to-cubin pass as well,
and move call to optimize llvm to a common place.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vinayaka-bandishti created this revision.May 26 2023, 4:33 AM

Herald added a reviewer: ThomasRaoux. · View Herald TranscriptMay 26 2023, 4:33 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 22 others. · View Herald Transcript

vinayaka-bandishti requested review of this revision.May 26 2023, 4:33 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMay 26 2023, 4:33 AM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

vinayaka-bandishti added inline comments.May 26 2023, 4:35 AM

mlir/lib/Dialect/GPU/CMakeLists.txt
127	I am a little skeptical about adding this. Please let me know if there is a better way to do this.

Harbormaster completed remote builds in B234831: Diff 526027.May 26 2023, 5:16 AM

bondhugula added reviewers: csigg, ftynse.May 26 2023, 5:27 AM

The update looks good but the layering needs to be fixed.

mlir/lib/Dialect/GPU/CMakeLists.txt
127	This can't be right. `GPUTransforms` can't depend on the `ExecutionEngine`. You'll need to move the OptUtils to its own library.

bondhugula added inline comments.May 26 2023, 5:32 AM

mlir/lib/Dialect/GPU/CMakeLists.txt
127	You can create an `MLIRLLVMOptUtils` with the needed functions shared between SerializeToBlob and ExecutionEngine.

bondhugula added inline comments.May 26 2023, 7:20 PM

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp
323	Since this was removed, a header include would need to be updated/trimmed?

vinayaka-bandishti added inline comments.May 29 2023, 10:07 AM

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp
323	The declaration was in the same file. I have removed it.

Fix layering

vinayaka-bandishti marked 2 inline comments as done.May 29 2023, 10:09 AM

Harbormaster completed remote builds in B235165: Diff 526454.May 29 2023, 10:36 AM

bondhugula added inline comments.May 29 2023, 8:21 PM

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
136	... with `optLevel` (default level 2).
mlir/lib/Dialect/GPU/CMakeLists.txt
127	Missing dep on `MLIRExecutionEngineUtils` - please see below.
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
16	This is again an issue I believe. OptUtils.h/cpp is part of MLIRExecutionEngineUtils. If you include this file, you'll need to depend on `MLIRExecutionEngineUtils`. Without it and in this form, the build/link will simply fail with shared libs turned on.

Address comments, layering fix

vinayaka-bandishti marked 3 inline comments as done.May 29 2023, 9:37 PM

vinayaka-bandishti added inline comments.

mlir/lib/Dialect/GPU/CMakeLists.txt
12	I think, we can remove this `IPO` dep. Perhaps in a different change.
127	This dep is already present.

Harbormaster completed remote builds in B235203: Diff 526504.May 29 2023, 9:51 PM

ftynse added inline comments.May 31 2023, 2:12 AM

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
115	I wouldn't have the default level be 2. This is surprising for both "classical compiler" users that would expect it to be 0 (as in clang/gcc) and "ml performance-oriented compiler" that would expect it ot be 3 or the maximum level. I see that this seems to have been the case for HSACO before, can we track down the commit that introduced that and ask the author for their rationale?
140	Ditto. Especially dangerous because CUBIN defaults to 2 and HSACO doesn't have a default.
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
109	Ultra-nit: don't start diagnostic messages from a letter.

vinayaka-bandishti marked an inline comment as done.Jun 1 2023, 10:46 PM

vinayaka-bandishti added a subscriber: krzysz00.

vinayaka-bandishti added inline comments.

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
115	Sure. Commit introduced default opt level = 2 for hsaco is https://github.com/llvm/llvm-project/commit/bd22554af06e1f16dc9ff12eac8987f0ceebe8c1, reviewed at https://reviews.llvm.org/D114113 @krzysz00 Can you please check this comment on rationale behind opt level?

Address comment on diagnostics style

vinayaka-bandishti added inline comments.Jun 1 2023, 11:00 PM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
109	You meant capital letters?

Harbormaster completed remote builds in B236091: Diff 527746.Jun 1 2023, 11:14 PM

bondhugula added inline comments.Jun 2 2023, 12:16 AM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
114	This method is in the MLIRExecutionEngineUtils library. The dependency on LLVMTarget itself won't help.

bondhugula accepted this revision.Jun 2 2023, 12:22 AM

bondhugula added inline comments.

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h
115	As for HSACO, if we want to change the default opt level, it should be done in another revision. This revision is on adding the optimizer on the serialize-to-cubin path. The commit shouldn't surprise users by changing the existing default.

This revision is now accepted and ready to land.Jun 2 2023, 12:22 AM

bondhugula added inline comments.Jun 2 2023, 12:39 AM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
109	Diagnostic messages shouldn't end with a new line. It is appropriately suffixed with more things. You are also missing a space after "level". The message should start in lowercase, I believe.

Resolve comments on diagnostics

vinayaka-bandishti marked 2 inline comments as done.Jun 4 2023, 9:40 PM

A note about the default opt level.

Before:
HSACO path had default opt level 2. CUBIN path had no(dummy) call to optimizeLlvm - effectively opt level was 0.

Since we make the default opt level common to both paths in this patch,
if we keep it at 2, cubin users will see the change/difference.
if we make it 0, hsaco users will see the change/difference.

Harbormaster completed remote builds in B236512: Diff 528278.Jun 4 2023, 9:55 PM

In D151554#4394485, @vinayaka-bandishti wrote:

A note about the default opt level.

Before:
HSACO path had default opt level 2. CUBIN path had no(dummy) call to optimizeLlvm - effectively opt level was 0.

Since we make the default opt level common to both paths in this patch,
if we keep it at 2, cubin users will see the change/difference.
if we make it 0, hsaco users will see the change/difference.

The commit title and purpose already captures the fact that the cubin path will now have optimization. So it's fine this way since HSACO path's behavior is unchanged. We can change both to O3 in a subsequent commit if that's natural for the amd/hsaco path as well.

Closed by commit rG01c755ff80cb: Make optimize llvm common to both gpu-to-hsaco/cubin (authored by vinayaka-polymage, committed by bondhugula). · Explain WhyJun 4 2023, 10:16 PM

This revision was automatically updated to reflect the committed changes.

bondhugula added a commit: rG01c755ff80cb: Make optimize llvm common to both gpu-to-hsaco/cubin.

Diff 528278

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines

	protected:			protected:
	Option<std::string> triple{*this, "triple",			Option<std::string> triple{*this, "triple",
	::llvm::cl::desc("Target triple")};			::llvm::cl::desc("Target triple")};
	Option<std::string> chip{*this, "chip",			Option<std::string> chip{*this, "chip",
	::llvm::cl::desc("Target architecture")};			::llvm::cl::desc("Target architecture")};
	Option<std::string> features{*this, "features",			Option<std::string> features{*this, "features",
	::llvm::cl::desc("Target features")};			::llvm::cl::desc("Target features")};
				Option<int> optLevel{*this, "opt-level",
				llvm::cl::desc("Optimization level for compilation"),
				llvm::cl::init(2)};
				ftynseUnsubmitted Not Done Reply Inline Actions I wouldn't have the default level be 2. This is surprising for both "classical compiler" users that would expect it to be 0 (as in clang/gcc) and "ml performance-oriented compiler" that would expect it ot be 3 or the maximum level. I see that this seems to have been the case for HSACO before, can we track down the commit that introduced that and ask the author for their rationale? ftynse: I wouldn't have the default level be 2. This is surprising for both "classical compiler" users…
				vinayaka-bandishtiAuthorUnsubmitted Not Done Reply Inline Actions Sure. Commit introduced default opt level = 2 for hsaco is https://github.com/llvm/llvm-project/commit/bd22554af06e1f16dc9ff12eac8987f0ceebe8c1, reviewed at https://reviews.llvm.org/D114113 @krzysz00 Can you please check this comment on rationale behind opt level? vinayaka-bandishti: Sure. Commit introduced default opt level = 2 for hsaco is https://github.com/llvm/llvm…
				bondhugulaUnsubmitted Not Done Reply Inline Actions As for HSACO, if we want to change the default opt level, it should be done in another revision. This revision is on adding the optimizer on the serialize-to-cubin path. The commit shouldn't surprise users by changing the existing default. bondhugula: As for HSACO, if we want to change the default opt level, it should be done in another revision.
	Option<std::string> gpuBinaryAnnotation{			Option<std::string> gpuBinaryAnnotation{
	*this, "gpu-binary-annotation",			*this, "gpu-binary-annotation",
	llvm::cl::desc("Annotation attribute string for GPU binary"),			llvm::cl::desc("Annotation attribute string for GPU binary"),
	llvm::cl::init(getDefaultGpuBinaryAnnotation())};			llvm::cl::init(getDefaultGpuBinaryAnnotation())};
	};			};
	} // namespace gpu			} // namespace gpu

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Register pass to serialize GPU kernel functions to a CUBIN binary			/// Register pass to serialize GPU kernel functions to a CUBIN binary
	/// annotation.			/// annotation.
	void registerGpuSerializeToCubinPass();			void registerGpuSerializeToCubinPass();

	/// Register pass to serialize GPU kernel functions to a HSAco binary			/// Register pass to serialize GPU kernel functions to a HSAco binary
	/// annotation.			/// annotation.
	void registerGpuSerializeToHsacoPass();			void registerGpuSerializeToHsacoPass();

	/// Create an instance of the GPU kernel function to CUBIN binary serialization			/// Create an instance of the GPU kernel function to CUBIN binary serialization
	/// pass.			/// pass with optLevel (default level 2).
				bondhugulaUnsubmitted Done Reply Inline Actions ... with `optLevel` (default level 2). bondhugula: ... with `optLevel` (default level 2).
	std::unique_ptr<Pass> createGpuSerializeToCubinPass(StringRef triple,			std::unique_ptr<Pass> createGpuSerializeToCubinPass(StringRef triple,
	StringRef chip,			StringRef chip,
	StringRef features);			StringRef features,
				int optLevel = 2);
				ftynseUnsubmitted Not Done Reply Inline Actions Ditto. Especially dangerous because CUBIN defaults to 2 and HSACO doesn't have a default. ftynse: Ditto. Especially dangerous because CUBIN defaults to 2 and HSACO doesn't have a default.

	/// Create an instance of the GPU kernel function to HSAco binary serialization			/// Create an instance of the GPU kernel function to HSAco binary serialization
	/// pass.			/// pass.
	std::unique_ptr<Pass> createGpuSerializeToHsacoPass(StringRef triple,			std::unique_ptr<Pass> createGpuSerializeToHsacoPass(StringRef triple,
	StringRef arch,			StringRef arch,
	StringRef features,			StringRef features,
	int optLevel);			int optLevel);

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"			#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_GPU_TRANSFORMS_PASSES_H_			#endif // MLIR_DIALECT_GPU_TRANSFORMS_PASSES_H_

mlir/lib/Dialect/GPU/CMakeLists.txt

if (MLIR_ENABLE_CUDA_CONVERSIONS)		if (MLIR_ENABLE_CUDA_CONVERSIONS)
set(NVPTX_LIBS		set(NVPTX_LIBS
NVPTXCodeGen		NVPTXCodeGen
NVPTXDesc		NVPTXDesc
NVPTXInfo		NVPTXInfo
)		)
endif()		endif()

if (MLIR_ENABLE_ROCM_CONVERSIONS)		if (MLIR_ENABLE_ROCM_CONVERSIONS)
set(AMDGPU_LIBS		set(AMDGPU_LIBS
IRReader		IRReader
IPO		IPO
		vinayaka-bandishtiAuthorUnsubmitted Done Reply Inline Actions I think, we can remove this `IPO` dep. Perhaps in a different change. vinayaka-bandishti: I think, we can remove this `IPO` dep. Perhaps in a different change.
linker		linker
MCParser		MCParser
AMDGPUAsmParser		AMDGPUAsmParser
AMDGPUCodeGen		AMDGPUCodeGen
AMDGPUDesc		AMDGPUDesc
AMDGPUInfo		AMDGPUInfo
target		target
)		)
Show All 35 Lines	add_mlir_dialect_library(MLIRGPUTransforms
Transforms/SerializeToHsaco.cpp		Transforms/SerializeToHsaco.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU		${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/GPU

LINK_COMPONENTS		LINK_COMPONENTS
Core		Core
MC		MC
		Target
${NVPTX_LIBS}		${NVPTX_LIBS}
${AMDGPU_LIBS}		${AMDGPU_LIBS}

DEPENDS		DEPENDS
MLIRGPUPassIncGen		MLIRGPUPassIncGen
MLIRParallelLoopMapperEnumsGen		MLIRParallelLoopMapperEnumsGen

LINK_LIBS PUBLIC		LINK_LIBS PUBLIC
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	if(MLIR_ENABLE_CUDA_RUNNER)
target_include_directories(obj.MLIRGPUTransforms		target_include_directories(obj.MLIRGPUTransforms
PRIVATE		PRIVATE
${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}		${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}
)		)

find_library(CUDA_DRIVER_LIBRARY cuda HINTS ${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES} REQUIRED)		find_library(CUDA_DRIVER_LIBRARY cuda HINTS ${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES} REQUIRED)
target_link_libraries(MLIRGPUTransforms		target_link_libraries(MLIRGPUTransforms
PRIVATE		PRIVATE
MLIRNVVMToLLVMIRTranslation		MLIRNVVMToLLVMIRTranslation
		vinayaka-bandishtiAuthorUnsubmitted Done Reply Inline Actions I am a little skeptical about adding this. Please let me know if there is a better way to do this. vinayaka-bandishti: I am a little skeptical about adding this. Please let me know if there is a better way to do…
		bondhugulaUnsubmitted Done Reply Inline Actions This can't be right. `GPUTransforms` can't depend on the `ExecutionEngine`. You'll need to move the OptUtils to its own library. bondhugula: This can't be right. `GPUTransforms` can't depend on the `ExecutionEngine`. You'll need to move…
		bondhugulaUnsubmitted Done Reply Inline Actions You can create an `MLIRLLVMOptUtils` with the needed functions shared between SerializeToBlob and ExecutionEngine. bondhugula: You can create an `MLIRLLVMOptUtils` with the needed functions shared between SerializeToBlob…
		bondhugulaUnsubmitted Done Reply Inline Actions Missing dep on `MLIRExecutionEngineUtils` - please see below. bondhugula: Missing dep on `MLIRExecutionEngineUtils` - please see below.
		vinayaka-bandishtiAuthorUnsubmitted Done Reply Inline Actions This dep is already present. vinayaka-bandishti: This dep is already present.
${CUDA_DRIVER_LIBRARY}		${CUDA_DRIVER_LIBRARY}
)		)

endif()		endif()

if(MLIR_ENABLE_ROCM_CONVERSIONS)		if(MLIR_ENABLE_ROCM_CONVERSIONS)
if (NOT ("AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD))		if (NOT ("AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD))
message(SEND_ERROR		message(SEND_ERROR
Show All 15 Lines

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

//===- SerializeToBlob.cpp - MLIR GPU lowering pass -----------------------===//		//===- SerializeToBlob.cpp - MLIR GPU lowering pass -----------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements a base class for a pass to serialize a gpu module		// This file implements a base class for a pass to serialize a gpu module
// into a binary blob that can be executed on a GPU. The binary blob is added		// into a binary blob that can be executed on a GPU. The binary blob is added
// as a string attribute to the gpu module.		// as a string attribute to the gpu module.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/GPU/Transforms/Passes.h"		#include "mlir/Dialect/GPU/Transforms/Passes.h"
		#include "mlir/ExecutionEngine/OptUtils.h"
		bondhugulaUnsubmitted Done Reply Inline Actions This is again an issue I believe. OptUtils.h/cpp is part of MLIRExecutionEngineUtils. If you include this file, you'll need to depend on `MLIRExecutionEngineUtils`. Without it and in this form, the build/link will simply fail with shared libs turned on. bondhugula: This is again an issue I believe. OptUtils.h/cpp is part of MLIRExecutionEngineUtils. If you…
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.h"		#include "mlir/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.h"
#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"		#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"
#include "mlir/Target/LLVMIR/Export.h"		#include "mlir/Target/LLVMIR/Export.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	void gpu::SerializeToBlobPass::runOnOperation() {
auto attr =		auto attr =
StringAttr::get(&getContext(), StringRef(blob->data(), blob->size()));		StringAttr::get(&getContext(), StringRef(blob->data(), blob->size()));
getOperation()->setAttr(gpuBinaryAnnotation, attr);		getOperation()->setAttr(gpuBinaryAnnotation, attr);
}		}

LogicalResult		LogicalResult
gpu::SerializeToBlobPass::optimizeLlvm(llvm::Module &llvmModule,		gpu::SerializeToBlobPass::optimizeLlvm(llvm::Module &llvmModule,
llvm::TargetMachine &targetMachine) {		llvm::TargetMachine &targetMachine) {
// TODO: If serializeToCubin ends up defining optimizations, factor them		int optLevel = this->optLevel.getValue();
// into here from SerializeToHsaco		if (optLevel < 0 \|\| optLevel > 3)
		return getOperation().emitError()
		<< "invalid optimization level " << optLevel;
		ftynseUnsubmitted Not Done Reply Inline Actions Ultra-nit: don't start diagnostic messages from a letter. ftynse: Ultra-nit: don't start diagnostic messages from a letter.
		vinayaka-bandishtiAuthorUnsubmitted Done Reply Inline Actions You meant capital letters? vinayaka-bandishti: You meant capital letters?
		bondhugulaUnsubmitted Done Reply Inline Actions Diagnostic messages shouldn't end with a new line. It is appropriately suffixed with more things. You are also missing a space after "level". The message should start in lowercase, I believe. bondhugula: Diagnostic messages shouldn't end with a new line. It is appropriately suffixed with more…

		targetMachine.setOptLevel(static_cast<llvm::CodeGenOpt::Level>(optLevel));

		auto transformer =
		makeOptimizingTransformer(optLevel, /sizeLevel=/0, &targetMachine);
		bondhugulaUnsubmitted Done Reply Inline Actions This method is in the MLIRExecutionEngineUtils library. The dependency on LLVMTarget itself won't help. bondhugula: This method is in the MLIRExecutionEngineUtils library. The dependency on LLVMTarget itself…
		auto error = transformer(&llvmModule);
		if (error) {
		InFlightDiagnostic mlirError = getOperation()->emitError();
		llvm::handleAllErrors(
		std::move(error), [&mlirError](const llvm::ErrorInfoBase &ei) {
		mlirError << "could not optimize LLVM IR: " << ei.message();
		});
		return mlirError;
		}
return success();		return success();
}		}

void gpu::SerializeToBlobPass::getDependentDialects(		void gpu::SerializeToBlobPass::getDependentDialects(
DialectRegistry &registry) const {		DialectRegistry &registry) const {
registerGPUDialectTranslation(registry);		registerGPUDialectTranslation(registry);
registerLLVMDialectTranslation(registry);		registerLLVMDialectTranslation(registry);
OperationPass<gpu::GPUModuleOp>::getDependentDialects(registry);		OperationPass<gpu::GPUModuleOp>::getDependentDialects(registry);
Show All 27 Lines

mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines

namespace {		namespace {
class SerializeToCubinPass		class SerializeToCubinPass
: public PassWrapper<SerializeToCubinPass, gpu::SerializeToBlobPass> {		: public PassWrapper<SerializeToCubinPass, gpu::SerializeToBlobPass> {
public:		public:
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(SerializeToCubinPass)		MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(SerializeToCubinPass)

SerializeToCubinPass(StringRef triple = "nvptx64-nvidia-cuda",		SerializeToCubinPass(StringRef triple = "nvptx64-nvidia-cuda",
StringRef chip = "sm_35", StringRef features = "+ptx60");		StringRef chip = "sm_35", StringRef features = "+ptx60",
		int optLevel = 2);

StringRef getArgument() const override { return "gpu-to-cubin"; }		StringRef getArgument() const override { return "gpu-to-cubin"; }
StringRef getDescription() const override {		StringRef getDescription() const override {
return "Lower GPU kernel function to CUBIN binary annotations";		return "Lower GPU kernel function to CUBIN binary annotations";
}		}

private:		private:
void getDependentDialects(DialectRegistry &registry) const override;		void getDependentDialects(DialectRegistry &registry) const override;

// Serializes PTX to CUBIN.		// Serializes PTX to CUBIN.
std::unique_ptr<std::vector<char>>		std::unique_ptr<std::vector<char>>
serializeISA(const std::string &isa) override;		serializeISA(const std::string &isa) override;
};		};
} // namespace		} // namespace

// Sets the 'option' to 'value' unless it already has a value.		// Sets the 'option' to 'value' unless it already has a value.
static void maybeSetOption(Pass::Option<std::string> &option, StringRef value) {		static void maybeSetOption(Pass::Option<std::string> &option, StringRef value) {
if (!option.hasValue())		if (!option.hasValue())
option = value.str();		option = value.str();
}		}

SerializeToCubinPass::SerializeToCubinPass(StringRef triple, StringRef chip,		SerializeToCubinPass::SerializeToCubinPass(StringRef triple, StringRef chip,
StringRef features) {		StringRef features, int optLevel) {
maybeSetOption(this->triple, triple);		maybeSetOption(this->triple, triple);
maybeSetOption(this->chip, chip);		maybeSetOption(this->chip, chip);
maybeSetOption(this->features, features);		maybeSetOption(this->features, features);
		if (this->optLevel.getNumOccurrences() == 0)
		this->optLevel.setValue(optLevel);
}		}

void SerializeToCubinPass::getDependentDialects(		void SerializeToCubinPass::getDependentDialects(
DialectRegistry &registry) const {		DialectRegistry &registry) const {
registerNVVMDialectTranslation(registry);		registerNVVMDialectTranslation(registry);
gpu::SerializeToBlobPass::getDependentDialects(registry);		gpu::SerializeToBlobPass::getDependentDialects(registry);
}		}

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	PassRegistration<SerializeToCubinPass> registerSerializeToCubin([] {
LLVMInitializeNVPTXAsmPrinter();		LLVMInitializeNVPTXAsmPrinter();

return std::make_unique<SerializeToCubinPass>();		return std::make_unique<SerializeToCubinPass>();
});		});
}		}

std::unique_ptr<Pass> mlir::createGpuSerializeToCubinPass(StringRef triple,		std::unique_ptr<Pass> mlir::createGpuSerializeToCubinPass(StringRef triple,
StringRef arch,		StringRef arch,
StringRef features) {		StringRef features,
return std::make_unique<SerializeToCubinPass>(triple, arch, features);		int optLevel) {
		return std::make_unique<SerializeToCubinPass>(triple, arch, features,
		optLevel);
}		}

#else // MLIR_GPU_TO_CUBIN_PASS_ENABLE		#else // MLIR_GPU_TO_CUBIN_PASS_ENABLE
void mlir::registerGpuSerializeToCubinPass() {}		void mlir::registerGpuSerializeToCubinPass() {}
#endif // MLIR_GPU_TO_CUBIN_PASS_ENABLE		#endif // MLIR_GPU_TO_CUBIN_PASS_ENABLE

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	SerializeToHsacoPass(StringRef triple, StringRef arch, StringRef features,
int optLevel);		int optLevel);
SerializeToHsacoPass(const SerializeToHsacoPass &other);		SerializeToHsacoPass(const SerializeToHsacoPass &other);
StringRef getArgument() const override { return "gpu-to-hsaco"; }		StringRef getArgument() const override { return "gpu-to-hsaco"; }
StringRef getDescription() const override {		StringRef getDescription() const override {
return "Lower GPU kernel function to HSACO binary annotations";		return "Lower GPU kernel function to HSACO binary annotations";
}		}

protected:		protected:
Option<int> optLevel{
*this, "opt-level",
llvm::cl::desc("Optimization level for HSACO compilation"),
llvm::cl::init(2)};

Option<std::string> rocmPath{*this, "rocm-path",		Option<std::string> rocmPath{*this, "rocm-path",
llvm::cl::desc("Path to ROCm install")};		llvm::cl::desc("Path to ROCm install")};

// Overload to allow linking in device libs		// Overload to allow linking in device libs
std::unique_ptr<llvm::Module>		std::unique_ptr<llvm::Module>
translateToLLVMIR(llvm::LLVMContext &llvmContext) override;		translateToLLVMIR(llvm::LLVMContext &llvmContext) override;

/// Adds LLVM optimization passes
LogicalResult optimizeLlvm(llvm::Module &llvmModule,
llvm::TargetMachine &targetMachine) override;

private:		private:
void getDependentDialects(DialectRegistry &registry) const override;		void getDependentDialects(DialectRegistry &registry) const override;

// Loads LLVM bitcode libraries		// Loads LLVM bitcode libraries
std::optional<SmallVector<std::unique_ptr<llvm::Module>, 3>>		std::optional<SmallVector<std::unique_ptr<llvm::Module>, 3>>
loadLibraries(SmallVectorImpl<char> &path,		loadLibraries(SmallVectorImpl<char> &path,
SmallVectorImpl<StringRef> &libraries,		SmallVectorImpl<StringRef> &libraries,
llvm::LLVMContext &context);		llvm::LLVMContext &context);
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	if (err) {
// We have no guaranties about the state of `ret`, so bail		// We have no guaranties about the state of `ret`, so bail
return nullptr;		return nullptr;
}		}
}		}

return ret;		return ret;
}		}

LogicalResult
bondhugulaUnsubmitted Not Done Reply Inline Actions Since this was removed, a header include would need to be updated/trimmed? bondhugula: Since this was removed, a header include would need to be updated/trimmed?
vinayaka-bandishtiAuthorUnsubmitted Done Reply Inline Actions The declaration was in the same file. I have removed it. vinayaka-bandishti: The declaration was in the same file. I have removed it.
SerializeToHsacoPass::optimizeLlvm(llvm::Module &llvmModule,
llvm::TargetMachine &targetMachine) {
int optLevel = this->optLevel.getValue();
if (optLevel < 0 \|\| optLevel > 3)
return getOperation().emitError()
<< "Invalid HSA optimization level" << optLevel << "\n";

targetMachine.setOptLevel(static_cast<llvm::CodeGenOpt::Level>(optLevel));

auto transformer =
makeOptimizingTransformer(optLevel, /sizeLevel=/0, &targetMachine);
auto error = transformer(&llvmModule);
if (error) {
InFlightDiagnostic mlirError = getOperation()->emitError();
llvm::handleAllErrors(
std::move(error), [&mlirError](const llvm::ErrorInfoBase &ei) {
mlirError << "Could not optimize LLVM IR: " << ei.message() << "\n";
});
return mlirError;
}
return success();
}

std::unique_ptr<SmallVectorImpl<char>>		std::unique_ptr<SmallVectorImpl<char>>
SerializeToHsacoPass::assembleIsa(const std::string &isa) {		SerializeToHsacoPass::assembleIsa(const std::string &isa) {
auto loc = getOperation().getLoc();		auto loc = getOperation().getLoc();

SmallVector<char, 0> result;		SmallVector<char, 0> result;
llvm::raw_svector_ostream os(result);		llvm::raw_svector_ostream os(result);

llvm::Triple triple(llvm::Triple::normalize(this->triple));		llvm::Triple triple(llvm::Triple::normalize(this->triple));
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Make optimize llvm common to both gpu-to-hsaco/cubin
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528278

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Make optimize llvm common to both gpu-to-hsaco/cubinClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528278

mlir/include/mlir/Dialect/GPU/Transforms/Passes.h

mlir/lib/Dialect/GPU/CMakeLists.txt

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

Make optimize llvm common to both gpu-to-hsaco/cubin
ClosedPublic