Download Raw Diff

Details

Reviewers

bondhugula
herhut

Commits

rG40aef79db0b0: [MLIR][GPU] Add debug output to enable dumping GPU assembly

Summary

Set the DEBUG_TYPE of SerializeToBlob to serialize-to-blob
Add debug output to print the assembly or PTX for GPU modules before they are assembled and linked

Note that, as SerializeToBlob is a superclass of SerializeToCubin and
SerializeToHsaco, --debug-only=serialize-to-blom will dump the
intermediate compiler result for both of these passes.

In addition, if LLVM options such as --stop-after are used to control
the GPU kernel compilation process, the debug output will contain the
appropriate intermediate IR.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Jan 17 2022, 3:26 PM

Herald added a reviewer: bondhugula. · View Herald TranscriptJan 17 2022, 3:26 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 21 others. · View Herald Transcript

krzysz00 requested review of this revision.Jan 17 2022, 3:26 PM

Herald added a reviewer: herhut. · View Herald TranscriptJan 17 2022, 3:26 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

(Looks like this also pulled in createSerializeToHsacoPass which wasn't in the upstream codebase, apparently)

Harbormaster completed remote builds in B143877: Diff 400658.Jan 17 2022, 3:39 PM

mehdi_amini added inline comments.Jan 17 2022, 5:26 PM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
32	In general we rely on people to disable threading for debugging. A global mutex for debugging purpose seems overkill to me.
87	Can we just use the usual Debugging facilities? LLVM_DEBUG({ llvm::dbgs() << targetISA << "\n"; llvm::dbgs().flush(); } );

krzysz00 added inline comments.Jan 18 2022, 8:40 AM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
32	Fair enough, though this lock came from The fact that, when I was debugging what LLVM was doing, I couldn't for the life of me get threading off in the interests of usable --print-after-all output That, without a lock and with manually patching in an `llvm::outs() << isa << "\n";` (though in a slightly different spot) I'd get the output of the assembled program interleaved with the assembly This might just be a case of me being overly paranoid wrt locks - I can remove.
87	Can we just use the usual Debugging facilities? Unless there's a way to special-case these even further so that even `--debug` doesn't turn on the output ... probably not without killing the usability of `--debug` output for everything else, since you'll have many screenfuls of raw assembly in the middle of the usually much tamer debug logs. (On that note, the Tensorflow folks also use debug options for this type of print

Thanks for adding this. I have wanted this a couple of times, too, but never went as far as creating a diff for it.

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
87	You could set the `LLVM_DEBUG_TYPE` to something meaningful, e.g., 'serialize-to-blob' and then '--debug-only=serialize-to-blob' would do what you want. Using the general `--debug` is not helpful IMHO because it already creates too much output.

This will be quite useful - thanks!

mlir/include/mlir/Dialect/GPU/Passes.h
92	Slight rephrase to: Dump final generated instructions ... for a kernel to the debug stream ?
105–106	Terminate all comments with a full stop.

bondhugula added inline comments.Jan 18 2022, 3:01 PM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
87	I too think `-debug-only=serialize-to-blob` should be fine. `dump-asm` isn't really a pass functionality option but a debugging one. It's useful either way though.

mehdi_amini added inline comments.Jan 18 2022, 6:00 PM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
87	Alternatively, you can make it an option that would attach the ASM as an attribute to the op instead of printing it.

bondhugula added inline comments.Jan 19 2022, 7:42 AM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
87	That would mess up the first line and also make it hard to navigate/find, etc. It's better in a way if the MLIR and the ASM are separate -- but for multiple cubins, it'll be good to have an easy way to identify which asm is for which.

Adress review feedback, will reword commit message later

Harbormaster completed remote builds in B144366: Diff 401332.Jan 19 2022, 12:54 PM

mehdi_amini added inline comments.Jan 19 2022, 3:22 PM

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
87	The tradeoff to me is more about whether you need this in production or not.

I agree about the LLVM_DEBUG point, and have edited the code accordingly.

This definitely isn't something I'd see coming around in production, but it is pretty helpful when debugging weird compiler bugs.

I agree with the LLVM_DEBUG arguments and have modified the pass accordingly.

I definitely don't see this being something that gets used in production, but it is pretty helpful for debugging weird compiler failures and the like.

Thanks.

This revision is now accepted and ready to land.Jan 20 2022, 5:23 AM

Please fix the commit message though. I just noticed this now.

Change commit message

Add tags to commit message

Harbormaster completed remote builds in B144627: Diff 401695.Jan 20 2022, 12:04 PM

Closed by commit rG40aef79db0b0: [MLIR][GPU] Add debug output to enable dumping GPU assembly (authored by krzysz00). · Explain WhyJan 20 2022, 12:52 PM

This revision was automatically updated to reflect the committed changes.

krzysz00 added a commit: rG40aef79db0b0: [MLIR][GPU] Add debug output to enable dumping GPU assembly.

Diff 401753

mlir/include/mlir/Dialect/GPU/Passes.h

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	protected:
Option<std::string> features{*this, "features",		Option<std::string> features{*this, "features",
::llvm::cl::desc("Target features")};		::llvm::cl::desc("Target features")};
Option<std::string> gpuBinaryAnnotation{		Option<std::string> gpuBinaryAnnotation{
*this, "gpu-binary-annotation",		*this, "gpu-binary-annotation",
llvm::cl::desc("Annotation attribute string for GPU binary"),		llvm::cl::desc("Annotation attribute string for GPU binary"),
llvm::cl::init(getDefaultGpuBinaryAnnotation())};		llvm::cl::init(getDefaultGpuBinaryAnnotation())};
};		};
} // namespace gpu		} // namespace gpu

		bondhugulaUnsubmitted Not Done Reply Inline Actions Slight rephrase to: Dump final generated instructions ... for a kernel to the debug stream ? bondhugula: Slight rephrase to: ``` Dump final generated instructions ... for a kernel to the debug stream…
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Registration		// Registration
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Register pass to serialize GPU kernel functions to a CUBIN binary		/// Register pass to serialize GPU kernel functions to a CUBIN binary
/// annotation.		/// annotation.
void registerGpuSerializeToCubinPass();		void registerGpuSerializeToCubinPass();

/// Register pass to serialize GPU kernel functions to a HSAco binary		/// Register pass to serialize GPU kernel functions to a HSAco binary
/// annotation.		/// annotation.
void registerGpuSerializeToHsacoPass();		void registerGpuSerializeToHsacoPass();

		/// Create an instance of the GPU kernel function to HSAco binary serialization
		/// pass.
		bondhugulaUnsubmitted Not Done Reply Inline Actions Terminate all comments with a full stop. bondhugula: Terminate all comments with a full stop.
		std::unique_ptr<Pass> createGpuSerializeToHsacoPass(StringRef triple,
		StringRef arch,
		StringRef features,
		int optLevel);

/// Generate the code for registering passes.		/// Generate the code for registering passes.
#define GEN_PASS_REGISTRATION		#define GEN_PASS_REGISTRATION
#include "mlir/Dialect/GPU/Passes.h.inc"		#include "mlir/Dialect/GPU/Passes.h.inc"

} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_GPU_PASSES_H_		#endif // MLIR_DIALECT_GPU_PASSES_H_

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

Show All 15 Lines
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"		#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"
#include "mlir/Target/LLVMIR/Export.h"		#include "mlir/Target/LLVMIR/Export.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"

		#include <string>

		#define DEBUG_TYPE "serialize-to-blob"

using namespace mlir;		using namespace mlir;

std::string gpu::getDefaultGpuBinaryAnnotation() { return "gpu.binary"; }		std::string gpu::getDefaultGpuBinaryAnnotation() { return "gpu.binary"; }

gpu::SerializeToBlobPass::SerializeToBlobPass(TypeID passID)		gpu::SerializeToBlobPass::SerializeToBlobPass(TypeID passID)
		mehdi_aminiUnsubmitted Done Reply Inline Actions In general we rely on people to disable threading for debugging. A global mutex for debugging purpose seems overkill to me. mehdi_amini: In general we rely on people to disable threading for debugging. A global mutex for debugging…
		krzysz00AuthorUnsubmitted Done Reply Inline Actions Fair enough, though this lock came from The fact that, when I was debugging what LLVM was doing, I couldn't for the life of me get threading off in the interests of usable --print-after-all output That, without a lock and with manually patching in an `llvm::outs() << isa << "\n";` (though in a slightly different spot) I'd get the output of the assembled program interleaved with the assembly This might just be a case of me being overly paranoid wrt locks - I can remove. krzysz00: Fair enough, though this lock came from 1) The fact that, when I was debugging what LLVM was…
: OperationPass<gpu::GPUModuleOp>(passID) {}		: OperationPass<gpu::GPUModuleOp>(passID) {}

gpu::SerializeToBlobPass::SerializeToBlobPass(const SerializeToBlobPass &other)		gpu::SerializeToBlobPass::SerializeToBlobPass(const SerializeToBlobPass &other)
: OperationPass<gpu::GPUModuleOp>(other) {}		: OperationPass<gpu::GPUModuleOp>(other) {}

Optional<std::string>		Optional<std::string>
gpu::SerializeToBlobPass::translateToISA(llvm::Module &llvmModule,		gpu::SerializeToBlobPass::translateToISA(llvm::Module &llvmModule,
llvm::TargetMachine &targetMachine) {		llvm::TargetMachine &targetMachine) {
Show All 34 Lines	void gpu::SerializeToBlobPass::runOnOperation() {
Optional<std::string> maybeTargetISA =		Optional<std::string> maybeTargetISA =
translateToISA(llvmModule, targetMachine);		translateToISA(llvmModule, targetMachine);

if (!maybeTargetISA.hasValue())		if (!maybeTargetISA.hasValue())
return signalPassFailure();		return signalPassFailure();

std::string targetISA = std::move(maybeTargetISA.getValue());		std::string targetISA = std::move(maybeTargetISA.getValue());

		LLVM_DEBUG({
		llvm::dbgs() << "ISA for module: " << getOperation().getNameAttr() << "\n";
		llvm::dbgs() << targetISA << "\n";
		llvm::dbgs().flush();
		});
		mehdi_aminiUnsubmitted Done Reply Inline Actions Can we just use the usual Debugging facilities? LLVM_DEBUG({ llvm::dbgs() << targetISA << "\n"; llvm::dbgs().flush(); } ); mehdi_amini: Can we just use the usual Debugging facilities? LLVM_DEBUG({ llvm::dbgs() << targetISA <<…
		krzysz00AuthorUnsubmitted Done Reply Inline Actions Can we just use the usual Debugging facilities? Unless there's a way to special-case these even further so that even `--debug` doesn't turn on the output ... probably not without killing the usability of `--debug` output for everything else, since you'll have many screenfuls of raw assembly in the middle of the usually much tamer debug logs. (On that note, the Tensorflow folks also use debug options for this type of print krzysz00: > Can we just use the usual Debugging facilities? Unless there's a way to special-case these…
		herhutUnsubmitted Done Reply Inline Actions You could set the `LLVM_DEBUG_TYPE` to something meaningful, e.g., 'serialize-to-blob' and then '--debug-only=serialize-to-blob' would do what you want. Using the general `--debug` is not helpful IMHO because it already creates too much output. herhut: You could set the `LLVM_DEBUG_TYPE` to something meaningful, e.g., 'serialize-to-blob' and then…
		bondhugulaUnsubmitted Done Reply Inline Actions I too think `-debug-only=serialize-to-blob` should be fine. `dump-asm` isn't really a pass functionality option but a debugging one. It's useful either way though. bondhugula: I too think `-debug-only=serialize-to-blob` should be fine. `dump-asm` isn't really a pass…
		mehdi_aminiUnsubmitted Done Reply Inline Actions Alternatively, you can make it an option that would attach the ASM as an attribute to the op instead of printing it. mehdi_amini: Alternatively, you can make it an option that would attach the ASM as an attribute to the op…
		bondhugulaUnsubmitted Done Reply Inline Actions That would mess up the first line and also make it hard to navigate/find, etc. It's better in a way if the MLIR and the ASM are separate -- but for multiple cubins, it'll be good to have an easy way to identify which asm is for which. bondhugula: That would mess up the first line and also make it hard to navigate/find, etc. It's better in a…
		mehdi_aminiUnsubmitted Done Reply Inline Actions The tradeoff to me is more about whether you need this in production or not. mehdi_amini: The tradeoff to me is more about whether you need this in production or not.

// Serialize the target ISA.		// Serialize the target ISA.
std::unique_ptr<std::vector<char>> blob = serializeISA(targetISA);		std::unique_ptr<std::vector<char>> blob = serializeISA(targetISA);
if (!blob)		if (!blob)
return signalPassFailure();		return signalPassFailure();

// Add the blob as module attribute.		// Add the blob as module attribute.
auto attr =		auto attr =
StringAttr::get(&getContext(), StringRef(blob->data(), blob->size()));		StringAttr::get(&getContext(), StringRef(blob->data(), blob->size()));
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

Show First 20 Lines • Show All 473 Lines • ▼ Show 20 Lines	PassRegistration<SerializeToHsacoPass> registerSerializeToHSACO(
LLVMInitializeAMDGPUTarget();		LLVMInitializeAMDGPUTarget();
LLVMInitializeAMDGPUTargetInfo();		LLVMInitializeAMDGPUTargetInfo();
LLVMInitializeAMDGPUTargetMC();		LLVMInitializeAMDGPUTargetMC();

return std::make_unique<SerializeToHsacoPass>("amdgcn-amd-amdhsa", "",		return std::make_unique<SerializeToHsacoPass>("amdgcn-amd-amdhsa", "",
"", 2);		"", 2);
});		});
}		}

		/// Create an instance of the GPU kernel function to HSAco binary serialization
		/// pass.
		std::unique_ptr<Pass> mlir::createGpuSerializeToHsacoPass(StringRef triple,
		StringRef arch,
		StringRef features,
		int optLevel) {
		return std::make_unique<SerializeToHsacoPass>(triple, arch, features,
		optLevel);
		}

#else // MLIR_GPU_TO_HSACO_PASS_ENABLE		#else // MLIR_GPU_TO_HSACO_PASS_ENABLE
void mlir::registerGpuSerializeToHsacoPass() {}		void mlir::registerGpuSerializeToHsacoPass() {}
#endif // MLIR_GPU_TO_HSACO_PASS_ENABLE		#endif // MLIR_GPU_TO_HSACO_PASS_ENABLE

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Add debug output to enable dumping GPU assembly
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 401753

mlir/include/mlir/Dialect/GPU/Passes.h

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Add debug output to enable dumping GPU assemblyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 401753

mlir/include/mlir/Dialect/GPU/Passes.h

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp

mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp

[MLIR][GPU] Add debug output to enable dumping GPU assembly
ClosedPublic