This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Add debug output to enable dumping GPU assembly
ClosedPublic

Authored by krzysz00 on Jan 17 2022, 3:26 PM.

Details

Summary
  • Set the DEBUG_TYPE of SerializeToBlob to serialize-to-blob
  • Add debug output to print the assembly or PTX for GPU modules before they are assembled and linked

Note that, as SerializeToBlob is a superclass of SerializeToCubin and
SerializeToHsaco, --debug-only=serialize-to-blom will dump the
intermediate compiler result for both of these passes.

In addition, if LLVM options such as --stop-after are used to control
the GPU kernel compilation process, the debug output will contain the
appropriate intermediate IR.

Diff Detail

Event Timeline

krzysz00 created this revision.Jan 17 2022, 3:26 PM
krzysz00 requested review of this revision.Jan 17 2022, 3:26 PM

(Looks like this also pulled in createSerializeToHsacoPass which wasn't in the upstream codebase, apparently)

mehdi_amini added inline comments.Jan 17 2022, 5:26 PM
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
31

In general we rely on people to disable threading for debugging. A global mutex for debugging purpose seems overkill to me.

90

Can we just use the usual Debugging facilities?
LLVM_DEBUG({

  llvm::dbgs() << targetISA << "\n";
  llvm::dbgs().flush();
}

);

krzysz00 added inline comments.Jan 18 2022, 8:40 AM
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
31

Fair enough, though this lock came from

  1. The fact that, when I was debugging what LLVM was doing, I couldn't for the life of me get threading off in the interests of usable --print-after-all output
  2. That, without a lock and with manually patching in an llvm::outs() << isa << "\n"; (though in a slightly different spot) I'd get the output of the assembled program interleaved with the assembly

This might just be a case of me being overly paranoid wrt locks - I can remove.

90

Can we just use the usual Debugging facilities?

Unless there's a way to special-case these even further so that even --debug doesn't turn on the output ... probably not without killing the usability of --debug output for everything else, since you'll have many screenfuls of raw assembly in the middle of the usually much tamer debug logs.

(On that note, the Tensorflow folks also use debug options for this type of print

Thanks for adding this. I have wanted this a couple of times, too, but never went as far as creating a diff for it.

mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
90

You could set the LLVM_DEBUG_TYPE to something meaningful, e.g., 'serialize-to-blob' and then '--debug-only=serialize-to-blob' would do what you want. Using the general --debug is not helpful IMHO because it already creates too much output.

This will be quite useful - thanks!

mlir/include/mlir/Dialect/GPU/Passes.h
92

Slight rephrase to:

Dump final generated instructions ... for a kernel to the debug stream

?

111–112

Terminate all comments with a full stop.

bondhugula added inline comments.Jan 18 2022, 3:01 PM
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
90

I too think -debug-only=serialize-to-blob should be fine. dump-asm isn't really a pass functionality option but a debugging one. It's useful either way though.

mehdi_amini added inline comments.Jan 18 2022, 6:00 PM
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
90

Alternatively, you can make it an option that would attach the ASM as an attribute to the op instead of printing it.

bondhugula added inline comments.Jan 19 2022, 7:42 AM
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
90

That would mess up the first line and also make it hard to navigate/find, etc. It's better in a way if the MLIR and the ASM are separate -- but for multiple cubins, it'll be good to have an easy way to identify which asm is for which.

krzysz00 updated this revision to Diff 401332.Jan 19 2022, 10:40 AM
  • Adress review feedback, will reword commit message later
mehdi_amini added inline comments.Jan 19 2022, 3:22 PM
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp
90

The tradeoff to me is more about whether you need this in production or not.

krzysz00 marked 7 inline comments as done.Jan 19 2022, 3:34 PM

I agree about the LLVM_DEBUG point, and have edited the code accordingly.

This definitely isn't something I'd see coming around in production, but it is pretty helpful when debugging weird compiler bugs.

I agree with the LLVM_DEBUG arguments and have modified the pass accordingly.

I definitely don't see this being something that gets used in production, but it is pretty helpful for debugging weird compiler failures and the like.

herhut accepted this revision.Jan 20 2022, 5:23 AM

Thanks.

This revision is now accepted and ready to land.Jan 20 2022, 5:23 AM

Please fix the commit message though. I just noticed this now.

krzysz00 updated this revision to Diff 401693.Jan 20 2022, 10:41 AM
krzysz00 retitled this revision from Add a dump-asm pass option to GPU serialization. to Add debug output to enable dumping GPU assembly.
krzysz00 edited the summary of this revision. (Show Details)

Change commit message

krzysz00 updated this revision to Diff 401695.Jan 20 2022, 10:44 AM
krzysz00 retitled this revision from Add debug output to enable dumping GPU assembly to [MLIR][GPU] Add debug output to enable dumping GPU assembly.

Add tags to commit message