This patch adds a basic block profile dump pass that runs right before
asmprint time and dumps basic block profile information so that cost
models can use this data during downstream analysis.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Pinging Rahman (at the request of Mircea) since this is supposed to integrate with the basic block section labels infrastructure currently being used for Propeller.
Currently keeping a raw_fd_ostream in the MachineFunctionPass. They're not supposed to keep state, so I'm not sure this is entirely idiomatic, but I didn't see any other obvious way to do it (might be missing something of course). Other than that should be good to go.
Move profile dump option out of LLVM_HAVE_TFLITE gate to prevent compilation
failure if building without TFLite.
You'll need to update the failing pass structure tests, when you do, can you put a comment there (in the tests) that this pass must be right before asmprinter to make sure nothing mutates MBB freqs?
llvm/include/llvm/CodeGen/Passes.h | ||
---|---|---|
582 | It's not really regalloc-tied, maybe createFinalProfileDumpPass (so "FinalProfileDump" being the pass)? | |
llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp | ||
56 ↗ | (On Diff #494795) | not regalloc. Maybe mbb-profile-dump? Also, can the comment say that matching this info with BBs afterwards, using e.g. llvm-objdump --symbolize-operands, requires the compilation be performed with -fbasic-block-sections=labels? |
122 ↗ | (On Diff #494795) | needs its own .cpp, it's general-purpose |
1190 ↗ | (On Diff #494795) | @rahmanl this is where we wanna make sure we get it right: is this the right API that would ensure the output obtained here (so right before AsmPrint) matches the BB #s you get if you -fbasic-block-sections=lables and then llvm-objdump --symbolize-operands - I realize the latter outputs BB<nr>, but other than that (i.e. ignoring the "BB" prefix) |
llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp | ||
---|---|---|
1190 ↗ | (On Diff #494795) | The BB symbols printed by llvm-objdump will match getNumber() here as long as this pass comes right before AsmPrinter and right after calls to RenumberBlocks() .-fbasic-block-sections will do this for you. But if you don't want to use the flag, you can renumber here. |
Rename pass to MBBProfileDump and split into a separate file from the
regalloc eviction advisor. Modifying pipeline tests is still TODO.
Update comment in TargetPassConfig.cpp
llvm/include/llvm/CodeGen/Passes.h | ||
---|---|---|
582 | I renamed the pass to MBBProfileDump and changed all the associated init/creation functions. Also moved all the code into MBBProfileDump.cpp. |
llvm/lib/CodeGen/MBBProfileDump.cpp | ||
---|---|---|
75 | This won't be pretty in Thin-LTO mode which can spawn multiple threads to perform codegen across various modules. Is this intentional? |
Add note in command line option help specifying that this option does not
support ThinLTO. Switch from putting everything in the LLVM namespace to
using an anonymous namespace.
llvm/lib/CodeGen/MBBProfileDump.cpp | ||
---|---|---|
75 | Yes. We currently aren't supporting ThinLTO as when working with the MLGO infrastructure (where this is intended to fit in), ThinLTO is done at a previous stage during corpus collection where we won't be specifying this flag. I've added a note into the CLI flag help to note that it's not currently supported with ThinLTO. |
Address comments, fix pipeline tests that changed after the new pass addition,
and switched from requiring a MachineBlockFrequencyInfo analysis to a
LazyMachineBlockFrequencyInfo Pass as it calls MBFI under the hood, is already
called at the end of the current pass pipeline on targets, and prevents extra
work from being done in the case no profile dump is desired (ie the flag is
not set).
I have not added comments to any of the pipeline passes since they seem to be
at least partially autogenerted and the dependency/timing info (that this
pass needs to be right before ASMPrinter time) is already available in the
target pass config. Will change if still desired however.
llvm/lib/CodeGen/MBBProfileDump.cpp | ||
---|---|---|
75 | Clarification: We aren't supporting in-process (ie multithreaded single process thinLTO) in the same compiler invocation with this flag, but all thinLTO cases should work with corpus collection, and distributed thinLTO should work as well since everything is per compiler invocation. |
llvm/lib/CodeGen/MBBProfileDump.cpp | ||
---|---|---|
26 | The sentence ends mid air, how about "Enabling this flag during in-process ThinLTO is not supported." | |
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll | ||
719 ↗ | (On Diff #495383) | Here and below for AMDGPU, it seems like the MBB Profile Dump should run in the next FunctionPass Manager. |
Move infrastructure from separate MachineFunctionPass to within AsmPrinter.
This should reduce maintenance, make it less likely that the pass ever gets
moved around in the pipeline in a manner that causes it to cease to function,
and doesn't need modificatio of the pass pipeline tests.
Thanks for doing this, looks so much simpler - basically ready to go imho after addressing the error handling comment
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
---|---|---|
363 ↗ | (On Diff #495722) | do this in doInitialize() because then you have a LLVMContext which you can use to emitError instead of errs() |
377 ↗ | (On Diff #495722) | well, for symmetry, do this in doFinalize. Although IIUC the dtor of raw_fd_ostream would close, too, so you could just skip this. |
1889 ↗ | (On Diff #495722) | nit: maybe test on if MBBProfileDumpFileOutput instead. |
The enable of this test without tflite broke tests on Mac: http://45.33.8.238/macm1/54722/step_11.txt
Please take a look and revert for now if it takes a while to fix.
(Saying this here since the other change doesn't have a phab link in its commit message.)
I'll send a patch in a moment. It's probably because the .ll is missing a triple. The test has no dependency on tflite, nor is it regalloc specific (albeit motivated by a regalloc and a mlgo training usecase).
@thakis can you confirm D143815 - on my mac (x86), the test passed even before the proposed patch, and I'm not able to glean from your bot what (if anything) could be different between my setup and yours - I looked into http://45.33.8.238/macm1/54722/log.txt and http://45.33.8.238/macm1/54722/, looking for a gn or gn config.
My setup is as vanilla as cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release
It's not really regalloc-tied, maybe createFinalProfileDumpPass (so "FinalProfileDump" being the pass)?