This is an archive of the discontinued LLVM Phabricator instance.

[MLGO] Add BB Profile Dump Pass for Regalloc Case
ClosedPublic

Authored by aidengrossman on Feb 3 2023, 10:56 PM.

Details

Summary

This patch adds a basic block profile dump pass that runs right before
asmprint time and dumps basic block profile information so that cost
models can use this data during downstream analysis.

Diff Detail

Event Timeline

aidengrossman created this revision.Feb 3 2023, 10:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 10:56 PM
aidengrossman requested review of this revision.Feb 3 2023, 10:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 10:56 PM

Pinging Rahman (at the request of Mircea) since this is supposed to integrate with the basic block section labels infrastructure currently being used for Propeller.

Currently keeping a raw_fd_ostream in the MachineFunctionPass. They're not supposed to keep state, so I'm not sure this is entirely idiomatic, but I didn't see any other obvious way to do it (might be missing something of course). Other than that should be good to go.

Move profile dump option out of LLVM_HAVE_TFLITE gate to prevent compilation
failure if building without TFLite.

You'll need to update the failing pass structure tests, when you do, can you put a comment there (in the tests) that this pass must be right before asmprinter to make sure nothing mutates MBB freqs?

llvm/include/llvm/CodeGen/Passes.h
582 ↗(On Diff #494795)

It's not really regalloc-tied, maybe createFinalProfileDumpPass (so "FinalProfileDump" being the pass)?

llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp
56 ↗(On Diff #494795)

not regalloc. Maybe mbb-profile-dump?

Also, can the comment say that matching this info with BBs afterwards, using e.g. llvm-objdump --symbolize-operands, requires the compilation be performed with -fbasic-block-sections=labels?

122 ↗(On Diff #494795)

needs its own .cpp, it's general-purpose

1190 ↗(On Diff #494795)

@rahmanl this is where we wanna make sure we get it right: is this the right API that would ensure the output obtained here (so right before AsmPrint) matches the BB #s you get if you -fbasic-block-sections=lables and then llvm-objdump --symbolize-operands - I realize the latter outputs BB<nr>, but other than that (i.e. ignoring the "BB" prefix)

rahmanl added inline comments.Feb 6 2023, 11:44 AM
llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp
1190 ↗(On Diff #494795)

The BB symbols printed by llvm-objdump will match getNumber() here as long as this pass comes right before AsmPrinter and right after calls to RenumberBlocks() .-fbasic-block-sections will do this for you. But if you don't want to use the flag, you can renumber here.

Rename pass to MBBProfileDump and split into a separate file from the
regalloc eviction advisor. Modifying pipeline tests is still TODO.

aidengrossman marked an inline comment as done.

Update comment in TargetPassConfig.cpp

llvm/include/llvm/CodeGen/Passes.h
582 ↗(On Diff #494795)

I renamed the pass to MBBProfileDump and changed all the associated init/creation functions. Also moved all the code into MBBProfileDump.cpp.

rahmanl added inline comments.Feb 6 2023, 3:45 PM
llvm/lib/CodeGen/MBBProfileDump.cpp
74 ↗(On Diff #495288)

This won't be pretty in Thin-LTO mode which can spawn multiple threads to perform codegen across various modules. Is this intentional?

mtrofin added inline comments.Feb 6 2023, 3:48 PM
llvm/lib/CodeGen/MBBProfileDump.cpp
26 ↗(On Diff #495288)

anonymous namespace; then just define outside that, fully qualified, the createXYZPass function (i.e. llvm::createMBBProfileDumpPass) see llvm/lib/Target/X86/X86DiscriminateMemOps.cpp for example.

73 ↗(On Diff #495288)

nit: const auto &MBB

Add note in command line option help specifying that this option does not
support ThinLTO. Switch from putting everything in the LLVM namespace to
using an anonymous namespace.

aidengrossman marked an inline comment as done.Feb 6 2023, 5:28 PM
aidengrossman added inline comments.
llvm/lib/CodeGen/MBBProfileDump.cpp
74 ↗(On Diff #495288)

Yes. We currently aren't supporting ThinLTO as when working with the MLGO infrastructure (where this is intended to fit in), ThinLTO is done at a previous stage during corpus collection where we won't be specifying this flag. I've added a note into the CLI flag help to note that it's not currently supported with ThinLTO.

aidengrossman marked 2 inline comments as done.

Address comments, fix pipeline tests that changed after the new pass addition,
and switched from requiring a MachineBlockFrequencyInfo analysis to a
LazyMachineBlockFrequencyInfo Pass as it calls MBFI under the hood, is already
called at the end of the current pass pipeline on targets, and prevents extra
work from being done in the case no profile dump is desired (ie the flag is
not set).

I have not added comments to any of the pipeline passes since they seem to be
at least partially autogenerted and the dependency/timing info (that this
pass needs to be right before ASMPrinter time) is already available in the
target pass config. Will change if still desired however.

aidengrossman added inline comments.Feb 6 2023, 10:42 PM
llvm/lib/CodeGen/MBBProfileDump.cpp
74 ↗(On Diff #495288)

Clarification: We aren't supporting in-process (ie multithreaded single process thinLTO) in the same compiler invocation with this flag, but all thinLTO cases should work with corpus collection, and distributed thinLTO should work as well since everything is per compiler invocation.

mtrofin added inline comments.Feb 7 2023, 7:55 AM
llvm/lib/CodeGen/MBBProfileDump.cpp
25 ↗(On Diff #495383)

The sentence ends mid air, how about "Enabling this flag during in-process ThinLTO is not supported."

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
719 ↗(On Diff #495383)

Here and below for AMDGPU, it seems like the MBB Profile Dump should run in the next FunctionPass Manager.

Fix CLI flag help comment based on reviewer feedback.

Move infrastructure from separate MachineFunctionPass to within AsmPrinter.
This should reduce maintenance, make it less likely that the pass ever gets
moved around in the pipeline in a manner that causes it to cease to function,
and doesn't need modificatio of the pass pipeline tests.

Update commit message to better reflect recent changes.

Thanks for doing this, looks so much simpler - basically ready to go imho after addressing the error handling comment

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
363

do this in doInitialize() because then you have a LLVMContext which you can use to emitError instead of errs()

377

well, for symmetry, do this in doFinalize. Although IIUC the dtor of raw_fd_ostream would close, too, so you could just skip this.

1889

nit: maybe test on if MBBProfileDumpFileOutput instead.

Update based on reviewer feedback.

aidengrossman marked 3 inline comments as done.Feb 8 2023, 1:01 PM
mtrofin accepted this revision.Feb 8 2023, 1:54 PM
This revision is now accepted and ready to land.Feb 8 2023, 1:54 PM
This revision was landed with ongoing or failed builds.Feb 8 2023, 3:14 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Feb 11 2023, 3:50 AM

The enable of this test without tflite broke tests on Mac: http://45.33.8.238/macm1/54722/step_11.txt

Please take a look and revert for now if it takes a while to fix.

(Saying this here since the other change doesn't have a phab link in its commit message.)

The enable of this test without tflite broke tests on Mac: http://45.33.8.238/macm1/54722/step_11.txt

Please take a look and revert for now if it takes a while to fix.

(Saying this here since the other change doesn't have a phab link in its commit message.)

I'll send a patch in a moment. It's probably because the .ll is missing a triple. The test has no dependency on tflite, nor is it regalloc specific (albeit motivated by a regalloc and a mlgo training usecase).

The enable of this test without tflite broke tests on Mac: http://45.33.8.238/macm1/54722/step_11.txt

Please take a look and revert for now if it takes a while to fix.

(Saying this here since the other change doesn't have a phab link in its commit message.)

@thakis can you confirm D143815 - on my mac (x86), the test passed even before the proposed patch, and I'm not able to glean from your bot what (if anything) could be different between my setup and yours - I looked into http://45.33.8.238/macm1/54722/log.txt and http://45.33.8.238/macm1/54722/, looking for a gn or gn config.

My setup is as vanilla as cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release