This is an archive of the discontinued LLVM Phabricator instance.

[OptRemark,LDist] RFC: Add hotness attribute
ClosedPublic

Authored by anemet on Jun 27 2016, 3:25 PM.

Details

Summary

This is the first set of changes implementing the RFC from
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334

This is a cross-sectional patch; rather than implementing the hotness
attribute for all optimization remarks and all passes in a patch set, it
implements it for the 'missed-optimization' remark for Loop
Distribution. My goal is to shake out the design issues before scaling
it up to other types and passes.

Hotness is computed as an integer as the multiplication of the block
frequency with the function entry count. It's only printed in opt
currently since clang prints the diagnostic fields directly. E.g.:

remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300)

A new API added is similar to emitOptimizationRemarkMissed. The
difference is that it additionally takes a code region that the
diagnostic corresponds to. From this, hotness is computed using BFI.
The new API is exposed via an analysis pass so that it can be made
dependent on LazyBFI. (Thanks to Hal for the analysis pass idea.)

This feature can all be enabled by setDiagnosticHotnessRequested in the
LLVM context. If this is off, LazyBFI is not calculated (D22141) so
there should be no overhead.

A new command-line option is added to turn this on in opt.

My plan is to switch all user of emitOptimizationRemark* to use this
module instead.

Diff Detail

Repository
rL LLVM

Event Timeline

anemet updated this revision to Diff 62026.Jun 27 2016, 3:25 PM
anemet retitled this revision from to [OptRemark] RFC: Add hotness attribute.
anemet updated this object.
anemet added a reviewer: hfinkel.
anemet added a subscriber: llvm-commits.
rcox2 added a subscriber: rcox2.Jun 29 2016, 1:46 PM
anemet updated this revision to Diff 62312.Jun 29 2016, 4:40 PM

Add a missing function comment

anemet updated this revision to Diff 62381.Jun 30 2016, 10:42 AM

Add a more convenient, alternative API for emitOptimizationRemarkMissed that
takes a loop and derives debug location and the IR Value internally.

anemet updated this object.Jun 30 2016, 10:47 AM
anemet updated this revision to Diff 62384.Jun 30 2016, 11:06 AM

Fix rebase glitch.

anemet updated this revision to Diff 62508.Jul 1 2016, 10:32 AM

BFI::getBlockProfileCount already multiplies the BB count with the function's
count. Update hotness calculation and test.

anemet updated this revision to Diff 62579.Jul 1 2016, 5:10 PM

Use getAnalysisIfAvailable to access BFI. If the pass does not currently use
BFI the original code wouldn't work. This does not trigger with LV but
porting Loop Distribution tripped on this.

@hfinkel it would be great if you could look at this. I have bunch of patches lined up after this with most of them post-commit-reviewable as long as you sign off on the design sketched out in this patch. Thanks!

A couple of typos spotted:

include/llvm/IR/DiagnosticInfo.h
425 ↗(On Diff #62579)

s/informatin/information/
s/this zero/this is zero/

anemet marked an inline comment as done.Jul 5 2016, 11:13 AM

Thanks, Michael!

hfinkel edited edge metadata.Jul 5 2016, 2:33 PM

Thanks for working on this.

Is there a reason that these should be free functions instead of just making an Analysis pass with these as member functions? Making it an actual analysis will make it easy to require BFI. I think the "get if available" will end up being confusing (sometimes the filtering will work, sometimes not, depending on what gets invalidated when - especially with the new pass manager where this will be dynamic). Side note: We might want to make BFI lazy (or compute-on-first-query instead of actually computing things in runOnFunction).

include/llvm/IR/DiagnosticInfo.h
426 ↗(On Diff #62579)

I think that we should differentiate between 0 and "no information". Maybe make this an Optional<uint64_t>?

lib/Analysis/OptimizationDiagnosticInfo.cpp
27 ↗(On Diff #62579)

V might not always be a BB. This is something you were planning to generalize later?

anemet added a comment.Jul 5 2016, 3:22 PM

Is there a reason that these should be free functions instead of just making an Analysis pass with these as member functions? Making it an actual analysis will make it easy to require BFI. I think the "get if available" will end up being confusing (sometimes the filtering will work, sometimes not, depending on what gets invalidated when - especially with the new pass manager where this will be dynamic). Side note: We might want to make BFI lazy (or compute-on-first-query instead of actually computing things in runOnFunction).

Thanks for feedback!

The main reason these are free function so that it's easy to transition from the current API which uses free function (i.e. no major reason ;-). But yes, I wasn't completely happy with the BFI dependence either. It was working for LV that already depends on BFI but it didn't work for LoopDist for example. I added a new command flag and made the BFI dependence conditional on that.

How would an analysis pass help this. It feels orthogonal because we can't unconditionally require the new analysis either because it would populate BFI. I actually think that the new PM will help here because we could then hopefully check F.getEntryCount to see if PGO is available and only query BFI then, no?

Thinking a bit more, having this packaged as a new analysis pass may help because right now I have this in LoopDist for example:

+ if (PassRemarksWithHotness) <------ this is the command line option
+ AU.addRequired<BlockFrequencyInfoWrapperPass>();

but when the emitOptimizationRemark* function is called, I don't know if BFI is actually available (i.e. whether the pass was updated with the change above) so I need to use getAnalysisIfAvailable. Having the Analysis pass would take care of this because it's all under the control of the analysis pass.

Regarding the BFI population, I was thinking of using the command line flag (-pass-remarks-with-hotness). Hopefully we can somehow detect if PGO is available and then enable the flag automatically at some higher level. As you say, lazy BFI would be helpful but it would be nice not to directly depend on that feature for this project.

What do you think?

Adam

include/llvm/IR/DiagnosticInfo.h
426 ↗(On Diff #62579)

I was undecided about this but you moved the needle now :). Will add.

lib/Analysis/OptimizationDiagnosticInfo.cpp
27 ↗(On Diff #62579)

Yes, hence the cast so that we'd assert for anything else right. I hope to refine this part of the design as more things get hooked up.

Is there a reason that these should be free functions instead of just making an Analysis pass with these as member functions? Making it an actual analysis will make it easy to require BFI. I think the "get if available" will end up being confusing (sometimes the filtering will work, sometimes not, depending on what gets invalidated when - especially with the new pass manager where this will be dynamic). Side note: We might want to make BFI lazy (or compute-on-first-query instead of actually computing things in runOnFunction).

Thanks for feedback!

The main reason these are free function so that it's easy to transition from the current API which uses free function (i.e. no major reason ;-). But yes, I wasn't completely happy with the BFI dependence either. It was working for LV that already depends on BFI but it didn't work for LoopDist for example. I added a new command flag and made the BFI dependence conditional on that.

How would an analysis pass help this. It feels orthogonal because we can't unconditionally require the new analysis either because it would populate BFI. I actually think that the new PM will help here because we could then hopefully check F.getEntryCount to see if PGO is available and only query BFI then, no?

Thinking a bit more, having this packaged as a new analysis pass may help because right now I have this in LoopDist for example:

+ if (PassRemarksWithHotness) <------ this is the command line option
+ AU.addRequired<BlockFrequencyInfoWrapperPass>();

but when the emitOptimizationRemark* function is called, I don't know if BFI is actually available (i.e. whether the pass was updated with the change above) so I need to use getAnalysisIfAvailable. Having the Analysis pass would take care of this because it's all under the control of the analysis pass.

Exactly. Looks like we both agree on packaging this as an analysis pass...

Regarding the BFI population, I was thinking of using the command line flag (-pass-remarks-with-hotness). Hopefully we can somehow detect if PGO is available and then enable the flag automatically at some higher level. As you say, lazy BFI would be helpful but it would be nice not to directly depend on that feature for this project.

What do you think?

I think we should add some global state (perhaps something that is set when setDiagnosticHandler is called, or via some nearby API) that indicates whether diagnostic "hotness" is requested. We should package these wrappers as an analysis pass, and that pass should have a hard requirement on BFI.

Regarding making BFI lazy, I think we probably need to do this, at least in the trivial sense: Add a Boolean to BFI indicating whether or not 'calculate' has been called, and add 'if (!Calculated) calculate()' to BFI::getBlockFreq, BFI::getBlockProfileCount and BFI::setBlockFreq, remove the call to calculate in runOnFunction. There is a more-complicated sense of making BFI lazy -- only computing frequencies for parts of the function as required -- but that's a larger project.

Then if we make no BFI calls, then requiring it is free. I think then the infrastructure will just work for us.

Adam

anemet added a comment.Jul 5 2016, 4:40 PM

Exactly. Looks like we both agree on packaging this as an analysis pass...

Yes, sorry if it wasn't completely clear. I was more and more convinced as I was replying...

I think we should add some global state (perhaps something that is set when setDiagnosticHandler is called, or via some nearby API) that indicates whether diagnostic "hotness" is requested. We should package these wrappers as an analysis pass, and that pass should have a hard requirement on BFI.

Regarding making BFI lazy, I think we probably need to do this, at least in the trivial sense: Add a Boolean to BFI indicating whether or not 'calculate' has been called, and add 'if (!Calculated) calculate()' to BFI::getBlockFreq, BFI::getBlockProfileCount and BFI::setBlockFreq, remove the call to calculate in runOnFunction. There is a more-complicated sense of making BFI lazy -- only computing frequencies for parts of the function as required -- but that's a larger project.

Then if we make no BFI calls, then requiring it is free. I think then the infrastructure will just work for us.

Yes this should work.

I guess what you're saying is that we can't make the pass conditionally dependent on BFI because we have no access to the LLVM context in getAnalysisUsage. So while this may work for a command-line option, it does not work for a global in the LLVM context sense?

Exactly. Looks like we both agree on packaging this as an analysis pass...

Yes, sorry if it wasn't completely clear. I was more and more convinced as I was replying...

I think we should add some global state (perhaps something that is set when setDiagnosticHandler is called, or via some nearby API) that indicates whether diagnostic "hotness" is requested. We should package these wrappers as an analysis pass, and that pass should have a hard requirement on BFI.

Regarding making BFI lazy, I think we probably need to do this, at least in the trivial sense: Add a Boolean to BFI indicating whether or not 'calculate' has been called, and add 'if (!Calculated) calculate()' to BFI::getBlockFreq, BFI::getBlockProfileCount and BFI::setBlockFreq, remove the call to calculate in runOnFunction. There is a more-complicated sense of making BFI lazy -- only computing frequencies for parts of the function as required -- but that's a larger project.

Then if we make no BFI calls, then requiring it is free. I think then the infrastructure will just work for us.

Yes this should work.

I guess what you're saying is that we can't make the pass conditionally dependent on BFI because we have no access to the LLVM context in getAnalysisUsage. So while this may work for a command-line option, it does not work for a global in the LLVM context sense?

Yes, this is my thought.

anemet added a comment.Jul 5 2016, 4:49 PM

Thanks very much for the feedback, Hal! I'll add a new review for the BFI changes and then update this one as well.

Adam

anemet updated this revision to Diff 63245.Jul 8 2016, 9:18 AM
anemet edited edge metadata.

Address Hal's comments: repackage as an analysis pass, use lazy BFI

anemet retitled this revision from [OptRemark] RFC: Add hotness attribute to [OptRemark,LV] RFC: Add hotness attribute.
anemet updated this object.
anemet updated this revision to Diff 63741.Jul 12 2016, 3:39 PM

Updated to use the LazyBFI analysis from D22141.

Also switched to make it work with Loop Distribution rather than the
Vectorizer. This way I can also test the laziness if the global flag is off.
(LV itself depends on BFI.)

anemet retitled this revision from [OptRemark,LV] RFC: Add hotness attribute to [OptRemark,LDist] RFC: Add hotness attribute.Jul 12 2016, 3:41 PM
anemet updated this object.

Hi @hfinkel,

I've updated this yesterday to reflect the commit of the LazyBFI pass. So this is ready to be looked at whenever you get a chance.

Thanks,
Adam

hfinkel accepted this revision.Jul 14 2016, 7:28 PM
hfinkel edited edge metadata.

LGTM. Thanks!

include/llvm/IR/LLVMContext.h
178 ↗(On Diff #63741)

diagnostic -> diagnostics

181 ↗(On Diff #63741)

diagnostic -> diagnostics

This revision is now accepted and ready to land.Jul 14 2016, 7:28 PM
anemet marked 2 inline comments as done.Jul 15 2016, 9:54 AM

Thanks very much for the review, Hal!

This revision was automatically updated to reflect the committed changes.