This is an archive of the discontinued LLVM Phabricator instance.

[Inliner,OptDiag] Add hotness attribute to opt diagnostics
ClosedPublic

Authored by anemet on Jul 22 2016, 11:58 AM.

Details

Summary

The inliner not being a function pass requires the work-around of
generating the OptimizationRemarkEmitter and in turn BFI on demand.
This will go away after the new PM is ready.

BFI is only computed inside ORE if the user has requested hotness information for optimization diagnostitics (-pass-remark-with-hotness at the 'opt' level). Thus there is no additional overhead without the flag.

Diff Detail

Repository
rL LLVM

Event Timeline

anemet updated this revision to Diff 65120.Jul 22 2016, 11:58 AM
anemet retitled this revision from to [Inliner,OptDiag] Add hotness attribute to opt diagnostics.
anemet updated this object.
anemet added reviewers: hfinkel, davidxl.
anemet added a subscriber: llvm-commits.
anemet updated this object.Jul 22 2016, 12:15 PM
anemet updated this revision to Diff 65185.Jul 22 2016, 4:06 PM

Rebase on top of r276488.

eraman edited edge metadata.Jul 28 2016, 2:59 PM

This computes DominatorTree, LoopInfo, BPI and BFI for *every callsite* that is considered for inlining. Even though this is (I assume) turned off by default, it's still very expensive. Even in the new PM, this can be avoided only by incrementally updating the BFI of the caller after inlining a callee into it.

This computes DominatorTree, LoopInfo, BPI and BFI for *every callsite* that is considered for inlining. Even though this is (I assume) turned off by default, it's still very expensive.

Certainly it is off by default and it's meant as a performance analysis tool so the budget is different. That said, we can I guess cache ORE per caller and only invalidate it once we inline. Does that sound reasonable to you?

Even in the new PM, this can be avoided only by incrementally updating the BFI of the caller after inlining a callee into it.

Sure but I thought that was required for PGO-based inlining anyway. I was hoping to piggyback on that.

This computes DominatorTree, LoopInfo, BPI and BFI for *every callsite* that is considered for inlining. Even though this is (I assume) turned off by default, it's still very expensive.

Certainly it is off by default and it's meant as a performance analysis tool so the budget is different.

Yes, I understand that this is a reporting tool and so the compilation time is not a major constraint. I was wondering if this makes it practical for use in real applications. Perhaps it is. Have you run this on anything large (SPEC, for example) and measured the overhead? I don't have any objections to this patch if you have measured the overhead and found them reasonable.

That said, we can I guess cache ORE per caller and only invalidate it once we inline. Does that sound reasonable to you?

That'll certainly reduce the overhead in the cases where inlining is considered, but does not happen. I am not sure if it is worth the additional complexity though. If you go this route, please isolate the changes to within ORE and invalidate this in emitOptimizationRemark.

Even in the new PM, this can be avoided only by incrementally updating the BFI of the caller after inlining a callee into it.

Sure but I thought that was required for PGO-based inlining anyway. I was hoping to piggyback on that.

Yes, that's true. I didn't mean to say that ORE has to specifically do this.

test/Transforms/Inline/optimization-remarks-with-hotness.ll
1 ↗(On Diff #65185)

The test case could be vastly simplified. The contents of the functions do not affect the test case.

This computes DominatorTree, LoopInfo, BPI and BFI for *every callsite* that is considered for inlining. Even though this is (I assume) turned off by default, it's still very expensive.

Certainly it is off by default and it's meant as a performance analysis tool so the budget is different.

Yes, I understand that this is a reporting tool and so the compilation time is not a major constraint. I was wondering if this makes it practical for use in real applications. Perhaps it is. Have you run this on anything large (SPEC, for example) and measured the overhead? I don't have any objections to this patch if you have measured the overhead and found them reasonable.

No, I haven't run it with anything yet. I can keep this patch local for now and start doing the evaluation and go from there.

That said, we can I guess cache ORE per caller and only invalidate it once we inline. Does that sound reasonable to you?

That'll certainly reduce the overhead in the cases where inlining is considered, but does not happen. I am not sure if it is worth the additional complexity though. If you go this route, please isolate the changes to within ORE and invalidate this in emitOptimizationRemark.

Yes, certainly. It would be good if as little of this as possible would spill over into the inliner.

anemet added inline comments.Jul 29 2016, 11:08 AM
test/Transforms/Inline/optimization-remarks-with-hotness.ll
1 ↗(On Diff #65185)

It is just a copy of optimization-remarks.ll further annotated with !prof metadata. Anyhow, I can simplify it.

anemet added a comment.Aug 2 2016, 9:32 PM

Yes, I understand that this is a reporting tool and so the compilation time is not a major constraint. I was wondering if this makes it practical for use in real applications. Perhaps it is. Have you run this on anything large (SPEC, for example) and measured the overhead? I don't have any objections to this patch if you have measured the overhead and found them reasonable.

I've run this on SPEC2000_int, SPEC2006_int/fp now and the overhead seems reasonable with a release (no-asserts) compiler. The worst ones were perlbmk (both 2000 and 2006) and omnetpp around 30% slower. The differences trail off quickly afterwards. Also this was done without LTO. LTO is probably worse.

Let me know if you have further questions.

Adam

anemet updated this revision to Diff 67187.Aug 8 2016, 9:38 AM
anemet edited edge metadata.

Simplified the test per @eraman's comments.

chandlerc accepted this revision.Aug 8 2016, 11:25 PM
chandlerc added a reviewer: chandlerc.
chandlerc added a subscriber: chandlerc.

LGTM w.r.t. to the inliner. Please make sure the other heavy users of remarks are happy as well as you are changing the output mechanism here.

I agree with the sentiment in the patch description that this is a bit gross, but at least seems sufficiently isolated.

include/llvm/Analysis/OptimizationDiagnosticInfo.h
37–38 ↗(On Diff #67187)

This needs a proper doxygen comment, especially regarding the cost of doing this. And doubly especially that it is free unless the context has the hotness emission specifically enabled. I had to read the implementation to understand why this is OK. =[

I'll also note that this applies to this entire file which seems ... remarkably devoid of doxygen API comments. Please document this interface and infrastructure in subsequent commits. =/

156 ↗(On Diff #67187)

No need for '\brief'.

This revision is now accepted and ready to land.Aug 8 2016, 11:25 PM
davidxl accepted this revision.Aug 9 2016, 9:02 AM
davidxl edited edge metadata.

lgtm -- the compile time issue is temporary (and not affecting the default path).

anemet marked 2 inline comments as done.Aug 9 2016, 5:20 PM
This revision was automatically updated to reflect the committed changes.
anemet added inline comments.Aug 9 2016, 5:57 PM
include/llvm/Analysis/OptimizationDiagnosticInfo.h
37–38 ↗(On Diff #67187)

Actually, I think that only the class comment was missing. The member functions were heavily commented.

Fixed in rL278186.

Thanks to all of you for the reviews!