Page MenuHomePhabricator

Avoid inlining in throw statement
AbandonedPublic

Authored by junbuml on Sep 30 2015, 12:36 PM.

Details

Summary

It might be reasonably to avoid inlining CallSites invoked in
exception handling context so that we can reduce code size blow-up in
EH regions as well as indirectly increase inline opportunites for unwinding
functions containing exception handling code.

In this change, the NoInline attribute is added in CallSites
invoked specifically by the throw statement.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 36136.Sep 30 2015, 12:36 PM
junbuml retitled this revision from to Avoid inlining in throw statement.
junbuml updated this object.
junbuml added subscribers: gberry, cfe-commits.
mcrosier edited edge metadata.Oct 1 2015, 7:45 AM

FWIW, a llvm based solution was discussed in http://reviews.llvm.org/D12979, but the clang solution is obviously more robust and easier to implement.

lib/CodeGen/CodeGenFunction.h
287

ture -> true

junbuml updated this revision to Diff 36551.Oct 5 2015, 2:22 PM
junbuml edited edge metadata.

Just minor cleaning. Please let me know any comment.

Is there any comment on this change? Note that with change, I observed about 6% performance improvement in spec2006/xalancbmk. I will address any comment to get this in.

rsmith added a subscriber: rsmith.Oct 7 2015, 1:18 PM

This seems like something that's much better handled by the inliner. We should presumably treat all the code that leads up to the throw as being cold, not just the invocation of the throw-expression itself, and it seems like it should be straightforward for the inliner to check whether a call invariably leads to an unreachable or an unwind.

Also, blindly marking the calls as noinline doesn't seem like a good idea at all. This should just factor into the inline cost heuristics, and shouldn't stop us inlining in cases where doing so is obviously going to cause a code size reduction.

Thanks Richard for the comment.

Initially, I intended to implement this in inliner by checking if a callsite is in exception handling regions. However, I decided not to implement this in inliner because this kind of check should be performed for all callsites if we implement it in inliner.

Instead of directly adding complexity in inliner, I implemented this in PruneEH.cpp (D12979) because this is very specific to exception handling regions. In this patch, I tried to mark all callsites invoked from throw statements as cold (noinline) by looking up users of cxa_throw() and cxa_allocate_exception(). We had many discussions and finally ended up to implement the same thing in clang to be more general and simpler as Hal suggested in D12979.

As you point out, it should be done by influencing inline cost heurisic, so I believe Attribute::Cold is the right attribute to be added here. However, as I FIXMEed, the current ColdThreshold is not tuned yet (r200898). So for now, I add both cold and noinline.

Regarding code size, I believe not inlining contractor calls in throw statements could be potentially more helpful for code size in many cases. If inlining very small callsites in throw statements could be issue, then we may be able to check if callee is smaller than some threshold to avoid adding the attributes (cold and noinline).

reames resigned from this revision.Oct 8 2015, 10:29 AM
reames removed a reviewer: reames.

Just want to hear if there is any opinion about moving this implementation back to PrunceEH.cpp like D12979 and adding the minimum callee size check to avoid marking noinlines on trivial constrictor calls. Please let me know any objection or suggestion.

I just want to ping one more time to see if there is any objection about the basic idea of this change. If the basic idea itself is acceptable, then I want to find the best way to get idea in.

Please let me know any new suggestion or any opinion about moving this implementation back to backend (maybe in PrunceEH.cpp) with the minimum callee size check to avoid blindly marking noinlines on all callsites. I will be happy to hear any opinion.

hfinkel edited edge metadata.Oct 27 2015, 2:03 PM

I just want to ping one more time to see if there is any objection about the basic idea of this change. If the basic idea itself is acceptable, then I want to find the best way to get idea in.

Please let me know any new suggestion or any opinion about moving this implementation back to backend (maybe in PrunceEH.cpp) with the minimum callee size check to avoid blindly marking noinlines on all callsites. I will be happy to hear any opinion.

It seems like we want two things here:

  1. Mark the exception-handling intrinsics as cold
  2. Have the inliner not inline things in cold regions

What is preventing us from doing this now?

Did you mean to add the Cold in the exception handling region itself instead of callsite in throw statements ?

Did you mean to add the Cold in the exception handling region itself instead of callsite in throw statements ?

We already have BranchProbabilityInfo::calcColdCallHeuristics, and so adding it to the callsite should be sufficient. Existing infrastructure should take care of the rest.

I see what you mean. Now, I doubt if BranchProbabilityInfo::calcColdCallHeuristics() set the Cold before inliner. As far as I check, BranchProbabilityInfo is executed after inliner.

Another issue is that even if we can add the Cold in callsites in exception handling regions before inliner, the default ColdThreshold (225) in the inliner is still not tuned (r200898), so I cannot see the expected performance improvement only with the Cold.

If we don't have any plan to tune the ColdThreshold in near future, we may need to use other ways to conservatively inline in exception handling regions. So, for example, in this patch I simply added both Cold and NoInline.

The basic idea of this change is to avoid inlining callsites invoked in exception handling regions (EHR) so that we can reduce code size blow-up in very cold regions and indirectly increase inline opportunities for functions containing exception handling code.

I think the best way to implement this is influencing to the inline heuristic by smoothly decreasing the threshold for the callsites in EHR. From this perspective, Attribute::Cold seems to be the right attribute to be added. However, the current ColdThreshold is still not tuned yet (r200898) and too high to conservatively perform inlining in EHR.

As a work-around, in my current implementation, both Cold and NoInline are added to the callsites. I understand that the NoInline is somewhat strong attribute to be added, but I don't think there is any negative impact on performance unless the execution logic really depends on exception handling flows, which is rare. The only downsides I can think of is the case where the callee is very small so that inlining it is profitable for size, but the impact must be minor.

If using NoInline is too strong to use, another work-around could be introducing a new attribute something like "ColdInEHR", and then we decrease the inline threshold for the callsites marked with this attribute. OptSizeThreshold(75) could be considered to be a candidate for the default threshold.

Unless there is a strong objection about the basic idea, I want to move forward and close this issue as soon as possible. Please let me know any opinion.

@chandlerc: Adding Chandler in case he has an opinion on how to move forward or how we could go about tuning the cold threshold.

rsmith added a comment.Nov 3 2015, 1:34 PM

I am not convinced that it's reasonable to put inlining heuristics into clang's IR generation. This will cause maintenance problems for the inliner in the future (anyone tuning the inlining heuristics and thresholds will need to be aware of this, and clang will behave differently from other frontends in this regard).

I don't really see a need for clang's IR to change in order to support the inliner in this regard -- the inliner can already determine that a call to a noreturn function can only exit via unwinding, and can already determine which code is only reachable via landingpads. That seems like enough information for all the cases you're addressing here. (If I remember correctly, we already have a heuristic that treats calls followed by unreachable as being cold; perhaps we could extend that to also look for invokes that branch to unreachable, if it doesn't already handle that case.) If we handle this at that level, we'll also handle cases such as a user-defined function whose purpose is to throw an exception, and we'll treat the code leading to the throw the same regardless of whether it's within the operand of the throw-expression.

noinline seems like too strong of a constraint for this case. There are calls that we really want to inline, even if they occur in cold regions (because doing so will reduce code size), and calls that we must inline for correctness (calls to always_inline functions). If the cold attribute isn't working properly, that's an inliner bug and should be fixed in the inliner, not worked around here.

If you want to change the inlining heuristics to make a different choice here, it seems to me that you should make that change in the inliner.

Thanks Richard for your comment !

If the frond-end is not a good to place for this, I think there are two places we can consider : inliner or prune-eh.

  1. In inliner, we can directly check if a CallSite branches an exception region, and then make getInlineThreshold() return a lower threshold.
  2. If we want to avoid adding the additional check in inliner, we can move back to PruneEH.cpp(D12979). If NoInline is too strong to use, then I want to suggest to introduce a new attribute and allow ininliner to check the new attribute and decide lower inline threshold for callsites in exception handling regions.

Please let me know any opinion or any better suggestion.

Inliner currently does not include analysis passes such as BPI and BFI. With the new pass manager, we should be able to hook up inliner with BFI (and we can handle throw in BFI). Before that, maybe we can add this as part of inlining analysis?

If we want to add a check for CallSites in EHRs in inliner, we may be able to borrow things done in BranchProbabilityInfo::calcColdCallHeuristics, but for exception handing intrinsics, not for cold, and make getInlineThreshold() return a lower threshold so that we can be conservative for calls in EHR.

Considering that inliner will be hooked with BPI / BFI with the new pass manager, as Hal mentioned above we may need to mark the exception handling intrinsics as cold so that we can allow BranchProbabilityInfo::calcColdCallHeuristics to set weight properly for blocks that branch to exception regions. I believe we can do this at prune-eh if front-end don’t do that, but it could be done later.

Just ping to see if there is any objection about adding the extra check for CallSites in EHRs in inliner. I will be happy to hear any opinion, suggestion, or objection.

junbuml abandoned this revision.Jan 12 2016, 8:03 AM

Abandon this based the last comment in D15289.