This is an archive of the discontinued LLVM Phabricator instance.

Avoid inlining in exception handling context
AbandonedPublic

Authored by junbuml on Sep 18 2015, 12:44 PM.

Details

Summary

It might be reasonably to avoid inlining CallSites invoked in
exception handling context so that we can reduce code size blow-up in
EH regions as well as indirectly increase inline opportunites for unwinding
functions containing exception handling code.

In this change, the NoInline attribute is added in CallSites
invoked specifically by the throw statement.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 35124.Sep 18 2015, 12:44 PM
junbuml retitled this revision from to Avoid inlining in exception handling context.
junbuml updated this object.
hfinkel added inline comments.
lib/Transforms/IPO/PruneEH.cpp
326 ↗(On Diff #35124)

There's no need to require the bitcast here. What if you actually want an i8* as the exception type? You should just call stripPointerCasts instead.

junbuml updated this revision to Diff 35294.Sep 21 2015, 12:47 PM
junbuml marked an inline comment as done.

Addressed comments from Hal and make this check only once per module.

hfinkel added inline comments.Sep 23 2015, 7:27 AM
lib/Transforms/IPO/PruneEH.cpp
189 ↗(On Diff #35294)

unwind -> unwinds

190 ↗(On Diff #35294)

mark the -> mark them with the

193 ↗(On Diff #35294)

You should MarkNoInlineInEHR = true; here instead of in AddNoinlineForCSInEH (no need to do the CGSCC traversal here to call functions that don't do anything).

Also, to maintain the general pass model, I think you need to override the pass's 'virtual bool doFinalization(Module &)' function and reset the MarkNoInlineInEHR flag to false.

junbuml updated this revision to Diff 35503.Sep 23 2015, 8:15 AM
junbuml updated this object.
junbuml marked 3 inline comments as done.

Addressed Philip's comments.
Thanks Philip for the review.

Addressed comments from Hal.
Thanks Hal for the review!

lib/Transforms/IPO/PruneEH.cpp
326 ↗(On Diff #35124)

Thanks Hal for the review.
I refactor this part into a separate function to trace bitcast from __cxa_allocate_exception().

majnemer added inline comments.
lib/Transforms/IPO/PruneEH.cpp
47 ↗(On Diff #35503)

Please do not have a space between the variable and the parenthesis.

313 ↗(On Diff #35503)

You can use auto *BCInst to avoid repetition.

313–315 ↗(On Diff #35503)

Please use consistent bracing.

junbuml updated this revision to Diff 35516.Sep 23 2015, 9:07 AM
junbuml added reviewers: majnemer, hfinkel.
junbuml marked 3 inline comments as done.

Addressed comments from David.
Thanks David for the review !

hfinkel added inline comments.Sep 23 2015, 9:16 AM
lib/Transforms/IPO/PruneEH.cpp
79 ↗(On Diff #35516)

As I understand it, you want this flag to apply per-module, not per run over the call graph. So you need to do this in 'bool doFinalization(Module &)', not in 'doFinalization(CallGraph &CG)' which is called at the end of the call graph's runOnModule function. If there's no functional difference, then you don't need the flag at all.

majnemer edited edge metadata.Sep 23 2015, 9:16 AM

Out of curiosity, why not handle this as an inline cost heuristic? I imagine we'd want to inline very trivial constructors.

lib/Transforms/IPO/PruneEH.cpp
308 ↗(On Diff #35516)

NoInlie -> NoInline

358 ↗(On Diff #35516)

Likewise, you can use auto * here.

363 ↗(On Diff #35516)

And here.

junbuml updated this revision to Diff 35534.Sep 23 2015, 11:35 AM
junbuml edited edge metadata.
junbuml marked 3 inline comments as done.

I agree that using the NoInline attribute will prevent even a trivial constructor from being inlined. Initially, I intended to use Attribute::Cold, but by default, Attribute::Cold doesn't seem to have any impact in inline heuristic because the current ColdThreshold is not tuned yet (r200898).

In my opinion, considering that CallSites in exception handling region is very cold is not unreasonable even without PGO, and avoiding inlining in exception handling region may not have significant impacts on performance unless a program execution logic is really depend on exception handling flows.

For now, I can add FIXME mentioning that Attribute::Cold could be used when the inline cold-threshold is tune. We could continue discussing any other options if using NoInline attribute is to strong to use here.

junbuml added inline comments.Sep 23 2015, 11:36 AM
lib/Transforms/IPO/PruneEH.cpp
79 ↗(On Diff #35516)

As you mention, doFinalization(CallGraph &CG) is called at the end of runOnModule() of CGPassManager. Since doFinalization(CallGraph &CG) is called after all SCCs in the module has been processed, I believe this is called once per module.

hfinkel edited edge metadata.Sep 24 2015, 2:59 AM

For now, I can add FIXME mentioning that Attribute::Cold could be used when the inline cold-threshold is tune.

Please do. Actually, please add both attributes for now and make the FIXME to just use 'cold'. Even if the attribute does not do the right thing in the inliner, it does have the correct affect on MachineBlockPlacement, etc.

include/llvm/IR/CallSite.h
274 ↗(On Diff #35534)

Why are you changing this?

lib/Transforms/IPO/PruneEH.cpp
79 ↗(On Diff #35534)

Indeed, I agree.

junbuml updated this revision to Diff 35657.Sep 24 2015, 11:18 AM
junbuml edited edge metadata.
junbuml marked 5 inline comments as done.

Addressed Hal's comments

  • Add FIXME mentioning that Attribute::NoInline should be removed when the inline cold-threshold is tuned.
  • Add both Attribute::NoInline and Attribute::Cold in CallSites for now.
junbuml added inline comments.Sep 24 2015, 11:26 AM
include/llvm/IR/CallSite.h
274 ↗(On Diff #35534)

Both CallInst and InvokeInst have only setIsNoInline(); there is no setIsNoInline(bool) in both CallInst and InvokeInst. I believe it was bug.

hfinkel added inline comments.Sep 25 2015, 2:49 PM
lib/Transforms/IPO/PruneEH.cpp
203 ↗(On Diff #35657)

Why do you 'break;' here? If you do that, you'll only visit the first function in each call graph.

I think you want to have:

if (!MarkColdInEHR && SCCMightUnwind) {
  MarkColdInEHR = true;
   for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I) {
     Function *F = (*I)->getFunction();
     if (!F)
       continue;

     AddColdForCSInEH(F);
  }
}
junbuml added inline comments.Sep 25 2015, 9:21 PM
lib/Transforms/IPO/PruneEH.cpp
203 ↗(On Diff #35657)

As I comment above, AddColdForCSInEH() does not need to be performed multiple times for each function. Since it handle all CallSites for __cxa_throw() in a module, it should be called only once per module. That's also why I have MarkColdInEHR .

hfinkel added inline comments.Sep 25 2015, 9:33 PM
lib/Transforms/IPO/PruneEH.cpp
203 ↗(On Diff #35657)

OIC, you don't really want to iterate over the call graph here at all; you're just trying to get the first function, any function, so that you can call getParent() to get the module.

Don't do that, just call CG.getModule().

Yes, right.
Okay I will use CG.getModule(). Thanks Hal!

junbuml updated this revision to Diff 35793.Sep 25 2015, 9:59 PM
hfinkel added inline comments.Sep 25 2015, 10:14 PM
lib/Transforms/IPO/PruneEH.cpp
362 ↗(On Diff #35793)

Why are you looking for users of cxa_throw? Can we make the assumption that all user of cxa_allocate_exception are essentially cold, and just look for the users of return values of that function?

junbuml marked 3 inline comments as done.Sep 25 2015, 10:39 PM
junbuml added inline comments.
lib/Transforms/IPO/PruneEH.cpp
362 ↗(On Diff #35793)

I'm trying to find all actual calls for __cxa_throw() by looking at the users of FnThrowE where an exception is thrown.

Here, I specifically check the exception allocated to be thrown (__cxa_throw()) in throw statement, which should be cold unless the program logic highly rely on throw statement.

When a exception handling class is created in throw statement,
the constructor must take the exception used in __cxa_throw() as its first parameter.

Please correct me if I miss something.

hfinkel added inline comments.Sep 28 2015, 1:31 PM
lib/Transforms/IPO/PruneEH.cpp
362 ↗(On Diff #35793)

I understand what you're doing, but my question is this: Is the call to cxa_allocate_exception itself cold? If so, perhaps that's all we need to find. Maybe looking for the cxa_throw calls is an unnecessary complication.

junbuml added inline comments.Sep 28 2015, 1:54 PM
lib/Transforms/IPO/PruneEH.cpp
362 ↗(On Diff #35793)

I believe cxa_allocate_exception() itself must be cold in almost all normal c++ program. But, I just wanted to make sure the exception allocated is actually used for throwing. If checking cxa_throw() doesn't make it too complicated, I believe there is no harm with it. I can also leave FIXME about it. Please let me know your thought!

hfinkel added inline comments.Sep 28 2015, 2:01 PM
lib/Transforms/IPO/PruneEH.cpp
362 ↗(On Diff #35793)

I believe cxa_allocate_exception() itself must be cold in almost all normal c++ program. But, I just wanted to make sure the exception allocated is actually used for throwing. If checking cxa_throw() doesn't make it too complicated, I believe there is no harm with it. I can also leave FIXME about it. Please let me know your thought!

Okay.

Let me also ask something I should have asked in the very beginning: Why are you doing this here and not in Clang? It seems that, in Clang, it would be simple to have a piece of state that records whether we're in a throw statement or a catch block, and marks the call sites appropriately.

junbuml added inline comments.Sep 28 2015, 2:21 PM
lib/Transforms/IPO/PruneEH.cpp
362 ↗(On Diff #35793)

I just wanted to place it somewhere exception handling specific pass, instead of directly touching inliner. If you believe Clang (ItaniumCXXABI.cpp ?) is better place I can do the same thing in there.

hfinkel added inline comments.Sep 28 2015, 2:43 PM
lib/Transforms/IPO/PruneEH.cpp
362 ↗(On Diff #35793)

This might be really off base, but here's what I was thinking about last night...

We're trying to mark all callsites within throw statements (and catch blocks) as cold (noinline). This seems analogous to me to how we handle setting instruction fast-math flags, where we give the IRBuilder a default state, and change it based on scope information.

  1. Give IRBuilder a new default state argument (along with DefaultFPMathTag/FMF) to indicate that all calls are cold (noinline).
  2. In Clang, update CodeGenFunction::EmitCXXThrowExpr in lib/CodeGen/CGException.cpp to set/reset this default IRBuilder state around the rest of what it does.
  3. To make the catch regions also have cold callsites, edit CodeGenFunction::ExitCXXTryStmt, and set/reset the default IRBuilder state around the loop over Handlers.

This seems like it might be both more general and simpler.

Thanks Hal for the suggestion. I will take a look at this approach and get back to you soon.

junbuml updated this revision to Diff 36107.Sep 30 2015, 8:19 AM
junbuml edited subscribers, added: cfe-commits; removed: llvm-commits.

In this clang change, I added a state flag (bool IsColdRegion) in CodeGenFunction and set/reset the flag in EmitCXXThrowExpr(). In EmitCall(), the NoInline attribute would be added if IsColdRegion is true. As of now, this change only handles throw statements because I don't have any data / test for catch blocks yet.

You should start a new differential for this, so that you can get a clean summary e-mail with a description sent to cfe-commits. There's some overlap, but you'll also get potentially-different reviewers here.

junbuml abandoned this revision.Sep 30 2015, 12:47 PM

Move this clang change in http://reviews.llvm.org/D13304