This is an archive of the discontinued LLVM Phabricator instance.

Extended support for native Windows C++ EH outlining
ClosedPublic

Authored by andrew.w.kaylor on Feb 25 2015, 12:36 PM.

Details

Summary

This patch adds landing block pre-analysis support, catch and cleanup handler pruning, shared and nested handler support and landing pad body replacement (with a preliminary form of the yet to be finalized llvm.eh.actions intrinsic).

I'd be happy to peel off parts of this for smaller change sets, but I think this is more or less ready for a proper review.

Diff Detail

Repository
rL LLVM

Event Timeline

andrew.w.kaylor retitled this revision from to Work-in-progress omnibus patch for native Windows C++ EH outlining.
andrew.w.kaylor updated this object.
andrew.w.kaylor edited the test plan for this revision. (Show Details)
andrew.w.kaylor added reviewers: rnk, majnemer, ABataev.
andrew.w.kaylor set the repository for this revision to rL LLVM.
andrew.w.kaylor added a subscriber: Unknown Object (MLST).
majnemer added inline comments.Feb 25 2015, 1:36 PM
lib/CodeGen/WinEHPrepare.cpp
92

I'd typedef SmallSet<BasicBlock *, 4> in case we'd like to change it later.

1452–1453

Please fold these lines together.

1470

Please reformat this.

1500–1503

Would this be simpler if you initialized your iterator to the instruction at BB->getFirstNonPHIOrDbg()?

1560–1561

Same line please.

majnemer added inline comments.Feb 25 2015, 1:36 PM
lib/CodeGen/WinEHPrepare.cpp
244–246

This can just be DeleteContainerPointers(Actions).

257

This can be a const method.

259–260

Can you use SmallVectorImpl instead of SmallVector<ActionHandler *, 4> here?

499–500

Please stick the else on the same line as the }

585–587

Please consistently use braces here.

588–589

This can be simplified to just auto *ParentInst = cast<Instruction>(ParentVal);

600–601

Same line please.

625

Is it ever OK for AllocaInsertPt to be null? If not, use cast instead of dyn_cast.

705–714

I'm guessing we are not going to want to always give them ExternalLinkage but I guess we can fix that later. Perhaps we should have a TODO?

820–822

Might be a little easier to read as:

if (!Store)
  continue;
1029–1030

This can simply be:

if (isa<LandingPadInst>(&Inst))
  continue;
1053–1054

Please fold these two lines together.

1071–1073

This can simply be:

if (match(&Inst, m_Intrinsic<Intrinsic::eh_endcatch>()))
  continue;
1075–1076

Please fold these two lines together.

1096–1097

Please fold these lines together.

1113–1114

Please fold these lines together.

1341–1342

Please fold these lines together.

1361–1362

Please fold these lines together.

1371–1372

You can use cast instead of dyn_cast here.

1383–1384

Please fold these lines together.

1402–1406

Please format this with clang-format.

1416

Please format this with clang-format.

lib/CodeGen/WinEHPrepare.cpp
705–714

That's a good point. I'm not sure I know all the implications of the various linkage options, but it sounds like PrivateLinkage is probably what we want here.

rnk edited edge metadata.Feb 25 2015, 3:03 PM

Some early feedback, I still need to look at mapLandingPadBlocks.

It looks like your analysis runs once per landing pad, so chained EH actions will still get visited twice. We might want to consider an up-front analysis on the whole function.

include/llvm/Transforms/Utils/Cloning.h
148–149

Can SkipInstruction do this also?

lib/CodeGen/WinEHPrepare.cpp
443–457

This can be just removeUnreachableBlocks(F). If we need more cleanup power, we can consider making simplifyFunctionCFG from lib/Transforms/Scalar/SimplifyCFGPass.cpp a utility.

1003–1009

Based on the comment, I think SimplifyCFG knows how to do this already. If you run SimplifyCFG() on each landing pad BB, you'll get the same behavior with none of this code.

You may need to run some other cleanup utility on the block first, though.

1317

Looks like you did that. :)

include/llvm/Transforms/Utils/Cloning.h
148–149

I actually originally coded it that way, but it seemed to me like I was overloading the meaning of SkipInstruction too much. I don't have strong feelings about it either way.

lib/CodeGen/WinEHPrepare.cpp
443–457

Sounds good. By the way, I forgot to add a comment mentioning this, but this is currently deleting too much because the indirect branch targets aren't filled in in the revised landing pads.

1003–1009

I saw the SimplifyResume function in SimplifyCFG. It wouldn't work as is here because it only handles resumes in the same block as the landing pad instruction. I considered moving some form of the logic here into SimplifyResume, but I'm not sure it really belongs there.

The case this is meant to handle looks like this:

catch14:                                          ; preds = %catch.dispatch11
  %exn15 = load i8** %exn.slot
  %18 = call i8* @llvm.eh.begincatch(i8* %exn15) #4
  %19 = bitcast i8* %18 to float*
  %20 = load float* %19, align 4
  store float %20, float* %f, align 4
  %21 = load float* %f, align 4
  invoke void @_Z12handle_floatf(float %21)
          to label %invoke.cont17 unwind label %lpad16

lpad16:                                           ; preds = %catch14
  %22 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
  %23 = extractvalue { i8*, i32 } %22, 0
  store i8* %23, i8** %exn.slot
  %24 = extractvalue { i8*, i32 } %22, 1
  store i32 %24, i32* %ehselector.slot
  call void @llvm.eh.endcatch() #4
  br label %eh.resume

eh.resume:                                        ; preds = %lpad16, %catch.dispatch11
  %exn20 = load i8** %exn.slot
  %sel21 = load i32* %ehselector.slot
  %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn20, 0
  %lpad.val22 = insertvalue { i8*, i32 } %lpad.val, i32 %sel21, 1
  resume { i8*, i32 } %lpad.val22

This is the landing pad for an exception inside a catch handler, but in this case the landing pad doesn't do anything except continue the search. Since that happens automatically, we can just change the invoke in the catch handler to a call. But there are all kinds of things here that SimplifyResume doesn't know anything about.

There are two things here that make this not a candidate for simplification in SimplifyCFG. First, the fact that the endcatch call can be eliminated is very Windows specific. Second, the actual resume block is shared between two landing pads.

Arguably, we could teach SimplifyResume to identify the situations in which llvm.eh.endcatch can be dropped (currently always, but in the future maybe not). We'd also need to teach it how to consider multiple predecessors and decide which ones it can simplify.

majnemer added inline comments.Feb 25 2015, 3:39 PM
lib/CodeGen/WinEHPrepare.cpp
705–714

InternalLinkage is a bit nicer, it allows us to remove it if it becomes transitively unreferenced at link-time.

rnk added inline comments.Feb 25 2015, 3:52 PM
lib/CodeGen/WinEHPrepare.cpp
1003–1009

OK, this is specifically getting rid of unneeded llvm.eh.endcatch EH cleanups. Can we suppress this call to llvm.eh.endcatch in clang when using CxxFrameHandler3, or do we need it to stop outlining? If it doesn't help EH preparation, then it doesn't have much use.

lib/CodeGen/WinEHPrepare.cpp
1003–1009

Yeah, I guess you could say this is getting ride of the call to llvm.eh.endcatch since that's the only thing here that isn't scaffolding, but it also gets rid of the landing pad and turns an invoke into a call. The EH preparation doesn't need any of that.

I'd definitely like to have clang not omit code like this, but I think it still can turn out this way as a result of later optimizations. For instance, if there's a destructor to be called, clang would need to emit the invoke and cleanup, but if the object being destructed or even just the call to the destructor later gets eliminated then we're back in this situation, so I think WinEHPrepare needs to know how to do this.

rnk added a comment.Feb 25 2015, 4:18 PM

I'll take another look at this once the cleanup patch lands, since the diff will shrink a bit.

lib/CodeGen/WinEHPrepare.cpp
1003–1009

If some pass introduces a landingpad like this and simplifycfg doesn't zap it, that sounds like a phase-ordering bug in the optimizer. Also, once we take llvm.eh.endcatch out of the picture, we can just call SimplifyCFG to do this cleanup for us.

lib/CodeGen/WinEHPrepare.cpp
1003–1009

Unless there's something in SimplifyCFG other than SimplifyResume that would do this, we'd need to enhance SimplifyResume as I describe above. Currently it won't simplify if it finds anything at all other than debug info intrinsics between the landingpad and the resume.

Also, it just occurred to me that you were probably asking if llvm.eh.endcatch can be completely eliminated in all cases. I think I'd have to say "no" to that. I'm relying on either llvm.eh.endcatch or a resume instruction to mark the end of handler code. Without llvm.eh.endcatch we can just branch into normal code with no reliable way to tell we're done.

andrew.w.kaylor retitled this revision from Work-in-progress omnibus patch for native Windows C++ EH outlining to Extended support for native Windows C++ EH outlining.
andrew.w.kaylor updated this object.
andrew.w.kaylor edited the test plan for this revision. (Show Details)
andrew.w.kaylor edited edge metadata.

Rebased to incorporate the recently committed patches.
Refactored to eliminate duplicate code.
Updated the landingpad replacement to properly preserve exception argument variables and indirect branch targets.
Added a mapping scheme to eliminate duplicate parsing of blocks shared between landing pads.

rnk added a comment.Mar 5 2015, 2:29 PM

Some thoughts. I'll try to write up a more precise proposal for llvm.eh.actions.

lib/CodeGen/WinEHPrepare.cpp
68–71

You should do this and call .clear() at the end of runOnFunction(). My understanding is that FunctionPasses aren't recreated before and after each function.

125–130

Can you throw in a comment about how normally we only ever see at most one of these per landingpad?

402

Maybe do:

if (!HandlersOutlined)
  continue;
409–411

Would you say this was the "EH state" that we talked about, or something else?

439–447

What happens if these instructions still have uses? I think we'll assert on this code for example:

lpad:
  %ehvals = landingpad
  %x = load i32* %x.ptr
  ... ; normal EH dispatch
catch.int:
  store i32 %x, i32* %x.ptr

I think a cleaner way to do this would be:

  1. create a new basic block for the llvm.eh.actions call
  2. assert that there are no phi nodes on the old landingpad block (EH preparation should've demoted them)
  3. clone the LandingPadInst from the old one into the new one
  4. set the unwind edge of all incoming invokes to the new landingpad
  5. old landingpad is now trivially unreachable, allow removeUnreachableBlocks to do its thing

The advantage of this approach is that you never have to try to update the def-use graph.

566–607

Can you help me understand this case? Is it something like this?

void f() {
  HasCleanup o;
  int x = g(); // invoke value used in catch
  try {
    g();
  } catch (int) {
    use_x(x);
  }
}

I wonder if this is a bug in DemoteRegToStack that can be shown with opt -reg2mem.

649

Can this be simplified with PatternMatch::match()?

769–772

You might want StartBB->getFirstInsertionPt(). Also, consider this test case:

void use_it(int);
void g();
void f() {
  int g_calls = 0;
  try {
    g(); g_calls++;
    g(); g_calls++;
    g(); g_calls++;
  } catch (int) {
    use_it(g_calls);
  }
}

Using -O2, we get this interesting landingpad:

lpad:                                             ; preds = %invoke.cont1, %invoke.cont, %entry
  %g_calls.0 = phi i32 [ 2, %invoke.cont1 ], [ 1, %invoke.cont ], [ 0, %entry ]
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          catch i8* bitcast (%rtti.TypeDescriptor2* @"\01??_R0H@8" to i8*)

We need to make sure that phi gets demoted. We can probably do something like assert(StartBB->begin() == StartBB()->getFirstNonPhi());

882

These two functions could be refactored to be parameterized over the vector of stores, either EHPtrStores of SelectorStores.

893–894

It's kind of amusing that the ehptr hasn't been used for anything yet... I wonder if there's any way to get MSVC to use the second argument to the catch handler.

919–926

This seems like an unexpected side effect for a function called "isEHPtrLoad".

1243

I'm OK doing this for now, this isn't true in general, so we should put a FIXME here. Consider something horrible like this:

void might_throw();
struct MyDtor {
  __forceinline ~MyDtor() {
    try { might_throw(); } catch (int) {}
  }
};
void f() {
  MyDtor o;
  might_throw();
}

After inlining, f will have a cleanup with an invoke in it.

1248

I still don't think we need this. I'd rather hold off on committing this and experiment with making the llvm.eh.endcatch() cleanup non-exceptional.

test/CodeGen/WinEH/cppeh-inalloca.ll
73 ↗(On Diff #21234)

Negative checks should generally try to be overly general, since small changes in the output can easily leave behind an extractvalue instruction that isn't named '%2', for example. In this case, I'd go for:

; CHECK-NOT: extractvalue { i8*, i32 }
; CHECK-NOT: br label

This basically says, there shouldn't be extractvalue or branches before we hit the llvm.eh.actions call.

test/CodeGen/WinEH/cppeh-nested-2.ll
50–51 ↗(On Diff #21234)

Should be gone as of this morning

I'll rebase to pull in your latest changes and clean this up as you've suggested.

So, do you think it's worth pushing this forward on this review, or would you rather I peel off a piece or two?

lib/CodeGen/WinEHPrepare.cpp
68–71

Yeah, I thought I had a reason for keeping the actions around, but after thinking about it some more, that isn't necessary.

125–130

Sure.

402

OK

409–411

Yes, this is EH state. I'll fix the name in the comment here.

439–447

Sounds good.

566–607

No, case you cite here doesn't do this.

The case this is meant to handle shows up in the cppeh-nonalloca-frame-values.ll test. Basically, any time a temporary value is computed in the normal code but only referenced in the catch code this happens. The source code for that test looks like this:

void test() {
  int NumExceptions = 0;
  int ExceptionVal[10];
  SomeData Data = { 0, 0 };

  for (int i = 0; i < 10; ++i) {
    try {
      may_throw();
      Data.a += i;
    }
    catch (int e) {
      ExceptionVal[NumExceptions] = e;
      ++NumExceptions;
      if (e == i)
        Data.b += e;
      else
        Data.a += e;
    }
    does_not_throw(NumExceptions);
  }
  dump(ExceptionVal, NumExceptions, Data);
}

The when this is compiled with optimization enabled, both i32 members of Data get initialized with a single i64 store. There's a GEP instruction in the entry block that gets the address of the 'b' memebr as %b, but %b is only used in the catch handling code.

What DemoteRegToStack is doing seems correct to me. If the value to be demoted has no uses then it just erases it and returns NULL. The trouble is caused by the fact that I'm doing this demotion after the catch handler code has been pruned. The only reference to the value was in the catch handler and the outlined handler is using a temporary alloca which is at this point waiting to be replaced by a framerecover call.

I think the right solution in this case is to sink the "computed" value into the catch handler. That could get a bit tricky in cases (probably very common) where values needed to by the value to be sunk aren't otherwise referenced in the handler, but I expect we can find a reasonable way to do it.

For instance in the case cited above, %b looks like this:

%b = getelementptr inbounds %struct.SomeData, %struct.SomeData* %tmpcast, i64 0, i32 1

Nothing in the catch handler references %tmpcast, so we'd need to pull it in. Right now we're creating a temporary alloca for %b in the value materializer and translating that to a framerecover call. I think if we recognize that %b is a non-alloca with no uses outside the catch handlers (I see this having the potential to go south here, but it would be easy enough for this case), then the materializer can create a temporary alloca for %tmpcast instead (if that hasn't been done already) and clone the %b instruction.

649

I'll look into that.

882

OK

893–894

Hard to say. It looks like it's just there to help out the debugger.

919–926

Yeah, I should probably rename it. I'm pretty sure I've seen this kind of thing happening often in code generated with optimizations disabled.

1243

I believe the MS runtime will terminate the process if an exception occurs in a cleanup handler. So I think we'd need to ignore the __forceinline attribute in that case.

1248

I'm fine with holding it back for now.

test/CodeGen/WinEH/cppeh-inalloca.ll
73 ↗(On Diff #21234)

Yeah, I was thinking about this yesterday as I was having to renumber temporary variables in some of the positive checks. These tests are probably going to be very brittle the way I've written them. At some point I want to go through and generalize them in a lot of ways. For now, I can certainly trim down the negative checks as you suggest.

test/CodeGen/WinEH/cppeh-nested-2.ll
50–51 ↗(On Diff #21234)

Woo hoo!

Incorporated feedback from previous review
Rebased to merge with latest trunk code
Abstracted value names in tests

rnk added a comment.Mar 10 2015, 6:16 PM

I think this is pretty close.

lib/CodeGen/WinEHPrepare.cpp
1620–1621

I think if we can resolve this, it'll greatly simplify the code and reduce the amount of repeated selector analysis.

1788

Does this do anything?

1805–1808

Don't we need to analyze the condition more to know which BB is the fallthrough side?

1822–1826

This seems like a lot of work, we end up mapping the same landingpads over and over in inner loops. Maybe we should have a map for each landing pad and just look them up inside the cloning directors?

1840–1842

I feel like this repeated stanza could be simplified to something like:

return createCleanupHandler(CleanupHandlerMap, BB);

void createCleanupHandler(CleanupHandlerMapTy &Map, BasicBlock *BB) {
  auto *Action = new CleanupHandler(BB);
  Map[BB] = Action;
  return Action;
}
lib/CodeGen/WinEHPrepare.cpp
1620–1621

Actually I think what I meant by this comment is already accomplished by the new handling of the catch and cleanup maps.

1788

No. I have no idea why that's there.

1805–1808

I don't think so. I believe earlier analysis of this block has determined that the block ends with a branch that is either unconditional or is predicated on the comparison of the result of an llvm.eh.typeid.for call and that the predicate is either ICMP_EQ or ICMP_NE, so just looking at the predicate tells us which successor corresponds to a selector match.

If the predicate is ICMP_EQ, then successor 0 is the catch block and successor 1 is the fall through, otherwise the reverse.

I can add some more comments to explain this. I can probably also clean this up a bit with PatternMatch.

1822–1826

You're right. There's no reason to create more than one LandingPadMapper for each landing pad.

1840–1842

Sure. We can do that.

Simplified cleanup handler mapping
Eliminated duplicate landing pad instruction mapping

rnk accepted this revision.Mar 11 2015, 3:40 PM
rnk edited edge metadata.

David said he'd take a look, but this looks good to me.

I want to patch it in and see if I can parameterize this a bit to work for SEH __finally block outlining. Exciting. :)

test/CodeGen/WinEH/cppeh-nonalloca-frame-values.ll
236 ↗(On Diff #21775)

FileCheck actually doesn't care about horizontal whitespace. A run of spaces and tabs will match any other run of spaces and tabs.

This revision is now accepted and ready to land.Mar 11 2015, 3:40 PM
majnemer accepted this revision.Mar 11 2015, 3:41 PM
majnemer edited edge metadata.

LGTM

This change set was implemented in r231981.