This is an archive of the discontinued LLVM Phabricator instance.

Extended support for native Windows C++ EH outlining
ClosedPublic

Authored by andrew.w.kaylor on Feb 25 2015, 12:36 PM.

Details

Summary

This patch adds landing block pre-analysis support, catch and cleanup handler pruning, shared and nested handler support and landing pad body replacement (with a preliminary form of the yet to be finalized llvm.eh.actions intrinsic).

I'd be happy to peel off parts of this for smaller change sets, but I think this is more or less ready for a proper review.

Diff Detail

Repository
rL LLVM

Event Timeline

andrew.w.kaylor retitled this revision from to Work-in-progress omnibus patch for native Windows C++ EH outlining.
andrew.w.kaylor updated this object.
andrew.w.kaylor edited the test plan for this revision. (Show Details)
andrew.w.kaylor added reviewers: rnk, majnemer, ABataev.
andrew.w.kaylor set the repository for this revision to rL LLVM.
andrew.w.kaylor added a subscriber: Unknown Object (MLST).
majnemer added inline comments.Feb 25 2015, 1:36 PM
lib/CodeGen/WinEHPrepare.cpp
92

I'd typedef SmallSet<BasicBlock *, 4> in case we'd like to change it later.

1118–1119

Please fold these lines together.

1136

Please reformat this.

1166–1169

Would this be simpler if you initialized your iterator to the instruction at BB->getFirstNonPHIOrDbg()?

1226–1227

Same line please.

majnemer added inline comments.Feb 25 2015, 1:36 PM
lib/CodeGen/WinEHPrepare.cpp
411–413

This can just be DeleteContainerPointers(Actions).

424

This can be a const method.

426–427

Can you use SmallVectorImpl instead of SmallVector<ActionHandler *, 4> here?

530–531

Please stick the else on the same line as the }

549–551

Please consistently use braces here.

552–553

This can be simplified to just auto *ParentInst = cast<Instruction>(ParentVal);

564–565

Same line please.

578

Is it ever OK for AllocaInsertPt to be null? If not, use cast instead of dyn_cast.

667–676

I'm guessing we are not going to want to always give them ExternalLinkage but I guess we can fix that later. Perhaps we should have a TODO?

762–764

Might be a little easier to read as:

if (!Store)
  continue;
924–925

This can simply be:

if (isa<LandingPadInst>(&Inst))
  continue;
948–949

Please fold these two lines together.

966–968

This can simply be:

if (match(&Inst, m_Intrinsic<Intrinsic::eh_endcatch>()))
  continue;
970–971

Please fold these two lines together.

991–992

Please fold these lines together.

1008–1009

Please fold these lines together.

1135–1136

Please fold these lines together.

1155–1156

Please fold these lines together.

1165–1166

You can use cast instead of dyn_cast here.

1177–1178

Please fold these lines together.

1196–1200

Please format this with clang-format.

1210

Please format this with clang-format.

lib/CodeGen/WinEHPrepare.cpp
667–676

That's a good point. I'm not sure I know all the implications of the various linkage options, but it sounds like PrivateLinkage is probably what we want here.

rnk edited edge metadata.Feb 25 2015, 3:03 PM

Some early feedback, I still need to look at mapLandingPadBlocks.

It looks like your analysis runs once per landing pad, so chained EH actions will still get visited twice. We might want to consider an up-front analysis on the whole function.

include/llvm/Transforms/Utils/Cloning.h
148–149

Can SkipInstruction do this also?

lib/CodeGen/WinEHPrepare.cpp
513–515

This can be just removeUnreachableBlocks(F). If we need more cleanup power, we can consider making simplifyFunctionCFG from lib/Transforms/Scalar/SimplifyCFGPass.cpp a utility.

898–904

Based on the comment, I think SimplifyCFG knows how to do this already. If you run SimplifyCFG() on each landing pad BB, you'll get the same behavior with none of this code.

You may need to run some other cleanup utility on the block first, though.

1111

Looks like you did that. :)

include/llvm/Transforms/Utils/Cloning.h
148–149

I actually originally coded it that way, but it seemed to me like I was overloading the meaning of SkipInstruction too much. I don't have strong feelings about it either way.

lib/CodeGen/WinEHPrepare.cpp
513–515

Sounds good. By the way, I forgot to add a comment mentioning this, but this is currently deleting too much because the indirect branch targets aren't filled in in the revised landing pads.

898–904

I saw the SimplifyResume function in SimplifyCFG. It wouldn't work as is here because it only handles resumes in the same block as the landing pad instruction. I considered moving some form of the logic here into SimplifyResume, but I'm not sure it really belongs there.

The case this is meant to handle looks like this:

catch14:                                          ; preds = %catch.dispatch11
  %exn15 = load i8** %exn.slot
  %18 = call i8* @llvm.eh.begincatch(i8* %exn15) #4
  %19 = bitcast i8* %18 to float*
  %20 = load float* %19, align 4
  store float %20, float* %f, align 4
  %21 = load float* %f, align 4
  invoke void @_Z12handle_floatf(float %21)
          to label %invoke.cont17 unwind label %lpad16

lpad16:                                           ; preds = %catch14
  %22 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
  %23 = extractvalue { i8*, i32 } %22, 0
  store i8* %23, i8** %exn.slot
  %24 = extractvalue { i8*, i32 } %22, 1
  store i32 %24, i32* %ehselector.slot
  call void @llvm.eh.endcatch() #4
  br label %eh.resume

eh.resume:                                        ; preds = %lpad16, %catch.dispatch11
  %exn20 = load i8** %exn.slot
  %sel21 = load i32* %ehselector.slot
  %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn20, 0
  %lpad.val22 = insertvalue { i8*, i32 } %lpad.val, i32 %sel21, 1
  resume { i8*, i32 } %lpad.val22

This is the landing pad for an exception inside a catch handler, but in this case the landing pad doesn't do anything except continue the search. Since that happens automatically, we can just change the invoke in the catch handler to a call. But there are all kinds of things here that SimplifyResume doesn't know anything about.

There are two things here that make this not a candidate for simplification in SimplifyCFG. First, the fact that the endcatch call can be eliminated is very Windows specific. Second, the actual resume block is shared between two landing pads.

Arguably, we could teach SimplifyResume to identify the situations in which llvm.eh.endcatch can be dropped (currently always, but in the future maybe not). We'd also need to teach it how to consider multiple predecessors and decide which ones it can simplify.

majnemer added inline comments.Feb 25 2015, 3:39 PM
lib/CodeGen/WinEHPrepare.cpp
667–676

InternalLinkage is a bit nicer, it allows us to remove it if it becomes transitively unreferenced at link-time.

rnk added inline comments.Feb 25 2015, 3:52 PM
lib/CodeGen/WinEHPrepare.cpp
898–904

OK, this is specifically getting rid of unneeded llvm.eh.endcatch EH cleanups. Can we suppress this call to llvm.eh.endcatch in clang when using CxxFrameHandler3, or do we need it to stop outlining? If it doesn't help EH preparation, then it doesn't have much use.

lib/CodeGen/WinEHPrepare.cpp
898–904

Yeah, I guess you could say this is getting ride of the call to llvm.eh.endcatch since that's the only thing here that isn't scaffolding, but it also gets rid of the landing pad and turns an invoke into a call. The EH preparation doesn't need any of that.

I'd definitely like to have clang not omit code like this, but I think it still can turn out this way as a result of later optimizations. For instance, if there's a destructor to be called, clang would need to emit the invoke and cleanup, but if the object being destructed or even just the call to the destructor later gets eliminated then we're back in this situation, so I think WinEHPrepare needs to know how to do this.

rnk added a comment.Feb 25 2015, 4:18 PM

I'll take another look at this once the cleanup patch lands, since the diff will shrink a bit.

lib/CodeGen/WinEHPrepare.cpp
898–904

If some pass introduces a landingpad like this and simplifycfg doesn't zap it, that sounds like a phase-ordering bug in the optimizer. Also, once we take llvm.eh.endcatch out of the picture, we can just call SimplifyCFG to do this cleanup for us.

lib/CodeGen/WinEHPrepare.cpp
898–904

Unless there's something in SimplifyCFG other than SimplifyResume that would do this, we'd need to enhance SimplifyResume as I describe above. Currently it won't simplify if it finds anything at all other than debug info intrinsics between the landingpad and the resume.

Also, it just occurred to me that you were probably asking if llvm.eh.endcatch can be completely eliminated in all cases. I think I'd have to say "no" to that. I'm relying on either llvm.eh.endcatch or a resume instruction to mark the end of handler code. Without llvm.eh.endcatch we can just branch into normal code with no reliable way to tell we're done.

andrew.w.kaylor retitled this revision from Work-in-progress omnibus patch for native Windows C++ EH outlining to Extended support for native Windows C++ EH outlining.
andrew.w.kaylor updated this object.
andrew.w.kaylor edited the test plan for this revision. (Show Details)
andrew.w.kaylor edited edge metadata.

Rebased to incorporate the recently committed patches.
Refactored to eliminate duplicate code.
Updated the landingpad replacement to properly preserve exception argument variables and indirect branch targets.
Added a mapping scheme to eliminate duplicate parsing of blocks shared between landing pads.

rnk added a comment.Mar 5 2015, 2:29 PM

Some thoughts. I'll try to write up a more precise proposal for llvm.eh.actions.

lib/CodeGen/WinEHPrepare.cpp
72–75

You should do this and call .clear() at the end of runOnFunction(). My understanding is that FunctionPasses aren't recreated before and after each function.

126–131

Can you throw in a comment about how normally we only ever see at most one of these per landingpad?

462

Maybe do:

if (!HandlersOutlined)
  continue;
469–471

Would you say this was the "EH state" that we talked about, or something else?

499–507

What happens if these instructions still have uses? I think we'll assert on this code for example:

lpad:
  %ehvals = landingpad
  %x = load i32* %x.ptr
  ... ; normal EH dispatch
catch.int:
  store i32 %x, i32* %x.ptr

I think a cleaner way to do this would be:

  1. create a new basic block for the llvm.eh.actions call
  2. assert that there are no phi nodes on the old landingpad block (EH preparation should've demoted them)
  3. clone the LandingPadInst from the old one into the new one
  4. set the unwind edge of all incoming invokes to the new landingpad
  5. old landingpad is now trivially unreachable, allow removeUnreachableBlocks to do its thing

The advantage of this approach is that you never have to try to update the def-use graph.

556–570

Can you help me understand this case? Is it something like this?

void f() {
  HasCleanup o;
  int x = g(); // invoke value used in catch
  try {
    g();
  } catch (int) {
    use_x(x);
  }
}

I wonder if this is a bug in DemoteRegToStack that can be shown with opt -reg2mem.

625

Can this be simplified with PatternMatch::match()?

704–706

You might want StartBB->getFirstInsertionPt(). Also, consider this test case:

void use_it(int);
void g();
void f() {
  int g_calls = 0;
  try {
    g(); g_calls++;
    g(); g_calls++;
    g(); g_calls++;
  } catch (int) {
    use_it(g_calls);
  }
}

Using -O2, we get this interesting landingpad:

lpad:                                             ; preds = %invoke.cont1, %invoke.cont, %entry
  %g_calls.0 = phi i32 [ 2, %invoke.cont1 ], [ 1, %invoke.cont ], [ 0, %entry ]
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          catch i8* bitcast (%rtti.TypeDescriptor2* @"\01??_R0H@8" to i8*)

We need to make sure that phi gets demoted. We can probably do something like assert(StartBB->begin() == StartBB()->getFirstNonPhi());

804–833

These two functions could be refactored to be parameterized over the vector of stores, either EHPtrStores of SelectorStores.

824–831

This seems like an unexpected side effect for a function called "isEHPtrLoad".

847–853

It's kind of amusing that the ehptr hasn't been used for anything yet... I wonder if there's any way to get MSVC to use the second argument to the catch handler.

974

I still don't think we need this. I'd rather hold off on committing this and experiment with making the llvm.eh.endcatch() cleanup non-exceptional.

1023

I'm OK doing this for now, this isn't true in general, so we should put a FIXME here. Consider something horrible like this:

void might_throw();
struct MyDtor {
  __forceinline ~MyDtor() {
    try { might_throw(); } catch (int) {}
  }
};
void f() {
  MyDtor o;
  might_throw();
}

After inlining, f will have a cleanup with an invoke in it.

test/CodeGen/WinEH/cppeh-inalloca.ll
68

Negative checks should generally try to be overly general, since small changes in the output can easily leave behind an extractvalue instruction that isn't named '%2', for example. In this case, I'd go for:

; CHECK-NOT: extractvalue { i8*, i32 }
; CHECK-NOT: br label

This basically says, there shouldn't be extractvalue or branches before we hit the llvm.eh.actions call.

test/CodeGen/WinEH/cppeh-nested-2.ll
51–52

Should be gone as of this morning

I'll rebase to pull in your latest changes and clean this up as you've suggested.

So, do you think it's worth pushing this forward on this review, or would you rather I peel off a piece or two?

lib/CodeGen/WinEHPrepare.cpp
72–75

Yeah, I thought I had a reason for keeping the actions around, but after thinking about it some more, that isn't necessary.

126–131

Sure.

462

OK

469–471

Yes, this is EH state. I'll fix the name in the comment here.

499–507

Sounds good.

556–570

No, case you cite here doesn't do this.

The case this is meant to handle shows up in the cppeh-nonalloca-frame-values.ll test. Basically, any time a temporary value is computed in the normal code but only referenced in the catch code this happens. The source code for that test looks like this:

void test() {
  int NumExceptions = 0;
  int ExceptionVal[10];
  SomeData Data = { 0, 0 };

  for (int i = 0; i < 10; ++i) {
    try {
      may_throw();
      Data.a += i;
    }
    catch (int e) {
      ExceptionVal[NumExceptions] = e;
      ++NumExceptions;
      if (e == i)
        Data.b += e;
      else
        Data.a += e;
    }
    does_not_throw(NumExceptions);
  }
  dump(ExceptionVal, NumExceptions, Data);
}

The when this is compiled with optimization enabled, both i32 members of Data get initialized with a single i64 store. There's a GEP instruction in the entry block that gets the address of the 'b' memebr as %b, but %b is only used in the catch handling code.

What DemoteRegToStack is doing seems correct to me. If the value to be demoted has no uses then it just erases it and returns NULL. The trouble is caused by the fact that I'm doing this demotion after the catch handler code has been pruned. The only reference to the value was in the catch handler and the outlined handler is using a temporary alloca which is at this point waiting to be replaced by a framerecover call.

I think the right solution in this case is to sink the "computed" value into the catch handler. That could get a bit tricky in cases (probably very common) where values needed to by the value to be sunk aren't otherwise referenced in the handler, but I expect we can find a reasonable way to do it.

For instance in the case cited above, %b looks like this:

%b = getelementptr inbounds %struct.SomeData, %struct.SomeData* %tmpcast, i64 0, i32 1

Nothing in the catch handler references %tmpcast, so we'd need to pull it in. Right now we're creating a temporary alloca for %b in the value materializer and translating that to a framerecover call. I think if we recognize that %b is a non-alloca with no uses outside the catch handlers (I see this having the potential to go south here, but it would be easy enough for this case), then the materializer can create a temporary alloca for %tmpcast instead (if that hasn't been done already) and clone the %b instruction.

625

I'll look into that.

804–833

OK

824–831

Yeah, I should probably rename it. I'm pretty sure I've seen this kind of thing happening often in code generated with optimizations disabled.

847–853

Hard to say. It looks like it's just there to help out the debugger.

974

I'm fine with holding it back for now.

1023

I believe the MS runtime will terminate the process if an exception occurs in a cleanup handler. So I think we'd need to ignore the __forceinline attribute in that case.

test/CodeGen/WinEH/cppeh-inalloca.ll
68

Yeah, I was thinking about this yesterday as I was having to renumber temporary variables in some of the positive checks. These tests are probably going to be very brittle the way I've written them. At some point I want to go through and generalize them in a lot of ways. For now, I can certainly trim down the negative checks as you suggest.

test/CodeGen/WinEH/cppeh-nested-2.ll
51–52

Woo hoo!

Incorporated feedback from previous review
Rebased to merge with latest trunk code
Abstracted value names in tests

rnk added a comment.Mar 10 2015, 6:16 PM

I think this is pretty close.

lib/CodeGen/WinEHPrepare.cpp
1122–1123

I think if we can resolve this, it'll greatly simplify the code and reduce the amount of repeated selector analysis.

1290

Does this do anything?

1307–1310

Don't we need to analyze the condition more to know which BB is the fallthrough side?

1324–1328

This seems like a lot of work, we end up mapping the same landingpads over and over in inner loops. Maybe we should have a map for each landing pad and just look them up inside the cloning directors?

1342–1344

I feel like this repeated stanza could be simplified to something like:

return createCleanupHandler(CleanupHandlerMap, BB);

void createCleanupHandler(CleanupHandlerMapTy &Map, BasicBlock *BB) {
  auto *Action = new CleanupHandler(BB);
  Map[BB] = Action;
  return Action;
}
lib/CodeGen/WinEHPrepare.cpp
1122–1123

Actually I think what I meant by this comment is already accomplished by the new handling of the catch and cleanup maps.

1290

No. I have no idea why that's there.

1307–1310

I don't think so. I believe earlier analysis of this block has determined that the block ends with a branch that is either unconditional or is predicated on the comparison of the result of an llvm.eh.typeid.for call and that the predicate is either ICMP_EQ or ICMP_NE, so just looking at the predicate tells us which successor corresponds to a selector match.

If the predicate is ICMP_EQ, then successor 0 is the catch block and successor 1 is the fall through, otherwise the reverse.

I can add some more comments to explain this. I can probably also clean this up a bit with PatternMatch.

1324–1328

You're right. There's no reason to create more than one LandingPadMapper for each landing pad.

1342–1344

Sure. We can do that.

Simplified cleanup handler mapping
Eliminated duplicate landing pad instruction mapping

rnk accepted this revision.Mar 11 2015, 3:40 PM
rnk edited edge metadata.

David said he'd take a look, but this looks good to me.

I want to patch it in and see if I can parameterize this a bit to work for SEH __finally block outlining. Exciting. :)

test/CodeGen/WinEH/cppeh-nonalloca-frame-values.ll
236

FileCheck actually doesn't care about horizontal whitespace. A run of spaces and tabs will match any other run of spaces and tabs.

This revision is now accepted and ready to land.Mar 11 2015, 3:40 PM
majnemer accepted this revision.Mar 11 2015, 3:41 PM
majnemer edited edge metadata.

LGTM

This change set was implemented in r231981.