This is an archive of the discontinued LLVM Phabricator instance.

New EH representation for MSVC compatibility
ClosedPublic

Authored by majnemer on Jul 8 2015, 1:30 PM.

Details

Summary

This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.

Diff Detail

Repository
rL LLVM

Event Timeline

majnemer updated this revision to Diff 29282.Jul 8 2015, 1:30 PM
majnemer retitled this revision from to New EH representation for MSVC compatibility.
majnemer updated this object.
majnemer added a subscriber: llvm-commits.
rnk added inline comments.Jul 8 2015, 3:26 PM
docs/LangRef.rst
5185–5186 ↗(On Diff #29282)

s/the function it is located in/its parent function/?

5187–5188 ↗(On Diff #29282)

s/look for an appropriate catch block in the caller/continue processing exception handling actions in the caller/?

Also, the personality function isn't really responsible for moving to the caller, since the caller might have a different EH personality.

5194 ↗(On Diff #29282)

s/constituent/preceding/?

5213–5214 ↗(On Diff #29282)

This seems to be missing the restriction that exactly one catchblock must unwind to a catchendblock.

5254 ↗(On Diff #29282)

s/transfer/transfers/

5304 ↗(On Diff #29282)

Maybe add the void example:

cleanupret void unwind to caller
5316 ↗(On Diff #29282)

The other instructions have the 'unwind to caller' form. Any reason not to list that here?

5323–5324 ↗(On Diff #29282)

s/attempts to transfer control to terminate the program/may decide to terminate the program/?

5327–5328 ↗(On Diff #29282)

s/if the terminateblock is an appropriate handler for the in-flight exception/if the personality routine decides not to terminate the program/?

5329–5331 ↗(On Diff #29282)

I think we can drop this sentence from the overview.

5339–5340 ↗(On Diff #29282)

Why? It might unwind to caller, consider a noexcept function.

8329 ↗(On Diff #29282)

I wonder if we should just do Itanium and MSVC examples for every instruction, since together they motivate why the instructions have the flexibility that they do.

lib/IR/Verifier.cpp
2411–2412 ↗(On Diff #29282)

Comment needs more edits for grammar:
Verify that the first non-PHI instruction of the unwind destination is an exception handling instruction.

lib/Transforms/Instrumentation/MemorySanitizer.cpp
2656–2674 ↗(On Diff #29282)

Our new instructions may be void, and we don't need to set shadow memory for void instructions. As written, this fills the DenseMaps with a bunch of useless values. Let's avoid that.

majnemer updated this revision to Diff 29296.Jul 8 2015, 6:19 PM
majnemer marked 6 inline comments as done.
  • Address Reid's review feedback.
majnemer marked 7 inline comments as done.Jul 8 2015, 6:20 PM

The latest revision should address these comments.

I'm happy to see this coming online! I have some inline questions about the details.

docs/LangRef.rst
4730–4734 ↗(On Diff #29296)

Missing cleanupret here

5123–5125 ↗(On Diff #29296)

What should the exception label look like for a catch that is unconditionally entered (like catch(...) in C++)? Or do front-ends need to translate such things to cleanups?

5140 ↗(On Diff #29296)

Is there also a restriction on the target of the exception label? E.g. that it must be a catchblock or catchendblock?

5144–5149 ↗(On Diff #29296)

I'd imagine we're going to want a word for this set of three restrictions; we have a few of these and I expect some optimizations will want to know that these are in effect without caring which specific block type this is. I think I saw the term "unsplittable block" used for this in one of the discussions on the dev alias, which I like because the restriction is very similar to that of an unsplittable edge (i.e. you can't put code in it).

I'm not sure what's the right way to express that in these documents. E.g. should it just say "A catch block is unsplittable" and there's a glossary or something that can spell out what that means? Having an explicit section discussing unsplittable blocks may be warranted since it may be a construct that many aren't used to.

I think my best suggestion for now is to add a bullet that says "A catch block is 'unsplittable':" and then the bullets you have become sub-bullets of that (and likewise for the other unsplittable blocks).

5184 ↗(On Diff #29296)

If we have

try {
   code;
   try {
     code;
   } catch (T e) {  // <--- catch 1
     MayThrow();
   }
   code;
} catch (T2 e2) {  // <---- catch 2
  code;
}

is the invoke for MayThrow supposed to target the catchbolck for catch2, or the catchendblock for catch1? If it's the catchendblock, having some verbiage here explaining that would be good.

5256 ↗(On Diff #29296)

Mention that this reads and writes memory?

5263 ↗(On Diff #29296)

That "unwind" shouldn't be there, should it?

5265 ↗(On Diff #29296)

nit: You introduce cleanupret before you introduce cleanupblock; that reads strangely.

5275–5276 ↗(On Diff #29296)

My impression from the RFC discussion was that cleanupret would also have a 'from' label indicating its cleanupblock (though I didn't see a reply to John's questions about that). What happened to that?

5289–5290 ↗(On Diff #29296)

This implies that all cleanupblocks in a given function have the same result type; should that be mentioned as a restriction on cleanupblock and/or checked in the verifier?

5296–5297 ↗(On Diff #29296)

Since cleanupblocks can nest, I think "one cleanupblock" would be more clear here than "the cleanupblock". Also s/transfered/transferred/.

8292–8293 ↗(On Diff #29296)

I think you're missing some words here ("personality function upon re-entry to the function." is a fragment).

include/llvm/IR/Instructions.h
3804 ↗(On Diff #29296)

s/invoke/catchblock/

3809 ↗(On Diff #29296)

s/invoke/catchblock/

lib/AsmParser/LLParser.cpp
4997 ↗(On Diff #29296)

s/cleanupret/terminateblock/

5025 ↗(On Diff #29296)

s/cleanupret/catchendblock/

lib/IR/Instruction.cpp
468–471 ↗(On Diff #29296)

Should there be a clause for TerminateBlock here?

lib/IR/Verifier.cpp
383–386 ↗(On Diff #29296)

Nothing to check for CatchRet or CleanupRet? I think you could check that the CleanupRet unwind block is a (non-Landingpad?) EH block.

2840 ↗(On Diff #29296)

Would be good to check that the unwind successor has an appropriate target (must be an EH block, can't be a LandingPad, I'm not sure if you're intending cleanupblock and/or terminateblock to be allowed).

2840 ↗(On Diff #29296)

Check that all predecessors have exceptional terminators? Likewise for the other new EH blocks.

2857 ↗(On Diff #29296)

Check here that this is the target of no more than one catchblock?

2857 ↗(On Diff #29296)

If there's an unwind successor, it must be a (non-Landingpad?) EHBlock.

2890 ↗(On Diff #29296)

If there's an unwind successor, it must be a (non-Landingpad?) EHBlock.

Also, is there any new UB that should be spelled out in the documentation? E.g. can you return (from the function) from within a catchblock (without first executing catchret), and likewise for cleanups? Unwind from a catchblock to somewhere other than its catchendblock? Catchret from a cleanupblock or cleanupret from a catchblock?

majnemer marked 5 inline comments as done.Jul 9 2015, 11:47 AM

Also, is there any new UB that should be spelled out in the documentation? E.g. can you return (from the function) from within a catchblock (without first executing catchret), and likewise for cleanups? Unwind from a catchblock to somewhere other than its catchendblock? Catchret from a cleanupblock or cleanupret from a catchblock?

Yes, those should all be UB. I'll add them to the LangRef.

docs/LangRef.rst
5123–5125 ↗(On Diff #29296)

The catchblock should unwind to a catchendblock in that case.

5140 ↗(On Diff #29296)

Yes.

5184 ↗(On Diff #29296)

MayThrow should unwind to the same catchendblock that "catch 1" will unwind to. That catchendblock would unwind to the outer catchblock.

5265 ↗(On Diff #29296)

cleanupblock isn't a terminator and the terminators are all documented together :/

lib/IR/Instruction.cpp
468–471 ↗(On Diff #29296)

I think TerminateBlock is more like a function-call which never returns and thus wouldn't throw.

lib/IR/Instruction.cpp
468–471 ↗(On Diff #29296)

The LangRef indicates that TerminateBlock might continue propagating the exception rather than never-return terminate the program (and it shouldn't need an unwind clause otherwise):

where a personality routine may decide to terminate the program

Control is transferred to the `exception` label if the personality routine decides not to terminate the program for the in-flight exception.

So in the event that a TerminateBlock unwinds to caller and chooses not to terminate the program, does that count as "throw" for these purposes? To me it seems analogous to the catchendblock case.

majnemer added inline comments.Jul 9 2015, 12:54 PM
lib/IR/Instruction.cpp
468–471 ↗(On Diff #29296)

Ah yes, I forgot the unwind to caller case.

majnemer marked 13 inline comments as done.Jul 9 2015, 1:51 PM
majnemer added inline comments.
docs/LangRef.rst
5275–5276 ↗(On Diff #29296)

We ended up believing we didn't need it.

5289–5290 ↗(On Diff #29296)

This is implicit in the cleanupblock semantics section because the personality function chooses the result type and there can only be one personality function for a function.

And yes, the verifier should check this.

majnemer updated this revision to Diff 29384.Jul 9 2015, 1:51 PM
majnemer marked an inline comment as done.
majnemer edited edge metadata.
  • Address Reid's review feedback.
  • Address Joseph's review comments.
JosephTremoulet added inline comments.Jul 9 2015, 6:04 PM
lib/IR/Verifier.cpp
2864–2866 ↗(On Diff #29384)

The lang ref says catchblock must unwind to catchblock or catchendblock, but the check here is more permissive (allows cleanupblock and terminateblock).

This revision was automatically updated to reflect the committed changes.

I accidentally committed this in r241888-r241891 and reverted those commits
in r241893. Apologies for the ensuing confusion.

docs/LangRef.rst
5123–5125 ↗(On Diff #29296)

That seems problematic. Doesn't that leave us without a way to indicate in the CFG that exceptions don't escape a catch-all? We'll definitely want to e.g. figure out when a handler has no incoming exceptions and can be removed (which could happen because there is an inner catch-all), or determine whether exceptions escape a function (to propagate that info up the call graph). Maybe EH-specific analyses like that could be expected to know about this and check whether a catchblock's arguments indicate catch-all for the given personality, but I expect we'll also want the information be visible to more generic analyses like reachability and dominance.
I'd like to offer a suggested change, but I'm not sure I fully understand what problem catchendblock is solving, so my first question would be what it's needed for and whether there's another way to solve that problem (the RFC just said "It is merely a placeholder to help reconstruct which invokes were part of the catch blocks of a try"; could that be computed by checking whether catchret is reachable from the invokes?). Barring that, it seems we'd want something like a catchallblock that isn't a terminator, doesn't have an unwind edge, and would have whatever the appropriate arguments are for a catchblock that catches everything; front-ends or an EH-aware transform could recognize when a catchblock catches everything and the catchendblock would have no other predecessors, and write the last catchblock as catchallblock (dropping the catchendblock) instead.

rnk added inline comments.Jul 10 2015, 8:26 AM
docs/LangRef.rst
5123–5125 ↗(On Diff #29296)

It turns out that you need this edge in order to generate the right __CxxFrameHandler3 tables when catch-all is involved. Consider:

try { throw 1; }
catch (...) {
  try { throw; }
  catch (...) { throw; }
  }
}

The inner catch-all block unwinds to the catchendblock of the outer catch. This affects our planned EH state numbering algorithm.

Basically, I think that only EH-aware transforms should be allowed to do this kind of CFG simplification.

docs/LangRef.rst
5123–5125 ↗(On Diff #29296)
  1. Regarding your specific example: Since the inner catch(...) re-throws, there's still unwind to out of the function (or at least the snippet); the inner catchendblock would have two predecessors, and I'm more concerned about cases where the artificial edge from the catchblock is the only predecessor of the catchendblock. Or in terms of the example, something more like this:
try { throw 1; }
catch (...) {
  try { throw; }
  catch (...) { /* nothing throws here */ }
  }
}

I'd like to be able to record that a function with that body doesn't unwind out of it.

  1. Maybe I'd need more details on your planned EH state numbering algorithm, but the idea would be that you'd identify anywhere that you clone a catch-all into a handler, and consider the catch-all a child of that handler instead of top-level. Would that be doable?
  1. I agree with this:

Basically, I think that only EH-aware transforms should be allowed to do this kind of CFG simplification.

So let's say I'm writing an EH-aware transform that knows my target has flexible EH reporting. What is the output of the CFG simplification that it performs? It seems to me that it doesn't have any valid way to simplify the CFG, that's the part that's concerning.

reames edited edge metadata.Jul 21 2015, 5:00 PM

Apparently, the contents of this review are now stale and future comments should go to D11097: New EH representation for MSVC compatibility. Mentioning this solely to avoid confusion on the part of future readers.