This is an archive of the discontinued LLVM Phabricator instance.

New EH representation for MSVC compatibility
ClosedPublic

Authored by majnemer on Jul 10 2015, 12:21 AM.

Details

Summary

This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.

Diff Detail

Repository
rL LLVM

Event Timeline

majnemer updated this revision to Diff 29430.Jul 10 2015, 12:21 AM
majnemer retitled this revision from to New EH representation for MSVC compatibility.
majnemer updated this object.
lib/IR/Verifier.cpp
187–188 ↗(On Diff #29430)

Could this be split out to check catches and cleanups separately? The CoreCLR's personality uses different types (cleanups are not provided a pointer to the in-flight exception).

majnemer added inline comments.Jul 11 2015, 4:34 PM
lib/IR/Verifier.cpp
187–188 ↗(On Diff #29430)

Sure, will do.

majnemer updated this revision to Diff 29517.Jul 11 2015, 10:29 PM
  • Let ehblock instructions have different result types

LGTM, I think all my comments have been addressed except for adding something about the new UB to the langref (and personally I'd be inclined to add a note to the section on catchendblock that it should be the unwind target of invokes between its catchblocks and their catchrets).

majnemer updated this revision to Diff 29942.Jul 16 2015, 2:00 PM
majnemer edited edge metadata.
  • Let ehblock instructions have different result types
  • Address Joseph's latest review comments.

Spotted a typo. Also, is a plain 'ret' allowed to implicitly close all catches/cleanups, or is it UB to return without catchret/cleanupret?

Thanks.

docs/LangRef.rst
5233–5234 ↗(On Diff #29942)

s/its catchendblock/a corresponding catchret/
s/a catchendblock/its catchendblock/

majnemer updated this revision to Diff 30074.Jul 17 2015, 11:42 PM
  • Let ehblock instructions have different result types
  • Address Joseph's latest review comments.
  • Fix typo spotted by Joseph

Spotted a typo.

Typo has been fixed.

Also, is a plain 'ret' allowed to implicitly close all catches/cleanups, or is it UB to return without catchret/cleanupret?

Executing that ret would be UB.

Also, is a plain 'ret' allowed to implicitly close all catches/cleanups, or is it UB to return without catchret/cleanupret?

Executing that ret would be UB.

Should the langref indicate this? I don't see anywhere that it currently does.

majnemer updated this revision to Diff 30215.Jul 20 2015, 5:01 PM
  • Add a comment for ret from catchblock/cleanupblock

Looks good to me.

reames edited edge metadata.Jul 21 2015, 3:52 PM

What is the relation between this patch and http://reviews.llvm.org/D11041? They seem to include some of the same code.

A couple of specific comments, but nothing approaching a full review.

docs/LangRef.rst
8421 ↗(On Diff #30215)

Is there anywhere in the docs that give context for why these instructions are needed and when they should be used instead of landingpad? If not, this should be done before checkin. Despite the discussion on llvmdev, I still don't feel like I understand the use case enough to really review the design.

include/llvm/IR/Instruction.h
392 ↗(On Diff #30215)

As mentioned elsewhere, I find the name "Block" confusing in this function name. Please change.

Actually, why is this a predicate on the instruction at all? Isn't this a property of the containing block?

lib/IR/Verifier.cpp
2819 ↗(On Diff #30215)

This looks like a fix for an existing deficiency in the verifier right? Not specific to the new instructions? If so, please submit this separately.

2867 ↗(On Diff #30215)

The property of being a single instruction is a block is repeated in several of these. It would make sense to have this checked in a common helper routine which includes the terminator checks as well.

2900 ↗(On Diff #30215)

Why not exit on 2nd one found?

2909 ↗(On Diff #30215)

The use of the name "isEHBlock" on an *Instruction* seems highly misleading. Please change this. That function name on a BasicBlock makes sense, but not on a Instruction.

2953 ↗(On Diff #30215)

TerminateBlock is a highly confusing name if this is EH specific. "Doesn't every block have a terminator?"

What is the relation between this patch and http://reviews.llvm.org/D11041? They seem to include some of the same code.

D11041 was an older version of this patch, I'd ignore it.

A couple of specific comments, but nothing approaching a full review.

docs/LangRef.rst
8421 ↗(On Diff #30215)

I don't think that the LangRef is the right place to discuss which set of EH-related IR is most appropriate for a frontend. I think it would make more sense in ExceptionHandling.rst or maybe a new file in the docs/Frontend directory.

We introduced these new instructions because of constraints imposed by our personality routine. For example:

  • we need to be able to statically determine which invokes occurred in a catch block.
  • it is impossible to lower @llvm.eh.typeid.for for our personality routine.

These constraints make landingpad/resume inappropriate, especially in the face of optimizations. These new instructions allow us maintain enough information in IR-form that we can lower it into a representation our personality will be happy with.

For example, we can easily find all the code for a try body's catch handlers by examining the unwind labels and operands of its relevant CatchBlockInsts and tracing execution to relevant CatchRetInsts. Such an operation would be impossible in the landingpad scheme because control flow can leave "exceptional" code back to "normal" code through a branch instruction.

include/llvm/IR/Instruction.h
392 ↗(On Diff #30215)

I'm all for changing the Block suffix given a reasonable suggestion.

It is on instruction because we have instructions called CatchBlockInst, CleanupBlockInst, etc. The predicate answers the question if it is one of these "FooBlock" instructions.

lib/IR/Verifier.cpp
2819 ↗(On Diff #30215)

Sure, removed.

2900 ↗(On Diff #30215)

Just to make the code simpler, it would be optimizing for a case that would never happen in practice.

2909 ↗(On Diff #30215)

The instructions have "Block" in the name, hence "Block" in the function's name.

2953 ↗(On Diff #30215)

I'm open to suggestions, it's the best name we could come up with.

@reames thinking about it a little more, would you be happier with CatchPadInst instead of CatchBlockInst, etc. and isEHPad instead of isEHBlock?

@reames thinking about it a little more, would you be happier with CatchPadInst instead of CatchBlockInst, etc. and isEHPad instead of isEHBlock?

That would help. Reusing block causes confusion with the widespread use of "block" to refer to a basic block in the code base.

More generally, consistency in the terminology would be a good thing. Let me lay out my current understanding:

  • A X-"pad" is a place the exceptional control flow might resume.
  • A catch is a specific type of "pad" which models a language/ABI level catch clause. A catch is exception type specific.
  • A cleanup is a specific type of "pad" which models scope exit without an explicit exception type.
  • Both catch and cleanup are statically scoped. There's a begin and end construct for each.

In this context, what is a terminateblock? It's clearly a "pad", but why is not just a cleanup?

On the topic of motivation, I still feel like I'm missing a lot. In particular, it feels like the catchpad, cleanuppad, and terminatepad are really close to what's already described by landingpad(*). I get that you need to prevent merging of catch blocks, but why isn't a single "pad fence" at the end of each clause which is unmergeable solve that problem?

  • We might end up changing how you describe a catch clause, but the idea is already there. You do seem to need different arguments then the existing catch clause bits.

p.s. For the record, I'm asking for my own understanding not because I oppose the proposal.

docs/LangRef.rst
8421 ↗(On Diff #30215)

I agree that the LangRef would be the wrong place for this. I don't really care where it ends up, but I do want to see the motivation documented.

In particular, when should the older scheme be used vs this one? Are you planning on deprecating the old mechanism? Is this essentially Windows only? I honestly couldn't answer any of those questions if asked today.

lib/IR/Verifier.cpp
2900 ↗(On Diff #30215)

I was only asking from the perspective of code clarity. Having the extra counter variable makes me thing it's going to be used for something.

rnk edited edge metadata.Jul 21 2015, 5:45 PM

@reames thinking about it a little more, would you be happier with CatchPadInst instead of CatchBlockInst, etc. and isEHPad instead of isEHBlock?

I like the pad terminology. The block terminology came about because we're really imbuing some special properties on the whole BasicBlock. Currently the landingpad instruction is the only instruction with the power to make an edge unsplittable, for example. Anyway, pad sounds good to me.

That would help. Reusing block causes confusion with the widespread use of "block" to refer to a basic block in the code base.

More generally, consistency in the terminology would be a good thing. Let me lay out my current understanding:

  • A X-"pad" is a place the exceptional control flow might resume.
  • A catch is a specific type of "pad" which models a language/ABI level catch clause. A catch is exception type specific.
  • A cleanup is a specific type of "pad" which models scope exit without an explicit exception type.
  • Both catch and cleanup are statically scoped. There's a begin and end construct for each.

In this context, what is a terminateblock? It's clearly a "pad", but why is not just a cleanup?

Currently, Clang expresses noexcept() as a catch-all clause that simply calls terminate. This is inefficient for both Itanium and MSVC because both LSDA tables can express noexcept with a single bit. We do it because it makes it possible to inline one noexcept function into another EH scope without teaching the inliner that it should transform some function attribute into LLVM IR that makes some C++ runtime function call.

Terminatepad solves this all by keeping data as data, allowing late IR preparation passes to expand the IR into code (catch-all or filter) or leave it alone if it can be encoded in the table.

On the topic of motivation, I still feel like I'm missing a lot. In particular, it feels like the catchpad, cleanuppad, and terminatepad are really close to what's already described by landingpad(*). I get that you need to prevent merging of catch blocks, but why isn't a single "pad fence" at the end of each clause which is unmergeable solve that problem?

  • We might end up changing how you describe a catch clause, but the idea is already there. You do seem to need different arguments then the existing catch clause bits.

Yeah, they are the same set of clauses. :) However, with the new instructions, we won't need to do impossibly error-prone pattern matching.

How would a "pad fence" statically indicate that after running this cleanup, the next EH action is that catch block over there? Right now we try to figure this out by walking the CFG and trying to find the next conditional branch based on a comparison of the EH selector value. I think the primary failure mode of WinEHPrepare today is that we inline a destructor containing control flow, and then this analysis falls over. I'd rather not have to do dataflow analysis to rediscover this very basic nesting property of the C++ source code.

Essentially, the new instructions are exactly that: a really strong "pad fence" that keeps all the data and control flow transfers that we want to know about grouped together.

majnemer updated this revision to Diff 30323.Jul 21 2015, 11:41 PM
majnemer edited edge metadata.
  • Let ehblock instructions have different result types
  • Address Joseph's latest review comments.
  • Fix typo spotted by Joseph
  • Add a comment for ret from catchblock/cleanupblock
  • Tidy-up some middle end pieces
  • Rename the new instructions from FooBlock to FooPad
lib/IR/BasicBlock.cpp
337 ↗(On Diff #30323)

Surprised this is just CatchEndPad; don't CatchPad and TerminatePad have the same restrictions?

majnemer added inline comments.Jul 22 2015, 9:41 AM
lib/IR/BasicBlock.cpp
337 ↗(On Diff #30323)

CatchPad can be split (for the purposes of SplitBlockPredecessors) by:

  1. Creating a new BB
  2. Cloning the CatchPad into the new BB
  3. Moving relevant PHI nodes to the new BB
  4. Hooking up relevant predecessors to unwind to the new BB

TerminatePad can be "split" using a similar mechanism.

The name of the function might need some work...

lib/IR/BasicBlock.cpp
337 ↗(On Diff #30323)

The name of the function might need some work...

Yes, I think the name was throwing me off. Looking at what SplitBlockPredecessors does, this makes sense. Which suggests the name canSplitPredecessors

CatchPad can be split (for the purposes of SplitBlockPredecessors) by:
1.Creating a new BB
2.Cloning the CatchPad into the new BB
3.Moving relevant PHI nodes to the new BB
4.Hooking up relevant predecessors to unwind to the new BB

Some questions about this:

  1. Will callers find this useful, or are they expecting that the new BB will be one they can insert instructions into prior to branching to the original BB?
  2. Do you need to add a step for inserting another BB, successor to the new catchpad, with a catchret?
  3. Could the same approach work equally well for catchendpad except changing #2 from "cloning the CatchPad into the new BB" to "inserting a new CatchPad into the new BB"? Is the problem that we don't have a general way to compute the arguments of a catch-all CatchPad?
majnemer updated this revision to Diff 30422.Jul 22 2015, 4:13 PM
  • Let ehblock instructions have different result types
  • Address Joseph's latest review comments.
  • Fix typo spotted by Joseph
  • Add a comment for ret from catchblock/cleanupblock
  • Tidy-up some middle end pieces
  • Rename the new instructions from FooBlock to FooPad
  • Rename isSplittable to canSplitPredecessors
majnemer updated this revision to Diff 30541.Jul 23 2015, 4:49 PM
  • Make canSplitPredecessors a little mroe conservative for now

@reames, can we iterate on the documentation in-tree now that we have D11565 out for review? It would be a big help.

majnemer updated this revision to Diff 30965.Jul 29 2015, 4:22 PM
  • Let ehblock instructions have different result types
  • Address Joseph's latest review comments.
  • Fix typo spotted by Joseph
  • Add a comment for ret from catchblock/cleanupblock
  • Tidy-up some middle end pieces
  • Rename the new instructions from FooBlock to FooPad
  • Rename isSplittable to canSplitPredecessors
  • Make canSplitPredecessors a little mroe conservative for now
  • Fixup for trunk
  • Inliner changes
rnk accepted this revision.Jul 31 2015, 9:58 AM
rnk edited edge metadata.

I think we've generated enough consensus around this design, and it's time to commit this and work incrementally in tree. I wrote up more motivation for the design in D11565.

This revision is now accepted and ready to land.Jul 31 2015, 9:58 AM
This revision was automatically updated to reflect the committed changes.