This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/IR/Verifier.cpp | ||
---|---|---|
187–188 ↗ | (On Diff #29430) | Could this be split out to check catches and cleanups separately? The CoreCLR's personality uses different types (cleanups are not provided a pointer to the in-flight exception). |
lib/IR/Verifier.cpp | ||
---|---|---|
187–188 ↗ | (On Diff #29430) | Sure, will do. |
LGTM, I think all my comments have been addressed except for adding something about the new UB to the langref (and personally I'd be inclined to add a note to the section on catchendblock that it should be the unwind target of invokes between its catchblocks and their catchrets).
- Let ehblock instructions have different result types
- Address Joseph's latest review comments.
Spotted a typo. Also, is a plain 'ret' allowed to implicitly close all catches/cleanups, or is it UB to return without catchret/cleanupret?
Thanks.
docs/LangRef.rst | ||
---|---|---|
5233–5234 ↗ | (On Diff #29942) | s/its catchendblock/a corresponding catchret/ |
- Let ehblock instructions have different result types
- Address Joseph's latest review comments.
- Fix typo spotted by Joseph
Typo has been fixed.
Also, is a plain 'ret' allowed to implicitly close all catches/cleanups, or is it UB to return without catchret/cleanupret?
Executing that ret would be UB.
Also, is a plain 'ret' allowed to implicitly close all catches/cleanups, or is it UB to return without catchret/cleanupret?
Executing that ret would be UB.
Should the langref indicate this? I don't see anywhere that it currently does.
What is the relation between this patch and http://reviews.llvm.org/D11041? They seem to include some of the same code.
A couple of specific comments, but nothing approaching a full review.
docs/LangRef.rst | ||
---|---|---|
8421 ↗ | (On Diff #30215) | Is there anywhere in the docs that give context for why these instructions are needed and when they should be used instead of landingpad? If not, this should be done before checkin. Despite the discussion on llvmdev, I still don't feel like I understand the use case enough to really review the design. |
include/llvm/IR/Instruction.h | ||
392 ↗ | (On Diff #30215) | As mentioned elsewhere, I find the name "Block" confusing in this function name. Please change. Actually, why is this a predicate on the instruction at all? Isn't this a property of the containing block? |
lib/IR/Verifier.cpp | ||
2819 ↗ | (On Diff #30215) | This looks like a fix for an existing deficiency in the verifier right? Not specific to the new instructions? If so, please submit this separately. |
2867 ↗ | (On Diff #30215) | The property of being a single instruction is a block is repeated in several of these. It would make sense to have this checked in a common helper routine which includes the terminator checks as well. |
2900 ↗ | (On Diff #30215) | Why not exit on 2nd one found? |
2909 ↗ | (On Diff #30215) | The use of the name "isEHBlock" on an *Instruction* seems highly misleading. Please change this. That function name on a BasicBlock makes sense, but not on a Instruction. |
2953 ↗ | (On Diff #30215) | TerminateBlock is a highly confusing name if this is EH specific. "Doesn't every block have a terminator?" |
D11041 was an older version of this patch, I'd ignore it.
A couple of specific comments, but nothing approaching a full review.
docs/LangRef.rst | ||
---|---|---|
8421 ↗ | (On Diff #30215) | I don't think that the LangRef is the right place to discuss which set of EH-related IR is most appropriate for a frontend. I think it would make more sense in ExceptionHandling.rst or maybe a new file in the docs/Frontend directory. We introduced these new instructions because of constraints imposed by our personality routine. For example:
These constraints make landingpad/resume inappropriate, especially in the face of optimizations. These new instructions allow us maintain enough information in IR-form that we can lower it into a representation our personality will be happy with. For example, we can easily find all the code for a try body's catch handlers by examining the unwind labels and operands of its relevant CatchBlockInsts and tracing execution to relevant CatchRetInsts. Such an operation would be impossible in the landingpad scheme because control flow can leave "exceptional" code back to "normal" code through a branch instruction. |
include/llvm/IR/Instruction.h | ||
392 ↗ | (On Diff #30215) | I'm all for changing the Block suffix given a reasonable suggestion. It is on instruction because we have instructions called CatchBlockInst, CleanupBlockInst, etc. The predicate answers the question if it is one of these "FooBlock" instructions. |
lib/IR/Verifier.cpp | ||
2819 ↗ | (On Diff #30215) | Sure, removed. |
2900 ↗ | (On Diff #30215) | Just to make the code simpler, it would be optimizing for a case that would never happen in practice. |
2909 ↗ | (On Diff #30215) | The instructions have "Block" in the name, hence "Block" in the function's name. |
2953 ↗ | (On Diff #30215) | I'm open to suggestions, it's the best name we could come up with. |
@reames thinking about it a little more, would you be happier with CatchPadInst instead of CatchBlockInst, etc. and isEHPad instead of isEHBlock?
That would help. Reusing block causes confusion with the widespread use of "block" to refer to a basic block in the code base.
More generally, consistency in the terminology would be a good thing. Let me lay out my current understanding:
- A X-"pad" is a place the exceptional control flow might resume.
- A catch is a specific type of "pad" which models a language/ABI level catch clause. A catch is exception type specific.
- A cleanup is a specific type of "pad" which models scope exit without an explicit exception type.
- Both catch and cleanup are statically scoped. There's a begin and end construct for each.
In this context, what is a terminateblock? It's clearly a "pad", but why is not just a cleanup?
On the topic of motivation, I still feel like I'm missing a lot. In particular, it feels like the catchpad, cleanuppad, and terminatepad are really close to what's already described by landingpad(*). I get that you need to prevent merging of catch blocks, but why isn't a single "pad fence" at the end of each clause which is unmergeable solve that problem?
- We might end up changing how you describe a catch clause, but the idea is already there. You do seem to need different arguments then the existing catch clause bits.
p.s. For the record, I'm asking for my own understanding not because I oppose the proposal.
docs/LangRef.rst | ||
---|---|---|
8421 ↗ | (On Diff #30215) | I agree that the LangRef would be the wrong place for this. I don't really care where it ends up, but I do want to see the motivation documented. In particular, when should the older scheme be used vs this one? Are you planning on deprecating the old mechanism? Is this essentially Windows only? I honestly couldn't answer any of those questions if asked today. |
lib/IR/Verifier.cpp | ||
2900 ↗ | (On Diff #30215) | I was only asking from the perspective of code clarity. Having the extra counter variable makes me thing it's going to be used for something. |
I like the pad terminology. The block terminology came about because we're really imbuing some special properties on the whole BasicBlock. Currently the landingpad instruction is the only instruction with the power to make an edge unsplittable, for example. Anyway, pad sounds good to me.
That would help. Reusing block causes confusion with the widespread use of "block" to refer to a basic block in the code base.
More generally, consistency in the terminology would be a good thing. Let me lay out my current understanding:
- A X-"pad" is a place the exceptional control flow might resume.
- A catch is a specific type of "pad" which models a language/ABI level catch clause. A catch is exception type specific.
- A cleanup is a specific type of "pad" which models scope exit without an explicit exception type.
- Both catch and cleanup are statically scoped. There's a begin and end construct for each.
In this context, what is a terminateblock? It's clearly a "pad", but why is not just a cleanup?
Currently, Clang expresses noexcept() as a catch-all clause that simply calls terminate. This is inefficient for both Itanium and MSVC because both LSDA tables can express noexcept with a single bit. We do it because it makes it possible to inline one noexcept function into another EH scope without teaching the inliner that it should transform some function attribute into LLVM IR that makes some C++ runtime function call.
Terminatepad solves this all by keeping data as data, allowing late IR preparation passes to expand the IR into code (catch-all or filter) or leave it alone if it can be encoded in the table.
On the topic of motivation, I still feel like I'm missing a lot. In particular, it feels like the catchpad, cleanuppad, and terminatepad are really close to what's already described by landingpad(*). I get that you need to prevent merging of catch blocks, but why isn't a single "pad fence" at the end of each clause which is unmergeable solve that problem?
- We might end up changing how you describe a catch clause, but the idea is already there. You do seem to need different arguments then the existing catch clause bits.
Yeah, they are the same set of clauses. :) However, with the new instructions, we won't need to do impossibly error-prone pattern matching.
How would a "pad fence" statically indicate that after running this cleanup, the next EH action is that catch block over there? Right now we try to figure this out by walking the CFG and trying to find the next conditional branch based on a comparison of the EH selector value. I think the primary failure mode of WinEHPrepare today is that we inline a destructor containing control flow, and then this analysis falls over. I'd rather not have to do dataflow analysis to rediscover this very basic nesting property of the C++ source code.
Essentially, the new instructions are exactly that: a really strong "pad fence" that keeps all the data and control flow transfers that we want to know about grouped together.
- Let ehblock instructions have different result types
- Address Joseph's latest review comments.
- Fix typo spotted by Joseph
- Add a comment for ret from catchblock/cleanupblock
- Tidy-up some middle end pieces
- Rename the new instructions from FooBlock to FooPad
lib/IR/BasicBlock.cpp | ||
---|---|---|
337 ↗ | (On Diff #30323) | Surprised this is just CatchEndPad; don't CatchPad and TerminatePad have the same restrictions? |
lib/IR/BasicBlock.cpp | ||
---|---|---|
337 ↗ | (On Diff #30323) | CatchPad can be split (for the purposes of SplitBlockPredecessors) by:
TerminatePad can be "split" using a similar mechanism. The name of the function might need some work... |
lib/IR/BasicBlock.cpp | ||
---|---|---|
337 ↗ | (On Diff #30323) |
Yes, I think the name was throwing me off. Looking at what SplitBlockPredecessors does, this makes sense. Which suggests the name canSplitPredecessors
Some questions about this:
|
- Let ehblock instructions have different result types
- Address Joseph's latest review comments.
- Fix typo spotted by Joseph
- Add a comment for ret from catchblock/cleanupblock
- Tidy-up some middle end pieces
- Rename the new instructions from FooBlock to FooPad
- Rename isSplittable to canSplitPredecessors
- Let ehblock instructions have different result types
- Address Joseph's latest review comments.
- Fix typo spotted by Joseph
- Add a comment for ret from catchblock/cleanupblock
- Tidy-up some middle end pieces
- Rename the new instructions from FooBlock to FooPad
- Rename isSplittable to canSplitPredecessors
- Make canSplitPredecessors a little mroe conservative for now
- Fixup for trunk
- Inliner changes
I think we've generated enough consensus around this design, and it's time to commit this and work incrementally in tree. I wrote up more motivation for the design in D11565.