This is an archive of the discontinued LLVM Phabricator instance.

CodeGen support for x86_64 SEH catch handlers in LLVM
ClosedPublic

Authored by rnk on Nov 17 2014, 5:27 PM.

Details

Summary

This adds ExceptionHandling::MSVC, used by the x86_64-pc-windows-msvc
triple. It assumes that filter functions have already been outlined in
either the frontend or the backend. Filter functions are used in place
of the landingpad catch clause type info operands. In catch clause
order, the first filter to return true will catch the exception.

The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.

Missing functionality:

  • Accessing data in the parent frame from outlined filters
  • Cleanups (from __finally) are unsupported, as they will require outlining and parent frame access
  • Filter clauses are unsupported, as there's no clear analogue in SEH

In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk updated this revision to Diff 16312.Nov 17 2014, 5:27 PM
rnk retitled this revision from to CodeGen support for x86_64 SEH catch handlers in LLVM.
rnk added a reviewer: rjmccall.
rnk updated this object.
rnk added a subscriber: Unknown Object (MLST).
rjmccall edited edge metadata.Nov 17 2014, 7:19 PM

I can't review LLVM CodeGen patches; you need a backend person for that. But one comment:

include/llvm/MC/MCAsmInfo.h
50 ↗(On Diff #16312)

Let's not call this just "MSVC" when we know that MSVC actually uses several different schemes.

I've made some specific comments on things that I found confusing.

Maybe this doesn't need to be entirely settled now, but it isn't clear to me what you imagine the IR to look like before outlining and how the transition from that to the outlined filter functions will happen. In particular, could you describe what would happen if the filter condition made reference to local variables in the function?

For instance, suppose I wanted a filter in the safe_div example that looked like this:

__except(GetExceptionCode() == EXCEPTION_INT_DIVIDE_BY_ZERO && *n == 0)

what would that look like?

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
248 ↗(On Diff #16312)

The new exception handling implementation doesn't use Itanium encoding, does it?

lib/CodeGen/AsmPrinter/EHStreamer.h
68 ↗(On Diff #16312)

This comment is now wrong. Also, does a null pointer still mean there is no landing pad?

lib/CodeGen/AsmPrinter/Win64Exception.cpp
113 ↗(On Diff #16312)

Will there eventually be another call added here if the personality function is _CxxFrameHandler or will that be considered a different exception handling type?

141 ↗(On Diff #16312)

The comment "Zero for __finally" applies to the "FilterOrFinally" member, right? That wasn't clear to me on first reading.

159 ↗(On Diff #16312)

Could you clarify what "as we would for DWARF" means? This comment doesn't really tell me what's happening here.

215 ↗(On Diff #16312)

The existing naming is a bit unfortunate. I take it this is referring to landingpad type filters, not SEH filters. At least in the Windows-specific code perhaps we can use a term like "type filters."

223 ↗(On Diff #16312)

Based on the comments above, this must be a function pointer, right? I'm confused by the references to TypeID.

226 ↗(On Diff #16312)

Are there two different ways to specify a __finally handler?

test/CodeGen/X86/seh-safe-div.ll
36 ↗(On Diff #16312)

Can you explain how the results from the filter function get propagated to the landingpad return value?

126 ↗(On Diff #16312)

I take it this function signature is dictated by the __C_specific_handler implementation. Can you document that and describe the parameters?

rnk added inline comments.Nov 18 2014, 3:04 PM
include/llvm/MC/MCAsmInfo.h
50 ↗(On Diff #16312)

That's true, but this model will handle all of them. MSVC has two main EH personalites: C++ EH and SEH. Both use outlined cleanups, and this EH model will be used to trigger that outlining in the backend. The language-specific tables will key off the name of the personality function instead of this enum.

Maybe I could call this OutlinedWinEH or something to make it clearer that it's more of a family?

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
248 ↗(On Diff #16312)

It's confusing. There are two "Itanium" things here, neither of which is the ISA. There is the EH chapter of the Itanium C++ ABI, documented here (http://mentorembedded.github.io/cxx-abi/abi-eh.html), and there is the encoding of the UNWIND_INFO opcodes in .xdata. Saleem added this enum, so maybe he knows more. Presumably what happened was that Microsoft reused the opcodes they were going to use for Itanium for x86_64.

lib/CodeGen/AsmPrinter/EHStreamer.h
68 ↗(On Diff #16312)

Woops. Yeah, null means "gap", which in the Itanium LSDA model requires emitting an entry in the call site table.

lib/CodeGen/AsmPrinter/Win64Exception.cpp
113 ↗(On Diff #16312)

Yes, other personality functions such as _CxxFrameHandler and _except_handler3 will be handled here under the same overall EH scheme. They have different tables, but a similar model that requires outlining.

141 ↗(On Diff #16312)

OK, I added some more text to the previous paragraph, as there isn't room inline here.

159 ↗(On Diff #16312)

I guess I meant "as we would for the Itanium LSDA" instead. This comment predated my attempt to separate DWARF CFI from the Itanium LSDA. Previously those concepts were conflated.

215 ↗(On Diff #16312)

How about "filter clauses in landing pads"?

223 ↗(On Diff #16312)

Yes, it's a function pointer, but the existing code consistently refers to this operand of the landing pad catch clause as the "type info" and it is assigned a "type id" returned from @llvm.eh.typeid.for(i8* @my_type_info). More comments...

226 ↗(On Diff #16312)

"catch i8* null" is a catch-all handler, not a __finally handler. The difference is that unwinding will continue through __finally, but catch-all handles the exception.

test/CodeGen/X86/seh-safe-div.ll
36 ↗(On Diff #16312)

This %ehptr is just zero for now. =/ It's probably in some live-in physical register (%eax?), though.

The result of the filter functions is reflected by the selector value, which indicates which filter function fired.

126 ↗(On Diff #16312)

I can do that here, but I was planning to do that in the not-yet implemented CodeGenPrep pass that needs to interact with the parameters.

rjmccall added inline comments.Nov 18 2014, 3:09 PM
include/llvm/MC/MCAsmInfo.h
50 ↗(On Diff #16312)

I meant more 32-bit vs. 64-bit EH, unless you think those will be implemented by the same backend scheme.

rnk added a comment.Nov 18 2014, 3:36 PM

I've made some specific comments on things that I found confusing.

Maybe this doesn't need to be entirely settled now, but it isn't clear to me what you imagine the IR to look like before outlining and how the transition from that to the outlined filter functions will happen. In particular, could you describe what would happen if the filter condition made reference to local variables in the function?

For instance, suppose I wanted a filter in the safe_div example that looked like this:

__except(GetExceptionCode() == EXCEPTION_INT_DIVIDE_BY_ZERO && *n == 0)

what would that look like?

I should post this to the RFC, but I expect the frontend to emit the filter expression code inline in the landingpad, conditional under typeid dispatch. The frontend should create stub filter functions that are filled in by the backend. The backend would trace out basic blocks starting at the landingpad assuming a constant selector value. Tracing would stop at @llvm.eh.seh.filter. Any function-ending terminator such as ret, resume, or unreachable would become unreachable in the outlined filter function. In the parent function, the oulined code would be deleted.

This has a weird interaction with __finally, because we can't outline the cleanup code into the filter function. I think this can be surmounted by changing clang to emit filter expressions before cleanups. This is actually correct under the two-stage unwinding rules given that all filter functions are called until one returns true, all cleanups are called until the final frame is reached, and then control is transferred to the recovering frame.

Here's example IR that I wrote manually:

define i32 @safe_div_filt0(i8* %eh_ptrs, i8* %rbp) {
  unreachable ; Stub function, filled in by SEHPrepare
}
define i32 @safe_div_filt1(i8* %eh_ptrs, i8* %rbp) {
  unreachable ; Stub function, filled in by SEHPrepare
}
declare void @cleanup()

define i32 @safe_div(i32* %n, i32* %d) {
entry:
  %r = alloca i32, align 4
  invoke void @try_body(i32* %r, i32* %n, i32* %d)
          to label %__try.cont unwind label %lpad

lpad:
  %vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__C_specific_handler to i8*)
          cleanup
          catch i8* bitcast (i32 (i8*, i8*)* @safe_div_filt0 to i8*)
          catch i8* bitcast (i32 (i8*, i8*)* @safe_div_filt1 to i8*)
  %eh_ptrs = extractvalue { i8*, i32 } %vals, 0
  %sel = extractvalue { i8*, i32 } %vals, 1
  %filt0_val = call i32 @llvm.eh.typeid.for(i8* bitcast (i32 (i8*, i8*)* @safe_div_filt0 to i8*))
  %is_filt0 = icmp eq i32 %sel, %filt0_val
  br i1 %is_filt0, label %filter0, label %eh.dispatch1

eh.dispatch1:
  %filt1_val = call i32 @llvm.eh.typeid.for(i8* bitcast (i32 (i8*, i8*)* @safe_div_filt1 to i8*))
  %is_filt1 = icmp eq i32 %sel, %filt1_val
  br i1 %is_filt1, label %filter1, label %__finally

filter0:  ; SEHPrepare traces out this basic block, after folding away preceding dispatch.
  %eh_ptrs_c.0 = bitcast i8* %eh_ptrs to i32**
  %eh_rec.0 = load i32** %eh_ptrs_c.0
  %eh_code.0 = load i32* %eh_rec.0
  ; EXCEPTION_ACCESS_VIOLATION = 0xC0000005
  %cmp.0 = icmp eq i32 %eh_code.0, 3221225477
  %filt0.res = zext i1 %cmp.1 to i32
  ; FILTER OUTLINING ENDS HERE
  call void @llvm.eh.seh.filter(i32 %filt0.res)
  br label %__finally

filter1:
  %eh_ptrs_c.1 = bitcast i8* %eh_ptrs to i32**
  %eh_rec.1 = load i32** %eh_ptrs_c.1
  %eh_code.1 = load i32* %eh_rec.1
  ; EXCEPTION_INT_DIVIDE_BY_ZERO = 0xC0000094
  %cmp.1 = icmp eq i32 %eh_code.1, 3221225620
  %filt1.res = zext i1 %cmp.1 to i32
  ; FILTER OUTLINING ENDS HERE
  call void @llvm.eh.seh.filter(i32 %filt0.res)
  br label %__finally

__finally:
  call void @cleanup()
  ; Redo eh typeid dispatch.
  br i1 %is_filt0, label %handler0, label %eh.dispatch2

eh.dispatch2:
  br i1 %is_filt1, label %handler1, label %eh.resume


handler0:
  call void @puts(i8* getelementptr ([27 x i8]* @str1, i32 0, i32 0))
  store i32 -1, i32* %r, align 4
  br label %__try.cont

handler1:
  call void @puts(i8* getelementptr ([29 x i8]* @str2, i32 0, i32 0))
  store i32 -2, i32* %r, align 4
  br label %__try.cont

eh.resume:
  resume { i8*, i32 } %vals

__try.cont:
  %safe_ret = load i32* %r, align 4
  ret i32 %safe_ret
}
rnk added inline comments.Nov 18 2014, 3:41 PM
include/llvm/MC/MCAsmInfo.h
50 ↗(On Diff #16312)

I expect the same scheme can handle both, but I won't be 100% sure until I try. Both schemes are fundamentally similar. The difference is that 64-bit has all this extra unwind information that makes it clear that the filter expression is a first-class function.

rnk edited reviewers, added: grosbach; removed: rjmccall.Nov 18 2014, 5:22 PM
rnk added a subscriber: rjmccall.

Hey Jim, John says I need a backend person. Want to take a look? :)

lib/CodeGen/AsmPrinter/Win64Exception.cpp
159 ↗(On Diff #16312)

Yeah, I understood that much. I just didn't know what it's actually doing. The comment doesn't seem to tell me anything that the name of the computeCallSiteTable function tells me.

This is computing the LabelStart and LabelEnd values for the table, right? Then whatever is going into FirstActions is ignored? I guess maybe the comment just confused me more than it helped.

215 ↗(On Diff #16312)

Yeah, that's clear.

226 ↗(On Diff #16312)

It looks like the same information is being written to the table in either case. How does that work?

rnk added inline comments.Nov 18 2014, 6:46 PM
lib/CodeGen/AsmPrinter/Win64Exception.cpp
226 ↗(On Diff #16312)

This should be __finally:

try_begin_label
try_end_label
finally_callback
0 // zero for landing pad label

This should be catch all:

try_begin_label
try_end_label
0 // no filter function
landing_pad_label

Clear how that is different?

rnk updated this revision to Diff 16390.Nov 19 2014, 10:33 AM
  • Add comments
lib/CodeGen/AsmPrinter/Win64Exception.cpp
132 ↗(On Diff #16312)

This comment isn't quite correct. EXCEPTION_DISPOSITION is an enumerated value:

typedef enum _EXCEPTION_DISPOSITION \{
    ExceptionContinueExecution, // 0
    ExceptionContinueSearch, // 1
    ExceptionNestedException, // 2
    ExceptionCollidedUnwind // 3
} EXCEPTION_DISPOSITION;

This is the type of the value returned by the MS runtime personality function. The values you are referring to are just defined constants:

#define EXCEPTION_EXECUTE_HANDLER       1
#define EXCEPTION_CONTINUE_SEARCH       0
#define EXCEPTION_CONTINUE_EXECUTION    -1

I made a mistake with this when I was experimenting with a custom personality function. I couldn't understand why I had to return EXCEPTION_EXECUTE_HANDLER (1) to get the OS to skip my frame and find a handler lower on the stack. It was because my personality function should have been returning ExceptionContinueSearch (1). You are, of course, correct that the filter function needs to return 1, 0 or -1.

In any event, I think it would be worth having the comment name the constants here. Their names are probably sufficient to document what they do.

223 ↗(On Diff #16312)

I'd be much happier if the code didn't refer to it as a type ID. At the very least the variables here could call it what it is with an explanation of why it pulled a function poitner of of an array of TypeIDs.

226 ↗(On Diff #16312)

OK, that makes sense, though it isn't clear from reading the code that a ClauseLabel of zero is an expected occurance.

I guess my question above should have been, "Are there two different ways to specify a catch all handler?" What I'm wondering about is the fact that if TypeID is zero the emitted filter/finally function is zero, and if TI (TypeInfos[TypeID - 1]) is zero the emitted filter/finally function is zero. I would have thought one of these conditions shouldn't happen.

rnk added a comment.Dec 3 2014, 5:22 PM

Ping? Eric, maybe you can take a look?

I applied this patch locally with the @llvm.frameallocate patch, and they actually work together in the way I would expect.

rnk added a reviewer: echristo.Dec 3 2014, 5:22 PM
rnk updated this revision to Diff 17124.Dec 10 2014, 2:40 PM
  • fix tests
rnk updated this revision to Diff 17513.Dec 19 2014, 1:45 PM
  • remove obsolete terminoloy about dwarf eh
echristo edited edge metadata.Dec 19 2014, 2:01 PM

<insert offline discussion>

So, as we've discussed:

a) split out the enum changes,
b) split out moving the padmap construction
c) rebase

The first two are fairly obvious and fine, we'll then get to the review of the latter.

-eric

rnk updated this revision to Diff 17521.Dec 19 2014, 2:46 PM
rnk edited edge metadata.
  • rebase
rnk edited reviewers, added: majnemer, chandlerc; removed: echristo, grosbach.Dec 22 2014, 4:32 PM

This patch is stripped down to just two things now:

  • Splitting the single landing pad BB into 1 per selector
  • Emitting the EH table expected by __C_specific_handler

This seems pretty mechanical and not very interesting from a design perspective. Looks good?

majnemer added inline comments.Dec 29 2014, 4:29 PM
lib/CodeGen/AsmPrinter/EHStreamer.cpp
587 ↗(On Diff #17521)

Huh, I never knew we had a 32-bit size limit on Itanium...

lib/CodeGen/AsmPrinter/EHStreamer.h
68 ↗(On Diff #17521)

There may be more than one.

This seems superfluous.

69 ↗(On Diff #17521)

s/zero/null/ ?

lib/CodeGen/AsmPrinter/Win64Exception.cpp
113 ↗(On Diff #17521)

It would probably be nice to twine the unexpected personality function's name.

130 ↗(On Diff #17521)

s/typeinfo/typeid/ ?

143 ↗(On Diff #17521)

This should be -1, no?

188 ↗(On Diff #17521)

What special purpose does zero have here?

230–231 ↗(On Diff #17521)

Is this an IR invariant?

235 ↗(On Diff #17521)

What special purpose does zero have here?

lib/CodeGen/AsmPrinter/Win64Exception.h
19 ↗(On Diff #17521)

Is there a reason behind this change?

rnk updated this revision to Diff 18110.Jan 13 2015, 1:21 PM
  • comments
lib/CodeGen/AsmPrinter/EHStreamer.h
69 ↗(On Diff #17521)

Sure.

lib/CodeGen/AsmPrinter/Win64Exception.cpp
113 ↗(On Diff #17521)

OK

143 ↗(On Diff #17521)

oops

188 ↗(On Diff #17521)

Zero is the cleanup typeid, negative numbers are for filters. So this skips filters if present. They don't have defined semantics with the SEH personality function.

230–231 ↗(On Diff #17521)

No. What should we do with these? We can't really implement filters with the SEH personality function.

235 ↗(On Diff #17521)

typeid zero is always a cleanup. This doesn't actually implement them correctly, though, it creates a catch-all.

lib/CodeGen/AsmPrinter/Win64Exception.h
19 ↗(On Diff #17521)

Removed

majnemer accepted this revision.Jan 13 2015, 2:25 PM
majnemer edited edge metadata.

LGTM if you add a predicate which breaks down what values of TypeID correspond to what SEH action.

We should also rename @llvm.eh.typeid.for to something less Itanium specific to enhance clarity, though such a change shouldn't block this code.

lib/CodeGen/AsmPrinter/Win64Exception.cpp
230–231 ↗(On Diff #17521)

As discussed offline, such entries should be skipped.

This revision is now accepted and ready to land.Jan 13 2015, 2:25 PM
This revision was automatically updated to reflect the committed changes.