Page MenuHomePhabricator

[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
Needs ReviewPublic

Authored by tentzen on May 20 2020, 5:01 PM.

Details

Summary

This patch is the Part-1 (FE Clang) implementation of HW Exception handling.

This new feature adds the support of Hardware Exception for Microsoft Windows SEH (Structured Exception Handling). This is the first step of this project; only X86_64 target is enabled in this patch. The plan will add AArch64 & ARM and X86 targets later.

Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -feh-asynch, plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE: : Without the -EHa or -feh-asynch, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules. First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs.

The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(…). As such, when that happens in the middle of an object’s life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option.

One key element of this design is the ability to compute State number at block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping into a _try is not allowed.
The single entry must start with a seh_try_begin() invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks.
Side exits marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[].
Side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest State triumphs others.
If some exits flow to unreachable, propagation on those paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try. However there is one rare exception: jumping into a lifetime that has Dtor but has no Ctor is warned, but allowed:

note: jump bypasses variable with a non-trivial destructor

In this case, this region is actually a MEME (multiple entry multiple exits). Our solution is to inject a eha_scope_begin() invoke in the side entry block to ensure a correct State.

Implementation:
Part-1: Clang implementation described below.

  • Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining.
  • Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary.
  • All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small.

Part-2: LLVM implementation described below.

  • For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap).
  • For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done.
  • If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address.
  • The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions).
  • Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW exceptions can be passed through.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This should likely be at least 3 patches: llvm middle-end, llvm codegen, clang.
Langref changes missing for new intrinsics.
Please post all patches with full context (-U99999)

tentzen marked 3 inline comments as done.May 22 2020, 3:25 PM

This should likely be at least 3 patches: llvm middle-end, llvm codegen, clang.
Langref changes missing for new intrinsics.
Please post all patches with full context (-U99999)

OK, makes sense. I will separate it to at least two patches, one Clang and one LLVM CodeGen as there is not much change in middle-end & Opt.
thank you for pointing out missing the change in Langref.rst. I was thinking to update docs/ExceptionHandling.rst after this patch is accepted. Do you think this types of intrinsic should be described in Langref?
thanks,

tentzen updated this revision to Diff 265812.May 22 2020, 3:47 PM
tentzen retitled this revision from [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC option -EHa) to [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1.
tentzen edited the summary of this revision. (Show Details)
tentzen added a reviewer: lebedev.ri.

Per Roman Lebedev's feedback, divide the patch into Clang and LLVM.

lebedev.ri resigned from this revision.May 22 2020, 4:12 PM

This should likely be at least 3 patches: llvm middle-end, llvm codegen, clang.
Langref changes missing for new intrinsics.
Please post all patches with full context (-U99999)

I was thinking to update docs/ExceptionHandling.rst after this patch is accepted. Do you think this types of intrinsic should be described in Langref?
thanks,

It may be helpful (even for the reviewers) to first specify their behavior,
instead of writing that after-the-fact "backwardly" based on the implementation.

It may be helpful (even for the reviewers) to first specify their behavior,
instead of writing that after-the-fact "backwardly" based on the implementation.

For reviewers, the purpose of those intrinsic are described in Summary section:

  • Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining.
  • Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary.
tentzen updated this revision to Diff 265819.May 22 2020, 5:06 PM
tentzen removed a reviewer: lebedev.ri.

fixed tidy warnings

It may be helpful (even for the reviewers) to first specify their behavior,
instead of writing that after-the-fact "backwardly" based on the implementation.

For reviewers, the purpose of those intrinsic are described in Summary section:

Like the disscussion we just had in HWLoops patch, unless the behavior of new intrinsics/etc is stated in langref, they are unspecified.

  • Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining.
  • Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary.
tentzen updated this revision to Diff 266374.May 26 2020, 4:45 PM

update LangRef.rst for new intrinsics

Is there any feedback? thanks,

majnemer added inline comments.Jun 3 2020, 6:18 PM
clang/include/clang/AST/Stmt.h
1783

I think this should be isSideEntry to be consistent.

clang/lib/CodeGen/CGException.cpp
603–609

I think this logic should move into MicrosoftCXXABI::getCatchAllTypeInfo.

tentzen marked 3 inline comments as done.Jun 3 2020, 11:44 PM

thank you David. will update and submit a new patch shortly.

clang/lib/CodeGen/CGException.cpp
603–609

good point. thanks

tentzen updated this revision to Diff 268375.Jun 4 2020, 12:36 AM
tentzen marked an inline comment as done.
tentzen edited the summary of this revision. (Show Details)

update a couple of changes per David Majnemer's suggestion.

does it look good to you? or any more comment?
thanks

asmith accepted this revision.Sep 3 2020, 11:03 AM
This revision is now accepted and ready to land.Sep 3 2020, 11:03 AM
lebedev.ri requested changes to this revision.Sep 3 2020, 11:35 AM

I feel like this may be trying to do too many things at once.
Was there an RFC?
There are several patches here:

  1. langref
  2. llvm side of the patch (maybe should be in the previous patch)
  3. SideEntry changes - missing astdump changes, serialization/deserialization
  4. the rest of the owl
clang/lib/CodeGen/CGCleanup.cpp
777

Here and elsewhere, the comments start from Capital Letter

This revision now requires changes to proceed.Sep 3 2020, 11:35 AM

Yes there was an RFC and discussion and several requests for comments.
http://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html

Yes there was an RFC and discussion and several requests for comments.
http://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html

It is probably good to state so in the patch's description.
But as far as i can tell the disscussion died down without an explicit acceptance?

I've made some brief comments about the code, but I think I have much larger concerns here.

The first is whether LLVM really intends to satisfy the constraints necessary to make these exceptions work, which I don't think you've gotten clear consensus about at all. Unfortunately, llvm-dev is not an effective place to gather this kind of consensus. For a change of this magnitude, I think you need to be proactively gathering a group of interested parties to sort out whether this is something we should support, not just expecting people to reply to an RFC.

The second is that I think the concept of block-level tracking state / control flow is potentially really problematic for LLVM, and the entire design here seems contingent on it. Again, this is something we need to ensure consensus on, not something we can really just sign off on in code review.

The third is that I'm not really convinced that the way we handle cleanups in Clang today is sufficiently "atomic" in design to accommodate this. I need to think about it.

clang/include/clang/Basic/LangOptions.def
131

This is usually shortened as "async", but I'm not sure why it's being shortened at all. Also, hardware faults are synchronous. Also, it applies in all language modes.

clang/lib/CodeGen/CGCleanup.cpp
769

Readers of this code shouldn't have to remember what cl.exe command-line flags mean. Please call this something like RequiresSEHScopeEnd.

783

Please put braces around the outer if, and please be consistent about braces for the inner clauses.

1305

I think you should take the callee by value. And please rename this to not use "EHa".

clang/lib/CodeGen/CGDecl.cpp
2008 ↗(On Diff #268375)

This seems like a pretty random place to do this. Presumably this need to be tied to basically any scope with EH protection?

Hi, John, thank you for reviewing this patch and providing feedback.
regarding your comments:

1, In the RFC thread early, Reid K (the major contributor of Windows SEH/EH support) had agreed that "the value proposition is clear and large".
2, could you elaborate why do you think "block-level tracking state / control flow is potentially really problematic for LLVM.."? Could you be more specific or provide some examples? Remainder that this design also relies on existent EH framework to compute block state, and it has passed MSVC SEH suite that contains many complicated SEH cases.
3, I don't understand your concern "the way we handle cleanups in Clang today is sufficiently "atomic" in design to accommodate this". From Cleanup perspective, there is no difference between HW exception and non-HW exceptions. The key of handling HW exception is to catch exceptions on those faulty / memory instructions. Once it's caught the cleanup process is identical to non-HW exception. again, could you elaborate further?

Finally, regarding naming/option in the code, -EHa or "Asynchronous" is the terminology used in MSVC in Windows. Programmers who use/care this feature should know it well as it's the feature to match MSVC in Windows.
Thanks,

Hi, John, thank you for reviewing this patch and providing feedback.
regarding your comments:

1, In the RFC thread early, Reid K (the major contributor of Windows SEH/EH support) had agreed that "the value proposition is clear and large".

And I agree with him that the potential benefits are substantial. I'm just saying that one barely-trafficked RFC thread is not evidence of a community consensus.

2, could you elaborate why do you think "block-level tracking state / control flow is potentially really problematic for LLVM.."? Could you be more specific or provide some examples?

As I understand it, you are creating implicit edges within the function to the active SEH handler (potentially part of the same function) from arbitrary places within covered basic blocks, turning basic blocks from a single-entry single-(internal-)exit representation to a single-entry multiple-exit representation. That is a huge change. How do you expect LLVM transformations to preserve or merge this information across blocks? How do you expect transformations to not move code between blocks in problematic ways, or reorder code within blocks in ways that will suddenly be visible to the optimizer? These are the standard objections to supporting features like this, that they have basically unspecified behavior and appear to work by good intentions and happy thoughts.

Remainder that this design also relies on existent EH framework to compute block state, and it has passed MSVC SEH suite that contains many complicated SEH cases.

Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?

3, I don't understand your concern "the way we handle cleanups in Clang today is sufficiently "atomic" in design to accommodate this". From Cleanup perspective, there is no difference between HW exception and non-HW exceptions. The key of handling HW exception is to catch exceptions on those faulty / memory instructions. Once it's caught the cleanup process is identical to non-HW exception. again, could you elaborate further?

There is a major difference between HW exceptions and non-HW exceptions, which is that non-HW exceptions are observed only at call sites. It is quite easy to write code in IRGen that works because the code emitted at a certain point isn't totally arbitrary.

Finally, regarding naming/option in the code, -EHa or "Asynchronous" is the terminology used in MSVC in Windows. Programmers who use/care this feature should know it well as it's the feature to match MSVC in Windows.

Yes, but it's not known well to all the compiler developers who are reading and maintaining the Clang codebase. I'm not saying you should rename the command-line flag, I'm asking that you not just say "EHa" in comments and identifiers.

Thank you for prompt reply again.

[rjmccall] And I agree with him that the potential benefits are substantial. I'm just saying that one barely-trafficked RFC thread is not evidence of a community consensus.

OK, thanks. it's good to know you are also supportive in adding this feature.

[rjmccall] As I understand it, you are creating implicit edges within the function to the active SEH handler (potentially part of the same function) from arbitrary places within covered basic blocks, turning basic blocks from a single-entry single-(internal-)exit representation to a single-entry multiple-exit representation. That is a huge change. How do you expect LLVM transformations to preserve or merge this information across blocks? How do you expect transformations to not move code between blocks in problematic ways, or reorder code within blocks in ways that will suddenly be visible to the optimizer? These are the standard objections to supporting features like this, that they have basically unspecified behavior and appear to work by good intentions and happy thoughts.

There is absolutely NO explicit edge added in this design. You misunderstood the term of SEME here. It refers to a SEH scope-region of many blocks that can have multi-blocks jumping out of this region, not multi-branches jumping out of a block. Since blocks in LLVM are already 'completely' connected by Control Flow (no lexical concept at all before layout), there is nothing we need to do to build this SEME region. For LLVM transformations/optimizations, the only constrain added is volatile attribute on memory operations within a SEH region. As described earlier, it's a little sub-optimal, but given that the amount of code directly under a SEH scope is very small, the impact to overall code quality is minor. Volatile-ness is robust as it's one fundamental attribute that is expected to be honored by entire framework.

[rjmccall] Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?

Yes. this is just the first patch. there will be second patch to land LLVM code-gen part in, then 3rd patch to add some test cases (Ex, xcpt4u.c from MSVC). Without option -EHa, the patch is a zero-impact change for now.

[rjmccall] There is a major difference between HW exceptions and non-HW exceptions, which is that non-HW exceptions are observed only at call sites. It is quite easy to write code in IRGen that works because the code emitted at a certain point isn't totally arbitrary

still not completely sure what you are referring to. let me guess. Are you concerned that the Cleanup mechanism of a SEH region may not be setup properly if there is no call instruction in the region? This is actually a good question. The answer is that the presence of seh.try.begin() intrinsic will ensure that a Cleanup stack is established properly because there is an EH edge built for seh.try.begin() automatically. hope this answer your concern here.

[rjmccall] Yes, but it's not known well to all the compiler developers who are reading and maintaining the Clang codebase. I'm not saying you should rename the command-line flag, I'm asking that you not just say "EHa" in comments and identifiers.

Ok, no problem. more comments will be added.

Thank you for prompt reply again.

[rjmccall] And I agree with him that the potential benefits are substantial. I'm just saying that one barely-trafficked RFC thread is not evidence of a community consensus.

OK, thanks. it's good to know you are also supportive in adding this feature.

[rjmccall] As I understand it, you are creating implicit edges within the function to the active SEH handler (potentially part of the same function) from arbitrary places within covered basic blocks, turning basic blocks from a single-entry single-(internal-)exit representation to a single-entry multiple-exit representation. That is a huge change. How do you expect LLVM transformations to preserve or merge this information across blocks? How do you expect transformations to not move code between blocks in problematic ways, or reorder code within blocks in ways that will suddenly be visible to the optimizer? These are the standard objections to supporting features like this, that they have basically unspecified behavior and appear to work by good intentions and happy thoughts.

There is absolutely NO explicit edge added in this design.

I didn't say there was. I said that there was an *implicit* edge: from the memory access to the handler. It's precisely because that edge is implicit that it's problematic for analysis and optimization.

For LLVM transformations/optimizations, the only constrain added is volatile attribute on memory operations within a SEH region.

When is this attribute applied? Because you can't just apply it at a single point; you also need to mark operations when they're copied/inlined into the region. We once had a similar attempt at handling exceptions with a setjmp-based ABI (not the "SJLJ" exceptions supported by the backends, something frontend-centric) that had persistent problems because of poor modeling in the IR, which it feels like this is doomed to repeat.

[rjmccall] Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?

Yes. this is just the first patch. there will be second patch to land LLVM code-gen part in, then 3rd patch to add some test cases (Ex, xcpt4u.c from MSVC). Without option -EHa, the patch is a zero-impact change for now.

Do you expect to maintain a CI bot that runs the full test suite continuously as changes go into LLVM? Because this sort of thing is extremely easy to break.

[rjmccall] There is a major difference between HW exceptions and non-HW exceptions, which is that non-HW exceptions are observed only at call sites. It is quite easy to write code in IRGen that works because the code emitted at a certain point isn't totally arbitrary

still not completely sure what you are referring to. let me guess. Are you concerned that the Cleanup mechanism of a SEH region may not be setup properly if there is no call instruction in the region? This is actually a good question. The answer is that the presence of seh.try.begin() intrinsic will ensure that a Cleanup stack is established properly because there is an EH edge built for seh.try.begin() automatically. hope this answer your concern here.

Does this happen whenever there are changes to the active cleanups, and not just when entering the scope? Since you're not modeling the control edge from the access to the handler, IRGen isn't naturally going to create a cleanup just because there's a memory access in a scope, and even if it did, the cleanup would appear to be unreachable.

That's really a major part of my concern, that the test suite might be passing because of some common structure to the tests which minimizes the impact of the poor modeling in IR. IR modeling inadequacies are not going to cause obvious problems if in practice the SEH scope is always entered in a function that doesn't do much inside the scope besides call a function, and then the trap happens with the callee. In fact, I would be a lot more confident in this feature if you actually forced the IR to model it that way, so that there's nothing but a by-fiat unoptimizable invoke within the scope. You could extract the function in the frontend and then, if necessary, inline it in a very late pass after any IR-level optimizations, although that would leave me still concerned about LLVM CodeGen's transformations.

But even doing that wouldn't help with code like the following, where the only operation within the scope of s2 is a memory access:

void test() {
  std::string s1 = <blah>;
  int *ip = get_invalid_ptr();
  std::string s2 = <blah>;
  *ip = 0;
}

[rjmccall] Yes, but it's not known well to all the compiler developers who are reading and maintaining the Clang codebase. I'm not saying you should rename the command-line flag, I'm asking that you not just say "EHa" in comments and identifiers.

Ok, no problem. more comments will be added.

Comments and identifiers. So, for example, the intrinsic should not be called eha_scope_begin.

Thank you for prompt reply again.

[rjmccall] And I agree with him that the potential benefits are substantial. I'm just saying that one barely-trafficked RFC thread is not evidence of a community consensus.

OK, thanks. it's good to know you are also supportive in adding this feature.

[rjmccall] As I understand it, you are creating implicit edges within the function to the active SEH handler (potentially part of the same function) from arbitrary places within covered basic blocks, turning basic blocks from a single-entry single-(internal-)exit representation to a single-entry multiple-exit representation. That is a huge change. How do you expect LLVM transformations to preserve or merge this information across blocks? How do you expect transformations to not move code between blocks in problematic ways, or reorder code within blocks in ways that will suddenly be visible to the optimizer? These are the standard objections to supporting features like this, that they have basically unspecified behavior and appear to work by good intentions and happy thoughts.

There is absolutely NO explicit edge added in this design.

I didn't say there was. I said that there was an *implicit* edge: from the memory access to the handler. It's precisely because that edge is implicit that it's problematic for analysis and optimization.

Oh, sorry I misread it.
For HW exceptions the 'implicit' edge is unavoidable unless an explicit approach (like previous iload/istore) is employed. iload approach was extensively discussed and evaluated when it's proposed. As far as I know, the concerns were that it could create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial.
So this design applies an IMPLICT approach. I respectfully disagree that it's problematic because as long as volatile attribute is honored it's robust. please see the C & C++ SEH rule stated in patch description section.

For LLVM transformations/optimizations, the only constrain added is volatile attribute on memory operations within a SEH region.

When is this attribute applied? Because you can't just apply it at a single point; you also need to mark operations when they're copied/inlined into the region. We once had a similar attempt at handling exceptions with a setjmp-based ABI (not the "SJLJ" exceptions supported by the backends, something frontend-centric) that had persistent problems because of poor modeling in the IR, which it feels like this is doomed to repeat.

This is applied only once in FE. Consider it's like from the source code. So NO there is no other place we need to change. Per SEH semantic, the inline code is immune from volatile constraint.
I don't think this is the same as SJLJ story. I did once look into SJLJ problem when I found every single test in MSVC Setjmp suite was broken. it's because the implicit back edge from longjmp (or a callee with longjump) was not modeled properly. actually I have a simple idea that can fix this SJLJ problem robustly, but I've been clogged in this SEH task.

[rjmccall] Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?

Yes. this is just the first patch. there will be second patch to land LLVM code-gen part in, then 3rd patch to add some test cases (Ex, xcpt4u.c from MSVC). Without option -EHa, the patch is a zero-impact change for now.

Do you expect to maintain a CI bot that runs the full test suite continuously as changes go into LLVM? Because this sort of thing is extremely easy to break.

Yes, as long as it's a bug/flaw in the design/implementation. New opt/analysis/tools that violates original basic volatile/cleanup/EH-framework exposed by SEH test cases is not included. Isn't this the policy of Clang/LLVM community?

[rjmccall] There is a major difference between HW exceptions and non-HW exceptions, which is that non-HW exceptions are observed only at call sites. It is quite easy to write code in IRGen that works because the code emitted at a certain point isn't totally arbitrary

still not completely sure what you are referring to. let me guess. Are you concerned that the Cleanup mechanism of a SEH region may not be setup properly if there is no call instruction in the region? This is actually a good question. The answer is that the presence of seh.try.begin() intrinsic will ensure that a Cleanup stack is established properly because there is an EH edge built for seh.try.begin() automatically. hope this answer your concern here.

Does this happen whenever there are changes to the active cleanups, and not just when entering the scope? Since you're not modeling the control edge from the access to the handler, IRGen isn't naturally going to create a cleanup just because there's a memory access in a scope, and even if it did, the cleanup would appear to be unreachable.

That's really a major part of my concern, that the test suite might be passing because of some common structure to the tests which minimizes the impact of the poor modeling in IR. IR modeling inadequacies are not going to cause obvious problems if in practice the SEH scope is always entered in a function that doesn't do much inside the scope besides call a function, and then the trap happens with the callee. In fact, I would be a lot more confident in this feature if you actually forced the IR to model it that way, so that there's nothing but a by-fiat unoptimizable invoke within the scope.

Yes, this is exactly what we are doing. There is a seh.try.begin() injected for every level of SEH scope. In addition to marking the beginning of SEH SEME region, it also guarantees a cleanup is setup. This seh.try.begin() intrinsic must be turned into an invoke (not a call) because SEH state is not zero.

You could extract the function in the frontend and then, if necessary, inline it in a very late pass after any IR-level optimizations, although that would leave me still concerned about LLVM CodeGen's transformations.

But even doing that wouldn't help with code like the following, where the only operation within the scope of s2 is a memory access:

void test() {
  std::string s1 = <blah>;
  int *ip = get_invalid_ptr();
  std::string s2 = <blah>;
  *ip = 0;
}

For C++ case under -EHa, seh.try.begin(), an EhaScopeBegin() is injected at the beginning of the scope if Dtor() cleanup is required.

[rjmccall] Yes, but it's not known well to all the compiler developers who are reading and maintaining the Clang codebase. I'm not saying you should rename the command-line flag, I'm asking that you not just say "EHa" in comments and identifiers.

Ok, no problem. more comments will be added.

Comments and identifiers. So, for example, the intrinsic should not be called eha_scope_begin.

There is absolutely NO explicit edge added in this design.

I didn't say there was. I said that there was an *implicit* edge: from the memory access to the handler. It's precisely because that edge is implicit that it's problematic for analysis and optimization.

Oh, sorry I misread it.
For HW exceptions the 'implicit' edge is unavoidable unless an explicit approach (like previous iload/istore) is employed. iload approach was extensively discussed and evaluated when it's proposed. As far as I know, the concerns were that it could create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial.

Hmm. I suppose it's not that different from the general problem with setjmp/longjmp. I think you'll still have representation problems here, but I'll address them below; I concede that in principle marking regions with an intrinsic that instructions can't necessary be moved over is workable, and so you don't need to turn every faulting instruction into an invoke. You may need to mark the initial seh.try.begin with something like returns_twice or otherwise communicate that there's non-obvious control flow within that function.

So this design applies an IMPLICT approach. I respectfully disagree that it's problematic because as long as volatile attribute is honored it's robust. please see the C & C++ SEH rule stated in patch description section.

You also need to make sure that potentially-faulting instructions aren't moved *into* the _try region. I don't think that just making accesses with the _try volatile achieves that. Probably the easiest way to do this would be to outline the _try region through most of the LLVM pipeline and only inline it very late.

For LLVM transformations/optimizations, the only constrain added is volatile attribute on memory operations within a SEH region.

When is this attribute applied? Because you can't just apply it at a single point; you also need to mark operations when they're copied/inlined into the region. We once had a similar attempt at handling exceptions with a setjmp-based ABI (not the "SJLJ" exceptions supported by the backends, something frontend-centric) that had persistent problems because of poor modeling in the IR, which it feels like this is doomed to repeat.

This is applied only once in FE. Consider it's like from the source code. So NO there is no other place we need to change. Per SEH semantic, the inline code is immune from volatile constraint.

The SEH semantics say that faulting instructions not lexically within the _try can be re-ordered with each other, but it's not clear that it's okay to re-order them with instructions within the _try. If I had to guess, I would say that the expectation is that inlining is disabled within a _try scope.

I don't think this is the same as SJLJ story. I did once look into SJLJ problem when I found every single test in MSVC Setjmp suite was broken. it's because the implicit back edge from longjmp (or a callee with longjump) was not modeled properly. actually I have a simple idea that can fix this SJLJ problem robustly, but I've been clogged in this SEH task.

The issue I'm describing was somewhat different from the general problems of using setjmp. The C setjmp/longjmp library functions don't guarantee that stores to non-volatile objects will be visible after the longjmp; that is, they require user cooperation in order to get consistency. This creates a problem when trying to use setjmp/longjmp directly in IR to implement language exceptions: while you can relatively easily mark all the accesses within a lexical try as volatile in the frontend, non-volatile accesses can be introduced into the function (e.g. by inlining) that won't be volatile, and then they potentially get lost if an exception is thrown. A simple example of this is something like:

void assign_and_throw(int *ip) {
  *ip = 5;
  throw some_exception();
}

void test() {
  int i;
  try {
    i = 0;
    assign_and_throw(&i);
  } catch (...) {
    printf("%d\n", i);
  }
}

The frontend marks the assignment to i in test as volatile, so the compiler must emit it as a real store, and so it will be observed in the catch handler after the call to longjmp done within the throw operation. And if no inlining is done, the compiler has no choice but to emit the assignment to *ip in assign_and_throw as a real store as well. But if the compiler inlines the call to assign_and_throw, that store to *ip becomes a non-volatile store directly to i, and since there's no direct control-flow path from there to a location that can read it, that store appears to be dead and can be discarded. That is allowed under the rules for setjmp — if the user wants to ensure that this store is visible, they would have to declare i as a volatile int and ip as a volatile int * — but of course not under the rules for throw/catch. This is why frontends cannot naively lower things like EH to setjmp without any influence on the inliner.

[rjmccall] Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?

Yes. this is just the first patch. there will be second patch to land LLVM code-gen part in, then 3rd patch to add some test cases (Ex, xcpt4u.c from MSVC). Without option -EHa, the patch is a zero-impact change for now.

Do you expect to maintain a CI bot that runs the full test suite continuously as changes go into LLVM? Because this sort of thing is extremely easy to break.

Yes, as long as it's a bug/flaw in the design/implementation. New opt/analysis/tools that violates original basic volatile/cleanup/EH-framework exposed by SEH test cases is not included. Isn't this the policy of Clang/LLVM community?

I think you're missing what I'm asking. If LLVM accepts this feature, it will become our collective responsibility as a project to keep it working. You have a large external correctness test suite for this feature. It does not sound like you intend to open-source that test suite; instead, you intend to extract a small number of unit tests from it and add those to the LLVM test suite. So I'm asking if you're at least going to have an external CI bot which will run the full test suite for this feature to ensure it hasn't been broken by the last day of commits. It does not seem reasonable to expect that the few unit tests you extract will themselves be sufficient to keep the feature tested.

For C++ case under -EHa, seh.try.begin(), an EhaScopeBegin() is injected at the beginning of the scope if Dtor() cleanup is required.

Okay, I think I understand how this is supposed to work, thanks.

@rjmccall wrote:
I think you're missing what I'm asking. If LLVM accepts this feature, it will become our collective responsibility as a project to keep it working. You have a large external correctness test suite for this feature. It does not sound like you intend to open-source that test suite; instead, you intend to extract a small number of unit tests from it and add those to the LLVM test suite. So I'm asking if you're at least going to have an external CI bot which will run the full test suite for this feature to ensure it hasn't been broken by the last day of commits. It does not seem reasonable to expect that the few unit tests you extract will themselves be sufficient to keep the feature tested.

Microsoft's SEH tests are open source and there was a suggestion in one of the threads on llvm-dev to run them on the buildbots. We can look into that as a follow on to this patch.

https://lists.llvm.org/pipermail/llvm-dev/2019-November/136695.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/140614.html

@rjmccall wrote:
I think you're missing what I'm asking. If LLVM accepts this feature, it will become our collective responsibility as a project to keep it working. You have a large external correctness test suite for this feature. It does not sound like you intend to open-source that test suite; instead, you intend to extract a small number of unit tests from it and add those to the LLVM test suite. So I'm asking if you're at least going to have an external CI bot which will run the full test suite for this feature to ensure it hasn't been broken by the last day of commits. It does not seem reasonable to expect that the few unit tests you extract will themselves be sufficient to keep the feature tested.

Microsoft's SEH tests are open source and there was a suggestion in one of the threads on llvm-dev to run them on the buildbots. We can look into that as a follow on to this patch.

https://lists.llvm.org/pipermail/llvm-dev/2019-November/136695.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/140614.html

That's great. I think that needs to be taken as more than just a suggestion, though; it's critical part of the testing story for this proposed feature.

There is absolutely NO explicit edge added in this design.

I didn't say there was. I said that there was an *implicit* edge: from the memory access to the handler. It's precisely because that edge is implicit that it's problematic for analysis and optimization.

Oh, sorry I misread it.
For HW exceptions the 'implicit' edge is unavoidable unless an explicit approach (like previous iload/istore) is employed. iload approach was extensively discussed and evaluated when it's proposed. As far as I know, the concerns were that it could create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial.

Hmm. I suppose it's not that different from the general problem with setjmp/longjmp. I think you'll still have representation problems here, but I'll address them below; I concede that in principle marking regions with an intrinsic that instructions can't necessary be moved over is workable, and so you don't need to turn every faulting instruction into an invoke. You may need to mark the initial seh.try.begin with something like returns_twice or otherwise communicate that there's non-obvious control flow within that function.

OK, good to know this. thank you!
Other than blocking some opts, does this returns_twice attribute have some other implications?

So this design applies an IMPLICT approach. I respectfully disagree that it's problematic because as long as volatile attribute is honored it's robust. please see the C & C++ SEH rule stated in patch description section.

You also need to make sure that potentially-faulting instructions aren't moved *into* the _try region. I don't think that just making accesses with the _try volatile achieves that. Probably the easiest way to do this would be to outline the _try region through most of the LLVM pipeline and only inline it very late.

The intrinsic is set with unknown memory access. Plus returns_twice you just suggested, is not it sufficient to block potentially-faulting instructions from being moved across?

For LLVM transformations/optimizations, the only constrain added is volatile attribute on memory operations within a SEH region.

When is this attribute applied? Because you can't just apply it at a single point; you also need to mark operations when they're copied/inlined into the region. We once had a similar attempt at handling exceptions with a setjmp-based ABI (not the "SJLJ" exceptions supported by the backends, something frontend-centric) that had persistent problems because of poor modeling in the IR, which it feels like this is doomed to repeat.

This is applied only once in FE. Consider it's like from the source code. So NO there is no other place we need to change. Per SEH semantic, the inline code is immune from volatile constraint.

The SEH semantics say that faulting instructions not lexically within the _try can be re-ordered with each other, but it's not clear that it's okay to re-order them with instructions within the _try. If I had to guess, I would say that the expectation is that inlining is disabled within a _try scope.

Reordering is NOT okay for instructions 'directly' within a _try. 'directly' here means it's the code from source code originally, i.e., inlined code or the code in callees is not included. Disabling inline in general is good, but not necessary.

I don't think this is the same as SJLJ story. I did once look into SJLJ problem when I found every single test in MSVC Setjmp suite was broken. it's because the implicit back edge from longjmp (or a callee with longjump) was not modeled properly. actually I have a simple idea that can fix this SJLJ problem robustly, but I've been clogged in this SEH task.

The issue I'm describing was somewhat different from the general problems of using setjmp. The C setjmp/longjmp library functions don't guarantee that stores to non-volatile objects will be visible after the longjmp; that is, they require user cooperation in order to get consistency. This creates a problem when trying to use setjmp/longjmp directly in IR to implement language exceptions: while you can relatively easily mark all the accesses within a lexical try as volatile in the frontend, non-volatile accesses can be introduced into the function (e.g. by inlining) that won't be volatile, and then they potentially get lost if an exception is thrown. A simple example of this is something like:

void assign_and_throw(int *ip) {
  *ip = 5;
  throw some_exception();
}

void test() {
  int i;
  try {
    i = 0;
    assign_and_throw(&i);
  } catch (...) {
    printf("%d\n", i);
  }
}

The frontend marks the assignment to i in test as volatile, so the compiler must emit it as a real store, and so it will be observed in the catch handler after the call to longjmp done within the throw operation. And if no inlining is done, the compiler has no choice but to emit the assignment to *ip in assign_and_throw as a real store as well. But if the compiler inlines the call to assign_and_throw, that store to *ip becomes a non-volatile store directly to i, and since there's no direct control-flow path from there to a location that can read it, that store appears to be dead and can be discarded. That is allowed under the rules for setjmp — if the user wants to ensure that this store is visible, they would have to declare i as a volatile int and ip as a volatile int * — but of course not under the rules for throw/catch.

As said above, SEH semantic only applies to code "directly" under SEH scope for few obvious reasons (Ex, indirect callees or callees not in the same module wont be visible). if faulty instructions are allowed to be reordered inside a callee, those instructions are allowed to do so even after they are inlined into _try.
Volatile-ness here is primary for reordering constraint, not for data-flow. Per LLVM EH framework, a data-defined in _try will flow through explicit EH edge into the Handler. In this SEH design, there will be a seh.try.end() at every exit of SEH region that ensures data defined in _try scope flow to the use in Handler.

This is why frontends cannot naively lower things like EH to setjmp without any influence on the inliner.

why do you want to do that? LLVM EH offers a robust data-flow representation while SJLJ data-flow modeling is incomplete.

[rjmccall] Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?

Yes. this is just the first patch. there will be second patch to land LLVM code-gen part in, then 3rd patch to add some test cases (Ex, xcpt4u.c from MSVC). Without option -EHa, the patch is a zero-impact change for now.

Do you expect to maintain a CI bot that runs the full test suite continuously as changes go into LLVM? Because this sort of thing is extremely easy to break.

Yes, as long as it's a bug/flaw in the design/implementation. New opt/analysis/tools that violates original basic volatile/cleanup/EH-framework exposed by SEH test cases is not included. Isn't this the policy of Clang/LLVM community?

I think you're missing what I'm asking. If LLVM accepts this feature, it will become our collective responsibility as a project to keep it working. You have a large external correctness test suite for this feature. It does not sound like you intend to open-source that test suite; instead, you intend to extract a small number of unit tests from it and add those to the LLVM test suite. So I'm asking if you're at least going to have an external CI bot which will run the full test suite for this feature to ensure it hasn't been broken by the last day of commits. It does not seem reasonable to expect that the few unit tests you extract will themselves be sufficient to keep the feature tested.

As Aaron replied, yes MS SEH (and EH) test suite are (or will be) open sourced.
BTW I have to point out that to pass the entire suite we need 2nd missing feature (Local-Unwind) in place too. As a heads up, a proposed Local-Unwind design and implementation can be seen here (https://github.com/tentzen/llvm-project ). We will bring it back to RFC once -EHa is accepted and landed.

For C++ case under -EHa, seh.try.begin(), an EhaScopeBegin() is injected at the beginning of the scope if Dtor() cleanup is required.

Okay, I think I understand how this is supposed to work, thanks.

There is absolutely NO explicit edge added in this design.

I didn't say there was. I said that there was an *implicit* edge: from the memory access to the handler. It's precisely because that edge is implicit that it's problematic for analysis and optimization.

Oh, sorry I misread it.
For HW exceptions the 'implicit' edge is unavoidable unless an explicit approach (like previous iload/istore) is employed. iload approach was extensively discussed and evaluated when it's proposed. As far as I know, the concerns were that it could create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial.

Hmm. I suppose it's not that different from the general problem with setjmp/longjmp. I think you'll still have representation problems here, but I'll address them below; I concede that in principle marking regions with an intrinsic that instructions can't necessary be moved over is workable, and so you don't need to turn every faulting instruction into an invoke. You may need to mark the initial seh.try.begin with something like returns_twice or otherwise communicate that there's non-obvious control flow within that function.

OK, good to know this. thank you!
Other than blocking some opts, does this returns_twice attribute have some other implications?

So this design applies an IMPLICT approach. I respectfully disagree that it's problematic because as long as volatile attribute is honored it's robust. please see the C & C++ SEH rule stated in patch description section.

You also need to make sure that potentially-faulting instructions aren't moved *into* the _try region. I don't think that just making accesses with the _try volatile achieves that. Probably the easiest way to do this would be to outline the _try region through most of the LLVM pipeline and only inline it very late.

The intrinsic is set with unknown memory access. Plus returns_twice you just suggested, is not it sufficient to block potentially-faulting instructions from being moved across?

I've changed my mind; you've mostly got edges in the place you need, and you don't need returns_twice. But see the comment below about unreachable ends of scopes; I don't know how to handle that well.

For LLVM transformations/optimizations, the only constrain added is volatile attribute on memory operations within a SEH region.

When is this attribute applied? Because you can't just apply it at a single point; you also need to mark operations when they're copied/inlined into the region. We once had a similar attempt at handling exceptions with a setjmp-based ABI (not the "SJLJ" exceptions supported by the backends, something frontend-centric) that had persistent problems because of poor modeling in the IR, which it feels like this is doomed to repeat.

This is applied only once in FE. Consider it's like from the source code. So NO there is no other place we need to change. Per SEH semantic, the inline code is immune from volatile constraint.

The SEH semantics say that faulting instructions not lexically within the _try can be re-ordered with each other, but it's not clear that it's okay to re-order them with instructions within the _try. If I had to guess, I would say that the expectation is that inlining is disabled within a _try scope.

Reordering is NOT okay for instructions 'directly' within a _try. 'directly' here means it's the code from source code originally, i.e., inlined code or the code in callees is not included. Disabling inline in general is good, but not necessary.
As said above, SEH semantic only applies to code "directly" under SEH scope for few obvious reasons (Ex, indirect callees or callees not in the same module wont be visible). if faulty instructions are allowed to be reordered inside a callee, those instructions are allowed to do so even after they are inlined into _try.

I feel almost certain that this is wrong. Let me try to explain. It would be helpful to have a more precise statement of semantics than I can find online, but we can start from the basics.

"X and Y can be reordered" is how optimizer folks talk, but it's not generally how semantics are specified. As a general rule (ignoring sequence points and concurrency for simplicity), C and C++ require operations to be executed in a particular order on the abstract machine, and there are no rules explicitly sanctioning reordering. Instead, the "as if" rule gives the implementation broad leeway to do things differently from how they're specified on the abstract machine, as long as the difference cannot be observed by a valid program. Faults generally correspond to conditions that have undefined behavior under the language, and so the implementation may assume they do not happen in a valid program, and so the consequences of them happening cannot be observed by a valid program, and so faults can be ignored when deciding whether to reorder operations, which allows a lot of reordering.

_try should be understood as fully defining the behavior of faulting operations written within its scope so that they have the defined semantics of resuming execution in the _except clause. Because faults have the defined behavior of ending execution of the _try block, whether the fault occurred is observable, so the "as if" leeway stops applying. Thus, other operations with side-effects cannot be reordered with potentially faulty operations written within the _try, no more than they could be reordered with a break statement. That applies whether those other operations are written within the _try or not; it's just that potentially-faulty operations written within the _try must always be treated as having side-effects.

-EHa more generally should be understand as partially defining the behavior of faulting operations even if they are not written in a _try: if the operation faults, the behavior is defined so that scopes are unwound and control resumes at an _except clause if one is dynamically active, but this may be observed at an indeterminate point. It is hard to describe these semantics formally, but the intended rules for the implementation are pretty clear: potentially faulty operations outside of a _try can be reordered around each other or even moved to different scopes, as per normal optimization rules, but whenever those operations are executed, if they fault they must trigger an unwind and cause execution to resume at an _except clause if one is active.

So I think you cannot allow operations inlined from a call made within a _try to be reordered with operations written within the _try, or to happen outside the _try. Since LLVM doesn't promise not to reorder volatile and non-volatile stores to provably non-aliased locations, you cannot safely inline non-volatile stores within a _try. You can either disable inlining or mark the calls in some way that will cause the inliner to make any inlined stores volatile (and whatever else is necessary to ensure they will not be reordered).

Volatile-ness here is primary for reordering constraint, not for data-flow. Per LLVM EH framework, a data-defined in _try will flow through explicit EH edge into the Handler. In this SEH design, there will be a seh.try.end() at every exit of SEH region that ensures data defined in _try scope flow to the use in Handler.

Where is the "explicit EH edge into the handler" from a faulting instruction? It seems like there's an EH edge to the innermost cleanup from the last cleanup scope you entered, but that's not the same as from a faulting instruction, and the next site that does have a potential EH edge is the end of the scope, which you have no guarantee is actually reachable via the ordinary CFG. So I think you actually *are* relying on volatile for its data-flow guarantees.

This is why frontends cannot naively lower things like EH to setjmp without any influence on the inliner.

why do you want to do that? LLVM EH offers a robust data-flow representation while SJLJ data-flow modeling is incomplete.

We don't, it's a bad representation for EH. But the problems seem very analogous to what you're dealing with here.

[rjmccall] Does it pass under all combinations of optimization and code-generation settings? How do LLVM contributors ensure that this test suite continues to pass?

Yes. this is just the first patch. there will be second patch to land LLVM code-gen part in, then 3rd patch to add some test cases (Ex, xcpt4u.c from MSVC). Without option -EHa, the patch is a zero-impact change for now.

Do you expect to maintain a CI bot that runs the full test suite continuously as changes go into LLVM? Because this sort of thing is extremely easy to break.

Yes, as long as it's a bug/flaw in the design/implementation. New opt/analysis/tools that violates original basic volatile/cleanup/EH-framework exposed by SEH test cases is not included. Isn't this the policy of Clang/LLVM community?

I think you're missing what I'm asking. If LLVM accepts this feature, it will become our collective responsibility as a project to keep it working. You have a large external correctness test suite for this feature. It does not sound like you intend to open-source that test suite; instead, you intend to extract a small number of unit tests from it and add those to the LLVM test suite. So I'm asking if you're at least going to have an external CI bot which will run the full test suite for this feature to ensure it hasn't been broken by the last day of commits. It does not seem reasonable to expect that the few unit tests you extract will themselves be sufficient to keep the feature tested.

As Aaron replied, yes MS SEH (and EH) test suite are (or will be) open sourced.

Okay, great. We just need a firm commitment that those tests will be run continually on the main branches and during release qualification. If they're open-source, you may be able to add them to the extended LLVM test suite (not just the checked-in regression tests), which I think those things already happen for. Other people should be able to help you with the right process for that.

Reordering is NOT okay for instructions 'directly' within a _try. 'directly' here means it's the code from source code originally, i.e., inlined code or the code in callees is not included. Disabling inline in general is good, but not necessary.
As said above, SEH semantic only applies to code "directly" under SEH scope for few obvious reasons (Ex, indirect callees or callees not in the same module wont be visible). if faulty instructions are allowed to be reordered inside a callee, those instructions are allowed to do so even after they are inlined into _try.

I feel almost certain that this is wrong. Let me try to explain. It would be helpful to have a more precise statement of semantics than I can find online, but we can start from the basics.

"X and Y can be reordered" is how optimizer folks talk, but it's not generally how semantics are specified. As a general rule (ignoring sequence points and concurrency for simplicity), C and C++ require operations to be executed in a particular order on the abstract machine, and there are no rules explicitly sanctioning reordering. Instead, the "as if" rule gives the implementation broad leeway to do things differently from how they're specified on the abstract machine, as long as the difference cannot be observed by a valid program. Faults generally correspond to conditions that have undefined behavior under the language, and so the implementation may assume they do not happen in a valid program, and so the consequences of them happening cannot be observed by a valid program, and so faults can be ignored when deciding whether to reorder operations, which allows a lot of reordering.

_try should be understood as fully defining the behavior of faulting operations written within its scope so that they have the defined semantics of resuming execution in the _except clause. Because faults have the defined behavior of ending execution of the _try block, whether the fault occurred is observable, so the "as if" leeway stops applying. Thus, other operations with side-effects cannot be reordered with potentially faulty operations written within the _try, no more than they could be reordered with a break statement. That applies whether those other operations are written within the _try or not; it's just that potentially-faulty operations written within the _try must always be treated as having side-effects.

-EHa more generally should be understand as partially defining the behavior of faulting operations even if they are not written in a _try: if the operation faults, the behavior is defined so that scopes are unwound and control resumes at an _except clause if one is dynamically active, but this may be observed at an indeterminate point. It is hard to describe these semantics formally, but the intended rules for the implementation are pretty clear: potentially faulty operations outside of a _try can be reordered around each other or even moved to different scopes, as per normal optimization rules, but whenever those operations are executed, if they fault they must trigger an unwind and cause execution to resume at an _except clause if one is active.

So I think you cannot allow operations inlined from a call made within a _try to be reordered with operations written within the _try, or to happen outside the _try. Since LLVM doesn't promise not to reorder volatile and non-volatile stores to provably non-aliased locations, you cannot safely inline non-volatile stores within a _try. You can either disable inlining or mark the calls in some way that will cause the inliner to make any inlined stores volatile (and whatever else is necessary to ensure they will not be reordered).

Oh, yes you are right. I was talking about reordering between inlined faulty instructions is allowed. Reordering inlined instruction with any 'direct' volatile instruction should be prohibited. I overlooked LLVM doesn't promise not to reorder volatile and non-volatile. I will simply disable inlining into a _try scope. thanks!

Volatile-ness here is primary for reordering constraint, not for data-flow. Per LLVM EH framework, a data-defined in _try will flow through explicit EH edge into the Handler. In this SEH design, there will be a seh.try.end() at every exit of SEH region that ensures data defined in _try scope flow to the use in Handler.

Where is the "explicit EH edge into the handler" from a faulting instruction? It seems like there's an EH edge to the innermost cleanup from the last cleanup scope you entered, but that's not the same as from a faulting instruction, and the next site that does have a potential EH edge is the end of the scope, which you have no guarantee is actually reachable via the ordinary CFG. So I think you actually *are* relying on volatile for its data-flow guarantees.

Oh, right, I take it back. volatile-ness is used to assure 2nd (reordering) and 3rd (global-state) rules for C-code as described in head summary of this patch. I was thinking about abnormal exits (like https://reviews.llvm.org/D77936#change-6bDmAmTUlS0o) in _try-finally case where dataflow is guaranteed to reach _finally.

As Aaron replied, yes MS SEH (and EH) test suite are (or will be) open sourced.

Okay, great. We just need a firm commitment that those tests will be run continually on the main branches and during release qualification. If they're open-source, you may be able to add them to the extended LLVM test suite (not just the checked-in regression tests), which I think those things already happen for.

Yes, that is the plan after -EHa Part-II is done. but again, it won't be the entire suite without the support of Local-Unwind.

Okay. I think we're on the same page about representation now. If you can come up with a good replacement for "eha" in the intrinsic names, I think this is pretty much ready to go.

llvm/docs/LangRef.rst
11584

This is a very emission-centric description of the intrinsic. Maybe something like:

LLVM's ordinary exception-handling representation associates EH cleanups and
handlers only with ``invoke``s, which normally correspond only to call sites.  To
support arbitrary faulting instructions, it must be possible to recover the current
EH scope for any instruction.  Turning every operation in LLVM that could fault
into an ``invoke`` of a new, potentially-throwing intrinsic would require adding a
large number of intrinsics, impede optimization of those operations, and make
compilation slower by introducing many extra basic blocks.  These intrinsics can
be used instead to mark the region protected by a cleanup, such as for a local
C++ object with a non-trivial destructor.  ``llvm.eha.scope.begin`` is used to mark
the start of the region; it is aways called with ``invoke``, with the unwind block
being the desired unwind destination for any potentially-throwing instructions
within the region.  `llvm.eha.scope.end` is used to mark when the scope ends
and the EH cleanup is no longer required (e.g. because the destructor is being
called).

So, about that — how does this pairing actually work? I think I understand how it's *meant* to work, by just scanning the structure of the CFG. But Clang doesn't promise to emit blocks with that kind of structure, and in practice block structure gets pretty complicated when you have multiple edges out of a region and they have to go to destinations at different cleanup depths. I don't really understand how your analysis looks through that sort of thing.

Example:

#include <string>
void test(int x, int &state) {
  state = 0;
  std::string a;
  state = 1;
  if (x > 0) {
    state = 2;
    std::string b;
    state = 3;
    if (x > 10) {
      state = 4;
      return;
    }
    state = 5;
    std::string c;
    state = 6;
  }
  state = 7;
}

IIRC, the way Clang will emit this is something like:

void test(int x, int &state) {
  int jumpdest;
  state = 0;
  std::string a;
  state = 1;
  if (x > 0) {
    state = 2;
    std::string b;
    state = 3;
    if (x > 10) {
      state = 4;
      jumpdest = 0;
      goto destroy_b;
    }
    state = 5;
    std::string c;
    state = 6;
    c.~std::string();
    jumpdest = 1;
  destroy_b:
    b.~std::string();
    switch (jumpdest) {
    case 0: goto destroy_a;
    case 1: goto fallthrough;
    }
  fallthrough:
    ;
  }
  state = 7;
destroy_a:
  a.~std::string();
}

The point being that I'm not sure the stack-like relationship between these cleanup scopes that's present in the source actually reliably survives into IR. That makes me concerned about the implicit pairing happening with these intrinsics.

Okay. I think we're on the same page about representation now. If you can come up with a good replacement for "eha" in the intrinsic names, I think this is pretty much ready to go.

how about seh.cppscope.begin() and seh.cppscope.end()? or any suggestion?

This is a very emission-centric description of the intrinsic. Maybe something like:

Very good writing for LangRef.rst:-). Will update it with your writing. Thanks!

As for pairing question, we simply utilizes existent Cleanup mechanism as you described in which abnormal exits and normal fall-through are merged in a 'cleanup' pad. What this implementation does is to inject a scope.begin right after a ctor and a scope.end right before dtor in cleanup-pad. Use the same example, it will be something like below:

void test(int x, int &state) {
  int jumpdest;
  state = 0;
  std::string a;  
  **seh.cppscope.begin();  // for object a**
  state = 1;
  if (x > 0) {
    state = 2;
    std::string b;
    **seh.cppscope.begin();  // for b**
    state = 3;
    if (x > 10) {
      state = 4;
      jumpdest = 0;
      goto destroy_b;
    }
    state = 5;
    std::string c;
    **seh.cppscope.begin();  // for c**
    state = 6;
    **seh.cppscope.end(); // for c dtor**
    c.~std::string();
    jumpdest = 1;
  destroy_b:
    **seh.cppscope.end();   // for b dtor**
    b.~std::string();
    switch (jumpdest) {
    case 0: goto destroy_a;
    case 1: goto fallthrough;
    }
  fallthrough:
    ;
  }
  state = 7;
destroy_a:
  **seh.cppscope.begin();  // for a dtor **
  a.~std::string();
}

Okay. I think we're on the same page about representation now. If you can come up with a good replacement for "eha" in the intrinsic names, I think this is pretty much ready to go.

how about seh.cppscope.begin() and seh.cppscope.end()? or any suggestion?

I like that better. Maybe even just seh.scope.begin? Or seh.cleanupscope.begin? Cleanups aren't *inherently* a C++ feature, it's just that C doesn't officially have them and MSVC doesn't really support any non-C++ languages that do.

As for pairing question, we simply utilizes existent Cleanup mechanism as you described in which abnormal exits and normal fall-through are merged in a 'cleanup' pad. What this implementation does is to inject a scope.begin right after a ctor and a scope.end right before dtor in cleanup-pad. Use the same example, it will be something like below:

void test(int x, int &state) {
  int jumpdest;
  state = 0;
  std::string a;  
  **seh.cppscope.begin();  // for object a**
  state = 1;
  if (x > 0) {
    state = 2;
    std::string b;
    **seh.cppscope.begin();  // for b**
    state = 3;
    if (x > 10) {
      state = 4;
      jumpdest = 0;
      goto destroy_b;
    }
    state = 5;
    std::string c;
    **seh.cppscope.begin();  // for c**
    state = 6;
    **seh.cppscope.end(); // for c dtor**
    c.~std::string();
    jumpdest = 1;
  destroy_b:
    **seh.cppscope.end();   // for b dtor**
    b.~std::string();
    switch (jumpdest) {
    case 0: goto destroy_a;
    case 1: goto fallthrough;
    }
  fallthrough:
    ;
  }
  state = 7;
destroy_a:
  **seh.cppscope.begin();  // for a dtor **
  a.~std::string();
}

Okay. I feel fairly convinced that this will work for most cleanup situations. You might need to do some IR-testing work in Clang to make sure Clang calls seh.scope.begin the right way when emitting conditional cleanups:

  • We need to call it at some dominating point rather than at the current IP when we push the cleanup.
  • We need to call it in the right order for the cleanups we enter.

Conditional cleanups come up when we need a cleanup in code that's not unconditionally run during a expression evaluation, like in the 2nd or 3rd operand of ?: or in the RHS of &&.

OK, great.
Summarize our discussion above, I will follow up with:

  • rename intrinsic eha.scope.begin() with seh.scope.begin().
  • update LangRef.rst with detailed explanation
  • not to allow inlining under SEH direct lexical scope
  • add conditional cleanup test case and make sure that eha.scope.begin() is emitted in the right place
  • add sideEntry label support in AST-dumper and de/serialization.

thank you for your thorough review and valuable feedbacks. I'm in the middle of something else. I will post another patch latter.

Sounds good to me, thanks!

When do you plan to enable this on x86?

tentzen updated this revision to Diff 306287.Wed, Nov 18, 7:11 PM

per review feedback from John McCall and others, this update includes:

  • Rename intrinsic eha.scope.begin() with seh.scope.begin().
  • Update LangRef.rst with detailed explanation
  • Add conditional cleanup test case and move the emission of seh.scope.begin() into EHScopeStack::pushCleanup()
  • Add sideEntry label support in AST-dumper and de/serialization.

also add tests for Windows ABI temps' ctor and dtor process

With your latest patch update clang segfaults for me even when '-feh-asynch' is not passed.

tentzen updated this revision to Diff 306469.Thu, Nov 19, 10:37 AM

Fixed the clang fault when built without '-feh-asynch' option.
EHStack's CGF field must be initialized even without -feh-asynch.

llvm/docs/LangRef.rst
11587

Typo: aways -> always

11591

Maybe add something here like

Any set of operations can then be confined to the region by reading their leaf inputs via volatile loads and writing their root outputs via volatile stores.

IIUC, for something like x = 1 / y, the volatile load of y and volatile store of x being things you can't move past these intrinsic calls is how you make sure the potentially-faulting divide stays in the right scope. IMO it's a bit confusing to talk about "arbitrary faulting instructions" being constrained without that extra bit.

tentzen added inline comments.Thu, Nov 19, 12:03 PM
llvm/docs/LangRef.rst
11591

Very good point! will clarify as you suggested..

tentzen updated this revision to Diff 306510.Thu, Nov 19, 12:54 PM

Per Joseph's feedback, further clarify the semantic of llvm.seh.try.begin and llvm.seh.try.end.

Do we need to consider FP exceptions in _try block?

clang/include/clang/Driver/Options.td
886

It's better to follow alphabetical for it and line 1531.

clang/lib/CodeGen/CGCleanup.cpp
1287

Move { to the same line with else and better to add curly brackets for PopCleanupBlock();

clang/test/CodeGen/windows-seh-EHa-TryInFinally.cpp
1

Should this be a C file? I saw LangRef says they are used for C function.

llvm/docs/LangRef.rst
11534

Keep the same length with above line.

11560

Keep the same length with above line.

tentzen marked 2 inline comments as done.Thu, Nov 19, 10:17 PM

Do we need to consider FP exceptions in _try block?

Yes, FP exception is handled as long as FP exceptions are not disabled (Ex via _controlfp() runtime) and FP exception code is filtered & handled via ___except() statement (Ex, ___except(GetExceptionCode()==EXCEPTION_FLT_INEXACT_RESULT)).

clang/include/clang/Driver/Options.td
886

really? I see some others are not in alphabetical order. it's natural to right besides fcxx_exceptions.

clang/test/CodeGen/windows-seh-EHa-TryInFinally.cpp
1

yes it's a C-function that can be placed in .cpp file.

Does it really work for x86_64 architecture only? I have tried using this with x86 build and produced binaries looks like this should work on this platform as well, but I have not tried running produced executables to see the results yet tbh.

Does it really work for x86_64 architecture only? I have tried using this with x86 build and produced binaries looks like this should work on this platform as well, but I have not tried running produced executables to see the results yet tbh.

This is just part-1 (Clang) of SEH HW exception support. We need another patch for LLVM part to work for x86_64. After that we will need another two LLVM changes to enable it for X86 and Arm64.

Do we need to consider FP exceptions in _try block?

Yes, FP exception is handled as long as FP exceptions are not disabled (Ex via _controlfp() runtime) and FP exception code is filtered & handled via ___except() statement (Ex, ___except(GetExceptionCode()==EXCEPTION_FLT_INEXACT_RESULT)).

I see. If this is the case, you may need to assign FPE_Strict to _try block to preserve FP instructions' order.

Do we need to consider FP exceptions in _try block?

Yes, FP exception is handled as long as FP exceptions are not disabled (Ex via _controlfp() runtime) and FP exception code is filtered & handled via ___except() statement (Ex, ___except(GetExceptionCode()==EXCEPTION_FLT_INEXACT_RESULT)).

I see. If this is the case, you may need to assign FPE_Strict to _try block to preserve FP instructions' order.

Is FPE_Strict in Clang/LLVM equivalent to /fp:strict in VC++/Windows. -EHa does not automatically imply /fp:strict semantic in Windows. So for FP sensitive cases, users can explicitly specify -EHa and -FPE_Strict together.