Page MenuHomePhabricator

[clang][compiler-rt] Exception escape out of an non-unwinding function is an undefined behaviour
Needs ReviewPublic

Authored by lebedev.ri on Nov 3 2022, 6:45 PM.

Details

Reviewers
rjmccall
morehouse
aaron.ballman
dvyukov
MaskRay
vsk
jyknight
efriedma
Group Reviewers
Restricted Project
Summary

I've stumbled into this in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=52989&q=label%3AProj-librawspeed
which manifested as an obscure leak, and originated from a seemingly simple refactoring:
https://github.com/darktable-org/rawspeed/commit/1fd09b9cffbddc65753eb523f7ba5528d35fe34d#diff-c832cc8366d36ca1165ecef7f4a256a2643ec09c4405a1238222a4529df619a1R172-R174

Reduced, this should look like: (actual repro in progress)

#include <vector>

std::vector<int> handle() {
  std::vector<int> v(42, 42); // this somehow leaks
  return v;
}

__attribute__((pure)) // double yikes
std::vector<int> footgun(int argc) {
  std::vector<int> v(24, 24);
  if(argc != 42)
    throw int(0); // yikes
  return v;
}

int main(int argc, char* argv[]) {
    try {
        auto v = handle();
        auto v2 = footgun(argc);
    } catch(...) {}
    return 0;
}

https://godbolt.org/z/zdavdKnfa

Surprisingly, we did not handle it before.
I'm not yet sure if we can pretty-print the actual exception on compiler-rt side.

(I may need to update a couple tests still)

Diff Detail

Unit TestsFailed

TimeTest
710 msx64 debian > AddressSanitizer-x86_64-linux.TestCases/Linux::interface_symbols_linux.cpp
Script: -- : 'RUN: at line 5'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -x c++-header -o - -E /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/Linux/../../../../lib/asan/asan_interface.inc | sed "s/INTERFACE_FUNCTION/\nINTERFACE_FUNCTION/g" > /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Linux/Output/interface_symbols_linux.cpp.tmp.asan_interface.inc
1,200 msx64 debian > UBSan-AddressSanitizer-x86_64.TestCases/Misc::exception-escape.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -m64 -fsanitize=exception-escape /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/ubsan/TestCases/Misc/exception-escape.cpp -O3 -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/ubsan/AddressSanitizer-x86_64/TestCases/Misc/Output/exception-escape.cpp.tmp
1,120 msx64 debian > UBSan-MemorySanitizer-x86_64.TestCases/Misc::exception-escape.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=memory -m64 -fsanitize=exception-escape /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/ubsan/TestCases/Misc/exception-escape.cpp -O3 -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/ubsan/MemorySanitizer-x86_64/TestCases/Misc/Output/exception-escape.cpp.tmp
890 msx64 debian > UBSan-Standalone-x86_64.TestCases/Misc::exception-escape.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -fsanitize=exception-escape /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/ubsan/TestCases/Misc/exception-escape.cpp -O3 -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/ubsan/Standalone-x86_64/TestCases/Misc/Output/exception-escape.cpp.tmp
1,690 msx64 debian > UBSan-ThreadSanitizer-x86_64.TestCases/Misc::exception-escape.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=thread -m64 -fsanitize=exception-escape /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/ubsan/TestCases/Misc/exception-escape.cpp -O3 -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/ubsan/ThreadSanitizer-x86_64/TestCases/Misc/Output/exception-escape.cpp.tmp

Event Timeline

lebedev.ri created this revision.Nov 3 2022, 6:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 6:45 PM
lebedev.ri requested review of this revision.Nov 3 2022, 6:45 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 3 2022, 6:45 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

(also, this needs a bit more work around irgen, will look in a bit.)

MaskRay added a comment.EditedNov 3 2022, 9:15 PM

In your example, clang++ a.cc; ./a.out gives a libstdc++ error:

terminate called after throwing an instance of 'int'

libc++'s is similar.

footgun is nounwind (due to the GNU pure attribute), so Clang uses call instead of invoke and the function call is described by a call site entry with a zero action_record_offset (i.e. not a handler) in .gcc_except_table.
In _Unwind_RaiseException called by __cxa_throw, the missing exception handler causes __terminate.

g++ a.cc; ./a.out succeeds, because its footgun call is caught by catch (...). (Perhaps GCC doesn't have Clang's nounwind optimization.)

The patch doesn't implement the runtime correctly (I get a linker error) so I cannot try it out. How expensive is your instrumentation?
Does it work with -fno-exceptions intermediate functions? (Unwinding through such functions should fail as well). Note that this check can be done today,without any instrumentation: just use -fno-asynchronous-unwind-tables (for some targets which default to async unwind tables (aarch64,x86,etc)).
If the process installs a SIGABRT signal handler, the stack trace can be printed when __terminate calls abort.

In your example, clang++ a.cc; ./a.out gives a libstdc++ error:

terminate called after throwing an instance of 'int'

libc++'s is similar.

That's great, but just a symptom of misreduced testcase.
The whole problem is that in the original bug *no* abort happened at runtime,
the program terminated successfully, with a mysterious leak.

footgun is nounwind (due to the GNU pure attribute), so Clang uses call instead of invoke and the function call is described by a call site entry with a zero action_record_offset (i.e. not a handler) in .gcc_except_table.
In _Unwind_RaiseException called by __cxa_throw, the missing exception handler causes __terminate.

g++ a.cc; ./a.out succeeds, because its footgun call is caught by catch (...). (Perhaps GCC doesn't have Clang's nounwind optimization.)

The patch doesn't implement the runtime correctly (I get a linker error) so I cannot try it out. How expensive is your instrumentation?
Does it work with -fno-exceptions intermediate functions? (Unwinding through such functions should fail as well). Note that this check can be done today,without any instrumentation: just use -fno-asynchronous-unwind-tables (for some targets which default to async unwind tables (aarch64,x86,etc)).
If the process installs a SIGABRT signal handler, the stack trace can be printed when __terminate calls abort.

I find this comment non-welcoming and discouraging.
I just wanted to get something posted when i had something to post already. All of this needs a bit more time.

lebedev.ri updated this revision to Diff 473529.Nov 6 2022, 5:52 PM
lebedev.ri edited the summary of this revision. (Show Details)

In your example, clang++ a.cc; ./a.out gives a libstdc++ error:

terminate called after throwing an instance of 'int'

libc++'s is similar.

That's great, but just a symptom of misreduced testcase.
The whole problem is that in the original bug *no* abort happened at runtime,
the program terminated successfully, with a mysterious leak.

footgun is nounwind (due to the GNU pure attribute), so Clang uses call instead of invoke and the function call is described by a call site entry with a zero action_record_offset (i.e. not a handler) in .gcc_except_table.
In _Unwind_RaiseException called by __cxa_throw, the missing exception handler causes __terminate.

g++ a.cc; ./a.out succeeds, because its footgun call is caught by catch (...). (Perhaps GCC doesn't have Clang's nounwind optimization.)

FWIW, i now have a standalone reproducer (~3MiB filesize),
and without sanitizers there's no abort regardless of optimization level,
with sanitizers there's only the leak report, and no mention of the exception.

I've already reduced it, but something gone wrong, and for some inexplicable reason,
creduce succeeded with a repro that does not pass the interestingness test
(this is not an error in said interestingness test.) So i will need to re-reduce.

But as as i have expected, the problem will likely end up being more fundamental,
because, not unexpectedly, this new diagnostic does not show up either.

TLDR:

Macro thisisfine:

The patch doesn't implement the runtime correctly (I get a linker error) so I cannot try it out.

I believe this should now be fixed.
I did not update getRecoverableKind(), which, conveniently, duplicates Unrecoverable from SanitizerArgs.cpp...

How expensive is your instrumentation?
Does it work with -fno-exceptions intermediate functions? (Unwinding through such functions should fail as well). Note that this check can be done today,without any instrumentation: just use -fno-asynchronous-unwind-tables (for some targets which default to async unwind tables (aarch64,x86,etc)).
If the process installs a SIGABRT signal handler, the stack trace can be printed when __terminate calls abort.

As you can see, at least right now, this quite literally relies on
the existing CodeGenFunction::getTerminateLandingPad() handling infra,
so it is as cheap as it will be. But has the same bugs as it already has.

The obvious alternative implementation could be to, when emitting nounwing function body F,
emit it as a thunk with an an invoke of an actual function, but without nounwind,
but that is undesirable because we lose all source debug information.

This is UB, and it should be handled by UBSan, not somewhere else,
especially because that will result in best error messages.

I find this comment non-welcoming and discouraging.
I just wanted to get something posted when i had something to post already. All of this needs a bit more time.

I think improving diagnostic is useful but -fsanitize= is probably not a good place. Instrumenting call sites with callq __ubsan_handle_exception_escape@PLT wastes code size. The functionality is better handled somewhere in libc++abi personality related code with possible improvement to exception handling related LLVM IR.

void bar(); void foo() { bar(); }

clang++ -fno-exceptions -fsanitize=exception-escape -c b.cc does not instrument the potentially-throwing-and-propagating bar() so an error cannot be caught.
However, for -fno-exceptions code, instrumenting every call site is going to greatly increase code size and suppress optimizations.

I wish that I capture the compiler and runtime behavior well in https://maskray.me/blog/2020-12-12-c++-exception-handling-abi#compiler-behavior. https://discourse.llvm.org/t/catching-exceptions-while-unwinding-through-fno-exceptions-code/57151 is a proposal to improve the diagnostics.

Can someone from @Sanitizers please comment? :) @vitalybuka ? @glider ? @rsmith ?

I think improving diagnostic is useful but -fsanitize= is probably not a good place.

Instrumenting call sites with callq __ubsan_handle_exception_escape@PLT wastes code size.

Can you please explain why replacing one call with another somehow makes things worse?

The functionality is better handled somewhere in libc++abi personality related code

That would not be helpful, i'm not using libc++.
It seems obviously wrong to me to require every used C++ ABI library
to implement "some improvements", instead, you know,
the clang irgen + clang ubsan lib.

If we do this, we have native source location info for the location of the call site
out of which the exception has escaped, that causes the "abort".
I do not know if that will be possible with less native approach.

The reason i'm doing this, is because recently i had to deal with a few of these bugs,
and i never got any kind of backtrace, so the situation can't not be made better.

with possible improvement to exception handling related LLVM IR.

void bar(); void foo() { bar(); }

clang++ -fno-exceptions -fsanitize=exception-escape -c b.cc does not instrument the potentially-throwing-and-propagating bar() so an error cannot be caught.

i'm only looking at -fexceptions side of things, i'm not sure how
mixing -fno-exceptions and -fexceptions code is supposed to work.
It is not oblivious to me why not dealing with some other situation
means we can't deal with the situation we can deal with.

However, for -fno-exceptions code, instrumenting every call site is going to greatly increase code size and suppress optimizations.

Citation needed/define "greatly"?
We have a single "termination" landing pad per function,
and we seem to end up with just a single extra jmp per call.
https://godbolt.org/z/Td3jWM1oh

But yes, that's the known and expected nature of sanitization.

I wish that I capture the compiler and runtime behavior well in https://maskray.me/blog/2020-12-12-c++-exception-handling-abi#compiler-behavior. https://discourse.llvm.org/t/catching-exceptions-while-unwinding-through-fno-exceptions-code/57151 is a proposal to improve the diagnostics.

Thanks. That explains current behavior, but does not tell why it must be enshrined and never changed.

Can someone from @Sanitizers please comment? :) @vitalybuka ? @glider ? @rsmith ?

I think improving diagnostic is useful but -fsanitize= is probably not a good place.

@MaskRay Seems find to me. Why do you think so?

Instrumenting call sites with callq __ubsan_handle_exception_escape@PLT wastes code size.

Can you please explain why replacing one call with another somehow makes things worse?

Would be nice to see some measuraments? E.g. CTMark.

The functionality is better handled somewhere in libc++abi personality related code

That would not be helpful, i'm not using libc++.
It seems obviously wrong to me to require every used C++ ABI library
to implement "some improvements", instead, you know,
the clang irgen + clang ubsan lib.

If we do this, we have native source location info for the location of the call site
out of which the exception has escaped, that causes the "abort".
I do not know if that will be possible with less native approach.

The reason i'm doing this, is because recently i had to deal with a few of these bugs,
and i never got any kind of backtrace, so the situation can't not be made better.

with possible improvement to exception handling related LLVM IR.

void bar(); void foo() { bar(); }

clang++ -fno-exceptions -fsanitize=exception-escape -c b.cc does not instrument the potentially-throwing-and-propagating bar() so an error cannot be caught.

i'm only looking at -fexceptions side of things, i'm not sure how
mixing -fno-exceptions and -fexceptions code is supposed to work.

Handling this case would be usefull.

It is not oblivious to me why not dealing with some other situation
means we can't deal with the situation we can deal with.

However, for -fno-exceptions code, instrumenting every call site is going to greatly increase code size and suppress optimizations.

Citation needed/define "greatly"?
We have a single "termination" landing pad per function,
and we seem to end up with just a single extra jmp per call.
https://godbolt.org/z/Td3jWM1oh

But yes, that's the known and expected nature of sanitization.

I wish that I capture the compiler and runtime behavior well in https://maskray.me/blog/2020-12-12-c++-exception-handling-abi#compiler-behavior. https://discourse.llvm.org/t/catching-exceptions-while-unwinding-through-fno-exceptions-code/57151 is a proposal to improve the diagnostics.

Thanks. That explains current behavior, but does not tell why it must be enshrined and never changed.

clang/docs/ReleaseNotes.rst
877

I don't have a problem with adding a sanitizer like this. IIRC, though, there's a review process for adding new sanitizers; I don't remember if this is the normal RFC process or something more streamlined. Also, I know @jyknight is interested in a mode that checks -fno-exceptions more efficiently, which is of direct relevance to what you're trying here; that was discussed in a forums thread (https://discourse.llvm.org/t/rfc-add-call-unwindabort-to-llvm-ir/62543).

The implementation and tests here don't seem to match the stated problem, though. Trying to throw out of a noexcept function is actually well-defined — the function has to catch the exception and call std::terminate, and we do actually generate code that way. Your patch is just adding a check at the start of the handler we emit, which means it's not catching anything we're not already catching; also, again, this is a well-defined execution path, so it doesn't seem appropriate to trigger UBSan on it.

IIRC, there are really only two potential sources of UB with noexcept. The first is that you can declare a function noexcept(true) but implement it as noexcept(false), which is in a class of problems (ODR violations) that I think UBSan doesn't normally try to diagnose. The second is that I think it's possible to get a noexcept(true) function pointer for a noexcept(false) function, maybe via some chain of casts; that would be a little closer to what UBSan normally diagnoses.

Meanwhile, your original test case has nothing to do with noexcept. __attribute__((pure)) (like nothrow and -fno-exceptions) really does introduce a novel UB rule which it would be reasonable to ask the compiler to check under UBSan.

I think the right way to handle what you're trying to handle would be to introduce some different kind of scope, a "it's UB to unwind out of this" scope, and then push that scope around calls and implementations of functions that use one of the UB-to-throw attributes. You could also potentially push that scope around calls to nounwind functions if you want to catch those ODR / function-pointer cases.

clang/lib/CodeGen/CodeGenFunction.h
2028

I'm not sure why this has a default argument.

2035

The purpose of getInvokeDestImpl is to outline the slow path, and this definitely belongs in the slow path.

@vitalybuka @rjmccall thank you for taking a look!
Since posting this, there has been a conversation in #llvm with @jyknight,
where it was established that the situation is even worse than i though.
Essentially, there are two situations:

  1. exception escape out of non-unwinding function is guaranteed to cause program termination. That happens for:
    • the C++ noexcept
    • __attribute__((nothrow)) (for some non-obvious reason)
    • OpenMP region lowering (even though it's not mandated by the standard)
  2. exception escape out of non-unwinding function is wild UB, this is the semantics of
    • __attribute__((pure))/__attribute__((const)).

It is possible that __attribute__((nothrow)) should also be in UB category,
otherwise what benefit does it bring over C++'s noexcept?

So, i have correctly felt that there is a problem, but the fix is going in the wrong way.
While i still think that we should handle case 1. in UBSan, let's first deal with 2.

I don't have a problem with adding a sanitizer like this. IIRC, though, there's a review process for adding new sanitizers; I don't remember if this is the normal RFC process or something more streamlined.

For the record, i have added the entirety of -fsanitize=implicit-conversion (part of -fsanitize=integer)
and -fsanitize=alignment, and this is the first time i'm hearing of an RFC process.

Also, I know @jyknight is interested in a mode that checks -fno-exceptions more efficiently, which is of direct relevance to what you're trying here; that was discussed in a forums thread (https://discourse.llvm.org/t/rfc-add-call-unwindabort-to-llvm-ir/62543).

Yes, we've talked in #llvm about that a bit.

The implementation and tests here don't seem to match the stated problem, though. Trying to throw out of a noexcept function is actually well-defined — the function has to catch the exception and call std::terminate, and we do actually generate code that way. Your patch is just adding a check at the start of the handler we emit, which means it's not catching anything we're not already catching; also, again, this is a well-defined execution path, so it doesn't seem appropriate to trigger UBSan on it.

IIRC, there are really only two potential sources of UB with noexcept. The first is that you can declare a function noexcept(true) but implement it as noexcept(false), which is in a class of problems (ODR violations) that I think UBSan doesn't normally try to diagnose. The second is that I think it's possible to get a noexcept(true) function pointer for a noexcept(false) function, maybe via some chain of casts; that would be a little closer to what UBSan normally diagnoses.

Meanwhile, your original test case has nothing to do with noexcept. __attribute__((pure)) (like nothrow and -fno-exceptions) really does introduce a novel UB rule which it would be reasonable to ask the compiler to check under UBSan.

I think the right way to handle what you're trying to handle would be to introduce some different kind of scope, a "it's UB to unwind out of this" scope, and then push that scope around calls and implementations of functions that use one of the UB-to-throw attributes. You could also potentially push that scope around calls to nounwind functions if you want to catch those ODR / function-pointer cases.

Yes. Somehow i did not think that those attributes introduce UB, and not are just handled incorrectly,
and tried to solve this the wrong way around. That being said, this diff is not incorrect.
I would like to have a sanitizer for the "exception escape is guaranteed to cause program termination" mode,
because all of the bugs with that mode i had to deal with had rather bad diagnostic,
but that can be done afterwards.

Thanks!

lebedev.ri marked an inline comment as done.

@rjmccall thank you for taking a look!
I do believe this now does the right thing.

I would like to keep this patch minimal.
I have not looked into -fno-exceptions,
and i'm not sure about funclets.
I will need to double-check tests.

lebedev.ri marked an inline comment as done.Nov 15 2022, 5:55 PM
lebedev.ri added inline comments.
clang/lib/CodeGen/CodeGenFunction.h
2035

The problem is that we aggressively cache all the EH stuff.
I will take a second look, maybe this can be done less intrusively.

Update tests.

lebedev.ri marked an inline comment as done.Nov 16 2022, 12:47 PM
lebedev.ri added inline comments.
clang/lib/CodeGen/CGCall.cpp
5406–5415

@rjmccall
So there are two issues:

  1. getInvokeDest() isn't necessarily called just before creating an invoke, see e.g. -EHa handling in CodeGenFunction::PopCleanupBlock()
  2. Even if we ignore that, we need to do this for *every* invoke, not just those going to the our UB landing pad, consider: https://godbolt.org/z/qTeKor41a <- the invoke leads to a normal landing pad, yet we immediately rethrow the just-caught exception, and now end up in the UB landing pad.

So i'm not really seeing an alternative path here?

ping

clang/lib/CodeGen/CGCall.cpp
5406–5415

@rjmccall do you agree with my reasoning here?
Perhaps there is some other solution that i'm not seeing,
other than not having the source location?

ping. We need to get this going, reminder, this was caught because it caused a visible miscompilation.
@rjmccall are you satisfied with the way we emit srcloc info?
@aaron.ballman thoughts on the docs changes?

One unanswered question i have, is if we can get the exception's underlying C++ type as a string,
and print it. Also, is it useful to pass Exn into the handler? Can we do anything useful with it?

ping. We need to get this going, reminder, this was caught because it caused a visible miscompilation.

I'm sorry, I caught COVID-19 during the LLVM conference and have not had a lot of energy recently, plus last week was the Thanksgiving holiday in the US.

Please don't describe this as a miscompilation. It may not match what you want as a user, but what the the compiler is doing is not incorrect. What you're doing is adding a mitigation to try to protect against a source of UB that recently affected you, which is commendable but doesn't have the same level of urgency.

clang/lib/CodeGen/CGCall.cpp
5406–5415

I think there are four interesting questions posed by your example.

The first is whether we should make an effort to report the original source location for the call to maybe_throws instead of the rethrow location. To me, the answer is no, and the main reason is that the user is explicitly acknowledging the possibility of an exception and is (probably) trying to handle it. That means that the proximate source of programmer error is now that their attempt to handle it failed, not that an exception was raised in the first place. That might seem weird for your example specifically,

The second is what source location we should report for invokes that don't have an obvious source location. The answer is that I think we should always try to have a source location, even if it might be an imperfect one. Synthesized code almost always has a source location that triggered the synthesis. Cleanups are typically associated with some variable or temporary that itself should have a source location. Even things like __cxa_end_catch are associated with the catch clause. Hopefully UBSan has a flexible enough schema in how source locations are expressed to it for us to communicate these things down to the runtime, like "synthesized code at Foo.cpp:18" or "cleanup for variable at Bar.cpp:107".

The third is the more general question of whether we need to store a source location before every invoke. To ensure that this variable is meaningfully initialized, we need to store a source location before every invoke that can lead to a UB scope. But the variable is stateful, so it's equally important that we not emit stores that might overwrite the source location that we actually want to report. For example, we want the source location reported in your example to be the source location for the throw;, not something associated with the __cxa_end_catch cleanup. This basically boils down to not doing stores within a landing pad before it enters a catch handler. Fortunately, landing pad code like that should always be in a terminate scope, so I think you can just look at the current EH stack and see whether the landing pad can possibly reach a UB scope. (You might need some save-and-restore logic around @finally blocks in ObjC.)

The final question is whether we need to emit cleanups on paths that lead to UB scopes at all. In the simplest cases, I think we don't. The standard says that it's implementation-specified whether the stack is unwound on an exception path that leads to termination. That rule doesn't technically apply here, but only because we're dealing with a language extension that makes unwinding UB in the first place. Since we have room to choose the semantics, the standard gives us pretty strong cover to be just as aggressive about skipping cleanups in contexts where throwing is UB as it is in contexts where throwing terminates. That, in turn, means that the landing pad in these contexts will usually just be a UB scope with no cleanups in it. So even if we did need to worry about cleanups overwriting the source location, it usually won't be possible because we won't be running cleanups. In fact, if there's a catch-all or a terminate scope, the UB scope won't be reachable, so the only situation where we have to worry about cleanups being run before we reach a UB scope is if there are cleanups inside a *partial* catch clause, like catch (const std::exception &e). And in principle we have the information up front from the EH selector to know whether we're going to end up in a catch handler, so we could actually just check immediately at the start of the landing pad and then trigger UBSan.

@rjmccall thank you for taking a look!

ping. We need to get this going, reminder, this was caught because it caused a visible miscompilation.

I'm sorry, I caught COVID-19 during the LLVM conference and have not had a lot of energy recently,

I'm sorry.

plus last week was the Thanksgiving holiday in the US.

Please don't describe this as a miscompilation. It may not match what you want as a user, but what the the compiler is doing is not incorrect.

Right.

What you're doing is adding a mitigation to try to protect against a source of UB that recently affected you, which is commendable but doesn't have the same level of urgency.

I'm mainly worried that there are likely other "miscompilations" in the wild.
Also, after this, we might want to reevaluate which clang attribute does what,

clang/lib/CodeGen/CGCall.cpp
5406–5415

The final question is whether we need to emit cleanups on paths that lead to UB scopes at all. In the simplest cases, I think we don't. The standard says that it's implementation-specified whether the stack is unwound on an exception path that leads to termination. That rule doesn't technically apply here, but only because we're dealing with a language extension that makes unwinding UB in the first place. Since we have room to choose the semantics, the standard gives us pretty strong cover to be just as aggressive about skipping cleanups in contexts where throwing is UB as it is in contexts where throwing terminates. That, in turn, means that the landing pad in these contexts will usually just be a UB scope with no cleanups in it. So even if we did need to worry about cleanups overwriting the source location, it usually won't be possible because we won't be running cleanups. In fact, if there's a catch-all or a terminate scope, the UB scope won't be reachable, so the only situation where we have to worry about cleanups being run before we reach a UB scope is if there are cleanups inside a *partial* catch clause, like catch (const std::exception &e). And in principle we have the information up front from the EH selector to know whether we're going to end up in a catch handler, so we could actually just check immediately at the start of the landing pad and then trigger UBSan.

The more i stare at this. The more worried i am.
Is the idea that, when we are in termination/UB EH scope,
we no longer have to play by the RAII rules,
and destructors, for the moment, are no-ops and are side-effect free?

Consider:

#include <cstdlib>
#include <cstdio>

void will_throw() { throw int(42); }
void might_throw() { }

struct abort_on_dtor {
  abort_on_dtor() {}
  [[noreturn]] ~abort_on_dtor() { printf("in ~abort_on_dtor()\n"); abort(); }
};

void test0() noexcept {
  abort_on_dtor scope;
  will_throw();
  might_throw();
}

__attribute__ ((pure)) void test1() {
  abort_on_dtor scope;
  will_throw();
  might_throw();
}

int main(int argc, char* argv[]) {
  printf("argc = %i\n", argc);
  if(argc == 1)
    abort_on_dtor scope;
  if(argc == 2)
    test0();
  if(argc == 3)
    test1();
  return 0;
}

You'd think, if call to test0()/test1() unwinds,
we'd first destruct abort_on_dtor, and termination/UB no longer happens.
But i guess termination/UB time-traveled/"happened" the moment test0()/test1() unwinds?

@rjmccall thank you for taking a look!
Only addressing the final (out of 4) points.
Rest to follow, did not look into them yet.

clang/lib/CodeGen/CGCall.cpp
5406–5415

The final question is whether we need to emit cleanups on paths that lead to UB scopes at all. In the simplest cases

I believe, i have implemented these semantics, i.e.

  • no cleanups in UB EH scope, just like how we deal with Terminate scope
  • when in UB EH scope, and not sanitizing, all callees are implicitly nounwind
  • If catch() blocks don't catch the exception, and we get to UB scope, we go into unreachable block.

Did i understand you correctly?
(test suite does not pass, there are a few bugs to deal with)

The more i stare at this. The more worried i am. Is the idea that, when we are in termination/UB EH scope, we no longer have to play by the RAII rules, and destructors, for the moment, are no-ops and are side-effect free?

Termination is allowed to happen as soon as the implementation detects that unwinding will only avoid termination if a cleanup does something non-terminating, like infinite loop or abort. This does enable some amount of code-size optimization, but the main purpose (as I understand it) is to improve debugging of what's presumed to be a programmer error by allowing the runtime to trigger termination with the faulting context still intact instead of requiring the stack to be unwound just to, like, deallocate a temporary std::string ten frames up.

The important thing for us is that the standard explicitly blesses this presumption: it has been decided that C++ programmers should not be relying on destructors being called when their programs throw in a context where the implementation has no choice but to terminate, and if they really need unwinding to happen, they should be handling exceptions properly. I don't think that's an unreasonable rule. (Note that it is limited to failures to formally handle exceptions, not some sort of post-domination by std::terminate. If you catch an exception and then call std::terminate yourself, the rule does not apply, and the stack is required to be unwound to the catch.) Also, it's a rule that Clang already takes advantage of within terminate scopes, which are otherwise well-defined, so I do think it would be fairly absurd to act like these UB scopes need stricter rules about running cleanups.

lebedev.ri updated this revision to Diff 479869.Dec 3 2022, 2:10 PM

@rjmccall thank you for taking a look!

  1. Always add UB scope for nounwind functions
  2. When popping the EH scopes, do that in backwards order :)
  3. Ensure that check-clang actually passes (it wasn't in last patch)
  4. Correctly annotate call's in UB scope as nounwinding
  5. Now, this point is uncomfortably overlapped with the D138958: i believe, the attribute that specifies that unwind is UB, takes priority over the exception specification / noexcept.
  6. When looking into all this, an interesting question appeared: what about dynamic exception specification, EHFilterScope? If the outermost scope is UB, and exception is listed in EHFilterScope, then it will hit UB scope, and if it is not, then the function std::unexpected is called. The default function calls std::terminate, but it may be replaced by a user-provided function. Should we do something here?
  7. I'm a bit lost about your 3'rd point in https://reviews.llvm.org/D137381#inline-1339891, about not overriding the src loc with something that is irrelevant. I think i need a C++ example in order to handle this. Right now it seems like those calls wouldn't actually go throught CodeGenFunction::EmitCall(), and thus wouldn't cause a proble.
lebedev.ri added inline comments.
clang/lib/CodeGen/CGCall.cpp
5406–5415

Ok, trying to continue looking into this

The first is whether we should make an effort to report the original source location for the call to maybe_throws instead of the rethrow location. To me, the answer is no,

Sure, seems reasonable.

The second is what source location we should report for invokes that don't have an obvious source location. The answer is that I think we should always try to have a source location, even if it might be an imperfect one.

It seems like this point is non-actionable. We simply always emit the source location,
and if it is invalid, well, then that is a more general problem.

The third is the more general question of whether we need to store a source location before every invoke. To ensure that this variable is meaningfully initialized, we need to store a source location before every invoke that can lead to a UB scope. But the variable is stateful, so it's equally important that we not emit stores that might overwrite the source location that we actually want to report. For example, we want the source location reported in your example to be the source location for the throw;, not something associated with the __cxa_end_catch cleanup. This basically boils down to not doing stores within a landing pad before it enters a catch handler. Fortunately, landing pad code like that should always be in a terminate scope, so I think you can just look at the current EH stack and see whether the landing pad can possibly reach a UB scope. (You might need some save-and-restore logic around @finally blocks in ObjC.)

I've fixed source location for throw, but i'm not quite following the rest of the comment here.
Can you perhaps give a self-contained example where we'd end up with wrong line?
I'm not sure about ObjC, so i'll ignore that bit for a moment at least.

clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp
3

This test broke once we always started adding (outermost) UB scope for nounwind functions.
I don't quite get what is going wrong. It could be a bug in SEH handling.
Can someone who has some idea about that code take a look and suggest a fix?
@tentzen ?

tentzen added inline comments.Dec 4 2022, 6:39 PM
clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp
3

By definition, non-unwind function I think is for Synchronous EH. So this Sanitizer check should exclude Asynchronous EH functions, those with option -fasync-exceptions.

lebedev.ri added inline comments.Dec 5 2022, 5:46 AM
clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp
3

I do not understand.
If the function can unwind, then why is it marked nounwind?
This kind of thing is exactly what i was afraid of with those SEH patches.

efriedma added inline comments.Dec 5 2022, 9:45 AM
clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp
3

clang should not be marking functions "nounwind" in -fasync-exceptions mode; if it is, I'd consider that a bug. (I assume someone just forgot to add a check to some code that adds nounwind.)

lebedev.ri marked 2 inline comments as done.Dec 5 2022, 11:45 AM
lebedev.ri added inline comments.
clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp
3

My thoughts precisely. @tentzen please fix :)

tentzen added inline comments.Dec 6 2022, 8:42 PM
clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp
3

This is copied from LLVM reference manual:

nounwind
This function attribute indicates that the function never raises an exception. If the function does raise an exception, its runtime behavior is undefined. However, functions marked nounwind may still trap or generate asynchronous exceptions. Exception handling schemes that are recognized by LLVM to handle asynchronous exceptions, such as SEH, will still provide their implementation defined semantics.

So I think nounwind only implies no synchronous/software unwind, not HW traps etc.

lebedev.ri marked an inline comment as done.Dec 7 2022, 6:45 AM
lebedev.ri added inline comments.
clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp
3

Ok, good point.
But i'm still not quite sure why the test is getting miscompiled.

Last(?) ping of the year?

MaskRay added a comment.EditedMon, Jan 9, 10:10 PM

I am unfamiliar with Clang codegen so I am just commenting about what a user may feel about this feature.

compiler-rt/test/ubsan/TestCases/Misc/exception-escape.cpp cannot demonstrate the value of the new UBSan check -fsanitize=exception-escape, as the executable will fail without the option (Clang emits call instead of invoke for a nounwind function call, and there is no LSDA information, libgcc/libunwind will call terminate; it's trivial to get more diagnostic with a SIBABRT signal handler).
To demonstrate the value, footgun and its caller need to be in different TUs and the caller does not know footgun is nounwind.
In such a setup, I think people likely care more about -fno-exceptions/-fexceptions exception propagation instead of __attribute__((pure)) and find -fsanitize=exception-escape not do what it may advertise.

The more common -fno-exceptions/-fexceptions exception propagation dilemma can be detected today with -fno-asynchronous-unwind-tables.

I am unfamiliar with Clang codegen so I am just commenting about what a user may feel about this feature.

compiler-rt/test/ubsan/TestCases/Misc/exception-escape.cpp cannot demonstrate the value of the new UBSan check -fsanitize=exception-escape, as the executable will fail without the option (Clang emits call instead of invoke for a nounwind function call, and there is no LSDA information, libgcc/libunwind will call terminate; it's trivial to get more diagnostic with a SIBABRT signal handler).
To demonstrate the value, footgun and its caller need to be in different TUs and the caller does not know footgun is nounwind.
In such a setup, I think people likely care more about -fno-exceptions/-fexceptions exception propagation instead of __attribute__((pure)) and find -fsanitize=exception-escape not do what it may advertise.

The more common -fno-exceptions/-fexceptions exception propagation dilemma can be detected today with -fno-asynchronous-unwind-tables.

Thank you for taking a look. I'm afraid this review comment is not based on reality, so i'm forced to ignore it.
The whole point of the clang change is to *NOT* lower such calls that we "know" will not unwind to a call, but into an invoke, with sanitization in the landingpad.