This is an archive of the discontinued LLVM Phabricator instance.

StackProtector: add unwind cleanup paths for instrumentation.
ClosedPublic

Authored by t.p.northover on Feb 9 2023, 3:35 AM.

Details

Reviewers
xiangzhangllvm
Summary

This is a mitigation patch for https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack protection is skipped if a function is returned through by an unwinder rather than the normal call/return path. The recent patch D139254 added the ability to instrument a visible unwind path, at least in the IR case (I'm working on the SelectionDAG instrumentation too) but there are still invisible unwinds it can't reach.

So this patch adds logic to DwarfEHPrepare that goes through a function, converting any call that might throw into an invoke to a simple resume cleanup, and adding cleanup clauses to existing landingpads that lack them. Obviously we don't really want to do this if it's wasted effort, so I also exposed requiresStackProtector from the actual StackProtector code to skip the extra paths if they won't be used.

Diff Detail

Event Timeline

t.p.northover created this revision.Feb 9 2023, 3:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 3:35 AM
t.p.northover requested review of this revision.Feb 9 2023, 3:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 3:35 AM

Realised I hadn't updated the DomTree correctly first time round.

xiangzhangllvm added a comment.EditedFeb 21 2023, 10:19 PM

I took a quick look, Seems we can't reproduce the report for test in https://bugs.chromium.org/p/llvm/issues/detail?id=30 in latest llvm.

$./a.out
*** stack smashing detected ***: terminated
Aborted (core dumped)
$./a.out test
*** stack smashing detected ***: terminated
Aborted (core dumped)

or we mark "noreturn" for all the throw cases.
I'll take a deeper look on this patch. Anyway thanks first.

llvm/lib/CodeGen/DwarfEHPrepare.cpp
297

I just header the func __gxx_personality_v0 in libc++ will handle the exception.
Does all the callees in the caller (with personality attribute) may do invisible unwinds for exception ? So you visibly generate the cleanup resume (to let current stack protect can see it then generate protection code. )

t.p.northover added inline comments.Mar 9 2023, 1:40 AM
llvm/lib/CodeGen/DwarfEHPrepare.cpp
297

That's pretty much it.

Actually, even without a personality function callees might unwind through this function (that just needs uwtables or similar) but without a personality function we have no way to intercept the exception and check the stack is OK on the way back through.

xiangzhangllvm accepted this revision.Mar 9 2023, 5:42 PM
xiangzhangllvm added inline comments.
llvm/lib/CodeGen/DwarfEHPrepare.cpp
176

LGTM, just a question.
Do we will really generate a Landing Pad for try catch pairs without cleanup in real world ?
Should it report error earlier if without cleanup?
Anyway this patch enhance for them.

This revision is now accepted and ready to land.Mar 9 2023, 5:42 PM
xiangzhangllvm added inline comments.Mar 9 2023, 6:15 PM
llvm/lib/CodeGen/DwarfEHPrepare.cpp
176

Sorry, here only for personality function not visible try catch.

t.p.northover closed this revision.Mar 16 2023, 4:34 AM

Thanks, committed as 203b6f31bb71.

llvm/lib/CodeGen/DwarfEHPrepare.cpp
176

Do we will really generate a Landing Pad for try catch pairs without cleanup in real world ?

Yes, cleanup is only used if there's some code that needs to be run regardless of whether an exception is caught (e.g. C++ destructors). So for example this leads to a landingpad without cleanup:

void func();
void foo() {
  try {
    func();
  } catch (int i) {
  }
}
thakis added a subscriber: thakis.Mar 18 2023, 2:17 PM

It looks like this broke building libunwind. Building it now fails with (when building clang and lld and so on with this patch, and then building libunwind with just-built clang and lld and so on):

ld64.lld: error: undefined symbol: __gxx_personality_v0

I suppose this needs some flag to turn it off when building libunwind?

This is in a GN build. I didn't check if it's also broken in a cmake build as it's difficult (for me, at least) to do a build of libunwind that builds with just-build clang and lld. The GN build doesn't build libunwind with any -fstack-protector- or _FORTIFY_SOURCE things, but maybe it does something else silly that causes stack protection to fire when it shouldn't (…any ideas what this might be?). Or maybe stack protection is just enabled by default on darwin? But I don't see a fno-stack-protector flag in the libunwind cmake build either.

…hm, adding "-fno-stack-protector" to libunwind's cflags seems to unbreak the libunwind build. So I figure I'll add that in the GN build. The CMake build doesn't seem to explicitly add that, so I'm guessing libunwind is broken there too. Can you look at that?

(clang/cmake/caches/Apple-stage2.cmake globally adds -fno-stack-protector as far as I can tell, so the Apple bots probably don't show this if they use that cache file. But if you don't use that cache file, it's probably broken now?)

I filed https://github.com/llvm/llvm-project/issues/61501 for this.

hans added a subscriber: hans.Mar 31 2023, 10:25 AM

Heads up that we bisected test failures in Chromium (targeting iOS/x86_64 simulator) to this revision: https://bugs.chromium.org/p/chromium/issues/detail?id=1428624 Hopefully we'll be able to figure out what's going on next week, but if anyone else has seen problems too it would be interesting to know.

hans added a comment.Apr 4 2023, 5:15 AM

Heads up that we bisected test failures in Chromium (targeting iOS/x86_64 simulator) to this revision: https://bugs.chromium.org/p/chromium/issues/detail?id=1428624 Hopefully we'll be able to figure out what's going on next week, but if anyone else has seen problems too it would be interesting to know.

I still don't have a root cause or stand-alone repro, but I did notice that this kicks in even with clang's -fno-exceptions, which seems like it might be wrong?

hans added a comment.Apr 4 2023, 5:42 AM

Finally, here's a stand-alone reproducer for desktop: https://bugs.chromium.org/p/chromium/issues/detail?id=1428624#c11

Can we revert this until that's addressed?

Oh dear, thanks for investigating this, I didn't notice until I got an e-mail about the revert.

ObjC never fails to disappoint it seems; I'll see what I can do about accommodating it.

I think I'd like to reapply this one.

The __gxx_personality_v0 error only reproduces for me if I compile libunwind with -fexceptions (or perhaps more likely remove the LLVM default -fno-exceptions), and I think that's not something that should be possible. It conceptually introduces exactly that dependence from libunwind -> libc++abi lld is complaining about, which is in the wrong direction (and libunwind doesn't actually do exceptions anyway). I've been unable to get this to happen in an actual CMake build though, LLVM_ENABLE_EH doesn't get forwarded to the LLVM_EANBLE_RUNTIMES builds in my tests.

So if necessary I think we should add logic to libunwind to ensure -fno-exceptions, but I haven't updated the patch for now because I can't find a scenario where it triggers.

And, having read up a bit on ObjC lifetime semantics, I think the test-case there (and the Chromium one it's derived from, which is basically the same) is relying on lifetime optimizations that aren't actually guaranteed. In main:

int main() {
  NSObject *p = create();
  __weak NSObject *q = p;
  @autoreleasepool {
    p = nil;
  }
  printf("p = %p\n", p);
  printf("q = %p\n", q); // Both p and q should be nil.
  return 0;
}

create returns an object that's notionally got a retain count of 1 and a queued autorelease.

Before this change, the retain that was part of assigning to p cancelled out the autorelease (the call to _objc_retainAutoreleasedReturnValue was right after create) so it went into the autoreleasepool block with a count of 1 and nothing pending. Assigning nil decremented the refcount and freed the object (zeroing weak ref q).

After this change the retain actually happened (the invoke separated the call to create from _objc_retainAutoreleasedReturnValue). So instead so it went into the block with a refcount of 2 and an autorelease pending. Since the object wasn't deleted, the weak reference never got zeroed out (I assume the real runtime will drain all autoreleasing objects at some point, just not here). But it was only ever an opportunistic optimization.

It's notable that

  1. NSObject *p; p = create(); always failed to be optimized and q remains valid even now.
  2. This is only at -O0: we actually make a stronger effort to ensure an _objc_retainAutoreleasedReturnValue stays with its call at higher levels so the optimization would still happen there.

So I think the Chromium test needs updating in this case (though I don't really know what it was trying to do so I'm not sure how). @hans, what do you think? Is that feasible? Do you disagree with the reasoning?

hans added a comment.Jun 9 2023, 5:49 AM

So I think the Chromium test needs updating in this case (though I don't really know what it was trying to do so I'm not sure how). @hans, what do you think? Is that feasible? Do you disagree with the reasoning?

I'm certainly not an expert on ObjC lifetime management. Based on the original test (quoted in https://bugs.chromium.org/p/chromium/issues/detail?id=1428624#c4) it sounds like it's already sensitive to when exactly things get released. So if we're confident that the new behavior is also correct, I suppose the test should be updated.

But, I also notice that my reproducer works with -fno-exceptions, which is surprising to me. Why is this change also affecting -fno-exceptions code?

There's actually a valid reason to build libunwind with -fexceptions, which is getting functions like _Unwind_Resume to not be marked nounwind, which would otherwise cause invalid LSDA to be generated when you LTO libunwind together with other libraries. See https://github.com/llvm/llvm-project/issues/56825 and @jyknight's comment on it.

When we reland this, would there be any way to account for the -disable-check-noreturn-call flag added in D141556, to avoid the size increase for people who don't want to pay it? CC @nickdesaulniers, who also had size concerns that motivated D147975.