This is an archive of the discontinued LLVM Phabricator instance.

[StackProtector] don't check stack protector before calling nounwind functions
ClosedPublic

Authored by nickdesaulniers on Apr 10 2023, 3:32 PM.

Details

Summary

https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821
introduced a additional checks before calling noreturn functions in
response to this security paper related to Catch Handler Oriented
Programming (CHOP):
https://download.vusec.net/papers/chop_ndss23.pdf
See also:
https://bugs.chromium.org/p/llvm/issues/detail?id=30

This causes stack canaries to be inserted in C code which was
unexpected; we noticed certain Linux kernel trees stopped booting after
this (in functions trying to initialize the stack canary itself).
https://github.com/ClangBuiltLinux/linux/issues/1815

There is no point checking the stack canary like this when exceptions
are disabled (-fno-exceptions or function is marked noexcept) or for C
code. The GCC patch for this issue does something similar:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7

Android measured a 2% regression in RSS as a result of d656ae280957 and
undid it globally:
https://android-review.googlesource.com/c/platform/build/soong/+/2524336

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 3:32 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
nickdesaulniers requested review of this revision.Apr 10 2023, 3:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 3:32 PM
xiangzhangllvm accepted this revision.Apr 10 2023, 6:42 PM
xiangzhangllvm added inline comments.
llvm/lib/CodeGen/StackProtector.cpp
490

I +1 for this constraints to reduce code size.
For C level except (setjmp longjmp) they mainly set/read register status from function parameter not stack. So low risk.

This revision is now accepted and ready to land.Apr 10 2023, 6:42 PM

It seems the noreturn call do miss the stack canary checking. Is it the expected behaviour in security point of view?

I can confirm that this change fixes both panics on boot (the initial android-4.14-stable one and the one I found with objtool/core + Josh's pending series to mark more functions as __noreturn).

nickdesaulniers marked an inline comment as done.Apr 11 2023, 3:08 PM

It seems the noreturn call do miss the stack canary checking. Is it the expected behaviour in security point of view?

It's my opinion, and I acknowledge that folks can have a reasonable disagreement, but reading https://download.vusec.net/papers/chop_ndss23.pdf, I think noreturn is a red-herring. The attack relies on __cxa_throw being noreturn, and no stack check being emitted before that. Once your in a function, it's hard to know if the callers stack frame has been corrupted, so it's better to check the stack canary in the call before the caller.

I agree with this point: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245#c6. Why don't we check the stack canary before non-noreturn calls? Why not before EVERY operation involving stack memory? performance

Whether the called function is noreturn or not is kind of irrelevant. Consider the following case:

void caller () {
  int arr [42] = {};
  int (*callee)() = get_callback();
  vuln();
  callee(arr);
}

callee may make control flow decisions based on stack allocated variables in caller. Those values may have been corrupted. (A traditional JOP attack) Does it matter if callee is noreturn? (I think not) So I think CHOP is a cute name, but it's just a JOP attack that finds gadgets in the C++ exception handling runtime.

My change undoes the excessive canary checking to C code (and noexcept C++ functions, or C++ code built with -fno-exceptions), but -fexceptions C++ code will have a penalty larger than just before calls to __cxa_throw, such as calls to explicitly marked noreturn functions. I think the penalty introduced in https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 (and https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a25982ada523689c8745d7fb4b1b93c8f5dab2e7) could be reduced by simply checking the stack canary before a call to __cxa_throw alone (and whatever the equivalent is on non-Itanium ABI systems) since the CHOP paper demonstrates how powerful the C++ exception handling gadgets may be. I'm just trying to undo the unreasonable cost for C programs with this change.


Android undid https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 here due to measured 2% RSS increase: https://android-review.git.corp.google.com/c/platform/build/soong/+/2524336. I will add that note to the commit message. Unless there's more feedback about this change, such as something I may be missing, I plan to commit this tomorrow and request a cherry-pick to the release/16.x branch.

nickdesaulniers edited the summary of this revision. (Show Details)
  • add note about Android

For this case

void caller () {
  int arr [42] = {};
  int (*callee)() = get_callback();
  vuln();
  callee(arr);
}

If callee is not noreturn function, caller has a chance to check the canary when it returns.

As a tradeoff of security, code size and performance, I don't object to land the patch. I am curious about the noreturn function that is caught in your case. Could you show some examples of those noreturn function?

I think noreturn is a red-herring

noreturn without nounwind serves as a hint that the callee unconditionally throws an exception. (The only other alternatives are that the callee unconditionally calls abort(), or unconditionally spins in an infinite loop; such functions are relatively rare in most codebases.) But it's not really a strong heuristic; functions that are noreturn aren't guaranteed to throw, and functions that aren't noreturn frequently throw.

Longer-term, it might make sense to consider encoding information about the stack canary into DWARF data; if the unwinder can check the canary itself, that catches the relevant cases without a bunch of inline code.

Android undid https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 here due to measured 2% RSS increase: https://android-review.git.corp.google.com/c/platform/build/soong/+/2524336. I will add that note to the commit message. Unless there's more feedback about this change, such as something I may be missing, I plan to commit this tomorrow and request a cherry-pick to the release/16.x branch.

The non-internal version of this link is https://android-review.googlesource.com/c/platform/build/soong/+/2524336. You should update it in the Summary as well.

One other thing to point out here is that we measured D143637 internally as having an even larger size increase than D139254. D143637's been reverted for now, but when it's put up again we probably want a similar check there and/or an option to turn that off as well.

I'm a little nervous about the assumptions that we don't need to consider for C code. There're other exception behaviors like SJLJ, signal and Windows SEH. IIRC, both SJLJ and SEH will do stack unwind as well.
And I think they are more risky somehow, because they can catch up hardware exceptions like dividend by 0, segment fault or illegal instructions. These exceptions are relative eaiser for an attacker to trigger than a normal C++ exception.

SJLJ, signal

In practice, most noreturn functions in C are going to be calls to abort()-like functions, not longjmp()-like functions. And it doesn't really make sense to check the stack canary before we call abort(); the program is going to die before we touch any corrupted memory.

We could consider trying to encode more information into the IR, I guess. In a lot of cases, we know a "noreturn nounwind" function is abort()-like, and not longjmp()-like; we just don't bother recording it anywhere because nothing cared before this.

And I think they are more risky somehow, because they can catch up hardware exceptions like dividend by 0, segment fault or illegal instructions. These exceptions are relative eaiser for an attacker to trigger than a normal C++ exception.

Unwinding on divide-by-zero is a Windows-specific thing, as far as I know. It's possible we want to do something different on Windows? But there are codesize costs here, and nobody has done any research on the Windows-specific side of this.

we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself).

The kernel should probably disable stack canaries for the functions in question?

We could consider trying to encode more information into the IR, I guess. In a lot of cases, we know a "noreturn nounwind" function is abort()-like, and not longjmp()-like; we just don't bother recording it anywhere because nothing cared before this.

In IR if a function call is within try{} block, front-end should generate invoke instruction, so we may just check if the callsite is invoke instruction.

SJLJ, signal

In practice, most noreturn functions in C are going to be calls to abort()-like functions, not longjmp()-like functions. And it doesn't really make sense to check the stack canary before we call abort(); the program is going to die before we touch any corrupted memory.

We could consider trying to encode more information into the IR, I guess. In a lot of cases, we know a "noreturn nounwind" function is abort()-like, and not longjmp()-like; we just don't bother recording it anywhere because nothing cared before this.

And I think they are more risky somehow, because they can catch up hardware exceptions like dividend by 0, segment fault or illegal instructions. These exceptions are relative eaiser for an attacker to trigger than a normal C++ exception.

Unwinding on divide-by-zero is a Windows-specific thing, as far as I know. It's possible we want to do something different on Windows? But there are codesize costs here, and nobody has done any research on the Windows-specific side of this.

we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself).

The kernel should probably disable stack canaries for the functions in question?

Thanks @efriedma! Makes sense to me. Seem only Windows may have problem here especially given MSVC supports __declspec(noreturn) too. It's not a big deal since we can check eh-asynch from module flag, but I don't think we need to do it in this patch.

We could consider trying to encode more information into the IR, I guess. In a lot of cases, we know a "noreturn nounwind" function is abort()-like, and not longjmp()-like; we just don't bother recording it anywhere because nothing cared before this.

In IR if a function call is within try{} block, front-end should generate invoke instruction, so we may just check if the callsite is invoke instruction.

But the C code do not have try{} block.
And I recheck the longjmp(ptr env, int val), if the attacker re-write the parameter env and prepare the “meticulously prepared” reg data in the flaky env , it is possible to attack.

We could consider trying to encode more information into the IR, I guess. In a lot of cases, we know a "noreturn nounwind" function is abort()-like, and not longjmp()-like; we just don't bother recording it anywhere because nothing cared before this.

In IR if a function call is within try{} block, front-end should generate invoke instruction, so we may just check if the callsite is invoke instruction.

But the C code do not have try{} block.
And I recheck the longjmp(ptr env, int val), if the attacker re-write the parameter env and prepare the “meticulously prepared” reg data in the flaky env , it is possible to attack.

I think the point is noreturn. If it is a rare case using longjmp in a noreturn function, I think it's fine to not do much thing for it.

LuoYuanke added inline comments.Apr 12 2023, 2:48 AM
llvm/lib/CodeGen/StackProtector.cpp
490

Maybe check isa<InvokeInst>(CB) whose semantics (https://llvm.org/docs/LangRef.html#invoke-instruction) define normal control flow and exception control flow. But that may increase the code size for c++ code.

For this case

void caller () {
  int arr [42] = {};
  int (*callee)() = get_callback();
  vuln();
  callee(arr);
}

If callee is not noreturn function, caller has a chance to check the canary when it returns.

The issue is that if callee is not noreturn, vuln may have overwritten the value of callee (if it was on the stack) and the stack canary is not checked BEFORE transferring control flow to an arbitrary location. But if callee was noreturn we would? Echoing the sentiment of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245#c6, that lack of symmetry seems odd.

Consider another example:

void caller () {
  int data [42] = {};
  int (*callback)() = get_callback();
  vuln();
  callee(callback, data);
}

So the stack gets corrupted within caller's frame, possibly modifying callback if it was on the stack. caller gets a stack canary, but the check is after the return of callee to caller. In the meantime, callee assumes the stack data is not corrupted and invokes callback, which may be arbitrary attacker code now due to vuln, and might not actually ever return where any such stack canary would be checked. Oops!

In these two examples, nothing is noreturn, and yet you can still hijack control flow, so why treat noreturn functions specially? This is my interpretation of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245#c6 and I agree with the sentiment.

Once a caller has had its stack frame corrupted, literally any stack references passed to a callee are dangerous to use to make control flow decisions. I also don't think that the callee can detect such corruption of the parent's/caller's frame.

Another issue is that a stack canary is weak indicator that the stack has been tampered with. You can maliciously modify stack memory without overwriting the stack slot containing the canary, or if the canary was leaked it can be rewritten in the correct slot. So adding more and more and more stack canary checks is nearly meaningless work (IMO).

As a tradeoff of security, code size and performance, I don't object to land the patch. I am curious about the noreturn function that is caught in your case. Could you show some examples of those noreturn function?

The kernel code is basically:

void start_kernel () {
  ...
  initialize_the_stack_canary();
  do_rest();
}
void do_rest () {
  ...
  while (1);
}

LLVM:

  1. infers that do_rest is implicitly noreturn.
  2. decides that start_kernel must now have a stack canary

At runtime:

  1. start_kernel writes an uninitialized canary value into a stack slot.
  2. initialize_the_stack_canary changes the value of canary (globally, not in the stack slot)
  3. canary reloaded and compared against the value in the stack slot, which was uninitialized garbage
  4. __stack_chk_fail -> panic -> machine fails to boot at startup. ---

The kernel should probably disable stack canaries for the functions in question?

Yes, I plan to do that in addition to this change. But it's important to demonstrate to the kernel community that we're working on this problem from both angles. Because kernel developers will ask "what is your toolchain smoking?" (rightfully) when they see this.

For the kernel, I will have to:

  1. add support for __attribute__((no_stack_protector)). This doesn't exist for all supported toolchain versions (https://docs.kernel.org/process/changes.html) particularly GCC only gained support in gcc-11 which is relatively recent.
  2. backport this to all supported kernel versions (https://kernel.org/)

It is infeasible (IMO) to move start_kernel to its own TU and build that function with -fno-stack-protector; the function attributes give us the ideal granularity, but aren't available everywhere we need to support (temporal versions of kernel sources, and temporal versions of supported toolchains).


I think noreturn is a red-herring

noreturn without nounwind serves as a hint that the callee unconditionally throws an exception. (The only other alternatives are that the callee unconditionally calls abort(), or unconditionally spins in an infinite loop; such functions are relatively rare in most codebases.) But it's not really a strong heuristic; functions that are noreturn aren't guaranteed to throw, and functions that aren't noreturn frequently throw.

Is that a vote of confidence for or against this patch?

Longer-term, it might make sense to consider encoding information about the stack canary into DWARF data; if the unwinder can check the canary itself, that catches the relevant cases without a bunch of inline code.

DWARF does encode location lists which IIRC may contain which stack slot certain variables reside in for certain ranges of the program counter. I could imagine a special new tag to say "here's where the stack canary is."

What happens if code is built with -g0? Does that mean such info is not available? Or does C++ always emit such metadata?


Android undid https://reviews.llvm.org/rGd656ae28095726830f9beb8dbd4d69f5144ef821 here due to measured 2% RSS increase: https://android-review.git.corp.google.com/c/platform/build/soong/+/2524336. I will add that note to the commit message. Unless there's more feedback about this change, such as something I may be missing, I plan to commit this tomorrow and request a cherry-pick to the release/16.x branch.

The non-internal version of this link is https://android-review.googlesource.com/c/platform/build/soong/+/2524336. You should update it in the Summary as well.

Good catch, will do.


We could consider trying to encode more information into the IR, I guess. In a lot of cases, we know a "noreturn nounwind" function is abort()-like, and not longjmp()-like; we just don't bother recording it anywhere because nothing cared before this.

In IR if a function call is within try{} block, front-end should generate invoke instruction, so we may just check if the callsite is invoke instruction.

It's a good idea, but I don't think it's correct. Consider the following example:

void foo ();
void bar ();


void baz () {
    foo();
    try {
        bar();
    } catch (int e) {}
}

invoke is only used in baz when calling bar, but not when calling foo. If foo calls __cxa_throw, it will do so via call not invoke (and so would bar). So foo (and bar) would not check the stack canary before calling __cxa_throw, which is what the CHOP attack paper was concerned with. Is my understanding precise?


Unless there's more feedback about this change, such as something I may be missing, I plan to commit this tomorrow and request a cherry-pick to the release/16.x branch.

Based on the feedback, I will hold this for one more day (will update the broken link in the commit message though).

nickdesaulniers edited the summary of this revision. (Show Details)

I think noreturn is a red-herring

noreturn without nounwind serves as a hint that the callee unconditionally throws an exception. (The only other alternatives are that the callee unconditionally calls abort(), or unconditionally spins in an infinite loop; such functions are relatively rare in most codebases.) But it's not really a strong heuristic; functions that are noreturn aren't guaranteed to throw, and functions that aren't noreturn frequently throw.

Is that a vote of confidence for or against this patch?

For, I guess. It's fundamentally a heuristic, and I don't think the cost/benefit works out for diagnosing "noreturn nounwind" functions.

Longer-term, it might make sense to consider encoding information about the stack canary into DWARF data; if the unwinder can check the canary itself, that catches the relevant cases without a bunch of inline code.

DWARF does encode location lists which IIRC may contain which stack slot certain variables reside in for certain ranges of the program counter. I could imagine a special new tag to say "here's where the stack canary is."

What happens if code is built with -g0? Does that mean such info is not available? Or does C++ always emit such metadata?

I was thinking alongside DWARF unwind (which is not debug info; it's the info necessary to do EH unwind). I haven't thought through how it would be encoded.

we noticed certain Linux kernel trees stopped booting after this (in functions trying to initialize the stack canary itself).

The kernel should probably disable stack canaries for the functions in question?

Sent this series: https://lore.kernel.org/llvm/20230412-no_stackp-v1-0-46a69b507a4b@google.com/

(as an addition to this patch; not a substitute to or in preference of, see also Android disabling the previous behavior globally)

ojeda added a subscriber: ojeda.Apr 12 2023, 1:13 PM
hiraditya added inline comments.Apr 12 2023, 4:29 PM
llvm/lib/CodeGen/StackProtector.cpp
491

nit: tab

I was thinking alongside DWARF unwind (which is not debug info; it's the info necessary to do EH unwind). I haven't thought through how it would be encoded.

The unwind table is loaded to memory by dynamic loader? That would increase the binary size as well.

For this case

void caller () {
  int arr [42] = {};
  int (*callee)() = get_callback();
  vuln();
  callee(arr);
}

If callee is not noreturn function, caller has a chance to check the canary when it returns.

The issue is that if callee is not noreturn, vuln may have overwritten the value of callee (if it was on the stack) and the stack canary is not checked BEFORE transferring control flow to an arbitrary location. But if callee was noreturn we would? Echoing the sentiment of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245#c6, that lack of symmetry seems odd.

Consider another example:

void caller () {
  int data [42] = {};
  int (*callback)() = get_callback();
  vuln();
  callee(callback, data);
}

So the stack gets corrupted within caller's frame, possibly modifying callback if it was on the stack. caller gets a stack canary, but the check is after the return of callee to caller. In the meantime, callee assumes the stack data is not corrupted and invokes callback, which may be arbitrary attacker code now due to vuln, and might not actually ever return where any such stack canary would be checked. Oops!

In these two examples, nothing is noreturn, and yet you can still hijack control flow, so why treat noreturn functions specially? This is my interpretation of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245#c6 and I agree with the sentiment.

LLVM stack protector is not good at cover this 2 cases, the function pointers are in local variable area, But the position of LLVM stack guard are between local variable area and return address.
If the attacker only need to "cover" the local variable area on stack, the stack protector didn't work.

Once a caller has had its stack frame corrupted, literally any stack references passed to a callee are dangerous to use to make control flow decisions. I also don't think that the callee can detect such corruption of the parent's/caller's frame.

Another issue is that a stack canary is weak indicator that the stack has been tampered with. You can maliciously modify stack memory without overwriting the stack slot containing the canary, or if the canary was leaked it can be rewritten in the correct slot. So adding more and more and more stack canary checks is nearly meaningless work (IMO).

Very correct !So let's focus on what this patch mainly want to do ? It can't cover all stack attack case, it mainly to reduce the most common stack attack--attacker cover the stack from local variable area to fixed stack slot (return address or parameter address)--LLVM stack protect can work well because stack guard happens to be in this area.

As a tradeoff of security, code size and performance, I don't object to land the patch. I am curious about the noreturn function that is caught in your case. Could you show some examples of those noreturn function?

The kernel code is basically:

void start_kernel () {
  ...
  initialize_the_stack_canary();
  do_rest();
}
void do_rest () {
  ...
  while (1);
}

LLVM:

  1. infers that do_rest is implicitly noreturn.
  2. decides that start_kernel must now have a stack canary

At runtime:

  1. start_kernel writes an uninitialized canary value into a stack slot.
  2. initialize_the_stack_canary changes the value of canary (globally, not in the stack slot)
  3. canary reloaded and compared against the value in the stack slot, which was uninitialized garbage
  4. __stack_chk_fail -> panic -> machine fails to boot at startup. ---

It is belong to user code change the stack guard value, it should not use stack protect.

The kernel should probably disable stack canaries for the functions in question?

Yes, I plan to do that in addition to this change. But it's important to demonstrate to the kernel community that we're working on this problem from both angles. Because kernel developers will ask "what is your toolchain smoking?" (rightfully) when they see this.

For the kernel, I will have to:

  1. add support for __attribute__((no_stack_protector)). This doesn't exist for all supported toolchain versions (https://docs.kernel.org/process/changes.html) particularly GCC only gained support in gcc-11 which is relatively recent.
  2. backport this to all supported kernel versions (https://kernel.org/)

It is infeasible (IMO) to move start_kernel to its own TU and build that function with -fno-stack-protector; the function attributes give us the ideal granularity, but aren't available everywhere we need to support (temporal versions of kernel sources, and temporal versions of supported toolchains).

In my eye, Linux kernel should pay more attention on performance. Maybe we shouldn't build Linux kernel with stack-protector at all, especially if it has a lot of initialize_the_stack_canary() like functions.

I still +1 for this patch.

It is belong to user code change the stack guard value, it should not use stack protect.

I think these are distinct (multiple) canaries for distinct stacks. The kernel has its own canary value that may be per-CPU IIUC (so has to be initialized for each CPU that is onlined). This canary is not shared with userspace AFAIK. Userspace has its own canary values for the userspace stacks, sometimes thread local, sometimes a single global per process.

In my eye, Linux kernel should pay more attention on performance. Maybe we shouldn't build Linux kernel with stack-protector at all, especially if it has a lot of initialize_the_stack_canary() like functions.

The use of stack protectors to protect the Linux kernel is configurable, default on, and may be disabled. The stack canaries are initialized once per CPU when the CPU is brough online IIUC, but I haven't compared each architecture's implementation.

I still +1 for this patch.

Backport requested to clang-16: https://github.com/llvm/llvm-project/issues/62119. Thanks for the review and discussions.