Page MenuHomePhabricator

[libunwind] Add configuration to disable sigreturn frame check
AbandonedPublic

Authored by smeenai on Apr 8 2022, 5:33 PM.

Details

Reviewers
compnerd
danielkiss
rprichard
Group Reviewers
Restricted Project
Summary

https://reviews.llvm.org/D90898 added a check to libunwind for a
sigreturn frame on Linux aarch64. The check operates by reading from
memory, and the patch notes the possibility of segfaults occurring as a
result. We're observing such segfaults internally (possibly due to
invalid unwind info in libraries which are out of our control). Add a
configuration option to omit this sigreturn check, for users who are
willing to give up the ability to unwind through sigreturn frames in
return for avoiding potential segfaults.

The alternative would be to check memory before attempting to access it,
and that's the approach taken by nongnu libunwind [1]. There were
concerns raised on the original review about the syscalls needed for
this (e.g. pipe) not being available because of OS security
configurations, and there's also the overhead incurred by the memory
checking, so I'm inclined to go with the current approach for now.

[1] https://github.com/libunwind/libunwind/blob/e07b43c02d5cf1ea060c018fdf2e2ad34b7c7d80/src/aarch64/Ginit.c#L187

Diff Detail

Event Timeline

smeenai created this revision.Apr 8 2022, 5:33 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 8 2022, 5:33 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
smeenai requested review of this revision.Apr 8 2022, 5:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 5:33 PM
smeenai edited the summary of this revision. (Show Details)Apr 8 2022, 5:34 PM

Thanks for the patch.

libunwind/CMakeLists.txt
91

Please an option here like this.

smeenai updated this revision to Diff 421966.Apr 11 2022, 10:46 AM

Add CMake option

smeenai marked an inline comment as done.Apr 11 2022, 10:46 AM
smeenai updated this revision to Diff 421967.Apr 11 2022, 10:48 AM

Use suggested wording for CMake option, which I missed earlier

We're observing such segfaults internally (possibly due to invalid unwind info in libraries which are out of our control).

It would be interesting to know whether Clang does anything wrong here. @chill is improving such stuff.

(Without signal trampoline frame recognization, it could be bad for crash reporting: https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler#glibc-aarch64 )

smeenai added a comment.EditedApr 11 2022, 3:56 PM

We're observing such segfaults internally (possibly due to invalid unwind info in libraries which are out of our control).

It would be interesting to know whether Clang does anything wrong here. @chill is improving such stuff.

(Without signal trampoline frame recognization, it could be bad for crash reporting: https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler#glibc-aarch64 )

Unfortunately, we're only observing this crash on a small subset of devices in the wild, and I haven't had any luck reproducing it locally, so I can't say whether Clang is producing invalid unwind info or there's some system libraries on those particular devices with issues.

Our crash reporting uses a different unwinding flow which won't be affected by the change here. This configuration will only affect the unwinding used for exceptions, and we're willing to try not being able to unwind through sigreturn frames for that scenario.

The "fix" (that's still subject to time-of-check-to-time-of-use issues) would be to see if memory is readable before accessing it, like nongnu libunwind does, but there were concerns raised in the original review about the system calls required not being accessible because of security configurations (https://reviews.llvm.org/D90898#2384775), plus the overhead of the checking. Do you have any ideas for that?

... that's still subject to time-of-check-to-time-of-use issues

Using /proc/self/maps would be subject to TOCTOU, but I think most methods wouldn't, e.g.:

  • Open /proc/self/mem and pread() the address. This seems strictly better than /proc/self/maps?
  • Create a pipe using pipe(), write() the bytes into the pipe buffer and read() them back out. I believe a Linux pipe buffer is guaranteed to be big enough (>= 8 bytes).
  • process_vm_readv

I wonder if security configurations are a problem. Maybe I should experiment on an Android build.

... that's still subject to time-of-check-to-time-of-use issues

Using /proc/self/maps would be subject to TOCTOU, but I think most methods wouldn't, e.g.:

  • Open /proc/self/mem and pread() the address. This seems strictly better than /proc/self/maps?
  • Create a pipe using pipe(), write() the bytes into the pipe buffer and read() them back out. I believe a Linux pipe buffer is guaranteed to be big enough (>= 8 bytes).
  • process_vm_readv

I wonder if security configurations are a problem. Maybe I should experiment on an Android build.

With all those methods, there's still the chance (however unlikely) that the address is readable at the time of the check but somehow becomes unreadable by the time we perform the actual read, right? I doubt it matters much in practice though.

For the pipe, since we'll be checking for validity multiple times, I imagine we'll need to empty out the buffer at some point. libunwind seems to use mincore if available and the pipe as a fallback, but I'm not really understanding how mincore would work here, since a page could be readable even though it's not currently resident.

Using /proc/self/maps would be subject to TOCTOU, but I think most methods wouldn't, e.g.:

  • Open /proc/self/mem and pread() the address. This seems strictly better than /proc/self/maps?
  • Create a pipe using pipe(), write() the bytes into the pipe buffer and read() them back out. I believe a Linux pipe buffer is guaranteed to be big enough (>= 8 bytes).
  • process_vm_readv

I wonder if security configurations are a problem. Maybe I should experiment on an Android build.

With all those methods, there's still the chance (however unlikely) that the address is readable at the time of the check but somehow becomes unreadable by the time we perform the actual read, right? I doubt it matters much in practice though.

The unwinder needs to read two aligned 4-byte words at the target PC. I was thinking that it could replace the ordinary reads with system calls. e.g. For the pipe trick, write the target 8 bytes into a pipe and then read them back out. If multiple threads share a pipe fd, then they'd need the synchronize their use of the pipe.

I wonder if we'd need to close the pipe fd for dlclose. On Android, the unwinder is linked statically into app shared objects.

For the pipe, since we'll be checking for validity multiple times, I imagine we'll need to empty out the buffer at some point. libunwind seems to use mincore if available and the pipe as a fallback, but I'm not really understanding how mincore would work here, since a page could be readable even though it's not currently resident.

Yeah, mincore doesn't seem to do what's needed here.

Using /proc/self/maps would be subject to TOCTOU, but I think most methods wouldn't, e.g.:

  • Open /proc/self/mem and pread() the address. This seems strictly better than /proc/self/maps?
  • Create a pipe using pipe(), write() the bytes into the pipe buffer and read() them back out. I believe a Linux pipe buffer is guaranteed to be big enough (>= 8 bytes).
  • process_vm_readv

I wonder if security configurations are a problem. Maybe I should experiment on an Android build.

With all those methods, there's still the chance (however unlikely) that the address is readable at the time of the check but somehow becomes unreadable by the time we perform the actual read, right? I doubt it matters much in practice though.

The unwinder needs to read two aligned 4-byte words at the target PC. I was thinking that it could replace the ordinary reads with system calls. e.g. For the pipe trick, write the target 8 bytes into a pipe and then read them back out. If multiple threads share a pipe fd, then they'd need the synchronize their use of the pipe.

I wonder if we'd need to close the pipe fd for dlclose. On Android, the unwinder is linked statically into app shared objects.

For the pipe, since we'll be checking for validity multiple times, I imagine we'll need to empty out the buffer at some point. libunwind seems to use mincore if available and the pipe as a fallback, but I'm not really understanding how mincore would work here, since a page could be readable even though it's not currently resident.

Yeah, mincore doesn't seem to do what's needed here.

https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/address_is_readable.cc has gone through several iterations. We can use rt_sigprocmask.

Using /proc/self/maps would be subject to TOCTOU, but I think most methods wouldn't, e.g.:

  • Open /proc/self/mem and pread() the address. This seems strictly better than /proc/self/maps?
  • Create a pipe using pipe(), write() the bytes into the pipe buffer and read() them back out. I believe a Linux pipe buffer is guaranteed to be big enough (>= 8 bytes).
  • process_vm_readv

I wonder if security configurations are a problem. Maybe I should experiment on an Android build.

With all those methods, there's still the chance (however unlikely) that the address is readable at the time of the check but somehow becomes unreadable by the time we perform the actual read, right? I doubt it matters much in practice though.

The unwinder needs to read two aligned 4-byte words at the target PC. I was thinking that it could replace the ordinary reads with system calls. e.g. For the pipe trick, write the target 8 bytes into a pipe and then read them back out. If multiple threads share a pipe fd, then they'd need the synchronize their use of the pipe.

I wonder if we'd need to close the pipe fd for dlclose. On Android, the unwinder is linked statically into app shared objects.

For the pipe, since we'll be checking for validity multiple times, I imagine we'll need to empty out the buffer at some point. libunwind seems to use mincore if available and the pipe as a fallback, but I'm not really understanding how mincore would work here, since a page could be readable even though it's not currently resident.

Yeah, mincore doesn't seem to do what's needed here.

https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/address_is_readable.cc has gone through several iterations. We can use rt_sigprocmask.

@rprichard, do you know if this would work for Android? It has the TOCTOU issue, but I imagine it's much simpler than having to manage and synchronize the pipe fd, and we could live with the TOCTOU in practice.

https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/address_is_readable.cc has gone through several iterations. We can use rt_sigprocmask.

@rprichard, do you know if this would work for Android? It has the TOCTOU issue, but I imagine it's much simpler than having to manage and synchronize the pipe fd, and we could live with the TOCTOU in practice.

I'm glad @MaskRay found this -- I think it will probably work, and it seems better than assuming the PC is readable.

I see rt_procsigmask listed in bionic/libc/SYSCALLS.TXT, and I don't see it mentioned in any of the bionic/libc/SECCOMP*.txt files. I think that means seccomp is allowing the system call. I looked at kernel/signal.c, and AFAICT it's not doing any security checks that could be a problem. Bionic itself uses rt_sigprocmask for (at least) spawning new processes, creating/exiting threads, TLS-related locking, POSIX timers, and abort(). I think any seccomp-like blocking of rt_sigprocmask would have to be very targeted, so I think the syscall is probably allowed everywhere on Android.

It is assuming that the kernel will validate the address before validating the how. The kernel has a principle of not breaking userland -- is there a more specific guarantee we can rely on? e.g. The code has this comment:

// This strategy depends on Linux implementation details,
// so we rely on the test to alert us if it stops working.

The kernel source is structured as two wrappers around sigprocmask, SYSCALL_DEFINE4(rt_sigprocmask, ...) and COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, ...). The wrappers copy user memory to/from kernel memory before calling sigprocmask, so it makes sense that they would validate the address first.

https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/address_is_readable.cc has gone through several iterations. We can use rt_sigprocmask.

@rprichard, do you know if this would work for Android? It has the TOCTOU issue, but I imagine it's much simpler than having to manage and synchronize the pipe fd, and we could live with the TOCTOU in practice.

I'm glad @MaskRay found this -- I think it will probably work, and it seems better than assuming the PC is readable.

I see rt_procsigmask listed in bionic/libc/SYSCALLS.TXT, and I don't see it mentioned in any of the bionic/libc/SECCOMP*.txt files. I think that means seccomp is allowing the system call. I looked at kernel/signal.c, and AFAICT it's not doing any security checks that could be a problem. Bionic itself uses rt_sigprocmask for (at least) spawning new processes, creating/exiting threads, TLS-related locking, POSIX timers, and abort(). I think any seccomp-like blocking of rt_sigprocmask would have to be very targeted, so I think the syscall is probably allowed everywhere on Android.

It is assuming that the kernel will validate the address before validating the how. The kernel has a principle of not breaking userland -- is there a more specific guarantee we can rely on? e.g. The code has this comment:

// This strategy depends on Linux implementation details,
// so we rely on the test to alert us if it stops working.

The kernel source is structured as two wrappers around sigprocmask, SYSCALL_DEFINE4(rt_sigprocmask, ...) and COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, ...). The wrappers copy user memory to/from kernel memory before calling sigprocmask, so it makes sense that they would validate the address first.

I talked with cferris today (the maintainer of Android's libunwindstack), and I think LLVM's libunwind could just use process_vm_readv, which is more straightforward and would also avoid the TOCTOU issue.

process_vm_readv is used by libunwindstack, and cferris wasn't aware of any situation where process_vm_readv was blocked.

process_vm_readv is presumably slower than rt_procsigmask, but that's probably acceptable? It's only used when the unwinder can't find DWARF unwind info for a PC.

process_vm_readv requires a 3.2 kernel or newer, but this is syscall only needed for arm64, and arm64 support wasn't added until a later kernel version (3.7, it appears?).

I think the kernel special-cases process_vm_readv so it doesn't need ptrace permissions when reading the calling process's address space. When mm == current->mm, the kernel skips the ptrace_may_access call:

smeenai abandoned this revision.May 6 2022, 1:19 PM

https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/address_is_readable.cc has gone through several iterations. We can use rt_sigprocmask.

@rprichard, do you know if this would work for Android? It has the TOCTOU issue, but I imagine it's much simpler than having to manage and synchronize the pipe fd, and we could live with the TOCTOU in practice.

I'm glad @MaskRay found this -- I think it will probably work, and it seems better than assuming the PC is readable.

I see rt_procsigmask listed in bionic/libc/SYSCALLS.TXT, and I don't see it mentioned in any of the bionic/libc/SECCOMP*.txt files. I think that means seccomp is allowing the system call. I looked at kernel/signal.c, and AFAICT it's not doing any security checks that could be a problem. Bionic itself uses rt_sigprocmask for (at least) spawning new processes, creating/exiting threads, TLS-related locking, POSIX timers, and abort(). I think any seccomp-like blocking of rt_sigprocmask would have to be very targeted, so I think the syscall is probably allowed everywhere on Android.

It is assuming that the kernel will validate the address before validating the how. The kernel has a principle of not breaking userland -- is there a more specific guarantee we can rely on? e.g. The code has this comment:

// This strategy depends on Linux implementation details,
// so we rely on the test to alert us if it stops working.

The kernel source is structured as two wrappers around sigprocmask, SYSCALL_DEFINE4(rt_sigprocmask, ...) and COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, ...). The wrappers copy user memory to/from kernel memory before calling sigprocmask, so it makes sense that they would validate the address first.

I talked with cferris today (the maintainer of Android's libunwindstack), and I think LLVM's libunwind could just use process_vm_readv, which is more straightforward and would also avoid the TOCTOU issue.

process_vm_readv is used by libunwindstack, and cferris wasn't aware of any situation where process_vm_readv was blocked.

process_vm_readv is presumably slower than rt_procsigmask, but that's probably acceptable? It's only used when the unwinder can't find DWARF unwind info for a PC.

process_vm_readv requires a 3.2 kernel or newer, but this is syscall only needed for arm64, and arm64 support wasn't added until a later kernel version (3.7, it appears?).

I think the kernel special-cases process_vm_readv so it doesn't need ptrace permissions when reading the calling process's address space. When mm == current->mm, the kernel skips the ptrace_may_access call:

Thanks for all the research and info here! I'm abandoning this since the process_vm_readv route seems like a much better path. (I likely won't have time to take a swing at that for a couple of weeks, but I'm also happy for anyone else to try it if they have the time, of course.)

D126343 implements the process_vm_readv approach