This is an archive of the discontinued LLVM Phabricator instance.

libhwasan interceptor ABI intercept longjmp/setjmp
ClosedPublic

Authored by mmalcomson on Oct 16 2019, 8:34 AM.

Details

Summary

The hwasan interceptor ABI doesn't have interceptors for longjmp and setjmp.
This patch introduces them.

We require the size of the jmp_buf on the platform to be at least as large as the jmp_buf in our implementation.
To enforce this we compile hwasan_type_test.cpp that ensures a compile time failure if this is not true.

Tested on both GCC and clang using an AArch64 virtual machine.

Diff Detail

Event Timeline

mmalcomson created this revision.Oct 16 2019, 8:34 AM

Added a test in the hwasan testsuite.

eugenis added inline comments.Oct 17 2019, 3:25 PM
compiler-rt/lib/hwasan/hwasan_interceptors.cpp
224

This won't work on Android, please add && !SANITIZER_ANDROID.

Just FYI, this is what we do there:
https://android.googlesource.com/platform/bionic/+/919dc05d66b129ad6f34fad95322efb6de245754/libc/arch-arm64/bionic/setjmp.S#214

271

Why can't this asm be replaced with REAL(longjmp)(env, retval) ?

mmalcomson added inline comments.Oct 18 2019, 3:27 AM
compiler-rt/lib/hwasan/hwasan_interceptors.cpp
224

Can do, but I don't fully understand -- could you explain why a bit more?

I was thinking that the Android "platform ABI" would handle longjmp, but the Android "interceptor ABI" would have the same missing feature as any other interceptor ABI.

I would have guessed that the feature check in the bionic sources would be essentially checking for the platform ABI -- since if using the interceptor ABI bionic wouldn't know that hwasan is being used.

271

As I understand it, the order that registers are stored in the jmp_buf structure is undefined by any architecture ABI.
Hence different longjmp/setjmp implementations are likely to have a different order of registers (and a quick comparison of the bionic and glibc setjmps seem to confirm this).

Because of this, if using REAL(longjmp), we'd need to also use REAL(setjmp), and when using REAL(setjmp) we wouldn't know where the stack pointer is in the jmp_buf.

Also, some libc's (like glibc) mangle their stack pointer to improve security, and we wouldn't be able to read it even if we did know where it was (e.g. if hard-coded offset different for each platform we compile for or only targeting one libc).

mmalcomson added inline comments.Oct 18 2019, 7:08 AM
compiler-rt/lib/hwasan/hwasan_interceptors.cpp
224

Ah ... I figure the reason it won't work for Android is to do with the dynamic loader feature mentioned in https://reviews.llvm.org/D55986 .

Hence a setjmp could be from the libc while the longjmp is from the interceptor ... correct?

eugenis added inline comments.Oct 18 2019, 5:01 PM
compiler-rt/lib/hwasan/hwasan_interceptors.cpp
224

Yeah, something like that is possible, but only when hwasan is loaded late (ex. through dlopen). I don't think we want to support that use case anymore.

One reason this would not work on Android is that the symbol for sigsetjmp in bionic is actually called sigsetjmp, and not __sigsetjmp. Also, this code assumes that jmp_buf in the target libc is at least as large as hwasan definition. It is likely to be case, but not guaranteed.

Anyway, I agree that this approach is better than the alternative. Could you fix it to work on Android, too? This is the only AArch64 testing environment we have on the buildbot. I see a few other problems when trying to build, like missing definition for __sigset_t.

271

OK, this is a good argument. We would not want to detect libc version / change this code every time they invent a new jmp_buf mangling scheme.

357

These are not needed if you are not calling through to REAL(func).

compiler-rt/lib/hwasan/hwasan_setjmp.S
2

Please add a license header.

compiler-rt/test/hwasan/TestCases/longjmp-setjmp-interception.c
6

This should say something like

REQUIRES: aarch64-target-arch

I've tried to get this working for Android.

I've built for android (and fixed the build warnings), but have not yet managed to test at runtime on Android (latest hurdle was running a too-old version of Android for the TLS slots to match).

I'm putting what I have up now on the off-chance someone is willing to run the test on their development environment.

If not I'll continue to work on getting an Android environment working with hwasan -- I'm sure it will just take time -- the only difficulty is inexperience with it.

Cheers ;-)

I've checked this on Android, the test passes.
Indeed, you'd need Android 10, or an AOSP master branch build from 2019 to run this.

Compiler-rt tests can be difficult to setup on Android. LLVM has an unofficial GN build system where this works out of the box, check llvm/utils/gn/README.rst if you feel like it; setting android_ndk_path and android_serial_for_testing should be enough to get the tests running, but you would also need to update the build file under llvm/utils/gn/secondary to add the new source file to a list.

compiler-rt/lib/hwasan/hwasan_interceptors.cpp
243

To make sure that we do not overflow the platform's jmp_buf, could you move this definition to ex. hwasan.h, then add a compile-time check that its sizeof is <= than the system's? See CHECK_TYPE_SIZE macro in sanitizer_platform_limits_posix.h.

We don't allow system includes in this file because then the signatures of the interceptor wrappers would need to match system headers exactly, and that's not always possible across multiple libc versions.

258

We use CamelCase w/o underscores for internal functions.

compiler-rt/lib/hwasan/hwasan_setjmp.S
99

NO_EXEC_STACK_DIRECTIVE should go below #ifdef - otherwise you might get an empty source file that suppressed noexecstack.

Clang assembler defaults to non-executable stack when targeting Android, but other platforms may not be so lucky.

mmalcomson edited the summary of this revision. (Show Details)

Have addressed raised issues.

Still not managed to test on Android myself (but getting closer).

This revision is now accepted and ready to land.Oct 29 2019, 11:10 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Oct 30 2019, 8:22 AM
thakis added inline comments.
compiler-rt/lib/hwasan/CMakeLists.txt
19

Looks like this list was sorted alphabetically so far. Want to maintain that?

mmalcomson added inline comments.Oct 31 2019, 3:07 AM
compiler-rt/lib/hwasan/CMakeLists.txt
19

Ah! Yes, I would like to maintain that.
I'm not familiar with LLVM contribution guidelines, could someone do this without needing to create a new review?
(for reference, I don't have commit rights)

eugenis added inline comments.Oct 31 2019, 10:57 AM
compiler-rt/lib/hwasan/CMakeLists.txt
19

No problem, I've pushed a fix.

According to https://github.com/google/sanitizers/issues/1244, there is a non-interceptable _setjmp in __libc_start_main that is later jumped to in pthread_exit.
It seems to break this approach.
Any idea what to do? Detect that jmpbuf is not a hwasan jmpbuf and bail out? This is happening at the very end of a thread's life, so hopefully it should not matter that the stack in not untagged.

According to https://github.com/google/sanitizers/issues/1244, there is a non-interceptable _setjmp in __libc_start_main that is later jumped to in pthread_exit.
It seems to break this approach.
Any idea what to do? Detect that jmpbuf is not a hwasan jmpbuf and bail out? This is happening at the very end of a thread's life, so hopefully it should not matter that the stack in not untagged.

Well that's unfortunate ...
I'd thought I'd checked that any non-interceptable setjmps or longjmps would come in pairs so that a buffer set by glibc would be read by glibc.

(FWIW this particular setjmp can be intercepted in glibc 2.31 -- so hopefully as time goes on this will be less of a problem).

I think your idea of just avoiding buffers that aren't from HWASAN makes sense -- especially since the interceptor mode is not the main focus.

Just to set expectations: I expect I'll only find time to fix this next month (pretty busy in the near future).

jdanek added a subscriber: jdanek.May 13 2020, 1:11 PM

(FWIW this particular setjmp can be intercepted in glibc 2.31 -- so hopefully as time goes on this will be less of a problem).

I am the reporter of the linked GitHub issue. Are you saying that if I just built with glibc 2.31, without any change to clang, then I would not see this instance of the setjmp error?

I believe I am on 2.31 already:

bash
$ ldd c/examples/send-ssl 
        linux-vdso.so.1 (0x0000ffff8dd01000)
        libclang_rt.hwasan-aarch64.so => /home/ubuntu/llvm/install/lib/clang/11.0.0/lib/linux/libclang_rt.hwasan-aarch64.so (0x0000ffff8d2e0000)
        libqpid-proton-proactor.so.1 => /home/ubuntu/repos/qpid-proton-0.31.0/clang/c/libqpid-proton-proactor.so.1 (0x0000ffff8d200000)
        libqpid-proton-core.so.10 => /home/ubuntu/repos/qpid-proton-0.31.0/clang/c/libqpid-proton-core.so.10 (0x0000ffff8ce4e000)
        libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffff8ce1e000)
        libssl.so.1.1 => /lib/aarch64-linux-gnu/libssl.so.1.1 (0x0000ffff8cd84000)
        libcrypto.so.1.1 => /lib/aarch64-linux-gnu/libcrypto.so.1.1 (0x0000ffff8caf7000)
        libsasl2.so.2 => /lib/aarch64-linux-gnu/libsasl2.so.2 (0x0000ffff8cacc000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff8c95b000)
        libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffff8c779000)
        libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffff8c754000)
        libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000ffff8c740000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffff8dcd1000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffff8c695000)
ubuntu@ubuntu:~/repos/qpid-proton-0.31.0/clang$ /lib/aarch64-linux-gnu/libc.so.6 --version
GNU C Library (Ubuntu GLIBC 2.31-0ubuntu9) stable release version 2.31.
Copyright (C) 2020 Free Software Foundation, Inc.

(FWIW this particular setjmp can be intercepted in glibc 2.31 -- so hopefully as time goes on this will be less of a problem).

I am the reporter of the linked GitHub issue. Are you saying that if I just built with glibc 2.31, without any change to clang, then I would not see this instance of the setjmp error?

I believe I am on 2.31 already:

Oh that's interesting -- thanks for pointing it out.

Could you just double check objdump your libc -- if it still has a non-ptl symbol then it might be that there was something strange in my testing environment, and that in the wild this is usually non-interceptable.

Interestingly, the reason I added the __libc_longjmp interceptor (lines 282-287 in hwasan_interceptors.cpp in the patch above) was because I was seeing a jump buffer mismatch at pthread_exit in the exact opposite way of what you're seeing now (setjmp in libc_start_main was intercepted and longjmp in unwind_stop was not).
If you have the time, could you remove that interceptor, rebuild, and re-test in your environment? The output from that should give some useful information.

When I look into this I'll have to understand when glibc makes that symbol internal, and why my testing environment ended up with a dynamic one.

I still expect the best option would be to avoid non-hwasan jump buffers, as eugenis suggested, but it might be better to have avoid intercepting that symbol if that's what works without a fallback.

I believe I am on 2.31 already:

Could you just double check objdump your libc -- if it still has a non-ptl symbol then it might be that there was something strange in my testing environment, and that in the wild this is usually non-interceptable.

I don't actually know how to check for plt symbols. Let me just try few things, maybe I got lucky

$ readelf -Wa /lib/aarch64-linux-gnu/libc-2.31.so | grep setjmp
   653: 0000000000036ce0     8 FUNC    GLOBAL DEFAULT   13 setjmp@@GLIBC_2.17
  1031: 0000000000036cf0     8 FUNC    GLOBAL DEFAULT   13 _setjmp@@GLIBC_2.17
  1083: 0000000000036d00    88 FUNC    GLOBAL DEFAULT   13 __sigsetjmp@@GLIBC_2.17
    Name: setjmp
$ readelf -Wa /lib/aarch64-linux-gnu/libc-2.31.so | grep longjmp
   245: 00000000000e2c28    68 FUNC    GLOBAL DEFAULT   13 __longjmp_chk@@GLIBC_2.17
   314: 0000000000036db0    68 FUNC    WEAK   DEFAULT   13 _longjmp@@GLIBC_2.17
  1020: 0000000000036db0    68 FUNC    WEAK   DEFAULT   13 longjmp@@GLIBC_2.17
  1220: 0000000000036db0    68 FUNC    WEAK   DEFAULT   13 siglongjmp@@GLIBC_2.17
  1670: 0000000000036db0    68 FUNC    GLOBAL DEFAULT   13 __libc_siglongjmp@@GLIBC_PRIVATE
  2155: 0000000000036db0    68 FUNC    GLOBAL DEFAULT   13 __libc_longjmp@@GLIBC_PRIVATE
    Name: longjmp
    Name: longjmp_target

and, with more context before 24050 at https://gist.github.com/jiridanek/c1171bdcd59e6ce499255176e8211e8c#file-objdump-d-grep

$ objdump -d /lib/aarch64-linux-gnu/libc-2.31.so | grep __libc_start_main -A50 | grep setjmp
   24050:       94004b28        bl      36cf0 <_setjmp@@GLIBC_2.17>

and objdump -d -s -j .plt -j .got.plt /lib/aarch64-linux-gnu/libc-2.31.so at https://gist.github.com/jiridanek/c1171bdcd59e6ce499255176e8211e8c#file-objdump-d-s-j-plt-j-got-plt

If you have the time, could you remove that interceptor, rebuild, and re-test in your environment? The output from that should give some useful information.

I got essentially the same results as I quoted in https://github.com/google/sanitizers/issues/1245. Some tag-mismatches on mostly global variables (which I believe to be false positive reports) and the sigsegv. I did not get the assertion error for GetTagFromPointer(dst) == 0 any more.

Let me know when you get the time to look into this. I'll find myself better hardware to build clang more quickly, and would gladly help with testing. We could possibly pick some test project, a Python and Ruby module written in C, which does not have other compile dependencies, and is simple to compile.