This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][ASAN] support code for architecture-specific parts of asan
ClosedPublic

Authored by EccoTheDolphin on Sep 12 2020, 6:17 PM.

Details

Summary

[9/11] patch series to port ASAN for riscv64

Depends On D87579

Diff Detail

Event Timeline

EccoTheDolphin created this revision.Sep 12 2020, 6:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2020, 6:17 PM
Herald added subscribers: Restricted Project, evandro, luismarques and 11 others. · View Herald Transcript
EccoTheDolphin requested review of this revision.Sep 12 2020, 6:17 PM
luismarques requested changes to this revision.Sep 14 2020, 6:15 AM
luismarques added inline comments.
compiler-rt/lib/asan/asan_mapping.h
82

Nit: RISC-V.

86

Nit: missing period before "High"?

compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
310–312

The comment and the value were copied from AArch64, but they are incorrect for RISC-V.
The size of struct pthread that I see when I build glibc on riscv64 Linux is 1936.

compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
74–76

Since you copied the size of struct pthread from AArch64 and it was incorrect, I suspect that here the same might have happened... :-)

This revision now requires changes to proceed.Sep 14 2020, 6:15 AM
eugenis added inline comments.
compiler-rt/lib/asan/asan_mapping.h
227

missing indent

compiler-rt/lib/crt/crtbegin.c
55 ↗(On Diff #291435)

crtbegin change is unrelated, please move it to a different revision

compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
716

This is a bit too long, please define SANITIZER_RISCV64 (or some other name) in sanitizer_platform.h, similar to SANITIZER_X32.

compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
74–76

There is a compile-time test for this in sanitizer_platform_limits_linux.cpp

compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
716

I assume that this is not just about this file?
If yes, then can we address this as part of this commit without modifications of the diffs before (less than) D87580

compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
310–312

well, actually when I build glibc for RISCV the size I get is 1792 (so the value we use is off by 4 bytes for whatever reason.

I don't remember how exactly this value was calculated originally (but I definitely believe that I've checked it somehow). Moreover, I don't know/remember an effective way to do that :).

This time I did is as follows (I modified nptl sources and re-compiled):

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 670cb8ffe6..6d50fa76f1 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -169,6 +169,9 @@ stack_list_add (list_t *elem, list_t *list)
 static struct pthread *
 get_cached_stack (size_t *sizep, void **memp)
 {
+  int array[sizeof (struct pthread)];
+  __builtin_printf("%d", &array);
+
   size_t size = *sizep;
   struct pthread *result = NULL;
   list_t *entry;

GCC gives me the following error:

warning: format '%d' expects argument of type 'int', but argument 2 has type 'int (*)[1792]'

Hacky? Yes!. But we have the result ).

This is for glibc-2.29.

The target compiler (riscv64-unknown-linux-gnu, 9.2.0) uses the following (default) options:

 '-std=gnu11' '-fgnu89-inline' '-g' '-O2' '-Wall' '-Wwrite-strings' '-Wundef' '-fmerge-all-constants' '-frounding-math' '-fno-stack-protector' '-Wstrict-prototypes' '-Wold-style-definition' '-fmath-errno' '-ftls-model=initial-exec' \
'-march=rv64gc' '-mabi=lp64d
compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
74–76

Apparently, these definitions were merged in https://reviews.llvm.org/D66870. (git show 977205b595cd65fbeb4a045daf34990d8e8a5efd, by Sam Elliott <selliott@lowrisc.org>)

The value for struct_kernel_stat_sz is the correct one (I've double checked on qemu). I have doubts about struct_kernel_stat64_sz though since it looks like that this struct is not used on rv64. Linux kernel have the following header:

// from arch/riscv/include/generated/uapi/asm/stat.h
#include <asm-generic/stat.h>

// include/uapi/asm-generic/stat.h
/* This matches struct stat64 in glibc2.1. Only used for 32 bit. */
#if BITS_PER_LONG != 64 || defined(ARCH_WANT_STAT64)
struct stat64 {
...
#endif

So given that this is for rv64 only, probably it would be more correct to define it as 0?

74–76

This test passes, though it does not perform checks for stat64, we have the following there:

#if defined(i386)
COMPILER_CHECK(struct_kernel_stat64_sz == sizeof(struct stat64));
#endif

factored-out crt-related change

corrected indentation to comply with existing code style

EccoTheDolphin edited the summary of this revision. (Show Details)Sep 20 2020, 11:39 PM
EccoTheDolphin marked 5 inline comments as done.
EccoTheDolphin added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
716
EccoTheDolphin marked 2 inline comments as done.Sep 20 2020, 11:42 PM
eugenis added inline comments.Sep 21 2020, 1:02 PM
compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
74–76

OK, this makes sense.

eugenis added inline comments.Sep 21 2020, 1:07 PM
compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
716

I'd prefer if the patches were changed to use the new macro from the start. It would produce cleaner git history.

addressing comments

EccoTheDolphin edited the summary of this revision. (Show Details)Sep 22 2020, 11:41 AM
EccoTheDolphin marked an inline comment as done.Sep 22 2020, 11:43 AM
EccoTheDolphin added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
716

modified the order of patches. now the patch series starts with https://reviews.llvm.org/D87997, gradually adopting the new define.

eugenis added inline comments.Sep 22 2020, 1:03 PM
compiler-rt/lib/sanitizer_common/sanitizer_common_syscalls.inc
2300

remove "defined". SANITIZER_* macros are not defined/undefined, they are 1/0.

EccoTheDolphin marked an inline comment as done.Sep 22 2020, 1:33 PM
EccoTheDolphin added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_common_syscalls.inc
2300

remove "defined". SANITIZER_* macros are not defined/undefined, they are 1/0.

Yes. You are correct. Quite an embarrassing error. Will fix.

Actually, one way to fix it would be to avoid defining zero value. However, this will violate the current coding conventions, so I guess it's better to update the patchset once again.

eugenis added inline comments.Sep 22 2020, 1:36 PM
compiler-rt/lib/sanitizer_common/sanitizer_common_syscalls.inc
2300

Right. This way is shorter and the compiler will complain if you forget to include the header, instead of assuming that the property is false (when the macro is undefined).

addressing review comments

EccoTheDolphin marked 2 inline comments as done.Sep 22 2020, 8:26 PM
luismarques added inline comments.Sep 23 2020, 6:26 AM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
310–312

well, actually when I build glibc for RISCV the size I get is 1792 (so the value we use is off by 4 bytes for whatever reason.

What do you mean? You have 1772 in the patch, and you got 1792, I'm confused by the "off by 4 bytes" comment.

warning: format '%d' expects argument of type 'int', but argument 2 has type 'int (*)[1792]'

Hacky? Yes!. But we have the result ).

This is for glibc-2.29.

Ha! Hey, whatever works, right? I had used a static assert, but it was with a more recent version of glibc.

EccoTheDolphin added inline comments.Sep 23 2020, 7:05 AM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
310–312

What do you mean? You have 1772 in the patch, and you got 1792, I'm confused by the "off by 4 bytes" comment.

The original version of the patch had 1776.

Ha! Hey, whatever works, right? I had used a static assert, but it was with a more recent version of glibc.

By a more recent version of gcc - you mean 2.30+?
If this is the case - do you have any recommendations how we could handle that? Should I add some kind of #ifdef?

rebasing + addressing formatting issues

luismarques added inline comments.Sep 24 2020, 4:31 AM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
310–312

By a more recent version of gcc - you mean 2.30+?

The earliest version that I can compile in my RISC-V machine (with GCC 10.2.1) is 2.31, which has the same size as you reported (1792). 2.32 has a size of 1936.

If this is the case - do you have any recommendations how we could handle that? Should I add some kind of #ifdef?

You can probably do something similar to the code above: call GetLibcVersion and return a value that depends on the obtained version.

EccoTheDolphin added inline comments.Sep 24 2020, 4:36 AM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
310–312

By a more recent version of gcc - you mean 2.30+?

The earliest version that I can compile in my RISC-V machine (with GCC 10.2.1) is 2.31, which has the same size as you reported (1792). 2.32 has a size of 1936.

Wow! Thanks a lot!

You can probably do something similar to the code above: call GetLibcVersion and return a value that depends on the obtained version.

Will do, thanks.

addressed review comments

EccoTheDolphin marked an inline comment as done.Sep 27 2020, 3:44 PM
EccoTheDolphin added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
310–312

You can probably do something similar to the code above: call GetLibcVersion and return a value that depends on the obtained version.

done. I've double-checked the value for 2.32 - it is 1936 for me too. The versions below 2.29 are marked as untested.

EccoTheDolphin marked an inline comment as done.

RISCV64 does not use stat64, so the corresponding size constant is set to "0"

EccoTheDolphin marked an inline comment as done.Sep 27 2020, 3:50 PM

@luismarques, this commit diff has a mark that a change was requested... I believe that we've addressed known issues. If you have no further objections or comments could you remove the mark?

luismarques accepted this revision.Oct 1 2020, 3:44 PM

For versions 2.29 -- 2.31 you have val = 1172. Is that actually correct? I thought you had earlier obtained a value of 1772 for 2.29, so maybe that's a typo?
Other than that issue, LGTM. Just be sure to verify/adjust that.

This revision is now accepted and ready to land.Oct 1 2020, 3:44 PM

For versions 2.29 -- 2.31 you have val = 1172. Is that actually correct? I thought you had earlier obtained a value of 1772 for 2.29, so maybe that's a typo?
Other than that issue, LGTM. Just be sure to verify/adjust that.

yes, that's a typo :(. Will fix before committing. Thanks

rebasing + fixed typo

The complex ThreadDescriptorSize is only needed by LeakSanitizer for allocations only referenced by thread-specific data keys.
I'll check whether my D98926 sticks, if yes, try alternative solutions (e.g. intercept pthread_setspecific) to make thread-specific data keys less hacky.

Ultimately a lot of complexity of GetTls/ThreadDescriptorSize is because we don't have glibc APIs to get static/dynamic TLS boundaries.
If enough people complain loudly (https://groups.google.com/g/address-sanitizer/c/BfwYD8HMxTM/m/xmAY98RPBwAJ), chime in on https://sourceware.org/bugzilla/show_bug.cgi?id=16291 !
They may prioritize that feature request:)

lenary removed a subscriber: lenary.Mar 26 2021, 12:00 PM

It seems that Asan cannot work correctly for RISC-V now.
When I use '-fsanitize=address' to compile the program and then run on Qemu, I get
AddressSanitizer: CHECK failed: sanitizer_allocator_primary32.h:292 "((res)) < ((kNumPossibleRegions))" (0x40016, 0x40000) (tid=80622)

<empty stack>

It seems that Asan cannot work correctly for RISC-V now.
When I use '-fsanitize=address' to compile the program and then run on Qemu, I get
AddressSanitizer: CHECK failed: sanitizer_allocator_primary32.h:292 "((res)) < ((kNumPossibleRegions))" (0x40016, 0x40000) (tid=80622)

<empty stack>

From your description, I assume you're running user-mode QEMU. The ASAN address mappings are system-specific. Running the user-mode QEMU is going to call your host machine's kernel, and the address mappings aren't going to be compatible. So I guess that behavior is to be expected. You should instead run a full riscv64-linux-gnu system under qemu-system-riscv64 to use ASAN with RISC-V.

joshua-arch1 added a comment.EditedApr 25 2022, 8:03 PM

It seems that Asan cannot work correctly for RISC-V now.
When I use '-fsanitize=address' to compile the program and then run on Qemu, I get
AddressSanitizer: CHECK failed: sanitizer_allocator_primary32.h:292 "((res)) < ((kNumPossibleRegions))" (0x40016, 0x40000) (tid=80622)

<empty stack>

From your description, I assume you're running user-mode QEMU. The ASAN address mappings are system-specific. Running the user-mode QEMU is going to call your host machine's kernel, and the address mappings aren't going to be compatible. So I guess that behavior is to be expected. You should instead run a full riscv64-linux-gnu system under qemu-system-riscv64 to use ASAN with RISC-V.

Can the problem that ASAN is unable to run in user-mode QEMU be avoided by doing something in QEMU? Maybe we can add some options in QEMU for system-specific address mappings?
Also, if I remove the comparison between 'res' and 'kNumPossibleRegions' (CHECK_GE(res, kNumPossibleRegions)) in sanitizer_allocator_primary32.h, I will get another error:
AddressSanitizer: CHECK failed: sanitizer_allocator_dlsym.h:36 "((internal_allocator()->FromPrimary(ptr))) != (0)" (0x0, 0x0) (tid=100987)

<empty stack>

Is it stil because I did not run system-mode QEMU?

It seems that Asan cannot work correctly for RISC-V now.
When I use '-fsanitize=address' to compile the program and then run on Qemu, I get
AddressSanitizer: CHECK failed: sanitizer_allocator_primary32.h:292 "((res)) < ((kNumPossibleRegions))" (0x40016, 0x40000) (tid=80622)

<empty stack>

From your description, I assume you're running user-mode QEMU. The ASAN address mappings are system-specific. Running the user-mode QEMU is going to call your host machine's kernel, and the address mappings aren't going to be compatible. So I guess that behavior is to be expected. You should instead run a full riscv64-linux-gnu system under qemu-system-riscv64 to use ASAN with RISC-V.

I'm still wondering what causes the address mappings in ASAN incompatible with QEMU user-mode, since all the mappings are based on virtual memory.

The mappings are defined such that the address range where the kernel might place the main executable image do not overlap with the shadow region. This can be different in qemu.

The mappings are defined such that the address range where the kernel might place the main executable image do not overlap with the shadow region. This can be different in qemu.

Did you mean the kernel can guarantee that the address range of the executable image does not overlap with the asan shadow region, but the qemu user-mode cannot? As far as I know, asan is not kasan, and the kernel does not need to do a lot of things for asan. How can the kernel guarantee addresses do not overlap?

joshua-arch1 added a comment.EditedMay 8 2022, 11:11 PM

It seems that Asan cannot work correctly for RISC-V now.
When I use '-fsanitize=address' to compile the program and then run on Qemu, I get
AddressSanitizer: CHECK failed: sanitizer_allocator_primary32.h:292 "((res)) < ((kNumPossibleRegions))" (0x40016, 0x40000) (tid=80622)

<empty stack>

From your description, I assume you're running user-mode QEMU. The ASAN address mappings are system-specific. Running the user-mode QEMU is going to call your host machine's kernel, and the address mappings aren't going to be compatible. So I guess that behavior is to be expected. You should instead run a full riscv64-linux-gnu system under qemu-system-riscv64 to use ASAN with RISC-V.

It seems that ASAN can be used with AArch64 just under user-mode QEMU.

There is a range of addresses where the kernel may place the main executable. It has been extended once in the past (ASLR). ASan mapping is a compile-time constant, so it must work with any location within that range. I suspect that for QEMU this range may be different.

Anyway, some architectures use dynamic shadow allocation at runtime. IMHO this is a much better solution, even if it costs some CPU cycles (5% or whatever).

There is a range of addresses where the kernel may place the main executable. It has been extended once in the past (ASLR). ASan mapping is a compile-time constant, so it must work with any location within that range. I suspect that for QEMU this range may be different.

Anyway, some architectures use dynamic shadow allocation at runtime. IMHO this is a much better solution, even if it costs some CPU cycles (5% or whatever).

I basically know what you meant by saying 'some architectures use dynamic shadow allocation at runtime', but that is not what Asan wants. Asan's advantage is to use compile-time instrumentation with a direct address mapping to shadow memory. I'm still wondering how AArch64 can just be run on user-mode Qemu with Asan compile-time instrumentation. Can we also do anything for RISCV?

We've had enough problems with fixed mapping that I prefer dynamic shadow address for anything going forward. Also, fixed mapping address becomes an ABI constant, which causes issues with prebuilt libraries.

I'm still wondering how AArch64 can just be run on user-mode Qemu with Asan compile-time instrumentation.

Maybe the chosen shadow address happens to be compatible with qemu? You'll need to study the possible memory layouts to answer this.

joshua-arch1 added a comment.EditedMay 25 2022, 12:26 AM

It seems that even if I run on hardware, ASAN cannot work for riscv64. It says

==265==ERROR: AddressSanitizer: SEGV on unknown address 0x00081ffd1560 (pc 0x0000000108a8 bp 0x003fffe8ab70 sp 0x003fffe8aaf0 T0)
==265==The signal is caused by a READ memory access.

luismarques added a comment.EditedMay 25 2022, 10:14 AM

==265==ERROR: AddressSanitizer: SEGV on unknown address 0x00081ffd1560 (pc 0x0000000108a8 bp 0x003fffe8ab70 sp 0x003fffe8aaf0 T0)
==265==The signal is caused by a READ memory access.

The current (and original) ASan RISC-V implementation assumes sv39. Linux did not support sv48 when the RISC-V ASan port was merged. Check if you are running with sv48 (cat /proc/cpu). If so, that is to be expected. Of course, adding sv48 support is welcome.

joshua-arch1 added a comment.EditedMay 26 2022, 5:20 AM

==265==ERROR: AddressSanitizer: SEGV on unknown address 0x00081ffd1560 (pc 0x0000000108a8 bp 0x003fffe8ab70 sp 0x003fffe8aaf0 T0)
==265==The signal is caused by a READ memory access.

The current (and original) ASan RISC-V implementation assumes sv39. Linux did not support sv48 when the RISC-V ASan port was merged. Check if you are running with sv48 (cat /proc/cpu). If so, that is to be expected. Of course, adding sv48 support is welcome.

I have checked the satp register to convince myself that I was just running with sv39. I also spent a few hours debugging the up-to-date sanitizer code, but got nothing. Has anyone encounted with this error before?

I have checked the satp register to convince myself that I was just running with sv39. I also spent a few hours debugging the up-to-date sanitizer code, but got nothing. Has anyone encounted with this error before?

Thanks for that work. I think I will be able to look into this around the end of this week or the start of the next one.

I have checked the satp register to convince myself that I was just running with sv39. I also spent a few hours debugging the up-to-date sanitizer code, but got nothing. Has anyone encounted with this error before?

Thanks for that work. I think I will be able to look into this around the end of this week or the start of the next one.

I've checked that indeed things are broken in main but they weren't broken with an older commit. I'll bisect and investigate this. My current RV64 setup is slow so the bisection will take a while.

This comment was removed by joshua-arch1.

I have checked the satp register to convince myself that I was just running with sv39. I also spent a few hours debugging the up-to-date sanitizer code, but got nothing. Has anyone encounted with this error before?

Thanks for that work. I think I will be able to look into this around the end of this week or the start of the next one.

I've checked that indeed things are broken in main but they weren't broken with an older commit. I'll bisect and investigate this. My current RV64 setup is slow so the bisection will take a while.

How is this work going? If you find the commit that led to this failure, maybe I can also help fix it.

How is this work going? If you find the commit that led to this failure, maybe I can also help fix it.

I was going to reply earlier but then I ran into reproducibility issues. Long story short, I had reproduced the failure but during the bisection process it stopped reproducing. I'll provide more details when I can.

How is this work going? If you find the commit that led to this failure, maybe I can also help fix it.

I was going to reply earlier but then I ran into reproducibility issues. Long story short, I had reproduced the failure but during the bisection process it stopped reproducing. I'll provide more details when I can.

What specific failure are you encounted with currently? For me, I got the deadly signal error (SEGV on unknown address) for all the cases. Is this issue just for riscv or for all the targets? I'm trying to modify the logic between deadly signal error and generic error like heap-use-after-free. I'm not sure whether it is feasible.

What specific failure are you encounted with currently? For me, I got the deadly signal error (SEGV on unknown address) for all the cases. Is this issue just for riscv or for all the targets? I'm trying to modify the logic between deadly signal error and generic error like heap-use-after-free. I'm not sure whether it is feasible.

It varies. A few are SEGV on unknown address, others are failed CHECKs, others are the cases where the sanitizer was supposed to trip but it trips in a different way than the test expected, etc.

The ASan tests seem to fail when I do a check-all but (so far) not when I do check-asan, which explains why running just the ASan tests to bisect the issue was a dead end. Running the sanitizer-common tests (check-sanitizer) seems to reliably reproduce errors, which helps. I haven't yet delved into the errors themselves, I was just running these experiments as a background task. I'll check some possible sources of non-determinism (e.g. ASLR) and then try to actually delve into the failures. If you can share more about your test setup that might help.

This comment was removed by joshua-arch1.
This comment was removed by joshua-arch1.

There is a range of addresses where the kernel may place the main executable. It has been extended once in the past (ASLR). ASan mapping is a compile-time constant, so it must work with any location within that range. I suspect that for QEMU this range may be different.

Anyway, some architectures use dynamic shadow allocation at runtime. IMHO this is a much better solution, even if it costs some CPU cycles (5% or whatever).

I have been delving into ASLR these days and it will indeed cause the asan shadow memory range to interleave with an existing memory mapping. But I'm still wondering why this conflict only occurs under user-mode qemu rather than system-mode qemu or hardware.

There is a range of addresses where the kernel may place the main executable. It has been extended once in the past (ASLR). ASan mapping is a compile-time constant, so it must work with any location within that range. I suspect that for QEMU this range may be different.

Anyway, some architectures use dynamic shadow allocation at runtime. IMHO this is a much better solution, even if it costs some CPU cycles (5% or whatever).

I now understand what the kernel does with the address mapping, but why user-mode qemu is different? What's the difference when qemu is doing an address mapping?

Are you testing on user-QEMU or on hardware? Failed CHEKCS are caused by user-QEMU incompatibility. It will not affect the checker results. As far as I am concerneed, what we expect is to pass all the programs under gcc/testsuite/gcc.dg/asan.

I ran the LLVM sanitizer tests under a variety of configurations, in both qemu-system-riscv64 (Fedora) and a SiFive Unmatched (Ubuntu) PC.
While there were a few failures for the real RISC-V machine, it seems like the system-qemu configuration is where the problems really reproduce.
I also double-checked my previous regression investigation. To make a long story short, I just need to find the time to actually delve into the failing tests and see what the root cause of the issues is because those automated investigations (bisections, etc.) didn't really reveal anything particularly useful.

Are there any rv32 asan support currently? I know rv32 linux abi is not working, but I don't think it will influence the use of rv32 asan.

Are there any rv32 asan support currently? I know rv32 linux abi is not working, but I don't think it will influence the use of rv32 asan.

Hi @joshua-arch1 , no there is no support for rv32. At the time these patches were created rv32 was not even evaluated/considered. I can't say for sure if there are any issues with introducing such support.

Are there any rv32 asan support currently? I know rv32 linux abi is not working, but I don't think it will influence the use of rv32 asan.

Hi @joshua-arch1 , no there is no support for rv32. At the time these patches were created rv32 was not even evaluated/considered. I can't say for sure if there are any issues with introducing such support.

So do you have any plans to support rv32 asan? I suppose porting can be done independant of rv32 linux.

How can I get the size of struct pthread in sanitizer_linux_libcdep.cpp?