This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][Asan] Use dynamic shadow offset to make it work on different width of virtual-memory system.
Needs RevisionPublic

Authored by kito-cheng on Dec 12 2022, 3:03 AM.

Details

Summary

The original asan port was support Sv39 only, however Sv48 support has
landed to linux kernel[1] for a while, so we are trying to make it support
both Sv39 and Sv48, the key point is we need to use a dynamic offset for the
shadow region, this bring extra runtime overhead, but compare to other
overhead in ASan, this should be acceptable.

[1] https://www.phoronix.com/news/Linux-5.17-RISC-V-sv48

Diff Detail

Event Timeline

kito-cheng created this revision.Dec 12 2022, 3:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 3:03 AM
kito-cheng requested review of this revision.Dec 12 2022, 3:03 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 12 2022, 3:03 AM
Herald added subscribers: llvm-commits, Restricted Project, pcwang-thead, eopXD. · View Herald Transcript

Changes:

Fix marco condition.

craig.topper edited the summary of this revision. (Show Details)Dec 12 2022, 1:32 PM
shabnam4b added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_platform.h
310

Just out of curiosity, why can't we modify the code for SV57 here which I assume works for Sv48 and SV39? Maybe there is no way to test it?
I ran this patch on Qemu as well as Docker (Qemu user) and it worked.
The only thing on testing with docker was when the code does not have a bug, it prints the output but it generates a log for a fatal error:

root@74c2d5eadeed:/llvm-project/build# ./nobug
string is: Hello world!
==76543==LeakSanitizer has encountered a fatal error.
==76543==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==76543==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
shabnam4b added a comment.EditedJan 7 2023, 3:10 PM

Folks, please let me know in case I can do anything to make this patch merged as soon as possible. I can run tests with Qemu, docker and an unmatched board.

This revision is now accepted and ready to land.Jan 9 2023, 3:09 PM
MaskRay added a subscriber: MaskRay.Jan 9 2023, 3:48 PM

The original asan port was support Sv39 only, however Sv48 support has landed to linux kernel[1] for a while, so we are trying to make it support both Sv39 and Sv48,

Supporting too many address space bits can introduce significant hidden maintenance burden and performance implication, e.g. using SizeClassAllocator32 for LeakSanitizer (standalone or asan/hwasan integrated) is extremely slow.
See D137666 that AArch64 msan dropped <48-bit VMA. A line needs to be drawn.

The original asan port was support Sv39 only, however Sv48 support has landed to linux kernel[1] for a while, so we are trying to make it support both Sv39 and Sv48,

Supporting too many address space bits can introduce significant hidden maintenance burden and performance implication, e.g. using SizeClassAllocator32 for LeakSanitizer (standalone or asan/hwasan integrated) is extremely slow.
See D137666 that AArch64 msan dropped <48-bit VMA. A line needs to be drawn.

+1
Also overhead of dynamic shadow is quite significant, it dependents on hardware. So it would be nice if you can just drop smaller address spaces.

Supporting too many address space bits can introduce significant hidden maintenance burden and performance implication, e.g. using SizeClassAllocator32 for LeakSanitizer (standalone or asan/hwasan integrated) is extremely slow.
See D137666 that AArch64 msan dropped <48-bit VMA. A line needs to be drawn.

As I know there is still many RISC-V core are implement with Sv39 for now, we might able to drop support in future but it's not good timing for now I think.


Will defer few more days to commit for continue the discussion.

compiler-rt/lib/sanitizer_common/sanitizer_platform.h
310

Just out of curiosity, why can't we modify the code for SV57 here which I assume works for Sv48 and SV39? Maybe there is no way to test it?

In theory: yes, we can modify it to Sv57 and should work with Sv48 and Sv39, but we have no way to test due to the RISC-V linux didn't support that - and we don't have HW to test that too.

The only thing on testing with docker was when the code does not have a bug, it prints the output but it generates a log for a fatal error:

The message seems cause by docker.

Supporting too many address space bits can introduce significant hidden maintenance burden and performance implication, e.g. using SizeClassAllocator32 for LeakSanitizer (standalone or asan/hwasan integrated) is extremely slow.
See D137666 that AArch64 msan dropped <48-bit VMA. A line needs to be drawn.

As I know there is still many RISC-V core are implement with Sv39 for now, we might able to drop support in future but it's not good timing for now I think.


Will defer few more days to commit for continue the discussion.

Ok, thanks for checking! Could you please merge this PR if there is no issue left?

With the current (unpatched) LLVM tree on an RV64 SV39 host I get this ASAN test failure (for check-asan):

  • AddressSanitizer-riscv64-linux :: TestCases/memcmp_test.cpp

With this patch on an RV64 SV48 host I get these test failures:

  • AddressSanitizer-riscv64-linux :: TestCases/Linux/syscalls.cpp
  • AddressSanitizer-riscv64-linux :: TestCases/Posix/bcmp_test.cpp
  • AddressSanitizer-riscv64-linux :: TestCases/strncpy-overflow.cpp

I'll need to double-check if those results are actually correct but...
@kito-cheng Does that match your test results? If not, what do you observe? If it matches, have you investigated the new test failures?

compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
1119

The "default is 47-bit for RISCV64" is confusing, after listing "32, 39, 48 and 57". By that do you mean that Linux defaults to SV48? (if so fix 47->48). Or that the sanitizer defaults to a 1 << 47 userspace address range? Or both, or something else?

With the current (unpatched) LLVM tree on an RV64 SV39 host I get this ASAN test failure (for check-asan):

  • AddressSanitizer-riscv64-linux :: TestCases/memcmp_test.cpp

With this patch on an RV64 SV48 host I get these test failures:

  • AddressSanitizer-riscv64-linux :: TestCases/Linux/syscalls.cpp
  • AddressSanitizer-riscv64-linux :: TestCases/Posix/bcmp_test.cpp
  • AddressSanitizer-riscv64-linux :: TestCases/strncpy-overflow.cpp

I'll need to double-check if those results are actually correct but...
@kito-cheng Does that match your test results? If not, what do you observe? If it matches, have you investigated the new test failures?

Just back from Chinese new year vocation :P let me check with my local test environment again, thanks for testing and reporting this.

compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
1119

That's my stupid copy-and-paste error from AArch64, and my intention is support multiple address space layout.

Sorry for not updating for a while, we/SiFive gonna to seeking a ultimate solution to work on each different Sv* system, my exception is we still need dynamic shadow sentinel which is what we did for this patch, but it not work well as @luismarques report, so I would like to defer this patch landing for now, and it become high priority work to SiFive, we've allocated dedicated engineer resources to improve the ASan including the ASan for scalable vector stuff and RVV intrinsic, the both work are targeting end of Q2.

hauhsu added a subscriber: hauhsu.EditedJun 14 2023, 10:36 PM

Hi all,
I just published 3 more reviews:
https://reviews.llvm.org/D152895
https://reviews.llvm.org/D152990
https://reviews.llvm.org/D152991


With the 3 new patches, the check-lsan passes in our sv39, sv48, sv57 QEMU system mode test environments.


The common check-asan failure cases in the 3 virtual memory modes are

AddressSanitizer-riscv64-linux :: TestCases/Linux/syscalls.cpp
AddressSanitizer-riscv64-linux :: TestCases/Posix/bcmp_test.cpp
AddressSanitizer-riscv64-linux :: TestCases/memcmp_test.cpp
AddressSanitizer-riscv64-linux :: TestCases/log-path_test.cpp

The first 3 failures are because the CHECK statements check specific function names in call stack, like

// CHECK: AddressSanitizer: stack-buffer-overflow  (v)
// CHECK: {{#1.*bcmp}}    <= frame1                            
// CHECK: {{#2.*main}}     <= frame2

But the function QuickCheckForUnpoisonedRegion() is not inlined, so the call stack looks a little bit different:

#0 0x2aac31a59c in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) .../llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:804:7
#1 0x2aac31a9aa in QuickCheckForUnpoisonedRegion .../llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:850:10
#2 0x2aac31a9aa in bcmp .../llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:817:7
#3 0x2aac3bbca8 in main .../llvm-project/compiler-rt/test/asan/TestCases/Posix/bcmp_test.cpp:13:13
#4 0x3f886e36c0  (/lib/libc.so.6+0x266c0)
#5 0x3f886e376c in __libc_start_main (/lib/libc.so.6+0x2676c)
#6 0x2aac306d52 in _start (.../llvm-project/build/test/asan/RISCV64LinuxConfig/TestCases/Posix/Output/bcmp_test.cpp.tmp+0x26d52)

https://reviews.llvm.org/D152991 looses the check statement to make the tests pass.

For AddressSanitizer-riscv64-linux :: TestCases/log-path_test.cpp failure, one of the test tries to set the log path to /INVALID and expects error.
But since in our QEMU test environment uses root, the path is actually valid.


There are 3 additional check-asan failures for sv57:

AddressSanitizer-riscv64-linux :: TestCases/Linux/allocator_oom_test.cpp
AddressSanitizer-riscv64-linux :: TestCases/Linux/auto_memory_profile_test.cpp
AddressSanitizer-riscv64-linux :: TestCases/Linux/quarantine_size_mb.cpp

We are still investigating them. But since the tests seems not so critical and sv57 is rather little users, we decide to propose the reviews for now.

evandro removed a subscriber: evandro.Jun 20 2023, 1:12 PM
vitalybuka requested changes to this revision.Jul 27 2023, 5:42 PM

What is the status of this patch?

This revision now requires changes to proceed.Jul 27 2023, 5:42 PM

@vitalybuka this patch is waiting D152895 for make sure LSan is work fine on both Sv39 and Sv48, and D152990 could make it work on Sv57.

This patch has tested ASan on Sv39 and Sv48 by @hauhsu, and all failures found by @luismarques has been addressed by D152991 + D152895.

If I'm reading this right, anything built with ASAN will fail on sv48 or sv57? We're defaulting the mmap() hints to sv48 because we didn't have any failing workloads there (Hotspot and Go fail sv57), but if this fails we should probably default to sv37. Is there a reproducer bouncing around we can put in kselftests?

If I'm reading this right, anything built with ASAN will fail on sv48 or sv57? We're defaulting the mmap() hints to sv48 because we didn't have any failing workloads there (Hotspot and Go fail sv57), but if this fails we should probably default to sv37. Is there a reproducer bouncing around we can put in kselftests?

I guess you might confused by SANITIZER_MMAP_RANGE_SIZE, it a parameter for internal memory allocator, set for larger Sv is always works on smaller Sv but less memory efficiency.

And the original reason fail on the Sv48/Sv57 is because most setting is Sv sensitive (kRISCV64_ShadowOffset64), and we try make it into Sv insensitive by trading few more runtime performance.