[9/11] patch series to port ASAN for riscv64
Depends On D87579
Differential D87580
[RISCV][ASAN] support code for architecture-specific parts of asan EccoTheDolphin on Sep 12 2020, 6:17 PM. Authored by
Details
Diff Detail
Event Timeline
Comment Actions @luismarques, this commit diff has a mark that a change was requested... I believe that we've addressed known issues. If you have no further objections or comments could you remove the mark? Comment Actions For versions 2.29 -- 2.31 you have val = 1172. Is that actually correct? I thought you had earlier obtained a value of 1772 for 2.29, so maybe that's a typo? Comment Actions The complex ThreadDescriptorSize is only needed by LeakSanitizer for allocations only referenced by thread-specific data keys. Ultimately a lot of complexity of GetTls/ThreadDescriptorSize is because we don't have glibc APIs to get static/dynamic TLS boundaries. Comment Actions It seems that Asan cannot work correctly for RISC-V now. <empty stack> Comment Actions From your description, I assume you're running user-mode QEMU. The ASAN address mappings are system-specific. Running the user-mode QEMU is going to call your host machine's kernel, and the address mappings aren't going to be compatible. So I guess that behavior is to be expected. You should instead run a full riscv64-linux-gnu system under qemu-system-riscv64 to use ASAN with RISC-V. Comment Actions Can the problem that ASAN is unable to run in user-mode QEMU be avoided by doing something in QEMU? Maybe we can add some options in QEMU for system-specific address mappings? <empty stack> Is it stil because I did not run system-mode QEMU? Comment Actions I'm still wondering what causes the address mappings in ASAN incompatible with QEMU user-mode, since all the mappings are based on virtual memory. Comment Actions The mappings are defined such that the address range where the kernel might place the main executable image do not overlap with the shadow region. This can be different in qemu. Comment Actions Did you mean the kernel can guarantee that the address range of the executable image does not overlap with the asan shadow region, but the qemu user-mode cannot? As far as I know, asan is not kasan, and the kernel does not need to do a lot of things for asan. How can the kernel guarantee addresses do not overlap? Comment Actions There is a range of addresses where the kernel may place the main executable. It has been extended once in the past (ASLR). ASan mapping is a compile-time constant, so it must work with any location within that range. I suspect that for QEMU this range may be different. Anyway, some architectures use dynamic shadow allocation at runtime. IMHO this is a much better solution, even if it costs some CPU cycles (5% or whatever). Comment Actions I basically know what you meant by saying 'some architectures use dynamic shadow allocation at runtime', but that is not what Asan wants. Asan's advantage is to use compile-time instrumentation with a direct address mapping to shadow memory. I'm still wondering how AArch64 can just be run on user-mode Qemu with Asan compile-time instrumentation. Can we also do anything for RISCV? Comment Actions We've had enough problems with fixed mapping that I prefer dynamic shadow address for anything going forward. Also, fixed mapping address becomes an ABI constant, which causes issues with prebuilt libraries.
Maybe the chosen shadow address happens to be compatible with qemu? You'll need to study the possible memory layouts to answer this. Comment Actions It seems that even if I run on hardware, ASAN cannot work for riscv64. It says ==265==ERROR: AddressSanitizer: SEGV on unknown address 0x00081ffd1560 (pc 0x0000000108a8 bp 0x003fffe8ab70 sp 0x003fffe8aaf0 T0) Comment Actions The current (and original) ASan RISC-V implementation assumes sv39. Linux did not support sv48 when the RISC-V ASan port was merged. Check if you are running with sv48 (cat /proc/cpu). If so, that is to be expected. Of course, adding sv48 support is welcome. Comment Actions I have checked the satp register to convince myself that I was just running with sv39. I also spent a few hours debugging the up-to-date sanitizer code, but got nothing. Has anyone encounted with this error before? Comment Actions Thanks for that work. I think I will be able to look into this around the end of this week or the start of the next one. Comment Actions I've checked that indeed things are broken in main but they weren't broken with an older commit. I'll bisect and investigate this. My current RV64 setup is slow so the bisection will take a while. This comment was removed by joshua-arch1. Comment Actions How is this work going? If you find the commit that led to this failure, maybe I can also help fix it. Comment Actions I was going to reply earlier but then I ran into reproducibility issues. Long story short, I had reproduced the failure but during the bisection process it stopped reproducing. I'll provide more details when I can. Comment Actions What specific failure are you encounted with currently? For me, I got the deadly signal error (SEGV on unknown address) for all the cases. Is this issue just for riscv or for all the targets? I'm trying to modify the logic between deadly signal error and generic error like heap-use-after-free. I'm not sure whether it is feasible. Comment Actions It varies. A few are SEGV on unknown address, others are failed CHECKs, others are the cases where the sanitizer was supposed to trip but it trips in a different way than the test expected, etc. The ASan tests seem to fail when I do a check-all but (so far) not when I do check-asan, which explains why running just the ASan tests to bisect the issue was a dead end. Running the sanitizer-common tests (check-sanitizer) seems to reliably reproduce errors, which helps. I haven't yet delved into the errors themselves, I was just running these experiments as a background task. I'll check some possible sources of non-determinism (e.g. ASLR) and then try to actually delve into the failures. If you can share more about your test setup that might help. This comment was removed by joshua-arch1. This comment was removed by joshua-arch1. Comment Actions I have been delving into ASLR these days and it will indeed cause the asan shadow memory range to interleave with an existing memory mapping. But I'm still wondering why this conflict only occurs under user-mode qemu rather than system-mode qemu or hardware. Comment Actions I now understand what the kernel does with the address mapping, but why user-mode qemu is different? What's the difference when qemu is doing an address mapping? Comment Actions I ran the LLVM sanitizer tests under a variety of configurations, in both qemu-system-riscv64 (Fedora) and a SiFive Unmatched (Ubuntu) PC. Comment Actions Are there any rv32 asan support currently? I know rv32 linux abi is not working, but I don't think it will influence the use of rv32 asan. Comment Actions Hi @joshua-arch1 , no there is no support for rv32. At the time these patches were created rv32 was not even evaluated/considered. I can't say for sure if there are any issues with introducing such support. Comment Actions So do you have any plans to support rv32 asan? I suppose porting can be done independant of rv32 linux. |
Nit: RISC-V.