This is a segmentation fault in INTERCEPTOR function on a special edge
case of strstr libc call. When 'Haystack'(main string to be examined) is
NULL and 'needle'(sub-string to be searched in 'Haystack') is an empty
string then it hits a SEGV while using sanitizers and as a 'string not
found' case otherwise.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
90 ms | x64 debian > LLVM.Bindings/Go::go.test |
Event Timeline
Nice find!
Can you please extend llvm-project/compiler-rt/test/sanitizer_common/TestCases/strstr.c
compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
594 | Could you fix instead internal_strstr and StrstrCheck? |
I have added the fix to internal_strstr and extended the test case well. In libc's implementation of strstr, the empty needle string check is done at the very beginning and returns haystack if true. In accordance to that, I have added the fix in INTERCEPTOR for an early check rather than StrstrCheck where string length is being calculated. Fixing StrstrCheck would require changing the behavior of internal_strlen and that would not be consistent with the behavior of libc's implementation of strlen.
hm, according to this it's Undefined Behavior
Sanitizer should probably crash in this case, at least with strict_string_checks
https://en.cppreference.com/w/c/string/byte/strstr
If this version is OK, I will land it?
compiler-rt/test/sanitizer_common/TestCases/strstr.c | ||
---|---|---|
2 | why this one is here? |
Instead, why should the sanitizer not diagnose the use of undefined behavior?
It would be good for the sanitizer to emit a diagnostic about "undefined behavior" (probably a warning). Code that leads to undefined behavior is generally not portable and is a bad programing practice that should be avoided.
compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp | ||
---|---|---|
221–223 | Isn't checking for needle[0] efficient than calling a function? |
SEGV in this case should be straightforward
but if we check here but not in the rest of COMMON_INTERCEPTOR_READ_STRING usage, it will be inconsistent
compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp | ||
---|---|---|
221–223 | most likely case is != 0 and internal_strlen is going to be inlined anyway producing same code |
I verified the changes and it looks good to me. Please go ahead with the commit. Thanks.
Hi, we've noticed a breakage in Fuchsia's canaries for LLVM. This commit seems to be responsible, as it fails on the strstr.c tests or aarch64-linux.
It seems to affect all sanitizers in our build. Sorry this took so long to report, but other breakages hid this for most of the day.
If this is not easy to address, can you revert until a fix is ready?
Script: -- : 'RUN: at line 1'; /b/s/w/ir/x/w/staging/llvm_build/./bin/clang -gline-tables-only -fsanitize=address --unwindlib=libunwind -static-libgcc -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -funwind-tables --sysroot=/b/s/w/ir/x/w/cipd/linux -ldl /b/s/w/ir/x/w/llvm-project/compiler-rt/test/sanitizer_common/TestCases/strstr.c -o /b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/test/sanitizer_common/asan-aarch64-Linux/Output/strstr.c.tmp && /b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/test/sanitizer_common/asan-aarch64-Linux/Output/strstr.c.tmp 2>&1 -- Exit Code: 1 Command Output (stdout): -- AddressSanitizer:DEADLYSIGNAL ================================================================= ==30792==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0xffff9ad9c328 bp 0xfffff746b950 sp 0xfffff746b950 T0) ==30792==The signal is caused by a READ memory access. ==30792==Hint: address points to the zero page. libunwind: Unsupported .eh_frame_hdr version #0 0xffff9ad9c328 in strstr (/lib/aarch64-linux-gnu/libc.so.6+0x7c328) (BuildId: 53f40c1d2f3739ae017dcdcef1a17314786e3709) #1 0x26f8c8 in strstr ../staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc #2 0xffff9ad406dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x206dc) (BuildId: 53f40c1d2f3739ae017dcdcef1a17314786e3709) #3 0x25a554 in _start (/b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/test/sanitizer_common/asan-aarch64-Linux/Output/strstr.c.tmp+0x25a554) (BuildId: 5196c7df40b5e4c2) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/lib/aarch64-linux-gnu/libc.so.6+0x7c328) (BuildId: 53f40c1d2f3739ae017dcdcef1a17314786e3709) in strstr ==30792==ABORTING --
Hi, any updates on the fix so far? Is there anything I can do to get this patch committed?
Hi, I closed the issue on github, since we hadn't seen movement here or the issue show up at for a month. I was just reporting the breakage, and didn't investigate further after the revert. Maybe check with @vitalybuka if you need more support/information?
I think this should show up in any config when compiling w/ asan. So, AFICT just building for a Linux x64 target should reproduce the SEGV for you. The most important bit of Fuchsia's toolchain build is that we use a build a stage 2 compiler, so building clang, and using that just built clang + runtimes should give you something very close to our config if you're having trouble reproducing.
Could you fix instead internal_strstr and StrstrCheck?