This is an archive of the discontinued LLVM Phabricator instance.

SIGSEGV in Sanitizer INTERCEPTOR of strstr function.
AcceptedPublic

Authored by RitanyaB on Dec 17 2021, 12:06 AM.

Details

Summary

This is a segmentation fault in INTERCEPTOR function on a special edge
case of strstr libc call. When 'Haystack'(main string to be examined) is
NULL and 'needle'(sub-string to be searched in 'Haystack') is an empty
string then it hits a SEGV while using sanitizers and as a 'string not
found' case otherwise.

Diff Detail

Event Timeline

RitanyaB created this revision.Dec 17 2021, 12:06 AM
RitanyaB requested review of this revision.Dec 17 2021, 12:06 AM
Herald added a subscriber: Restricted Project. · View Herald TranscriptDec 17 2021, 12:06 AM
vitalybuka added a subscriber: vitalybuka.

Nice find!

Can you please extend llvm-project/compiler-rt/test/sanitizer_common/TestCases/strstr.c

compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
592

Could you fix instead internal_strstr and StrstrCheck?

RitanyaB updated this revision to Diff 395810.Dec 22 2021, 3:33 AM
RitanyaB edited the summary of this revision. (Show Details)

Nice find!

Can you please extend llvm-project/compiler-rt/test/sanitizer_common/TestCases/strstr.c

I have added the fix to internal_strstr and extended the test case well. In libc's implementation of strstr, the empty needle string check is done at the very beginning and returns haystack if true. In accordance to that, I have added the fix in INTERCEPTOR for an early check rather than StrstrCheck where string length is being calculated. Fixing StrstrCheck would require changing the behavior of internal_strlen and that would not be consistent with the behavior of libc's implementation of strlen.

vitalybuka edited the summary of this revision. (Show Details)

update

vitalybuka accepted this revision.Dec 22 2021, 4:12 PM
vitalybuka edited the summary of this revision. (Show Details)
vitalybuka edited the summary of this revision. (Show Details)

hm, according to this it's Undefined Behavior
Sanitizer should probably crash in this case, at least with strict_string_checks
https://en.cppreference.com/w/c/string/byte/strstr

If this version is OK, I will land it?

compiler-rt/test/sanitizer_common/TestCases/strstr.c
2

why this one is here?

This revision is now accepted and ready to land.Dec 22 2021, 4:13 PM

hm, according to this it's Undefined Behavior
Sanitizer should probably crash in this case, at least with strict_string_checks
https://en.cppreference.com/w/c/string/byte/strstr

Instead, why should the sanitizer not diagnose the use of undefined behavior?
It would be good for the sanitizer to emit a diagnostic about "undefined behavior" (probably a warning). Code that leads to undefined behavior is generally not portable and is a bad programing practice that should be avoided.

soumitra added inline comments.Dec 22 2021, 9:19 PM
compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp
225–227

Isn't checking for needle[0] efficient than calling a function?
It also matches the corresponding implementation in glibc.

hm, according to this it's Undefined Behavior
Sanitizer should probably crash in this case, at least with strict_string_checks
https://en.cppreference.com/w/c/string/byte/strstr

Instead, why should the sanitizer not diagnose the use of undefined behavior?
It would be good for the sanitizer to emit a diagnostic about "undefined behavior" (probably a warning). Code that leads to undefined behavior is generally not portable and is a bad programing practice that should be avoided.

SEGV in this case should be straightforward
but if we check here but not in the rest of COMMON_INTERCEPTOR_READ_STRING usage, it will be inconsistent

compiler-rt/lib/sanitizer_common/sanitizer_libc.cpp
225–227

most likely case is != 0 and internal_strlen is going to be inlined anyway producing same code

hm, according to this it's Undefined Behavior
Sanitizer should probably crash in this case, at least with strict_string_checks
https://en.cppreference.com/w/c/string/byte/strstr

If this version is OK, I will land it?

I verified the changes and it looks good to me. Please go ahead with the commit. Thanks.

This revision was landed with ongoing or failed builds.Jan 5 2022, 12:12 AM
This revision was automatically updated to reflect the committed changes.

Hi, we've noticed a breakage in Fuchsia's canaries for LLVM. This commit seems to be responsible, as it fails on the strstr.c tests or aarch64-linux.

It seems to affect all sanitizers in our build. Sorry this took so long to report, but other breakages hid this for most of the day.

If this is not easy to address, can you revert until a fix is ready?

Script:
--
: 'RUN: at line 1';      /b/s/w/ir/x/w/staging/llvm_build/./bin/clang  -gline-tables-only -fsanitize=address  --unwindlib=libunwind -static-libgcc -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta  -funwind-tables --sysroot=/b/s/w/ir/x/w/cipd/linux  -ldl /b/s/w/ir/x/w/llvm-project/compiler-rt/test/sanitizer_common/TestCases/strstr.c -o /b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/test/sanitizer_common/asan-aarch64-Linux/Output/strstr.c.tmp &&  /b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/test/sanitizer_common/asan-aarch64-Linux/Output/strstr.c.tmp 2>&1
--
Exit Code: 1

Command Output (stdout):
--
AddressSanitizer:DEADLYSIGNAL
=================================================================
==30792==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0xffff9ad9c328 bp 0xfffff746b950 sp 0xfffff746b950 T0)
==30792==The signal is caused by a READ memory access.
==30792==Hint: address points to the zero page.
libunwind: Unsupported .eh_frame_hdr version
    #0 0xffff9ad9c328 in strstr (/lib/aarch64-linux-gnu/libc.so.6+0x7c328) (BuildId: 53f40c1d2f3739ae017dcdcef1a17314786e3709)
    #1 0x26f8c8 in strstr ../staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc
    #2 0xffff9ad406dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x206dc) (BuildId: 53f40c1d2f3739ae017dcdcef1a17314786e3709)
    #3 0x25a554 in _start (/b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-aarch64-unknown-linux-gnu-bins/compiler-rt/test/sanitizer_common/asan-aarch64-Linux/Output/strstr.c.tmp+0x25a554) (BuildId: 5196c7df40b5e4c2)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/aarch64-linux-gnu/libc.so.6+0x7c328) (BuildId: 53f40c1d2f3739ae017dcdcef1a17314786e3709) in strstr
==30792==ABORTING

--
vitalybuka reopened this revision.Jan 5 2022, 9:26 PM

I am going to revert and investigate tomorrow.

This revision is now accepted and ready to land.Jan 5 2022, 9:26 PM

Hi, any updates on the fix so far? Is there anything I can do to get this patch committed?

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 3:02 AM

Hi, I closed the issue on github, since we hadn't seen movement here or the issue show up at for a month. I was just reporting the breakage, and didn't investigate further after the revert. Maybe check with @vitalybuka if you need more support/information?

I think this should show up in any config when compiling w/ asan. So, AFICT just building for a Linux x64 target should reproduce the SEGV for you. The most important bit of Fuchsia's toolchain build is that we use a build a stage 2 compiler, so building clang, and using that just built clang + runtimes should give you something very close to our config if you're having trouble reproducing.