This is an archive of the discontinued LLVM Phabricator instance.

[sanitizers] Fix interception of GLibc regexec
ClosedPublic

Authored by arichardson on Feb 9 2021, 8:09 AM.

Details

Summary

Previously, on GLibc systems, the interceptor was calling compat_regexec
(regexec@GLIBC_2.2.5) insead of the newer
regexec (regexec@GLIBC_2.3.4).
The __compat_regexec strips the REG_STARTEND flag but does not report an
error if other flags are present. This can result in infinite loops for
programs that use REG_STARTEND to find all matches inside a buffer (since
ignoring REG_STARTEND means that the search always starts from the first
character).

The underlying issue is that GLibc's dlsym(RTLD_NEXT, ...) appears to
always return the oldest versioned symbol instead of the default. This
means it does not match the behaviour of dlsym(RTLD_DEFAULT, ...) or the
behaviour documented in the manpage.

It appears a similar issue was encountered with realpath and worked around
in rG77ef78a0a5dbaa364529bd05ed7a7bd9a71dd8d4.

See also https://sourceware.org/bugzilla/show_bug.cgi?id=14932 and
https://sourceware.org/bugzilla/show_bug.cgi?id=1319.

Fixes https://github.com/google/sanitizers/issues/1371

Diff Detail

Event Timeline

arichardson requested review of this revision.Feb 9 2021, 8:09 AM
arichardson created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2021, 8:09 AM
arichardson edited the summary of this revision. (Show Details)Feb 9 2021, 8:32 AM
vitalybuka added inline comments.Feb 17 2021, 2:10 AM
compiler-rt/test/sanitizer_common/TestCases/Posix/regex_startend.cpp
2

Test can be simplified by removing FileCheck and just using asserts

  • simplify test using abort()

Note: incremental ninja check-sanitizer builds will fail this test since the change to the .inc does not trigger a rebuild of the runtime libraries. CMake bug?

arichardson marked an inline comment as done.Feb 17 2021, 2:34 AM
  • simplify test using abort()

Note: incremental ninja check-sanitizer builds will fail this test since the change to the .inc does not trigger a rebuild of the runtime libraries. CMake bug?

it's possible

compiler-rt/test/sanitizer_common/TestCases/Posix/regex_startend.cpp
27–40

I assumed you can remove printfs and most of this ifs and use asserts

arichardson added inline comments.Feb 17 2021, 3:31 AM
compiler-rt/test/sanitizer_common/TestCases/Posix/regex_startend.cpp
27–40

Yeah I can definitely do this, but I find the prints quite useful when things do go wrong. Happy to change it if you prefer,

vitalybuka added inline comments.Feb 17 2021, 3:33 AM
compiler-rt/test/sanitizer_common/TestCases/Posix/regex_startend.cpp
27–40
assert(expected != nullptr);
assert((size_t)matchlen == strlen(expected));
assert(strncmp(matchstart, expected, matchlen) == 0);
marxin added a subscriber: marxin.Feb 18 2021, 11:46 PM
marxin added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
7801

I think this is problematic because there are targets that don't define regexec@GLIBC_2.3.4.
Please see my target selection in https://reviews.llvm.org/D95864.

arichardson added inline comments.Feb 19 2021, 2:43 AM
compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
7801

I've added logic to do an unversioned fallback. To me this seems a bit more maintainable than encoding the target list (since that needs to be updated for every new architecture).

vitalybuka accepted this revision.Mar 5 2021, 12:19 PM
vitalybuka added a reviewer: marxin.
marxin accepted this revision.Mar 8 2021, 1:26 AM
This revision is now accepted and ready to land.Mar 8 2021, 1:26 AM
This revision was landed with ongoing or failed builds.Mar 8 2021, 2:54 AM
This revision was automatically updated to reflect the committed changes.
ro added a subscriber: ro.Mar 9 2021, 2:45 AM

The new test FAILs on Solaris:

SanitizerCommon-asan-i386-SunOS :: Posix/regex_startend.cpp
SanitizerCommon-ubsan-i386-SunOS :: Posix/regex_startend.cpp
SanitizerCommon-ubsan-x86_64-SunOS :: Posix/regex_startend.cpp

with

/vol/llvm/src/llvm-project/dist/compiler-rt/test/sanitizer_common/TestCases/Posix/regex_startend.cpp:22:44: error: use of undeclared identifier 'REG_STARTEND'
  int rv = regexec(preg, string, 1, match, REG_STARTEND);

This is no wonder given Linux regex(3) states

REG_STARTEND
       Use   pmatch[0]   on   the   input   string,  starting  at  byte
       pmatch[0].rm_so and ending before  byte  pmatch[0].rm_eo.   This
       allows  matching  embedded  NUL  bytes and avoids a strlen(3) on
       large strings.  It does not use nmatch on input,  and  does  not
       change REG_NOTBOL or REG_NEWLINE processing.  This flag is a BSD
       extension, not present in POSIX.

If the test requires glibc targets, it should be restricted to such.

In D96348#2613433, @ro wrote:

The new test FAILs on Solaris:

SanitizerCommon-asan-i386-SunOS :: Posix/regex_startend.cpp
SanitizerCommon-ubsan-i386-SunOS :: Posix/regex_startend.cpp
SanitizerCommon-ubsan-x86_64-SunOS :: Posix/regex_startend.cpp

with

/vol/llvm/src/llvm-project/dist/compiler-rt/test/sanitizer_common/TestCases/Posix/regex_startend.cpp:22:44: error: use of undeclared identifier 'REG_STARTEND'
  int rv = regexec(preg, string, 1, match, REG_STARTEND);

This is no wonder given Linux regex(3) states

REG_STARTEND
       Use   pmatch[0]   on   the   input   string,  starting  at  byte
       pmatch[0].rm_so and ending before  byte  pmatch[0].rm_eo.   This
       allows  matching  embedded  NUL  bytes and avoids a strlen(3) on
       large strings.  It does not use nmatch on input,  and  does  not
       change REG_NOTBOL or REG_NEWLINE processing.  This flag is a BSD
       extension, not present in POSIX.

If the test requires glibc targets, it should be restricted to such.

It is a common extension that is present on all BSDs, macOS, Linux and possibly other operating systems. It think the best solution would be to add UNSUPPORTED: solaris or wrap the entire test in #ifdef REG_STARTEND and add a dummy main method?

ro added a comment.Mar 9 2021, 2:52 AM

It is a common extension that is present on all BSDs, macOS, Linux and possibly other operating systems. It think the best solution would be to add UNSUPPORTED: solaris or wrap the entire test in #ifdef REG_STARTEND and add a dummy main method?

It may be common, but it's nonstandard and certainly not univeral: I've just checked AIX 7.2 which doesn't have it either. I believe wrapping in #ifdef REG_STARTEND is better than repeatedly adjusting UNSUPPORTED clauses, provided the test can work on every target that has REG_STARTEND.

In D96348#2613450, @ro wrote:

It is a common extension that is present on all BSDs, macOS, Linux and possibly other operating systems. It think the best solution would be to add UNSUPPORTED: solaris or wrap the entire test in #ifdef REG_STARTEND and add a dummy main method?

It may be common, but it's nonstandard and certainly not univeral: I've just checked AIX 7.2 which doesn't have it either. I believe wrapping in #ifdef REG_STARTEND is better than repeatedly adjusting UNSUPPORTED clauses, provided the test can work on every target that has REG_STARTEND.

The test will work on all targets where the host libc supports REG_STARTEND (unless of course there is a bug related to sanitizer interception as was the case on x86_64 glibc).

In D96348#2613450, @ro wrote:

It is a common extension that is present on all BSDs, macOS, Linux and possibly other operating systems. It think the best solution would be to add UNSUPPORTED: solaris or wrap the entire test in #ifdef REG_STARTEND and add a dummy main method?

It may be common, but it's nonstandard and certainly not univeral: I've just checked AIX 7.2 which doesn't have it either. I believe wrapping in #ifdef REG_STARTEND is better than repeatedly adjusting UNSUPPORTED clauses, provided the test can work on every target that has REG_STARTEND.

Done now as D98425.

I would like to point to an issue reported on s390x:
https://github.com/google/sanitizers/issues/1390

MaskRay added a subscriber: MaskRay.EditedMay 27 2022, 12:49 PM

The underlying issue is that GLibc's dlsym(RTLD_NEXT, ...) appears to always return the oldest versioned symbol instead of the default. This means it does not match the behaviour of dlsym(RTLD_DEFAULT, ...) or the behaviour documented in the manpage.

I fixed https://sourceware.org/bugzilla/show_bug.cgi?id=14932 today, so we shall not need versioned interceptors in the future (glibc >= 2.36).
The current versioned interceptors are still needed for glibc<2.36 compatibility.

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 12:49 PM
Herald added a subscriber: luke957. · View Herald Transcript