This is an archive of the discontinued LLVM Phabricator instance.

[asan] Always skip first object from dl_iterate_phdr
ClosedPublic

Authored by mcf on Feb 10 2022, 9:38 PM.

Details

Summary

All platforms return the main executable as the first dl_phdr_info.
FreeBSD, NetBSD, Solaris, and Linux-musl place the executable name
in the dlpi_name field of this entry. It appears that only Linux-glibc
uses the empty string.

To make this work generically on all platforms, unconditionally
skip the first object (like is currently done for FreeBSD and NetBSD).
This fixes first DSO detection on Linux-musl. It also would likely
fix detection on Solaris/Illumos if it were to gain PIE support
(since dlpi_addr would not be NULL).

Additionally, only skip the Linux VDSO on linux.

Finally, use the empty string as the "seen first dl_phdr_info"
marker rather than (char *)-1. If there was no other object, we
would try to dereference it for a string comparison.

Diff Detail

Event Timeline

mcf created this revision.Feb 10 2022, 9:38 PM
mcf requested review of this revision.Feb 10 2022, 9:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2022, 9:38 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Does this fix any test for Linux musl? If yes, please add them to the description.

I needed this patch to use asan on my system (linux-musl) without ASAN_OPTIONS=verify_asan_link_order=false. Otherwise I get the following error:

==23138==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.

even though it is first in the initial library list.

I've only tested this on linux-musl, but I checked dl_iterate_phdr behavior on FreeBSD, NetBSD, OpenIndiana, and Debian. Only Debian used the empty string for the first object; all the others used the executable name. dlpi_addr depended on whether the executable was PIE or not.

OK with me.

As an aside in practice FreeBSD returns the main executable as the first object but this is not guaranteed / documented at present; I have a FreeBSD bug (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199943) open with a request to do so.

MaskRay accepted this revision.Feb 11 2022, 11:04 AM

Looks great!

This revision is now accepted and ready to land.Feb 11 2022, 11:04 AM

I needed this patch to use asan on my system (linux-musl) without ASAN_OPTIONS=verify_asan_link_order=false. Otherwise I get the following error:

==23138==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.

even though it is first in the initial library list.

I've only tested this on linux-musl, but I checked dl_iterate_phdr behavior on FreeBSD, NetBSD, OpenIndiana, and Debian. Only Debian used the empty string for the first object; all the others used the executable name. dlpi_addr depended on whether the executable was PIE or not.

Perhaps you are using -shared-libsan / -shared-libasan on Linux musl and see the spurious ASan runtime does not come first in initial library list; ?

vitalybuka accepted this revision.Feb 11 2022, 12:48 PM
mcf added a comment.Feb 11 2022, 2:34 PM

Perhaps you are using -shared-libsan / -shared-libasan on Linux musl and see the spurious ASan runtime does not come first in initial library list; ?

Aha, that explains it. I actually hit this downstream with gcc, which defaults to -shared-libasan. Indeed, without this patch, if I pass -static-libasan it works, and if I use clang with -shared-libasan, it fails same way as gcc.

This revision was automatically updated to reflect the committed changes.

@mcf This is causing a buildbot failure, please can you take a look: https://lab.llvm.org/buildbot/#/builders/57/builds/14988

thakis added a subscriber: thakis.Feb 12 2022, 11:04 AM

Something in llvmorg-15-init-1237-gaff11542..llvmorg-15-init-1410-g103e1d93 made TestCases/Linux/asan_dlopen_test.cpp fail (in both 32-bit and 64-bit on our bots): https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8822456541383185153/+/u/package_clang/stdout?format=raw

This looks like a likely candidate.

Ok, RKSimon reported the same 7.5 ago.

Reverted in c07bbbcef9117cdef249fcea2ae4c9f32ae50ae3 for now.

mcf added a comment.Feb 13 2022, 3:08 AM

I'm having trouble reproducing this test failure. Does anyone have any advice? I'm testing on Debian sid.

In particular, I'm interested in the output of compiler-rt/test/asan/TestCases/Linux/asan_dlopen_test.cpp with ASAN_OPTIONS=verbosity=2.

I'm not exactly sure how this patch could incorrectly suppress the warning compared to the original code. Maybe if a dl_phdr_info besides the first one is a null pointer or the empty string?

ccross added a subscriber: ccross.Apr 7 2022, 3:10 PM

With some help from thakis to identify the environment used on the chromium buildbots (debian xenial), I've reproduced the chromium buildbot failures. On xenial glibc has been modified to return a blank dlpi_name for the vdso entry in a patch titled "Horrible workaround for horribly broken software", which was a fix for https://bugzilla.redhat.com/show_bug.cgi?id=737223. This diff on top of your patch fixes a manual run of the test code for me:

diff --git a/compiler-rt/lib/asan/asan_linux.cpp b/compiler-rt/lib/asan/asan_linux.cpp
index 42de45b63271..d1e838278f10 100644
--- a/compiler-rt/lib/asan/asan_linux.cpp
+++ b/compiler-rt/lib/asan/asan_linux.cpp
@@ -143,6 +143,12 @@ static int FindFirstDSOCallback(struct dl_phdr_info *info, size_t size,
   // Ignore vDSO
   if (internal_strncmp(info->dlpi_name, "linux-", sizeof("linux-") - 1) == 0)
     return 0;
+  // Some old versions of glibc patched by distributors include a "fix" for
+  // https://bugzilla.redhat.com/show_bug.cgi?id=737223 that returns an empty
+  // name for the vDSO entry.  The main program also has an empty name in
+  // glibc, but that was already handled by ignoring the first entry.
+  if (!info->dlpi_name || info->dlpi_name[0] == 0)
+    return 0;
 #    endif
 
   *name = info->dlpi_name;
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2022, 3:10 PM
Herald added a subscriber: StephenFan. · View Herald Transcript
MaskRay added a comment.EditedApr 7 2022, 3:31 PM

With some help from thakis to identify the environment used on the chromium buildbots (debian xenial), I've reproduced the chromium buildbot failures. On xenial glibc has been modified to return a blank dlpi_name for the vdso entry in a patch titled "Horrible workaround for horribly broken software", which was a fix for https://bugzilla.redhat.com/show_bug.cgi?id=737223. This diff on top of your patch fixes a manual run of the test code for me:

diff --git a/compiler-rt/lib/asan/asan_linux.cpp b/compiler-rt/lib/asan/asan_linux.cpp
index 42de45b63271..d1e838278f10 100644
--- a/compiler-rt/lib/asan/asan_linux.cpp
+++ b/compiler-rt/lib/asan/asan_linux.cpp
@@ -143,6 +143,12 @@ static int FindFirstDSOCallback(struct dl_phdr_info *info, size_t size,
   // Ignore vDSO
   if (internal_strncmp(info->dlpi_name, "linux-", sizeof("linux-") - 1) == 0)
     return 0;
+  // Some old versions of glibc patched by distributors include a "fix" for
+  // https://bugzilla.redhat.com/show_bug.cgi?id=737223 that returns an empty
+  // name for the vDSO entry.  The main program also has an empty name in
+  // glibc, but that was already handled by ignoring the first entry.
+  if (!info->dlpi_name || info->dlpi_name[0] == 0)
+    return 0;
 #    endif
 
   *name = info->dlpi_name;

Ubuntu 16.04 (xenial) which has exceeded "End of standard support" on April 2021?
I think the comment can be more accurate to mention which last Ubuntu version has this hack so that we can remove the hack when it is no longer relevant.

PS: if xenial is the last version shipping the hack, personally I think the build bot probably needs to be upgraded and I am unsure it should still be supported (or perhaps just unsupporting the dlopen mode for the two two dlopen tests). @thakis

ccross added a comment.Apr 7 2022, 4:17 PM

Ubuntu 16.04 (xenial) which has exceeded "End of standard support" on April 2021?
I think the comment can be more accurate to mention which last Ubuntu version has this hack so that we can remove the hack when it is no longer relevant.

PS: if xenial is the last version shipping the hack, personally I think the build bot probably needs to be upgraded and I am unsure it should still be supported (or perhaps just unsupporting the dlopen mode for the two two dlopen tests). @thakis

It's also in CentOS 7 (and I assume RHEL 7), which isn't EOL until 2024.

mcf added a comment.Apr 7 2022, 4:18 PM

Thanks for tracking that down, Colin!

It seems that prior to glibc 2.15, it used the empty string for the vDSO, and some distributions have patched that old behavior into their glibc to work around a bug in the nvidia binary drivers.

Since this is how glibc used to behave (even if such versions are very old), I think your amended patch looks good.

mcf reopened this revision.Apr 7 2022, 5:10 PM
This revision is now accepted and ready to land.Apr 7 2022, 5:10 PM
mcf updated this revision to Diff 421368.Apr 7 2022, 5:11 PM

Handle vDSO entry with empty name

MaskRay accepted this revision.Apr 7 2022, 5:17 PM

Thanks for checking that this is not only Ubuntu specific.
If glibc upstream fixed this at version X, I think it it would be great to replace "Some old versions of glibc" with the information, but I'm not blocking this patch if the information is not easy to obtain.

mcf updated this revision to Diff 421370.Apr 7 2022, 5:28 PM

Mention first glibc version that changed vDSO dlpi_name behavior.

This revision was automatically updated to reflect the committed changes.