This is an archive of the discontinued LLVM Phabricator instance.

[MSAN] Mark dlerror.cc expected failure for MIPS
ClosedPublic

Authored by mohit.bhakkad on Feb 22 2016, 4:52 AM.

Details

Summary

In mips64, we are facing same issue that of AArch64

It is giving error in external library, but as per I understand
MSan won't instrument glibc and other external libraries.
Could someone throw some light on it?

Diff Detail

Repository
rL LLVM

Event Timeline

mohit.bhakkad retitled this revision from to [MSAN] Mark dlerror.cc expected failure for MIPS .
mohit.bhakkad updated this object.
mohit.bhakkad added reviewers: eugenis, kcc, samsonov.
mohit.bhakkad set the repository for this revision to rL LLVM.
samsonov accepted this revision.Feb 22 2016, 11:17 AM
samsonov edited edge metadata.

This test checks that dlerror() unpoisons the string it returns. glibc is indeed not instrumented, so some implementations could trigger false positives by calling strcmp() on internal strings are not known to be fully initialized.

I'm fine with excluding this test on MIPS.

This revision is now accepted and ready to land.Feb 22 2016, 11:17 AM
This revision was automatically updated to reflect the committed changes.

I'm dealing with the same thing on PPC64, and I've debugged the issue. Here's the explanation:

  1. dlopen calls to ld.so
  2. ld.so fails to load a library
  3. ld.so malloc's a buffer for the error message. malloc is external to ld.so, so the usual symbol resolution happens, and our interceptor is called. The memory is poisoned.
  4. ld.so memcpy's the error string to the buffer. ld.so includes its own copy of memcpy (for bootstraping), which doesn't go through the usual dynamic linking. Hence the buffer remains poisoned, even though it was just initialized.
  5. dlerror is called. dlerror implementation is directly in libdl.so.
  6. dlerror calls strcmp on the above buffer. Since libdl.so doesn't have bootstrapping issues, this uses normal dynamic binding - calling msan's interceptor.
  7. __intercept_strcmp notices the (wrongly) poisoned buffer, and explodes.

As you might have noticed, it works just fine on x86_64. This is because on x86_64, gcc optimizes the strcmp in libdl.so by converting it to repz cmpsb, avoiding an actual function call. Since AArch64/MIPS64/PowerPC64 don't have an optimized strcmp, they don't dodge the bullet. Basically, x86 works by accident, and will stop working if you compile glibc with -O0...

As for fixing this issue - the only thing I can think of is to have a TLS variable __msan_in_libcall that's set for the duration of dlerror and other funny calls, and checked in interceptors, which will then disable all error checking. Not sure how workable that is. Thoughts?

eugenis edited edge metadata.Apr 18 2016, 1:45 PM

Thank you for the investigation!
We already have what you describe as "__msan_in_libcall". It's called MSanInterceptorContext::in_interceptor_scope and ::in_interceptor_scope in msan_interceptors.cc. At the moment it applies to the common interceptors, but not to msan-specific interceptors; dlerror is msan-specific.
I think it would be a good idea to extend the MsanInterceptorContext logic from COMMON_INTERCEPTOR_ENTER to all the (legacy) msan interceptors. Among other things, this would make ENSURE_MSAN_INITED() and "if (msan_init_is_running)" lines in most interceptors unnecessary.

Thank you for the investigation!
We already have what you describe as "__msan_in_libcall". It's called MSanInterceptorContext::in_interceptor_scope and ::in_interceptor_scope in msan_interceptors.cc. At the moment it applies to the common interceptors, but not to msan-specific interceptors; dlerror is msan-specific.
I think it would be a good idea to extend the MsanInterceptorContext logic from COMMON_INTERCEPTOR_ENTER to all the (legacy) msan interceptors. Among other things, this would make ENSURE_MSAN_INITED() and "if (msan_init_is_running)" lines in most interceptors unnecessary.

Huh, that actually sounds simpler than I thought. I'll wait with commiting D19205 and try that first.