This is an archive of the discontinued LLVM Phabricator instance.

Only use __cxa_demangle on C++ mangled names
AbandonedPublic

Authored by dim on Jul 28 2016, 2:35 PM.

Details

Reviewers
emaste
rnk
Summary

Fix the Symbolizer.DemangleSwiftAndCXX test on FreeBSD, where one should
not call __cxa_demangle() with an identifier that is not a C++ mangled
name. This would translate "foo" to "float", causing the test to fail.

To prevent this, test if the symbol starts with "_Z" before calling
__cxa_demangle().

Diff Detail

Event Timeline

dim updated this revision to Diff 66010.Jul 28 2016, 2:35 PM
dim retitled this revision from to Only use __cxa_demangle on C++ mangled names.
dim updated this object.
dim added reviewers: rnk, emaste.
dim added a subscriber: llvm-commits.
dim updated this revision to Diff 66024.Jul 28 2016, 2:51 PM

Scratch that initial version, if the name to be demangled does not start
with "_Z", we should not return nullptr, but the unmodified name
instead.

rnk edited edge metadata.Jul 28 2016, 3:39 PM

On Mac and 32-bit mingw C++ mangled names may have an extra underscore prefix, so the name will start with "__Z". You should also handle that case.

emaste edited edge metadata.EditedJul 28 2016, 5:22 PM

There is a QOI issue with FreeBSD's (actually libcxxrt's, actually ELF Tool Chain's) __cxa_demangle, but there's also something not quite right with the sanitizer as a consumer.

@dim reports that with this change another test fails, expecting 3Foo to demangle to foo. But, the sanitizer tests have:

// Check that the rest demangles properly.
EXPECT_STREQ("f1(char*, int)", DemangleSwiftAndCXX("_Z2f1Pci"));
EXPECT_STREQ("foo", DemangleSwiftAndCXX("foo"));

This suggests to me we're expecting to be able to pass in C++ mangled symbols (_Z3foo), C++ mangled type names (3foo) and unmangled C symbols (foo). But that's not really possible, since some strings are both valid C symbols and C++ type names. For example, f should demangle to either f or float depending on whether it's a C symbol or C++ type name.

dim added a comment.Jul 29 2016, 1:46 AM

So when we fix DemangleCXXABI() to only really demangle identifiers starting with "_Z", some UBSan tests start failing, for example:

FAIL: UBSan-ASan-i386 :: TestCases/TypeCheck/vptr-virtual-base.cpp (27899 of 27987)
******************** TEST 'UBSan-ASan-i386 :: TestCases/TypeCheck/vptr-virtual-base.cpp' FAILED ********************
Script:
--
/home/dim/obj/llvm-asan-276903/./bin/clang --driver-mode=g++ -fsanitize=address -m32 -frtti -fsanitize=vptr -fno-sanitize-recover=vptr -g /share/dim/src/llvm/asantest/projects/compiler-rt/test/ubsan/TestCases/TypeCheck/vptr-virtual-base.cpp -O3 -o /usr/home/dim/obj/llvm-asan-276903/projects/compiler-rt/test/ubsan/AddressSanitizer-i386/TestCases/TypeCheck/Output/vptr-virtual-base.cpp.tmp
not  /usr/home/dim/obj/llvm-asan-276903/projects/compiler-rt/test/ubsan/AddressSanitizer-i386/TestCases/TypeCheck/Output/vptr-virtual-base.cpp.tmp 2>&1 | FileCheck /share/dim/src/llvm/asantest/projects/compiler-rt/test/ubsan/TestCases/TypeCheck/vptr-virtual-base.cpp
--
Exit Code: 1

Command Output (stderr):
--
/share/dim/src/llvm/asantest/projects/compiler-rt/test/ubsan/TestCases/TypeCheck/vptr-virtual-base.cpp:16:17: error: expected string not found in input
 // CHECK-NEXT: [[PTR]]: note: object is of type 'Foo'
                ^
<stdin>:2:1: note: scanning from here
0xbfbfebb0: note: object is of type '3Foo'
^
<stdin>:2:1: note: with variable "PTR" equal to "0xbfbfebb0"
0xbfbfebb0: note: object is of type '3Foo'
^
<stdin>:2:11: note: possible intended match here
0xbfbfebb0: note: object is of type '3Foo'
          ^

The "object is of type '3Foo'" message is emitted from UBSan's HandleDynamicTypeCacheMiss(), where it calls TypeName(DTI.getMostDerivedTypeName()) to get the string representation of the type. This ultimately calls into getDynamicTypeInfoFromVtable(), and apparently there is no addition of "_Z" to the type name there.

If we would add _Z to a the output of getMostDerivedTypeName(), I think a lot of other tests might also fall over, so I am unsure what the best approach is for fixing this.

On Mac and 32-bit mingw C++ mangled names may have an extra underscore prefix, so the name will
start with "__Z". You should also handle that case.

Does looking at the status code returned from __cxa_demangle helps in determining if we are dealing with a valid mangled name? (https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.3/a01696.html)

dim added a comment.Jul 31 2016, 8:06 AM

Actually the mangled names do not always have to start with "_Z". https://mentorembedded.github.io/cxx-abi/abi.html#demangler says:

mangled-name is a pointer to a null-terminated array of characters. It may be either an external name, i.e. with a "_Z" prefix, or an internal NTBS mangling, e.g. of a type for type_info.

E.g. "_Z" is for external names, but without the prefix __cxa_demangle() should also demangle valid names like 3Foo properly. This is indeed a QoI issue in FreeBSD's demangler, which is provided by libcxxrt.

For now, the easiest "fix" might be to disable the one test that fails due to this, until we find some way of fixing this in libcxxrt (or switch to libcxxabi ;) ). Or is this a too-easy copout?

dim abandoned this revision.Jul 31 2016, 9:05 AM

Abandoned in favor of D23001 for now.