Page MenuHomePhabricator

[Symbolize] Demangle Rust symbols
ClosedPublic

Authored by tmiasko on Sep 28 2021, 3:30 PM.

Details

Summary

Extract and reuse demangling functionality common to llvm-cxxfilt,
LLVMSymbolizer::DemangleName and llvm::demangle.

From functional perspective it introduces following changes:

  • Supports Rust symbols in LLVMSymbolizer::DemangleName.
  • Accepts Rust symbols with an extra underscore in llvm::demangle.

Diff Detail

Unit TestsFailed

TimeTest
2,850 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::strcmp.c
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/strcmp.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/strcmp.c.tmp

Event Timeline

tmiasko created this revision.Sep 28 2021, 3:30 PM
tmiasko requested review of this revision.Sep 28 2021, 3:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2021, 3:30 PM
dblaikie added inline comments.Sep 28 2021, 3:48 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
654–659

Can/should we make this somehow a generic piece of code that could be used here and in llvm-cxxfilt, so it's only written in one place? (eg: there's patches to add D demangling going in at the moment & so would be handy if adding support only had to go in one place rather than 2 or more)

Speaking of the D demangling support - I was hoping it could be broken down into pieces like the Rust work you did (really appreciate that, btw) - I thought the original Rust proposal was a fully featured single patch that then got broken down, am I remembering that correctly? I couldn't find the original single patch version - if you happen to have a pointer to it, I'd appreciate it for comparison/understanding of how things went there.

tmiasko added inline comments.Sep 29 2021, 3:27 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
654–659

Reusing the llvm::demangle might be an option, but looking back in discussion from D68133, so far code here was intentionally using microsoftDemangle directly with different options. Maybe at least llvm-cxxfilt could use llvm::demangle? But then again llvm-cxxfilt, its not demangling MSVC symbols right now, and I don't have particular insight whether it should and if so how exactly.

The single patch version of Rust demangling support was in D99981.

dblaikie added inline comments.Sep 30 2021, 1:27 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
654–659

seems OK to me to use a generic demangling for all these tools - but at least llvm-cxxfilt and symbolize here are currently both doing everything except MSVC mangling - so maybe some "unix-y demangle" function would abstract these two callers at least.

Cross referencing to: D110577 - that patch and this one are both adding a case to two places that seem like they should be kept in sync, and I'd like to see that done probably by this patch, so that 110577 doesn't have to touch two places.

jhenderson added a subscriber: jhenderson.

Nothing to add myself, but +1 to reducing code duplication around this.

There are many other tools that do symbol demangling too (e.g. llvm-objdump, llvm-nm, llvm-readobj). Maybe it should be shared with those too somehow?

tmiasko added inline comments.Oct 1 2021, 1:52 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
654–659

The function here is handling MSVC mangling, though. Hence, my comment about an earlier attempt to use llvm::demangle in this function.

dblaikie added inline comments.Oct 4 2021, 10:57 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
654–659

ah, sorry, right, misread. I think it'd be fine/perhaps helpful to add llvm-cxxfilt support for MSVC demangling too (though maybe it's too ambiguous for a general "filter any string starting with '?' to try to demangle it") - but if not, at least the first part of LLVMSymbolizer::DemangleName could defer to a function that wraps up the itanium-ish demanglings over C++, Rust, and pending D support. Returning some "it wasn't mangled in any way we recognize" value, which in this caller would then fallback to the MSVC case.

tmiasko updated this revision to Diff 378804.Oct 11 2021, 2:37 PM
tmiasko edited the summary of this revision. (Show Details)

Extract and reuse common demangling functionality.

tmiasko added inline comments.Oct 11 2021, 2:39 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
654–659

I extracted a common demangling code into a new function.

I tried to limit the functional changes to demangling of Rust symbols only. The
code could be unified a bit further with additional functional changes, but I
would rather not attempt to do this here.

The current function name is non-great and I would welcome suggestions.

Why the functional change to allow double underscore Rust symbols?

llvm/lib/Demangle/Demangle.cpp
28

I'm not sure you need the llvm qualifiers, since you're already in the llvm namespace?

31–32

Previously you didn't need this second call. I'm not sure I follow why you need it now?

44

Do we have access to llvm::Optional here? Not sure, as I know there are some layering issues specific to the demangler.

If we do, I think returning an optional string would be better than passing in a reference.

tmiasko updated this revision to Diff 378911.Oct 12 2021, 1:08 AM
tmiasko marked 3 inline comments as done.

Remove unnecessary qualifiers

llvm/lib/Demangle/Demangle.cpp
31–32

The llvm::demangle and llvm-cxxfilt differ in a treatment of the leading underscore. For Itanium mangling llvm-cxxfilt accepts only symbols starting with _Z or ___Z, but has an option to strip an extra underscore first. On the other hand llvm::demangle accepts all variants by default, i.e., not only _Z and __Z but also __Z and ____Z.

To retain the existing behaviour while sharing the code that checks mangling prefixes, the new function expects symbols without an extra underscore. Thus when first attempt at demangling fails, the llvm::demangle tries once again with the leading underscore stripped.

The implementation strategy also subtly changed, since now responsibility for stripping the extra underscore moved from the inside of itaniumDemangle to the outside.

44

The demangle library currently doesn't have any dependencies on other LLVM components. Otherwise the optional string would be my preferred choice here as well.

Why the functional change to allow double underscore Rust symbols?

To support symbols with extra decoration on Mach-O for code paths that use llvm::demangle and expect it to strip the extra underscore. One example would be llvm-objdump.

jhenderson accepted this revision.Oct 15 2021, 1:01 AM

I think this all looks okay to me, but I'd appreciate @dblaikie confirming too.

This revision is now accepted and ready to land.Oct 15 2021, 1:01 AM
dblaikie accepted this revision.Oct 15 2021, 10:41 AM

Looks OK - might be beneficial to split it up into the refactor commit first (pulling out the functionality from llvm::demangle), then the additional functionality (reusing that in LLVMSymbolizer).