Page MenuHomePhabricator

[llvm-cxxfilt] Split and demangle stdin input on certain non-alphanumerics.
ClosedPublic

Authored by mattd on Feb 19 2019, 3:53 PM.

Details

Summary

This patch attempts to replicate GNU c++-filt behavior when splitting stdin input for demangling.

Previously, cxx-filt would split input only on spaces. Each delimited item is then demangled.
From what I have tested, GNU c++filt also splits input on any character that does not make
up the mangled name (notably commas, but also a large set of non-alphanumeric characters).

This patch splits stdin input on any character that does not belong to the Itanium mangling
format (since Itanium is currently the only supported format in llvm-cxxfilt).

This is an update to PR39990

Diff Detail

Repository
rL LLVM

Event Timeline

mattd created this revision.Feb 19 2019, 3:53 PM

Previously we would have emitted Foo [[:space:]] . right?

llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
125 ↗(On Diff #187457)

Shouldn't this be const auto &

mattd added a comment.Feb 19 2019, 4:49 PM

Previously we would have emitted Foo [[:space:]] . right?

Thanks for the feedback. Previously we emitted a literal space character, ' ', between the mangled terms that were split on [[:space:]]. This means that tabs would get replaced with single spaces when reassembling the demangled string. With this patch, we now delimit on a variety of non-space delimiters, and capture the delimiter. This patch correctly reassembles the string using the captured delimiter... not just a single space.

mattd marked 2 inline comments as done.Feb 19 2019, 5:15 PM
mattd added inline comments.
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
125 ↗(On Diff #187457)

Yes it should be const. I have corrected my local copy.

jhenderson added inline comments.Feb 20 2019, 2:25 AM
llvm/test/tools/llvm-cxxfilt/delimiters.test
30 ↗(On Diff #187457)

It may be worth adding one more demanglable string after this one here, to show that multiple delimiters can be handled and it doesn't completely stop demangling.

llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
89 ↗(On Diff #187457)

Nit: this and similar lines below are valid uses of auto.

114 ↗(On Diff #187457)

I take it c++filt does not split on those characters?

mattd marked 2 inline comments as done.Feb 20 2019, 8:55 AM
mattd added inline comments.
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
114 ↗(On Diff #187457)

It does but in specific cases depending on if the character is a prefix or suffix to the word. For instance

echo 'Test: ._Z1F _Z1F. Done' | c++filt
Test: .F _Z1F. Done

A similar result is produced when using '$' I feel that the safest thing to do is to just let those implementation specific characters be part of the legal set of allowed characters. In fact, we already handle '.' in the suffix position, via the DotSuffix handler in include/llvm/Demangle/ItaniumDemangle.h

mattd updated this revision to Diff 188407.Feb 26 2019, 10:14 AM
mattd marked 2 inline comments as done.
  • Added a more complicated test.
  • Made a few variables 'auto' instead of their iterator type.
compnerd accepted this revision.Feb 26 2019, 4:23 PM
This revision is now accepted and ready to land.Feb 26 2019, 4:23 PM
This revision was automatically updated to reflect the committed changes.