This is an archive of the discontinued LLVM Phabricator instance.

[libcxxabi][demangler] Use an AST to represent the demangled name
ClosedPublic

Authored by erik.pilkington on Jul 7 2017, 3:39 PM.

Details

Summary

This patch changes cxa_demangle.cpp to use an AST instead of manipulating std::strings to produce a demangled name. This is good because:

  • Performance: 3.7x faster to demangle symbols in LLVM
  • Code size: 40% reduction for a release build on my machine
  • Better readability: Now the formatting code is separated from the parsing code

This is probably correctish because:

  • The fuzzer doesn't seem to have any problems with this patch.
  • Demangles symbols in llvm identically to old demangler

Sorry for the mega-patch, I couldn't figure out a sane way of dividing this up further.

Please see a recent thread on llvm-dev for more context: http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html

NOTE: this patch depends on: https://reviews.llvm.org/D35158

Thanks for taking a look!
Erik

Diff Detail

Event Timeline

erik.pilkington created this revision.Jul 7 2017, 3:39 PM

Forgot to add context lines!

mehdi_amini added inline comments.
src/cxa_demangle.cpp
43

If this is supposed to be *the* ultimate LLVM demangler, can we follow LLVM coding standard?

mehdi_amini added inline comments.Jul 7 2017, 6:14 PM
src/cxa_demangle.cpp
86

Doc
(same for non trivial APIs)

124

Doc

172

Doc

238

Why is one pure virtual and not the other? Why isn't the stream const?

243

= default;

EricWF edited edge metadata.Jul 7 2017, 7:48 PM

This patch causes test_demangle.pass.cpp to fail with UBSan.

Standard Error:
--
/home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:113:44: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:47:14: note: nonnull attribute specified here
    #0 0x7ff62aae12fb in __cxxabiv1::(anonymous namespace)::stream::operator+=(__cxxabiv1::(anonymous namespace)::string_ref) /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:113:9
    #1 0x7ff62aaf9494 in __cxxabiv1::(anonymous namespace)::lambda_type_name::print_left(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:1135:11
    #2 0x7ff62aae711a in __cxxabiv1::(anonymous namespace)::node::print(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:233:9
    #3 0x7ff62aae711a in __cxxabiv1::(anonymous namespace)::qualified_name::print_left(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:837
    #4 0x7ff62aaf0da3 in __cxxabiv1::(anonymous namespace)::node::print(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:233:9
    #5 0x7ff62aaf0da3 in __cxxabiv1::(anonymous namespace)::node_array::print_with_seperator(__cxxabiv1::(anonymous namespace)::stream&, __cxxabiv1::(anonymous namespace)::string_ref) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:266
    #6 0x7ff62aafa85d in __cxxabiv1::(anonymous namespace)::template_params::print_left(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:901:16
    #7 0x7ff62aafaab1 in __cxxabiv1::(anonymous namespace)::node::print(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:233:9
    #8 0x7ff62aafaab1 in __cxxabiv1::(anonymous namespace)::name_with_template_args::print_left(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:926
    #9 0x7ff62aae70b9 in __cxxabiv1::(anonymous namespace)::node::print(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:233:9
    #10 0x7ff62aae70b9 in __cxxabiv1::(anonymous namespace)::qualified_name::print_left(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:834
    #11 0x7ff62aafb102 in __cxxabiv1::(anonymous namespace)::node::print(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:233:9
    #12 0x7ff62aafb102 in __cxxabiv1::(anonymous namespace)::top_level_function_decl::print_left(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:679
    #13 0x7ff62aad779f in __cxxabiv1::(anonymous namespace)::node::print(__cxxabiv1::(anonymous namespace)::stream&) const /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:233:9
    #14 0x7ff62aad779f in __cxa_demangle /home/eric/workspace/libcxxabi/src/cxa_demangle.cpp:6313
    #15 0x4218a8 in test() /home/eric/workspace/libcxxabi/test/test_demangle.pass.cpp:29682:24
    #16 0x422204 in main /home/eric/workspace/libcxxabi/test/test_demangle.pass.cpp:29761:9
    #17 0x7ff629f8c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #18 0x403a78 in _start (/home/eric/workspace/build-libcxxabi/test/Output/test_demangle.pass.cpp.exe+0x403a78)

Other than that I don't see any issues with this change. Thanks for working on this.

In this new patch:

  • Fix UB @EricWF pointed out; memmove(something, nullptr, 0) was being called.
  • Add some comments
  • rename stream -> output_stream

Thanks!
Erik

erik.pilkington marked 4 inline comments as done.Jul 8 2017, 1:23 PM
erik.pilkington added inline comments.
src/cxa_demangle.cpp
43

I would like if this followed LLVM conventions too, but this file is already written following this style and leaving it in some middle state would be ugly. All of libcxx[abi] follows this convention too, so this isn't a problem that is isolated to this file.

238

The second isn't pure virtual because its only implemented by nodes that have a part that goes on the right of the declarator, such as array types or functions types, but every node should implements print_left. The stream is where the AST is printed into, so making it const doesn't make sense. I renamed it to output_stream and wrote a comment to make this more clear in the new patch.

dexonsmith edited edge metadata.Jul 12 2017, 1:34 PM

This LGTM. Mehdi, do you have any other concerns?

src/cxa_demangle.cpp
43

I agree. I'd be fine with clang-formatting the entire project, but that seems independent from this change.

mehdi_amini added inline comments.Jul 12 2017, 1:49 PM
src/cxa_demangle.cpp
43

I'm not talking about pure "clang-format" but also naming for instance.

I would like if this followed LLVM conventions too, but this file is already written following this style and leaving it in some middle state would be ugly.

Right, but in the meantime you're adding a significant amount of "debt".

All of libcxx[abi] follows this convention too, so this isn't a problem that is isolated to this file.

This file is "special": it is shared (duplicated...) with LLVM.

ruiu edited edge metadata.Jul 12 2017, 1:50 PM

Looks like this demangler's design is similar to my demangler for Microsoft name mangling scheme (https://reviews.llvm.org/D34667). Is that a coincidence? Both demanglers create AST, stringize it using print_left/print_right (I named them write_pre/write_post), and use custom memory allocator. Looks like both demangler can share more code once both land.

Do you think you can avoid STL containers so that your demangler don't need any destructors? I observed that that makes my demangler faster.

This LGTM. Mehdi, do you have any other concerns?

No other concern (haven't looked further for any either)

dexonsmith added a subscriber: mclow.lists.

+mclow.lists

src/cxa_demangle.cpp
43

This is a patch for libcxxabi. Duplicating the file in LLVM is what created technical debt, and that file has already diverged from libcxxabi. That's where the bug is.

AFAICT, there is no requirement for LLVM subprojects to use LLVM naming schemes, and since libcxx and libcxxabi are implementing STL facilities, it's reasonable for them to use STL naming conventions (even in private implementations that aren't exposed to users).

(It would also be reasonable to follow LLVM naming conventions in private implementations, but that's not the current practice, and it would certainly inhibit code readability here to do so just for one type.)

Perhaps a compromise would be to rename string_ref to string_view, so that it sounds more like the equivalent STL type than the equivalent LLVM type.

@mclow.lists, would you like to weigh in as code owner here? Should the naming scheme for new types in libcxxabi private implementations follow LLVM coding conventions, or libcxxabi coding conventions?

Looks like this demangler's design is similar to my demangler for Microsoft name mangling scheme (https://reviews.llvm.org/D34667). Is that a coincidence? Both demanglers create AST, stringize it using print_left/print_right (I named them write_pre/write_post), and use custom memory allocator. Looks like both demangler can share more code once both land.

Yep, that was coincidental. I glanced at your patch and I think we could end up sharing some code here, which would be really neat.

Do you think you can avoid STL containers so that your demangler don't need any destructors? I observed that that makes my demangler faster.

The AST doesn't have destruction now for this reason, I used a bump pointer to allocate the AST. Looks like your patch followed this strategy too!

+1 for moving this file to LLVM's internal style.

src/cxa_demangle.cpp
43

I'm not talking about pure "clang-format" but also naming for instance.

+1 to that.

Rebase. I don't think the issue of purging underscores from this file/libcxx should block this, if we want to discuss that cfe-dev would probably be the place. I agree that it would be nice to clang-format this file, would anyone have any problems with me committing that after this? I found the git-blame of this file to be pretty useless, almost every line just points to r184097.
Are there any other thoughts on this change?
Thanks,
Erik

Rebase. I don't think the issue of purging underscores from this file/libcxx should block this, if we want to discuss that cfe-dev would probably be the place. I agree that it would be nice to clang-format this file, would anyone have any problems with me committing that after this? I found the git-blame of this file to be pretty useless, almost every line just points to r184097.
Are there any other thoughts on this change?

I suppose you may as well pick a new name for "string_ref" that doesn't have underscores, since it's a new type, if someone is going to follow up to drop underscores. Perhaps "StringView"?

s/string_ref/StringView

Use LLVM naming conventions for all the new stuff in this patch.

dexonsmith accepted this revision.Jul 27 2017, 2:17 PM

Use LLVM naming conventions for all the new stuff in this patch.

Thanks for renaming everything. LGTM!

This revision is now accepted and ready to land.Jul 27 2017, 2:17 PM
This revision was automatically updated to reflect the committed changes.