This is an archive of the discontinued LLVM Phabricator instance.

[libcxxabi][demangler] Improve handling of variadic templates
ClosedPublic

Authored by erik.pilkington on Jan 9 2018, 3:19 PM.

Details

Summary

This patch fixes some longstanding bugs in the demangler to do with it's handling of variadic templates.

Currently the demangler handles variadic templates by expanding them into the Names stack. A parse_* function can succeed and add 0 or more Node*s to the stack. Each production that can consume a variadic template was responsible for looping over all the names it parsed, and transforming the stack. Although its possible for a pack to appear in almost every production, most productions didn't implement this loop (all but pointer and reference types). The current demangler also doesn't respect "Dp" and "sp" productions, which correspond to pack expansions in type and expression contexts. It ignores these pack expansions, and just eagerly expands the pack in the nearest context where it would be possible. For example, we misdemangle the following:

template <class> struct unary {};
template <class... Ts> void f(unary<Ts>...) {}
template void f<int, char, float>(unary<int>, unary<char>, unary<float>);
// clang mangles to _Z1fIJicfEEvDp5unaryIT_E

The demangler misdemangles this symbol into:

void f<int, char, float>(unary<int, char, float>)

I think the best way to handle the Dp & sp productions is to parse an unexpanded AST, then expand it whenever we encounter the Dp/sp. One way of implementing this might be to do a recursive TreeTransform-like traversal of the AST, but we would have to do a bunch of allocations for all the expanded nodes and it would be pretty complex. This patch does the expanding during printing, by reusing the unexpanded AST. This also automatically gives us support for PackExpansions everywhere in the grammar. Most of the churn in this patch has to do with propagating up the the pack size and some other metadata as the AST is being constructed.

This patch also introduces a nice feature: Every parse_* function now either fails, or returns exactly 1 Node. I'm going to use this to clean up the parser a bit.

Thanks for taking a look!
Erik

Diff Detail

Repository
rL LLVM

Event Timeline

erik.pilkington created this revision.Jan 9 2018, 3:19 PM
erik.pilkington added inline comments.Jan 9 2018, 3:27 PM
src/cxa_demangle.cpp
130 ↗(On Diff #129094)

This caching API is being removed because it is no longer valid: if a node has an unexpanded parameter pack it can generate multiple different strings.

1599 ↗(On Diff #129094)

This type was an optimization that flattened the previously 2d vector of substitutions (index * pack) into two 1d vectors. Now that packs are represented with a single Node*, we can replace it with a simple vector.

test/test_demangle.pass.cpp
29607 ↗(On Diff #129094)

I just generated this symbol by hand because I couldn't get clang to do it without crashing. Turns out I forgot the Dp!

dexonsmith requested changes to this revision.Jan 22 2018, 11:08 AM

Thanks for working on this!

src/cxa_demangle.cpp
260–261 ↗(On Diff #129094)

Why is this behind #if 0? Should you just use something like LLVM_DUMP_METHOD?

604 ↗(On Diff #129094)

I'd rather style changes like this were separated out into NFC commits (pre or post), since they make it hard to see what actually changed.

1412–1414 ↗(On Diff #129094)

This seems like a super-common sequence in constructors of subclasses of Node. Can you pull out a common subclass of Node (say, CachingNode)?

test/test_demangle.pass.cpp
29607 ↗(On Diff #129094)

Why was it crashing? (Did you file a bug?)

This revision now requires changes to proceed.Jan 22 2018, 11:08 AM
erik.pilkington marked 2 inline comments as done.

In this new patch:

  • Make the cached values use a 3-way bool type, and default it to false. This simplifies all the Node ctors.
  • Remove some minor style fixes to clean up the diff.
src/cxa_demangle.cpp
260–261 ↗(On Diff #129094)

Do you mean conditionally defined based on NDEBUG and marked attribute((noinline,used)) for debuggers? I was just using it as a quick & simple little helper for debugging.

1412–1414 ↗(On Diff #129094)

I had a hard time finding a nice way of abstracting the cache from the individual ctors, they all need to set it up based on their parameters. The new patch just defaults to setting the three cached values to false, with each ctor overriding when necessary. This gets rid of a lot of the duplication.

test/test_demangle.pass.cpp
29607 ↗(On Diff #129094)

Yep, I filed https://bugs.llvm.org/show_bug.cgi?id=33092 to track this at the time. I didn't really look too far into the crash, but it seems pretty low-priority.

dexonsmith accepted this revision.Jan 29 2018, 2:17 PM

LGTM once you clean up the #if 0.

src/cxa_demangle.cpp
260–261 ↗(On Diff #129094)

Yup, those two things sound great to me. You might need some platform-specific logic to define the attributes correctly.

I figured it was a quick and simple helper, like the dump() methods throughout LLVM. Here's why I think you should change it:

  • I imagine this will be useful in the future, too, and being able to call dump() from the debugger without recompiling is a great feature.
  • #if 0s render as comments in some editors, and then bitrot because no one compiles them.
This revision is now accepted and ready to land.Jan 29 2018, 2:17 PM
This revision was automatically updated to reflect the committed changes.