This is an archive of the discontinued LLVM Phabricator instance.

[libcxxabi][demangler] Improve representation of substitutions/templates
ClosedPublic

Authored by erik.pilkington on Aug 7 2017, 1:44 PM.

Details

Summary

This patch changes the demangler so that it represents substitutions/templates more linearly. Previously, substitions were represented by a vector<vector<Node*>> and template parameter substitutions by a vector<vector<vector<Node*>>>! I wrote a comment in SubTable describing how this is done.
This patch gives a 20%-25% time reduction to demangle the symbols in LLVM, a 25% reduction in code size for the demangler on my machine, and makes the demangler a lot easier to read, IMO.

Thanks for taking a look!
Erik

Diff Detail

Event Timeline

erik.pilkington created this revision.Aug 7 2017, 1:44 PM
dexonsmith requested changes to this revision.Aug 7 2017, 5:09 PM

This looks like a great improvement. I've littered the patch with nit-picks, but my main concern is that there aren't any unit tests for the new data structures. I wonder if there's a hacky way to set that up...

src/cxa_demangle.cpp
1460–1461

This seems like a facility that should/could be unit tested (in C++).

I realize there's no precedent for that for the demangler. Perhaps this file could be #included by a test file:

// unittest/PODSmallVectorTest.h
#include "../src/cxa_demangle.cpp"

namespace {

TEST(PODSmallVectorTest, ...) {
}

} // namespace

Thoughts?

1462

Please add an assertion string, such as "T is required to be a plain-old data type".

1499–1501

This is identical to code in the move constructor. Separate it out into a helper?

1511–1517

Helper?

1527–1535

Should we assert when nullptr is returned from std::malloc or std::realloc?

1540–1542

Assertions on popping/dropping past empty?

1549–1550

Assertions on invalid access?

1562

As I read this patch, I keep reading "sub" as a prefix. Can we just spell this out as "SubstitutionTable"?

Also, this probably deserves unit tests as well.

1577–1580

Can/should this be implemented using pushPack() and pushSubIntoPack()?

1585

Do you need to assert that pushPack has been called at least once (i.e., !PackIndices.empty())?

1595

Perhaps NodeRange would be more clear.

1596–1599

Please be consistent with pointer style in this patch.

  • I suspect the rest of the file attaches to the type, i.e., Node** First, which is why I didn't comment earlier on, e.g., Node* Entry.
  • LLVM style is to attach to the name, like here.

I have a slight preference for switching toward LLVM style, but if the file consistently uses pointers attached to types, that's fine too. (Just be consistent within the patch.)

1602

This could perhaps use a comment. At least, inside the function to explain the math.

1603

Once you add this assertion to operator[] you can delete it here.

1604

Should there be some assertion on Subs.size() here? Perhaps, if so, this should be &Subs[PackIndices[N]], so that Subs.operator[]() handles it for you. Same below.

1605–1607

Does this deserve its own helper function (e.g., packEnd())? (I haven't read ahead to see if this logic is duplicated anywhere; probably not a separate function if it's not duplicated.)

1621–1631

How much have you thought about these sizes (32, 32, and 4)? Any evidence to back up these specific choices?

This revision now requires changes to proceed.Aug 7 2017, 5:09 PM

Oh, also found a couple of things you should likely split into prep commits to simplify this patch.

src/cxa_demangle.cpp
1575–1577
  • Why not rename names as well?
  • Please rename these in a separate prep commit to minimize noise on the final patch.
erik.pilkington edited edge metadata.
erik.pilkington marked 14 inline comments as done.

Address review comments.
Thanks!
Erik

src/cxa_demangle.cpp
1575–1577

Sure, r310415.

1604

We can't use operator[] here because if all we have is an empty parameter pack then &Subs[PackIndices[N]] needs to point the the end() iterator, which is out of bounds. The new patch adds an assert though.

1621–1631

Yep, 32 is enough for the Subs table/names to never overflow when demangling the symbols in LLVM, and 4 seldom overflows TemplateParams. TemplateParams is temporary held on the stack in parse_template_args(), so I wanted it to be smaller.

dexonsmith accepted this revision.Aug 9 2017, 10:43 AM

I have a few more nitpicks; LGTM after you fix those.

src/cxa_demangle.cpp
1621–1631

Sounds great. Please add comments to that effect.

1626–1630

This assertion won't be meaningful in the case that matters: when PackIndices[N + 1] is larger than Substitutions.size(). In that case, End is in the middle of nowhere, and the result of End <= Substitutions.end() is undefined. A smart optimizer would fold that to true.

You should assert PackIndices[N + 1] <= Substitutions.size().

5195–5199

This looks really hairy. This is the only (non-recursive) caller of parse_template_arg... it doesn't seem reasonable for parse_template_arg to modify template params, if we're just going to reset them after without looking.

Please add a FIXME to sort this out (at least to document the logic, if not to refactor it somehow).

5207–5209

Early continue instead of else, if you leave the logic like this.

I'm a little skeptical that this structure is better than before, though, since there's some non-trivial duplicated code. I'll let you decide.

This revision is now accepted and ready to land.Aug 9 2017, 10:43 AM
This revision was automatically updated to reflect the committed changes.