This is an archive of the discontinued LLVM Phabricator instance.

[libcxxabi][demangler] Improve representation of substitutions/templates
ClosedPublic

Authored by erik.pilkington on Aug 7 2017, 1:44 PM.

Details

Summary

This patch changes the demangler so that it represents substitutions/templates more linearly. Previously, substitions were represented by a vector<vector<Node*>> and template parameter substitutions by a vector<vector<vector<Node*>>>! I wrote a comment in SubTable describing how this is done.
This patch gives a 20%-25% time reduction to demangle the symbols in LLVM, a 25% reduction in code size for the demangler on my machine, and makes the demangler a lot easier to read, IMO.

Thanks for taking a look!
Erik

Diff Detail

Repository
rL LLVM

Event Timeline

erik.pilkington created this revision.Aug 7 2017, 1:44 PM
dexonsmith requested changes to this revision.Aug 7 2017, 5:09 PM

This looks like a great improvement. I've littered the patch with nit-picks, but my main concern is that there aren't any unit tests for the new data structures. I wonder if there's a hacky way to set that up...

src/cxa_demangle.cpp
1460–1461 ↗(On Diff #110080)

This seems like a facility that should/could be unit tested (in C++).

I realize there's no precedent for that for the demangler. Perhaps this file could be #included by a test file:

// unittest/PODSmallVectorTest.h
#include "../src/cxa_demangle.cpp"

namespace {

TEST(PODSmallVectorTest, ...) {
}

} // namespace

Thoughts?

1462 ↗(On Diff #110080)

Please add an assertion string, such as "T is required to be a plain-old data type".

1499–1501 ↗(On Diff #110080)

This is identical to code in the move constructor. Separate it out into a helper?

1511–1517 ↗(On Diff #110080)

Helper?

1527–1535 ↗(On Diff #110080)

Should we assert when nullptr is returned from std::malloc or std::realloc?

1540–1542 ↗(On Diff #110080)

Assertions on popping/dropping past empty?

1549–1550 ↗(On Diff #110080)

Assertions on invalid access?

1562 ↗(On Diff #110080)

As I read this patch, I keep reading "sub" as a prefix. Can we just spell this out as "SubstitutionTable"?

Also, this probably deserves unit tests as well.

1577–1580 ↗(On Diff #110080)

Can/should this be implemented using pushPack() and pushSubIntoPack()?

1585 ↗(On Diff #110080)

Do you need to assert that pushPack has been called at least once (i.e., !PackIndices.empty())?

1595 ↗(On Diff #110080)

Perhaps NodeRange would be more clear.

1596–1599 ↗(On Diff #110080)

Please be consistent with pointer style in this patch.

  • I suspect the rest of the file attaches to the type, i.e., Node** First, which is why I didn't comment earlier on, e.g., Node* Entry.
  • LLVM style is to attach to the name, like here.

I have a slight preference for switching toward LLVM style, but if the file consistently uses pointers attached to types, that's fine too. (Just be consistent within the patch.)

1602 ↗(On Diff #110080)

This could perhaps use a comment. At least, inside the function to explain the math.

1603 ↗(On Diff #110080)

Once you add this assertion to operator[] you can delete it here.

1604 ↗(On Diff #110080)

Should there be some assertion on Subs.size() here? Perhaps, if so, this should be &Subs[PackIndices[N]], so that Subs.operator[]() handles it for you. Same below.

1605–1607 ↗(On Diff #110080)

Does this deserve its own helper function (e.g., packEnd())? (I haven't read ahead to see if this logic is duplicated anywhere; probably not a separate function if it's not duplicated.)

1621–1631 ↗(On Diff #110080)

How much have you thought about these sizes (32, 32, and 4)? Any evidence to back up these specific choices?

This revision now requires changes to proceed.Aug 7 2017, 5:09 PM

Oh, also found a couple of things you should likely split into prep commits to simplify this patch.

src/cxa_demangle.cpp
1575–1577 ↗(On Diff #110080)
  • Why not rename names as well?
  • Please rename these in a separate prep commit to minimize noise on the final patch.
erik.pilkington edited edge metadata.
erik.pilkington marked 14 inline comments as done.

Address review comments.
Thanks!
Erik

src/cxa_demangle.cpp
1575–1577 ↗(On Diff #110080)

Sure, r310415.

1604 ↗(On Diff #110080)

We can't use operator[] here because if all we have is an empty parameter pack then &Subs[PackIndices[N]] needs to point the the end() iterator, which is out of bounds. The new patch adds an assert though.

1621–1631 ↗(On Diff #110080)

Yep, 32 is enough for the Subs table/names to never overflow when demangling the symbols in LLVM, and 4 seldom overflows TemplateParams. TemplateParams is temporary held on the stack in parse_template_args(), so I wanted it to be smaller.

dexonsmith accepted this revision.Aug 9 2017, 10:43 AM

I have a few more nitpicks; LGTM after you fix those.

src/cxa_demangle.cpp
1626–1630 ↗(On Diff #110281)

This assertion won't be meaningful in the case that matters: when PackIndices[N + 1] is larger than Substitutions.size(). In that case, End is in the middle of nowhere, and the result of End <= Substitutions.end() is undefined. A smart optimizer would fold that to true.

You should assert PackIndices[N + 1] <= Substitutions.size().

5195–5199 ↗(On Diff #110281)

This looks really hairy. This is the only (non-recursive) caller of parse_template_arg... it doesn't seem reasonable for parse_template_arg to modify template params, if we're just going to reset them after without looking.

Please add a FIXME to sort this out (at least to document the logic, if not to refactor it somehow).

5207–5209 ↗(On Diff #110281)

Early continue instead of else, if you leave the logic like this.

I'm a little skeptical that this structure is better than before, though, since there's some non-trivial duplicated code. I'll let you decide.

1621–1631 ↗(On Diff #110080)

Sounds great. Please add comments to that effect.

This revision is now accepted and ready to land.Aug 9 2017, 10:43 AM
This revision was automatically updated to reflect the committed changes.