This is an archive of the discontinued LLVM Phabricator instance.

Move Itanium demangler implementation into a header file and add visitation support.
ClosedPublic

Authored by rsmith on Aug 17 2018, 3:58 PM.

Details

Summary

This transforms the Itanium demangler into a generic reusable library that can
be used to build, traverse, and transform Itanium mangled name trees.

This is in preparation for adding a canonicalizing demangler, which
cannot live in the Demangle library for layering reasons. In order to
keep the diffs simpler, this patch moves more code to the new header
than is strictly necessary: in particular, all of the printLeft /
printRight implementations can be moved to the implementation file.
(And indeed we could make them non-virtual now if we wished, and remove
the vptr from Node.)

All nodes are now included in the Kind enumeration, rather than omitting
some of the Expr nodes, and the three different floating-point literal
node types now have distinct Kind values.

As a proof of concept for the visitation / matching mechanism, this
patch implements a Node dumping facility on top of it, replacing the
prior mechanism that produced the pretty-printed output rather than a
tree dump. Sample dump output:

FunctionEncoding(
  NameType("int"),
  NameWithTemplateArgs(
    NestedName(
      NameWithTemplateArgs(
        NameType("A"),
        TemplateArgs(
          {NameType("B")})),
      NameType("f")),
    TemplateArgs(
      {NameType("int")})),
  {},
  <null>,
  QualConst, FunctionRefQual::FrefQualLValue)

As a next step, it would make sense to move the LLVM high-level interface to
the demangler (the itaniumDemangler function and ItaniumPartialDemangler class)
into the Support library, and implement them in terms of the Demangle library.
This would allow the libc++abi demangler implementation to be an identical copy
of the llvm Demangle library, and would allow the LLVM implementation to reuse
LLVM components such as llvm::BumpPtrAllocator, but we'll need to decide how to
coordinate that with the MS ABI demangler, so I'm not doing that in this patch.

No functionality change intended other than the behavior of dump().

Diff Detail

Event Timeline

rsmith created this revision.Aug 17 2018, 3:58 PM

This seems like a good direction to me! +1 for dropping the vptr from Node, I think I'll try to do that once all the dust settles from these changes. I think you forgot to git add the new ItaniumDemangle.cpp into the diff, though.

include/llvm/Demangle/ItaniumDemangle.h
1

Nit: you forgot the "-*- C++ -*-"

144

Shouldn't this be match(Fn &&)? Otherwise, someone using std::function would get needlessly bad performance by default.

1033

Why did you delete this?

1136

Do you think it makes more sense from a library perspective to move the special-case rule to the implementation of the canonicalizer?

1416–1418

May as well get rid of this base class now. (If you don't mind!)

1769

I think you forgot to add a file that contained these definitions?

dlj added inline comments.Aug 17 2018, 5:20 PM
include/llvm/Demangle/ItaniumDemangle.h
29

Could you document the ordering semantics here?

I *think* it's most-to-least derived, but if I'm wrong... well, there you go. :-)

1121

Could you add an example of where this type appears?

1262

Isn't this just std::allocator? (Assuming it's for Sa)

rsmith marked 3 inline comments as done.Aug 17 2018, 5:23 PM
rsmith added inline comments.
include/llvm/Demangle/ItaniumDemangle.h
144

I'm following the convention used by the C++ standard library: function objects are passed by value by default, and users use std::ref to have them passed by copy instead. (Pass-by-value is likely better when copying is acceptable and the function object is small or empty.)

That said, I forgot the std::refs in the dump() implementation (fixed now). Perhaps that's a good argument to not follow the standard library convention.

1033

Oops, regex gone astray.

1136

Here are four example use cases for the visit/match functionality:

  1. AST dumping. As this patch shows, ForwardTemplateReference is a special case. (We want to visit the resolved node where possible, not the template parameter index, and need to deal with infinite recursion.)
  2. AST pretty-printing. (This is really the same as case 1.)
  3. Canonicalizing allocation. We don't want to canonicalize ForwardTemplateReference based on constructor arguments, because two nodes with the same constructor arguments do not have the same meaning in general.
  4. Cloning / tree transformation. We do not want to clone ForwardTemplateReference by calling the constructor with the same constructor arguments, because that would create an unresolved node.

Every reasonable visitor I've considered wants to special-case ForwardTemplateReference. It is fundamentally different from the other nodes, because it is not immutable after construction, so there is no way to produce a set of constructor arguments that produce an identical node.

So I think it's better to force every caller of match to think about this case by not providing a default implementation. Maybe adding a deleted version would be better though, to make it easier to see that this is intentional when the client hits the inevitable build break? I'll do that.

1769

This patch doesn't delete the existing lib/Demangle/ItaniumDemangle.cpp; this is still present there. Phabricator seems to hide its contents by default, though, so maybe that's why you're not seeing it?

lib/Demangle/ItaniumDemangle.cpp
240–241

(The BumpPointerAllocator implementation is over here. A diff algorithm somewhere seems to have got confused, though; this code is unchanged from the prior version in this file.)

rsmith updated this revision to Diff 161360.Aug 17 2018, 5:26 PM
rsmith marked an inline comment as done.

Address Erik's review comments.

rsmith added inline comments.Aug 17 2018, 5:36 PM
include/llvm/Demangle/ItaniumDemangle.h
29

There's no ordering constraints. As a matter of cleanliness, keeping these in the same order as the class definitions might be nice, but nothing relies on that. After Erik's request to remove the Expr class, there isn't any interesting hierarchy here either -- these classes all derived directly from Node.

1121

Sure, why not. (I'll do this in a separate commit.)

1262

Yes, that sure looks like a (pre-existing) bug! As it happens, this is unreachable, since we don't ever form an ExpandedSpecialSubstitution for SpecialSubKind::allocator. (The SpecialSubKind::string case is also wrong and also unreachable.)

I'll fix this in a separate commit.

dlj accepted this revision.Aug 17 2018, 5:57 PM

Looks fine for my comments.

include/llvm/Demangle/ItaniumDemangle.h
1969

Would it make sense to update this comment now that Expr is gone?

(I think this is what confused me about the ordering in FOR_EACH_NODE_KIND.)

This revision is now accepted and ready to land.Aug 17 2018, 5:57 PM
rsmith updated this revision to Diff 161365.Aug 17 2018, 6:42 PM

Update visit comment now there is no class hierarchy to speak of.

rsmith marked an inline comment as done.Aug 17 2018, 6:42 PM
rsmith edited the summary of this revision. (Show Details)
erik.pilkington accepted this revision.Aug 20 2018, 9:36 AM

LGTM too, thanks for doing this!

include/llvm/Demangle/ItaniumDemangle.h
28

Don't remove this in the libcxxabi version, or enclose the #include in namespace {}!. You were probably planning on doing that, but just to be sure.

144

Oh, okay, I guess it makes more sense to follow the standard conventions here then.

1136

Okay, I agree then, that's pretty convincing.

1262

Oops! Since this was unreachable, we should just remove these cases from the switch.

1769

Oh, right, my mistake!

lib/Demangle/ItaniumDemangle.cpp
63–65

This should be defined in an anon namespace, right?

236–238

I think this comment is too high: initializeOutputStream() and BumpPointerAllocator are both used in the libcxxabi version.

This revision was automatically updated to reflect the committed changes.
rsmith marked 2 inline comments as done.
rsmith added inline comments.Aug 20 2018, 1:15 PM
lib/Demangle/ItaniumDemangle.cpp
236–238

Right, sorry. I was thinking about what I plan to do next, which is to make the LLVM version use llvm::BumpPtrAllocator, but until / unless that happens I agree.

xbolva00 added inline comments.
llvm/trunk/include/llvm/Demangle/ItaniumDemangle.h
609 ↗(On Diff #161541)

Hi Richard, this code looks suspicious. Is this correct?

PVS warns:
/home/xbolva00/LLVM/llvm-project/llvm/include/llvm/Demangle/ItaniumDemangle.h 620 err V769 The 'SecondChar' pointer in the '++ SecondChar' expression equals nullptr. The resulting value is senseless and it should not be used.

Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2019, 8:40 AM
Herald added a subscriber: dexonsmith. · View Herald Transcript
llvm/trunk/include/llvm/Demangle/ItaniumDemangle.h
609 ↗(On Diff #161541)

(This is my fault, Richard just copied this)

This is sorta fine, this done to distinguish between an empty StringView with nullptr first and last and an empty NodeOrString. The pointers are never loaded from, but I believe this is still UB. That being said, I think this class is over engineered and we'd probably be better off putting the StringView in a NameType in cases where we need to have either a Node or a String.

xbolva00 added inline comments.Nov 2 2019, 12:00 PM
llvm/trunk/include/llvm/Demangle/ItaniumDemangle.h
609 ↗(On Diff #161541)

Thanks for info. Can you add code comment here for now?

llvm/trunk/include/llvm/Demangle/ItaniumDemangle.h
609 ↗(On Diff #161541)

I'll just eradicate it when I'm back at my desk on Monday, shouldn't be too tricky. IIRC this is only used in a few places.

include/llvm/Demangle/ItaniumDemangle.h