This is an archive of the discontinued LLVM Phabricator instance.

[Demangle] Add support for D anonymous symbols
ClosedPublic

Authored by ljmf00 on Nov 19 2021, 7:14 PM.

Details

Summary
Anonymous symbols are represented by 0 in the mangled symbol. We should skip
them in order to represent the demangled name correctly, otherwise demangled
names like `demangle..anon` can happen.

Diff Detail

Event Timeline

ljmf00 created this revision.Nov 19 2021, 7:14 PM
ljmf00 requested review of this revision.Nov 19 2021, 7:14 PM
dblaikie accepted this revision.Nov 20 2021, 1:19 PM

Looks OK, though the "anonymous" in the mangled name tests seems misleading - the "anonymous" isn't itself, anonymous, right? IT's followed by anonymous symbol (0), yeah? So maybe change that text/name - might be clearer.

Alternatively, wouldn't this fall out naturally from parseIdentifier if parseIdentifier returned Mangled (rather than nullptr) when Len == 0? Seems that'd be simpler/tidier?

This revision is now accepted and ready to land.Nov 20 2021, 1:19 PM
ljmf00 updated this revision to Diff 389876.Nov 25 2021, 1:20 PM

Looks OK, though the "anonymous" in the mangled name tests seems misleading - the "anonymous" isn't itself, anonymous, right? IT's followed by anonymous symbol (0), yeah? So maybe change that text/name - might be clearer.

Yeah, I understand that this could be misleading. I changed anonymous to test.

Alternatively, wouldn't this fall out naturally from parseIdentifier if parseIdentifier returned Mangled (rather than nullptr) when Len == 0? Seems that'd be simpler/tidier?

No, it would not fall naturally since backreferencing symbols may happen (in the future, not yet implemented). As is, it would fall naturally since e.g. 00035 would be decoded as 35, but in the case where 0Q..., a dot would be appended anyway and logic would be needed later to avoid it. The reason for doing this early is also for performance reasons since other checks are not needed such as whether it is a template or a backreference. Other expensive checks should be avoided like when a number decoding overflows.

The reason for doing this early is also for performance reasons since other checks are not needed such as whether it is a template or a backreference. Other expensive checks should be avoided like when a number decoding overflows.

I wouldn't mind avoiding that sort of thing on the first pass & implementing it separately for optimization if/when/with data to show it's valuable, but no worries - if it complicates the code (such as avoiding the ".") that's enough justification for me not to go the other way that I was suggesting. Thanks for explaining!

This revision was automatically updated to reflect the committed changes.