This is an archive of the discontinued LLVM Phabricator instance.

[Demangle] Add support for D simple single qualified names
ClosedPublic

Authored by ljmf00 on Oct 8 2021, 8:21 AM.

Details

Summary
This patch adds support for simple single qualified names that includes
internal mangled names and normal symbol names.

Diff Detail

Event Timeline

ljmf00 created this revision.Oct 8 2021, 8:21 AM
ljmf00 requested review of this revision.Oct 8 2021, 8:21 AM

This patch still looks pretty long - perhaps it could be split up. Could support a single identifier first, then multiple identifiers (qualifiers) and then the LName things? (could be one at a time, but might be easy enough to check them all together there if they're really regularly structured)

llvm/lib/Demangle/DLangDemangle.cpp
385–386

Might be easier to read with a StringSwitch (could make a StringRef from the Mnagled pointer and the length - maybe pass as StringRef here, rather than as the decomposed components) rather than a switch over length + strncmps?

391–392

Remove else-after-return(s)

423–429

This appears to be unstested

ljmf00 added inline comments.Oct 11 2021, 5:19 PM
llvm/lib/Demangle/DLangDemangle.cpp
385–386

I guess you are missing the point of this switch. Maybe some comments can be added to cover this behaviour but the point here is: the length passed into parseLName is the length of the identifier name but not of the mangled name evaluated inside strncmp. This is specifically needed, because, for other special names, e.g. initializers (__initZ), a Z is suffixed indicating it is a special symbol. In order to verify that, extra length to accommodate the 'Z' char is needed.

The StringSwitch is not applicable here because of that, otherwise would be a good fit. I see optimizations that can be done, however, which is:

  1. Use integer literals on strncmp instead of being calculated in runtime
  2. Use a second inner switch to make the compiler optimize it with jump tables
423–429

Thanks, this actually needs to be on another patch, since this is technically part of the function mangling. It also includes this and ~this (they are not being tested here, but on https://reviews.llvm.org/D111423 although they should be all tested on https://reviews.llvm.org/D110576) I'm going to update the path when tests file get rearranged (due to the discussion on https://reviews.llvm.org/D111414)

ljmf00 added inline comments.Oct 13 2021, 8:22 AM
llvm/lib/Demangle/DLangDemangle.cpp
385–386

Well, I realized that the first option is not doable here and the second one, from what I tested on clang 13, the optimizer can take what I suggested easily. https://godbolt.org/z/5dorfsWq9
If you have any other suggestions here that can increase readability, I would appreciate discussing them :)

ljmf00 updated this revision to Diff 388265.Nov 18 2021, 10:51 AM
ljmf00 edited the summary of this revision. (Show Details)

I changed the patch to avoid else-after-return, use // comment style, remove untested code, use ++i over i++ when possible, and use the OutputBuffer.

@dblaikie Can you rereview now?

Still seems a bit long in one patch/hard to verify that all the cases are covered by tests, etc. Could you break this up further - maybe implementing only the most basic, single 'parseIdentifier' case, then the iteration, then the anonymous symbols, then the lnames?

ljmf00 updated this revision to Diff 388675.Nov 19 2021, 7:07 PM
ljmf00 edited the summary of this revision. (Show Details)

Still seems a bit long in one patch/hard to verify that all the cases are covered by tests, etc. Could you break this up further - maybe implementing only the most basic, single 'parseIdentifier' case, then the iteration, then the anonymous symbols, then the lnames?

I have to disagree here a bit, but I'm going to do it anyway since I really want to go forward with this and don't lose time discussing this nit. I think parsing LName special cases could be indeed in another patch, but either anonymous symbols and identifier iteration are really little things. About anonymous symbols, we are talking about 5 lines of code that are trivial:

// Skip over anonymous symbols.
    if (*Mangled == '0') {
      do
        ++Mangled;
      while (*Mangled == '0');

      continue;
    }

And the iteration with an if for adding the dot is a simple do while dependent on isSymbolName which currently just checks if the next char is a digit or not before parsing another node, which is like 10 lines of code.

I'm going to also separate the fake parent symbols, which is not yet in the spec. See here for context https://github.com/dlang/dlang.org/pull/3124 .

ljmf00 retitled this revision from [Demangle] Add support for D simple qualified names to [Demangle] Add support for D simple single qualified names.Nov 19 2021, 7:08 PM
This revision was not accepted when it landed; it landed in state Needs Review.Nov 29 2021, 4:06 PM
This revision was automatically updated to reflect the committed changes.