This is an archive of the discontinued LLVM Phabricator instance.

[Demangle] Add support for D types back referencing
ClosedPublic

Authored by ljmf00 on Oct 8 2021, 8:26 AM.

Details

Summary
This patch adds support for type back referencing, allowing demangling of
compressed mangled symbols with repetitive types.

Signed-off-by: Luís Ferreira <contact@lsferreira.net>

Diff Detail

Event Timeline

ljmf00 created this revision.Oct 8 2021, 8:26 AM
ljmf00 requested review of this revision.Oct 8 2021, 8:26 AM
dblaikie added inline comments.Oct 10 2021, 10:11 AM
llvm/unittests/Demangle/DLangDemangleTest.cpp
19–20

Might be worth some comments explaining what each test is intended to cover. Using similar phrasing to the comments in the implementation source would help cross reference (eg: is this last one meant to test the "LastBackref" bailout, or the "parseType" failure case? (or something else)

20

/might/ be easier to follow if each case is tested in a separate mangled name? (helps to see which things are varying, which things are the same, so the intention of each test is clearer?)

Hmm, yeah, I'm not sure I follow what this is testing. It's got several backrefs, perhaps where one would suffice? (similarly with the previous ABC.ABC.ABC.a test, for what it's worth - not quite following that either) & they look like they'd all parse through parseQualified... 8demangle, 4ABCi, Qf, Qh, 1a, Qh - looks all like that would be resolved by parseIdentifier (calling the base case of parseLName for the numerically prefixed ones, and the parseSymbolBackref for the Q ones) called in parseQualified,

I added comments to the test cases and fixed C like comments, ++I instead of I++ and other requested changes on previous patches. Can you please re-review it?

llvm/unittests/Demangle/DLangDemangleTest.cpp
20

Well, I added an overcomplicated test. I simplified it to use only one back reference, since referencing backreferences are not in the specification. The idea behind this backreference concept is to calculate the position offset to the current position but backwards, by converting the difference of a char ASCII value with 'a' or 'A' to the desired shifted position and relocating the pointer with that value, so that Ret points to that reference. For the current test case example _D8demangle4ABCi1aQd Qd is a backreference for position 4, counting from d char, so decoding it points to i. Since i is a valid type sequence, it succeeded.

I added an invalid test case with a comment explaining it. LastBackRef is currently being used to check if the LastBackRef is the same as the currently decoded backreference, and therefore detecting a recursive backreference. This needs to be checked since a type can technically point to a letter. This logic can be later extended when pointing to non-recursive backreferences is a thing (I'm currently discussing this and doing some test cases to implement this in the official spec and the reference compiler) otherwise, checking for the 'Q' letter would be sufficient.

dblaikie accepted this revision.Jan 11 2022, 6:07 PM

Looks good, thanks!

llvm/unittests/Demangle/DLangDemangleTest.cpp
20

Might be handy to include some of the explanation of the _D8demangle4ABCi1aQd from this discussion in a comment in the test

This revision is now accepted and ready to land.Jan 11 2022, 6:07 PM
ljmf00 updated this revision to Diff 399183.Jan 11 2022, 7:25 PM

I also added another test to include invalid back reference position with _D8demangle3fooQXXXx.

llvm/unittests/Demangle/DLangDemangleTest.cpp
20

Added the suggested explanation on both patches. Updated the previous one as well.

This revision was landed with ongoing or failed builds.Jan 12 2022, 2:02 PM
This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.
llvm/lib/Demangle/DLangDemangle.cpp
312

@ljmf00 coverity is warning that it looks like we fail to restore LastBackref if we early-return here - is that correct?

dblaikie added inline comments.Feb 28 2022, 9:10 AM
llvm/lib/Demangle/DLangDemangle.cpp
312

Out of curiosity, do you have the full/actual Coverity error? Is it detecting based on the "Save" prefix that the variable is intended to save/restore?

I'm guessing on this codepath we're probably on an error path anyway, and it doesn't matter if it's restored?

But @ljmf00 could you confirm that that's the case and/or that this path (a LastBackref value, and a bail-out through this early return) is tested/behaving as desired?

This is the coverity dump (if you have access its at https://scan.coverity.com/projects/llvm):

292const char *Demangler::parseTypeBackref(const char *Mangled) {
293  // A type back reference always points to a letter.
294  //    TypeBackRef:
295  //        Q NumberBackRef
296  //        ^
297  const char *Backref;
298
299  // If we appear to be moving backwards through the mangle string, then
300  // bail as this may be a recursive back reference.
   1. Condition Mangled - this->Str >= this->LastBackref, taking false branch.
301  if (Mangled - Str >= LastBackref)
302    return nullptr;
303
   2. save: Saving non-local this->LastBackref in local SaveRefPos.
304  int SaveRefPos = LastBackref;
   3. modify: Modifying non-local this->LastBackref.
305  LastBackref = Mangled - Str;
306
307  // Get position of the back reference.
308  Mangled = decodeBackref(Mangled, Backref);
309
310  // Can't decode back reference.
   4. Condition Backref == NULL, taking true branch.
311  if (Backref == nullptr)
   CID 1469001 (#1 of 1): Failure to restore non-local value (MISSING_RESTORE)5. end_of_scope: Value of non-local this->LastBackref that was saved in SaveRefPos is not restored as it was along other paths.
312    return nullptr;
313
314  // TODO: Add support for function type back references.
315  Backref = parseType(Backref);
316
   A1. restore_example: The original value of non-local this->LastBackref was restored here.
317  LastBackref = SaveRefPos;
318
319  if (Backref == nullptr)
320    return nullptr;
321
322  return Mangled;
323}
324
ljmf00 added inline comments.Feb 28 2022, 10:32 AM
llvm/lib/Demangle/DLangDemangle.cpp
312

I don't see the need to restore such variable, as this will abort the demangler anyway. parse functions doesn't usually restore the demangler context, only decode functions, if needed. Can you clarify why this triggered your attention to this? I also requested access to coverity, is that possible?

RKSimon added inline comments.Feb 28 2022, 11:20 AM
llvm/lib/Demangle/DLangDemangle.cpp
312

I'm working my way back in time reviewing warnings - this was first detected at 2022/1/14