This is an archive of the discontinued LLVM Phabricator instance.

[MS Demangler] Demangle symbols in function local scopes
ClosedPublic

Authored by zturner on Jul 29 2018, 10:45 AM.

Details

Summary

There are a couple of issues you run into when you start getting into more complex names, especially with regards to function local statics. When you've got something like:

int x() {
  static int n = 0;
  return n;
}

Then this needs to demangle to something like

int `int __cdecl x()'::`1'::n

The nested mangled symbols (e.g. int __cdecl x() in the above example) also share state with regards to back-referencing, so we need to be able to re-use the demangler in the middle of demangling a symbol while sharing back-ref state.

To make matters more complicated, there are a lot of ambiguities when demangling a symbol's qualified name, because a function local scope pattern (usually something like ?1??name?) looks suspiciously like many other possible things that can occur, such as ?1 meaning the second back-ref and disambiguating these cases is rather interesting. the ?1? in a local scope pattern is actually a special case of the more general pattern of ? + <encoded number> + ?, where "encoded number" can itself have embedded @ symbols, which is a common delimeter in mangled names. So we have to take care during the disambiguation, which is the reason for the overly complicated isLocalScopePattern function in this patch.

I've added some pretty obnoxious tests to exercise all of this, which exposed several other problems related to back-referencing, so those are fixed here as well. Finally, I've uncommented some tests that were previously marked as FIXME, since now these work.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Jul 29 2018, 10:45 AM
majnemer added inline comments.Jul 29 2018, 2:33 PM
llvm/lib/Demangle/MicrosoftDemangle.cpp
424–441 ↗(On Diff #157890)

Do you think it makes sense to share this logic w/ Demangler::demangleNumber? Maybe add a tryDemangleNumber which returned the remaining portion of the string, the number, and whether or not an error occurred?

zturner added a subscriber: rnk.Jul 29 2018, 2:56 PM

I had thought of a tryDemangleNumber function but it didn’t really fit the
style of code . (Ie I don’t really do that anywhere else so it felt a
little out of place). I can still do that if you think it’s better, I don’t
have a strong preference

I had thought of a tryDemangleNumber function but it didn’t really fit the
style of code . (Ie I don’t really do that anywhere else so it felt a
little out of place). I can still do that if you think it’s better, I don’t
have a strong preference

SGTM, LGTM w/ a nit.

llvm/lib/Demangle/MicrosoftDemangle.cpp
432 ↗(On Diff #157890)

What prohibits the string from being empty after this point?

majnemer accepted this revision.Jul 29 2018, 6:09 PM
This revision is now accepted and ready to land.Jul 29 2018, 6:09 PM
zturner added inline comments.Jul 29 2018, 7:10 PM
llvm/lib/Demangle/MicrosoftDemangle.cpp
432 ↗(On Diff #157890)

I tested it out and it seems like it could be. At least, undname will turn ?M@?@??L@@YAHXZ@4HA into this:

int `int __cdecl L(void)'::`0'::M

But I've never seen this kind of mangled name generated in practice. Do you know how to get it to occur?

majnemer added inline comments.Jul 29 2018, 7:32 PM
llvm/lib/Demangle/MicrosoftDemangle.cpp
432 ↗(On Diff #157890)

AFAIK, it is not possible to generate.

This revision was automatically updated to reflect the committed changes.
rnk added a comment.Jul 30 2018, 10:44 AM

Something like tryDemangleNumber would be like a recursive descent parser. What you're doing looks kind of like lookahead, where you scan forward to see if a pattern matches, and continue if it does. Depending on how complicated this code gets, we may eventually want to do less or more lookahead.