Page MenuHomePhabricator

[MS Demangler] Demangle string literals

Authored by zturner on Aug 15 2018, 1:45 PM.



String literals are pretty interesting... undname, when it encounters a string literal, just prints string. For example:

$ undname ??_C@_0CF@LABBIIMO@012345678901234567890123456789AB@
Microsoft (R) C++ Name Undecorator
Copyright (C) Microsoft Corporation. All rights reserved.

Undecoration of :- "??_C@_0CF@LABBIIMO@012345678901234567890123456789AB@"
is :- "`string'"

I wanted to do better and print the actual string. So, for example, in the above example, I want to print const char * {"012345678901234567890123456789AB"...} (where the ... indicates the string was truncated).

There are a couple of gotchas here.

  1. The mangling is not unique. It is possible to produce, for example, a const char * a const char16_t *, and a const char32_t * that all mangle the same.
  1. The mangling is lossy. It only encodes a maximum of 32 bytes (for char, char16_t, and char32_t) and 32 characters (for wchar_t), which means any characters after that get truncated.

The biggest challenging is trying to infer the character byte size (i.e. is it a char, char16_t, or char32_t) (the mangling scheme uniquely identifies wchar_t, so this one is not a problem). We employ some heuristics based on counting null terminators or embedded nulls and make a best effort guess. It works well for all cases I tested, but probably for unicode strings it will break down. That said, even if it does fail to correctly deduce the character type, it will just fall back to printing the string as const char* with escaped characters. So it won't be incorrect, just sub-optimal.

You can see some sample output by examining the test file I checked in, which was ported from clang/test/CodeGenCXX/mangle-ms-string-literals.cpp

Diff Detail


Event Timeline

zturner created this revision.Aug 15 2018, 1:45 PM
rnk accepted this revision.Aug 15 2018, 1:54 PM


1638–1643 ↗(On Diff #160911)


This revision is now accepted and ready to land.Aug 15 2018, 1:54 PM
This revision was automatically updated to reflect the committed changes.