In AIX OS, function entry label are begin with '.', it can not be decoded currently.
we support to decode the name in this patch for all OS.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
---|---|---|
78 | Wouldn't it make more sense to introduce a StripDot variable similar to StripUnderscore? | |
90 | I'm not sure I understand the interaction of all the options and demangling methods, but if nonMicrosoftDemangle() returns false, maybe the potential leading dot should be restored for the rest of this function. | |
186 | Set StripDot here if appropriate |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
---|---|---|
78 | we do not do StripDot, we just demangle the symbol begin with "dot" if there is. (and this is xcoff specific option) I checked AIX clangtana cxxfilt, it do not has an option for "demangling the symbol begin with 'dot') . So I think we also do not need an option here and not need to introduce a new variable "StripDot". | |
90 | if (nonMicrosoftDemangle(DecoratedStr, Result)) return false. Result = Undecorated ? DotPrefix + Prefix + Undecorated : Mangled; free(Undecorated); not sure whether I answer your comment or not? | |
186 |
|
Please remember to upload your diff with full context.
llvm/test/tools/llvm-cxxfilt/delimiters.test | ||
---|---|---|
35 ↗ | (On Diff #498080) | I don't think this test should be changing, except possibly to mark it unsupported on AIX (since we don't want the behaviour changing otherwise). You could have an AIX version of the test that shows the behaviour difference, I guess. |
llvm/test/tools/llvm-cxxfilt/with-dot-prefix-aix.test | ||
5 ↗ | (On Diff #498080) | |
llvm/test/tools/llvm-cxxfilt/with-dot-prefix-non-aix.test | ||
1 ↗ | (On Diff #498080) | |
2 ↗ | (On Diff #498080) | UNSUPPORTED: system-aix is perhaps more idiomatic. |
4 ↗ | (On Diff #498080) | |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
78 | I'm ambivalent about whether there's a --strip-dot option (it could be on by default for AIX, and off otherwise). The advantage with having it is that it would be possible to demangle a binary designed for AIX OS on a different platform. Also "lable" -> "label". |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
---|---|---|
78 | I checked the gnu cxxfilt (in linux OS), it can demangle ._Z3foov to .foo() as you mention in your patch https://reviews.llvm.org/D63722?id=206236#inline-567977 I am not sure whether we can enable to decode the name prefixed with '.' by default without any new option for all OS platform ? @james Henderson |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
---|---|---|
78 | sorry , above comment should be " I checked the gnu c++filt (in linux OS), it can demangle ._Z3foov to .foo() as you mention in your patch ......." @jhenderson |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
---|---|---|
78 |
I would imagine it would work similarly to how --strip-underscore works, which IIRC is intended to help on Mac. On non-AIX OS, the --strip-dot option would be off by default, so that users don't have the behaviour unless they opt into it. On AIX OS it would be on by default, so users don't have to remember to specify it. Testing would be slightly complicated: you'd need to have testing that shows that the option is off/on by default on the respective OS types, with the rest of the tests having to handle it being in either state. You'd also need to consider how --strip-underscore interacts with it. |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
---|---|---|
78 | I disagree. As Digger pointed out, GNU c++filt works on Linux OS to decode .Z3foov to .foo(). --strip-underscore would replace __Z3foov (with two underscores) with foo() and not _foo(). The desired behaviour here is to retain the leading dot in the input. It seems to me that GNU c++filt simply has a default tokenization policy that does not consume leading . characters. I see no reason why llvm-cxxfilt should remain incompatible with GNU c++filt on the treatment of leading .s. |
So if default c++filt behaviour is to strip and then re-add a leading dot, on all OSes, I'm fine with llvm-cxxfilt changing to support that behaviour, in which case this patch needs to be made platform agnostic (i.e. tests should work on all platforms).
One thing to look at though is whether a dot in the middle of a name is considered a delimiter or not in c++filt. We should match their behaviour in this case.
Understood. I agree the patch needs such an update. I would prefer to think of it as processing a leading dot as "normal text" and not part of a symbol name, but the strip+re-add conception of the behaviour leads to the same result.
One thing to look at though is whether a dot in the middle of a name is considered a delimiter or not in c++filt. We should match their behaviour in this case.
I had observed that _Z3f.0v becomes f.0() (including on AIX), hence my characterization that GNU c++filt does not consume leading dots as part of the input symbol name.
Given that the demangling code (not the llvm-cxxfilt code though) is used in various places in LLVM, it's important this dot behaviour goes in the right place. I assume therefore that the dot stripping behaviour belongs in LLVMDemangle, on the basis that I expect tools like llvm-objdump should be able to handle the dot too.
I agree this should be explicitly addressed. For reference, the GNU implementation of __cxa_demangle does not accept the leading dot (so the dot handling is at a level closer to the tools).
I think it is difficult to implement the functionality in LLVMDemangle, for example, if I implement the functionality in the function llvm::nonMicrosoftDemangle(const char *MangledName, std::string &Result)
it will cause the symbol name "_._Z3f.0v" to be demangled as ".f.0()" in some place.
for example ,in the file Demangle.cpp
std::string llvm::demangle(const std::string &MangledName) { std::string Result; const char *S = MangledName.c_str(); if (nonMicrosoftDemangle(S, Result)) return Result; if (S[0] == '_' && nonMicrosoftDemangle(S + 1, Result)) return Result; .... }
but gnu c++filt demangle above as
[zhijian@krypton src]$ c++filt _._Z3f.0v
_._Z3f.0v
Would adding a if (S[0] == '.' && nonMicrosoftDemangle(S + 1, Result)) case there be appropriate?
if I change something like as your suggestion
std::string llvm::demangle(const std::string &MangledName) { std::string Result; const char *S = MangledName.c_str(); if (nonMicrosoftDemangle(S, Result)) return Result; if (S[0] == '_' && S[1] != '.' && nonMicrosoftDemangle(S + 1, Result)) return Result; if (char *Demangled = microsoftDemangle(S, nullptr, nullptr)) { Result = Demangled; std::free(Demangled); return Result; } return MangledName; } bool llvm::nonMicrosoftDemangle(const char *MangledName, std::string &Result) { char *Demangled = nullptr; // Not consider the prefix dot as part of the demangled symbol name. if (MangledName[0] == '.') { ++MangledName; Result = "."; } if (isItaniumEncoding(MangledName)) Demangled = itaniumDemangle(MangledName, nullptr, nullptr, nullptr); else if (isRustEncoding(MangledName)) Demangled = rustDemangle(MangledName); else if (isDLangEncoding(MangledName)) Demangled = dlangDemangle(MangledName); if (!Demangled) return false; Result += Demangled; std::free(Demangled); return true;
it will be have problem in the function demangleMachO llvm-nm.cpp
static std::optional<std::string> demangle(StringRef Name) { std::string Demangled; if (nonMicrosoftDemangle(Name.str().c_str(), Demangled)) return Demangled; return std::nullopt; } static std::optional<std::string> demangleMachO(StringRef Name) { if (!Name.empty() && Name[0] == '_') Name = Name.drop_front(); return demangle(Name); }
the function demangleMachO still will decode
symbol name "_._Z3f.0v" to be demangled as ".f.0()"
so my suggestion is putting the removing and adding prefix 'dot' back in demangle() function of in each llvm tools. (as my current patch implement).
My opinion hasn't changed that it's a mistake to put the dot demangling in llvm-cxxfilt specifically. That will mean you'll need to make the same change in llvm-nm, llvm-readelf, llvm-objdump and probably other tools too. Code duplication is bad. The next tool to be added might not realise that they can't just use the main demangler library as-is to do their name demangling.
If there is tool code that conflicts with that behaviour (such as the llvm-nm mach-o code), perhaps it's that tool code that should change instead, or perhaps we need to enhance the demangler library a little further, for example by adding an optional "strip underscore" behaviour to it. Adding @MaskRay as a reviewer in case he's got a different opinion.
llvm/test/tools/llvm-cxxfilt/delimiters.test | ||
---|---|---|
36 ↗ | (On Diff #518451) | That being said, this isn't the right place for these new test cases. The only thing that should change in this test is related to the ._Z3Foo string - see below. This test is about testing which characters are treated as delimiters, i.e. characters that separate sequences of names that are to be demangled individually. It is not about testing how things are demangled in the presence of ".". You should add that testing in a new test file. With the change in behaviour of leading dot prefixes, this test itself doesn't actually test the intended behaviour. The intention was to show that a dot was treated as part of the mangled name, rather than a separator between the previous character sequence (in this case simply the preceding space), and the subsequent mangled name (_Z3Foo). If the dot was treated as a delimiter, the _Z3Foo would have been demangled, resulting in the final part of the output being _Z3Foo$ .Foo. As you can see, this is indistinguishable from the case where ._Z3Foo as a whole is demangled (to .Foo). Instead, this bit of the bit of the check should be modified such that the new code can't demangle it still, but if instead the dot was treated as delimiter, it could be. For example, you could have two valid name manglings separated by a single ".", e.g. _Z3Foo._Z3Bar. If the dot were a delimiter, that would result in an output of Foo.Bar, but since it isn't, the original input should be emitted to the output (i.e. _Z3Foo._Z3Bar). I hope that makes sense! |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
77 | Nit: "Do not consider" Also applies to the patch title. |
I think it is difficult to llvm::demangle for the llvm-cxxfilt , since there is option `--strip-underscore` , please see the comment in https://reviews.llvm.org/D110664#inline-1063385 , the llvm-nm, llvm-readelf, llvm-objdump can share the same llvm::demangle()
std::string llvm::demangle(std::string_view MangledName) {
std::string Result; if (nonMicrosoftDemangle(MangledName, Result)) return Result; if (starts_with(MangledName, '_') && nonMicrosoftDemangle(MangledName.substr(1), Result)) return Result; if (char *Demangled = microsoftDemangle(MangledName, nullptr, nullptr)) { Result = Demangled; std::free(Demangled); } else { Result = MangledName; } return Result;
}
llvm/test/tools/llvm-cxxfilt/delimiters.test | ||
---|---|---|
36 ↗ | (On Diff #518451) |
._Z3Foo is a symbol name in AIX , but the ._Z3Foo is not a mangle name ,only _Z3Foo is a mangle name, we put the dot in front of the _Z3Foo to generate the entry label symbol of function in AIX.
according to the https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling , the dot maybe part of symbol (but in the middle of the mangled name, not the front of the mangle name). So we have to do some special for the front dot of symbol name for AIX. |
- there is option --strip-underscore in the llvm-cxxfilt, it is difficult to use std::string llvm::demangle(std::string_view MangledName) for the llvm-cxxfilt , please see the https://reviews.llvm.org/D110664#inline-1063385
- even I add a new parameter to function std::string llvm::demangle(std::string_view MangledName) as std::string llvm::demangle(std::string_view MangledName,bool StripUnderscore=true)
But the function std::string llvm::demangle() has the functionailty microsoftDemangle(MangledName, nullptr, nullptr) which do not need by the static std::string demangle(const std::string &Mangled) of llvm-cxxfile.cpp
- In the static std::string demangle(const std::string &Mangled) of llvm-cxxfile.cpp ,if demangle do not success, it still to do
if nonMicrosoftDemangle not successful, it will continue to do
if (Types) Undecorated = itaniumDemangle(DecoratedStr); if (!Undecorated && starts_with(DecoratedStr, "__imp_")) { Prefix = "import thunk for "; Undecorated = itaniumDemangle(DecoratedStr.substr(6)); }
but std::string llvm::demangle(std::string_view MangledName) do not return a bool to indicated demangle is successful or not.
- llvm-nm, llvm-readelf, llvm-objdump can use the same std::string llvm::demangle(std::string_view MangledName) , I will create as seperate NFC patch for the llvm-nm, llvm-readelf, llvm-objdump. (https://reviews.llvm.org/D159539).
Could you not just modify nonMicrosoftDemangle to handle the dot prefix? That would mean this works for all tools, including llvm-cxxfilt, rather than needing this and the separate D159539 patch. I'm assuming that a dot prefix in front of a Microsoft mangled name doesn't make sense...
llvm/test/tools/llvm-cxxfilt/delimiters.test | ||
---|---|---|
36 ↗ | (On Diff #518451) |
You're arguing about semantics here. As far as I'm concerned, the dot is part of the mangled name. The demangling code removes it, demangles the rest of the name, and then readds it, but it's still part of the mangled name (just not part of the Itanium mangling). But regardless of the exact terminology, it's clear you didn't understand my comment. We both agree that the dot is part of the name (whether it's part of the mangled name or not is irrelevant in this context). What's important to this test is that it isn't a delimiter (i.e. a character that separates a string into two separate names) and the rest of my comment and explanation still stands: your change in behaviour has rendered this part of the test as no longer testing what it wanted to test and therefore the test needs updating as I already explained (and you need proper testing of the dot demangling in a different test, which I see you've added). |
llvm/test/tools/llvm-cxxfilt/prefix-dot.test | ||
2 | I suggest we use the phrase "dot prefix" rather than "prefix dot" in the test name and comment as it's more natural. Same goes in other places in this patch. | |
4–5 | I don't think you need this many strings to show the behaviour. One case where it can be successfully demangled and one case where it cannot be demangled I think should be sufficient. |
as I mention before, If I modify in `nonMicrosoftDemangle' as
bool llvm::nonMicrosoftDemangle(const char *MangledName, std::string &Result) { char *Demangled = nullptr; // Not consider the prefix dot as part of the demangled symbol name. if (MangledName[0] == '.') { ++MangledName; Result = "."; } if (isItaniumEncoding(MangledName)) Demangled = itaniumDemangle(MangledName, nullptr, nullptr, nullptr); else if (isRustEncoding(MangledName)) Demangled = rustDemangle(MangledName); else if (isDLangEncoding(MangledName)) Demangled = dlangDemangle(MangledName); if (!Demangled) return false; Result += Demangled; std::free(Demangled); return true;
The llvm-nm or llvm-cxxfilt for machO will demangle "_._Z3f.0v" as ".f.0()"
but
[zhijian@krypton src]$ /opt/at15.0/bin/c++filt _._Z3f.0v
_._Z3f.0v
I'm assuming that a dot prefix in front of a Microsoft mangled name doesn't make sense...
in llvm-cxxfilt.cpp, the static std::string demangle(const std::string &Mangled) do not invoke microsoftDemangle
Honestly, I don't think I have an issue with this behaviour. --strip-underscore should literally just drop the leading underscore before demangling the rest of the string, and therefore the same should happen for tools where the underscore stripping behaviour is implicit. Whilst I agree there's no use-case for extra leading underscores (a Mach-O thing) and leading dots (XCOFF), the interaction of the option just seems natural to me. However, I don't really care about it, so if it's easier to do it the other way (i.e. only supporting an extra leading undescore, or a dot), I'm not going to fight about it.
Aside: the example you provided with GNU c++filt is not applicable, because you didn't specify the --strip-underscore option.
llvm/lib/Demangle/Demangle.cpp | ||
---|---|---|
27 ↗ | (On Diff #557248) | Please mark this false with a /*StripDot=*/ label |
52 ↗ | (On Diff #557248) | |
llvm/test/tools/llvm-cxxfilt/delimiters.test | ||
36 ↗ | (On Diff #518451) | This test still needs updating. |
llvm/test/tools/llvm-cxxfilt/dot-prefix.test | ||
2 ↗ | (On Diff #557248) | |
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp | ||
75 | As we're not actually stripping the dot (it's restored at the end), I recommend naming this CanHaveLeadingDot or something to that effect. Same goes in the nonMicrosoftDemangle function. |
Thanks for the update. Doing the following looks good to me.
bool CanHaveLeadingDot = true; if (StripUnderscore) if (DecoratedStr[0].consume_front("_")) // simplified CanHaveLeadingDot = false; std::string Result; if (nonMicrosoftDemangle(DecoratedStr, Result, CanHaveLeadingDot))
DecoratedStr is object of class std::string_view, it do not have API consume_front https://en.cppreference.com/w/cpp/string/basic_string_view
One remaining comment, then looks good to me
For reference, somebody else (possibly @shchenz?) put up https://github.com/llvm/llvm-project/pull/67389. They appear to be working on xcoff too, so it may be worth liaising with them/getting them involved with other reviews too.
llvm/test/tools/llvm-cxxfilt/delimiters.test | ||
---|---|---|
35 ↗ | (On Diff #557337) | As per my previous comments, I think I'd prefer this to be _Z3Foo._Z3Bar, in keeping with the existing patterns above, which should then be identical in the output (i.e. _Z3Foo._Z3Bar). |
For reference, somebody else (possibly @shchenz?) put up https://github.com/llvm/llvm-project/pull/67389. They appear to be working on xcoff too, so it may be worth liaising with them/getting them involved with other reviews too.
Thanks @jhenderson. The implementation LGTM.
llvm/test/tools/llvm-cxxfilt/delimiters.test | ||
---|---|---|
35 ↗ | (On Diff #557337) | without the patch or with the patch llvm-cxxfilt has the same behavior as bash> llvm-cxxfilt _Z3Foo._Z3Bar Foo (._Z3Bar) so I do not use the _Z3Foo._Z3Bar as your suggestion. |
I suggest we use the phrase "dot prefix" rather than "prefix dot" in the test name and comment as it's more natural. Same goes in other places in this patch.