This is an archive of the discontinued LLVM Phabricator instance.

[llvm-cxxfilt] Support Microsoft demangling format
AcceptedPublic

Authored by thieta on Nov 23 2022, 11:08 AM.

Details

Summary

This refactors the shared demangle() api to handle formats
and then expose that directly in llvm-cxxfilt.

New code should probably use the demangle() function that specifies
what format should be used.

Diff Detail

Event Timeline

thieta created this revision.Nov 23 2022, 11:08 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: hiraditya. · View Herald Transcript
thieta requested review of this revision.Nov 23 2022, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 11:08 AM

New code should probably use the demangle() function that specifies what format should be used.

Not sure I agree with this direction - I'd think that callers that know which format they want to use can/should use APIs that already implement only that format, rather than passing an enum and then switching over the enum to get back to the statically known kind?

So I'd have figured llvm-cxxfilt could have the switch-over-mangling-kind, but the lower level APIs in libDemangle wouldn't have an enum-based API, instead having dedicated functions for different demanglings. (though, arguably, the presence of demangleNonWindows goes against that design/direction)

Depends if most callers are likely to know their mangling or not? Got a sense of that, or what your plans are for this new API, beyond llvm-cxxfilt?

llvm/include/llvm/Demangle/Demangle.h
36

This seems weird to have an entry in this list that is a grouping of other entries in the list?

New code should probably use the demangle() function that specifies what format should be used.

Not sure I agree with this direction - I'd think that callers that know which format they want to use can/should use APIs that already implement only that format, rather than passing an enum and then switching over the enum to get back to the statically known kind?
So I'd have figured llvm-cxxfilt could have the switch-over-mangling-kind, but the lower level APIs in libDemangle wouldn't have an enum-based API, instead having dedicated functions for different demanglings. (though, arguably, the presence of demangleNonWindows goes against that design/direction)

I guess I can see your point. I think the problem I was running into when trying to get cxxfilt to demangle Microsoft symbols was that it was most of the API's where behaving differently and I was not able to use the common function since I was not able to limit the demanglers used if you wanted to specify the format. My instinct was to make the shared API easier and more flexible, but maybe this code should just live in cxxfilt tool instead.

Depends if most callers are likely to know their mangling or not? Got a sense of that, or what your plans are for this new API, beyond llvm-cxxfilt?

I looked around at other uses for demangle() and the specific demangle functions:

  • llvm::demangle() is used in CodeGenAction, MarkupFilter, ASAN and llvm-cxxfilt, where only cxxfilt seems to care about the demangling format.
  • itaniumDemangle() is used all over the source code
  • dlang and rust demangle functions are only called by cxxfilt and lldb.
  • microsoftDemangle() is used in cxxfilt and undname (which is cxxfilt for windows).
  • nonMicrosoftDemangle() is used in cxxfilt, llvm-nm and symbolize

With this information - I think I am in your camp here, better to move the format enum switching to cxxfilt. We can always revisit and rework the common API if more people have a use for it later.

thieta updated this revision to Diff 477693.Nov 24 2022, 12:17 AM

Move enum switching logic to llvm-cxxfilt after review comments.

thieta added inline comments.Nov 24 2022, 12:25 AM
llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
80

I am not super thrilled about copying the isXXXEncoding functions from Demangle.cpp - but they where static in there and not sure we want to expose those API's since they don't seem 100% complete. But let me know what you think.

jhenderson added inline comments.Nov 25 2022, 12:35 AM
llvm/test/tools/llvm-cxxfilt/microsoft-format.test
2–4

Nit: these should all end in full stops like code comments.

llvm/tools/llvm-cxxfilt/Opts.td
8

Seems like there's a lot of unrelated noise in this file?

22–28

Docs should be updated with the new --format values.

llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
53

enum class maybe to avoid polluting the global namespace?

54–59
80

I'm not sure why not being 100% complete translates into "don't expose them, duplicate them". Surely that means when the remaining parts are added, one or other of these locations will likely get missed and we'll end up with a mismatch?

92
93
96–97

It would probably make more sense for MangledName to be const char * as the input parameter.

126–131

I'm concerned that if we add other encodings, this list will easily rot, so that auto doesn't detect it. What was wrong with the nonMicrosoftDemangle from before?

132

non_microsoft is out of date?

(Also missing trailing full stop).

139–141

rather than duplicating the is* functions, what if the demangle functions returned some failure that indicated "this isn't formatted for this mangling scheme" (sort of "pass, I can't demangle this, but I'm not saying it's incorrect, it's just not suitable for this scheme/algorithm/function")?

thieta updated this revision to Diff 478568.Nov 29 2022, 6:40 AM
thieta marked 9 inline comments as done.

Several issues addressed in review.

thieta added inline comments.Nov 29 2022, 6:51 AM
llvm/tools/llvm-cxxfilt/Opts.td
8

Yes - I ran clang-format on it. But I should have done it as a NFC instead.

Reverted the clang-format stuff.

llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
80

Exposed them for now.

126–131

Fair point. I moved this function from Demangle.cpp when it we didn't want to change the API there so I guess it just followed along. I changed it to use nonMicrosoftDemangle now.

I still think it would be nice if the API of Demangle.cpp was a bit more coherent, but that will be for another day.

jhenderson accepted this revision.Nov 29 2022, 6:51 AM

No more comments from me, but please wait for @dblaikie to be happy.

llvm/docs/CommandGuide/llvm-cxxfilt.rst
45

Nit

This revision is now accepted and ready to land.Nov 29 2022, 6:51 AM
jhenderson requested changes to this revision.Nov 29 2022, 6:52 AM

Actually, don't you need tests to show that the new --format options work as expected?

This revision now requires changes to proceed.Nov 29 2022, 6:52 AM

Actually, don't you need tests to show that the new --format options work as expected?

You mean testing rust, dlang and itanium explicitly? I can definitely add that.

Actually, don't you need tests to show that the new --format options work as expected?

You mean testing rust, dlang and itanium explicitly? I can definitely add that.

Yep, that's what I meant (I think we have other testing that already tests that we can demangle these name types, so it should be fairly simple to ensure the option values are acceptable too).

thieta updated this revision to Diff 478585.Nov 29 2022, 7:14 AM

Added format testing

thieta updated this revision to Diff 478586.Nov 29 2022, 7:15 AM

Comment spelling and nit fix

jhenderson accepted this revision.EditedNov 29 2022, 7:16 AM

Looks good. Please wait for @dblaikie.

This revision is now accepted and ready to land.Nov 29 2022, 7:16 AM
thieta marked an inline comment as done.Nov 29 2022, 7:16 AM
thieta updated this revision to Diff 478607.Nov 29 2022, 8:04 AM

Added release note.

rather than duplicating the is* functions, what if the demangle functions returned some failure that indicated "this isn't formatted for this mangling scheme" (sort of "pass, I can't demangle this, but I'm not saying it's incorrect, it's just not suitable for this scheme/algorithm/function")?

Ping on this ^?

Ping on this ^?

Ah sorry - I thought I replied to this, but I just replied in my head.

Yes, I like this idea, but it does feel like we are getting into the territory where we have to refactor the API to demangle since the demangle functions have different return values and handling.

I am happy to do this, but I am not sure it's part of this diff. IMHO it's probably better to land this and then refactor the demangle API in its own diff.

Ping on this ^?

Ah sorry - I thought I replied to this, but I just replied in my head.

Yes, I like this idea, but it does feel like we are getting into the territory where we have to refactor the API to demangle since the demangle functions have different return values and handling.

I am happy to do this, but I am not sure it's part of this diff. IMHO it's probably better to land this and then refactor the demangle API in its own diff.

I'd rather avoid the technical debt of duplicating the is* functions only to figure out how to remove them later - the refactoring could go in before this patch to make the APIs amenable to this sort of usage.

llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp
120

Adding recursion in here is a bit quirky (& calling this generic function only to ask for one particular mangling, when there's direct functions for specific manglings) - could this be another call to microsoftDemangle directly here?

Ping on this ^?

Ah sorry - I thought I replied to this, but I just replied in my head.

Yes, I like this idea, but it does feel like we are getting into the territory where we have to refactor the API to demangle since the demangle functions have different return values and handling.

I am happy to do this, but I am not sure it's part of this diff. IMHO it's probably better to land this and then refactor the demangle API in its own diff.

I'd rather avoid the technical debt of duplicating the is* functions only to figure out how to remove them later - the refactoring could go in before this patch to make the APIs amenable to this sort of usage.

Alright, I am fine with that. What approach would you prefer:

doing something in Demangle.cpp that wraps the current demangle functions.

Changing the different demangle functions to all use a similar api?

I am thinking we could make the functions return an optional and all take a const char* as input. Then we can return a false optional if it's not the right format or if it fails. Or we make it possible to return multiple errors so we know if it failed because of the format or something else (not sure this is that applicable since there can't be that many failure states). Any other thoughts?

I think the basic idea would be to maybe leave the existing xxxDemangle functions in place, as wrappers around more verbose functions with more precise error handling. (specifically for itaniumDemangle - it'd probably be good for the underlying function to not support fallback to type demangling, it'll be easier to fail out from there & the itaniumDemangle could rall in again and ask specifically for type demangling (there's at least one caller that's doing this already anyway))

So I guess, for instance, llvm::itaniumDemangle calls, say, llvm::itaniumDemangleWithError that returns Expected<char*> - then llvm::itaniumDemangle maps failures to nullptr (well, does the fallback to type name demangling if possible, then fails) as before, but llvm-cxxfilt can inspect the error and check if it's some particular custom Error introduced to communicate unrecognized name. Then fallback to the next one, and so on and so forth.

Does that sound feasible? @jhenderson does that sound reasonable? (got other ideas about how this should all be factored?)

To a comment earlier about duplicating/exposing the is**Encoding:

not sure we want to expose those API's since they don't seem 100% complete.

What did you mean by that?

(I personally like the idea of not having separate is* functions and then do* functions when the do* has to do the same work anyway & could report is/isn't in the process)

Does that sound feasible? @jhenderson does that sound reasonable? (got other ideas about how this should all be factored?)

Yeah, that all sounds sensible to me. I think I recall it bugging me that there's no easy way of telling whether demangling failed because it was e.g. Itanium (based on the leading characters), but not a valid such encoding, so having the ability to report an Error to the caller to help with this would certainly be good. The only issue with this approach is that I recall at one point that the demangling library is I recall something to do with the layering of the demangler library is such that it is depended upon by the Support library, and not the other way around. Alternatively, it might have been an issue with the fact that (in theory) Demangle.h is a copy of somewhere else, IIRC (there's a comment to this effect near the top of the file). I don't know if that's still an issue though.

Worst case, we could return some sort of enum error code or provide a callback function of some kind instead in place of llvm::Error.

Does that sound feasible? @jhenderson does that sound reasonable? (got other ideas about how this should all be factored?)

Yeah, that all sounds sensible to me. I think I recall it bugging me that there's no easy way of telling whether demangling failed because it was e.g. Itanium (based on the leading characters), but not a valid such encoding, so having the ability to report an Error to the caller to help with this would certainly be good. The only issue with this approach is that I recall at one point that the demangling library is I recall something to do with the layering of the demangler library is such that it is depended upon by the Support library, and not the other way around. Alternatively, it might have been an issue with the fact that (in theory) Demangle.h is a copy of somewhere else, IIRC (there's a comment to this effect near the top of the file). I don't know if that's still an issue though.

Worst case, we could return some sort of enum error code or provide a callback function of some kind instead in place of llvm::Error.

Ooooh, right. Yeah, that sounds like it's the case to me. So... so, std::error_code? (I thought maybe we had an LLVM error code-ish thing too, but I don't recall, maybe it's been fully replaced by llvm::Error, if it did exist) Pick a blessed value to communicate "this input isn't handled here, maybe someone else knows what it is" as distinct from "this input is mine, but it's broken".