This refactors the shared demangle() api to handle formats
and then expose that directly in llvm-cxxfilt.
New code should probably use the demangle() function that specifies
what format should be used.
Differential D138595
[llvm-cxxfilt] Support Microsoft demangling format thieta on Nov 23 2022, 11:08 AM. Authored by
Details
This refactors the shared demangle() api to handle formats New code should probably use the demangle() function that specifies
Diff Detail
Event TimelineComment Actions
Not sure I agree with this direction - I'd think that callers that know which format they want to use can/should use APIs that already implement only that format, rather than passing an enum and then switching over the enum to get back to the statically known kind? So I'd have figured llvm-cxxfilt could have the switch-over-mangling-kind, but the lower level APIs in libDemangle wouldn't have an enum-based API, instead having dedicated functions for different demanglings. (though, arguably, the presence of demangleNonWindows goes against that design/direction) Depends if most callers are likely to know their mangling or not? Got a sense of that, or what your plans are for this new API, beyond llvm-cxxfilt?
Comment Actions I guess I can see your point. I think the problem I was running into when trying to get cxxfilt to demangle Microsoft symbols was that it was most of the API's where behaving differently and I was not able to use the common function since I was not able to limit the demanglers used if you wanted to specify the format. My instinct was to make the shared API easier and more flexible, but maybe this code should just live in cxxfilt tool instead.
I looked around at other uses for demangle() and the specific demangle functions:
With this information - I think I am in your camp here, better to move the format enum switching to cxxfilt. We can always revisit and rework the common API if more people have a use for it later.
Comment Actions rather than duplicating the is* functions, what if the demangle functions returned some failure that indicated "this isn't formatted for this mangling scheme" (sort of "pass, I can't demangle this, but I'm not saying it's incorrect, it's just not suitable for this scheme/algorithm/function")?
Comment Actions Actually, don't you need tests to show that the new --format options work as expected? Comment Actions Yep, that's what I meant (I think we have other testing that already tests that we can demangle these name types, so it should be fairly simple to ensure the option values are acceptable too). Comment Actions Ah sorry - I thought I replied to this, but I just replied in my head. Yes, I like this idea, but it does feel like we are getting into the territory where we have to refactor the API to demangle since the demangle functions have different return values and handling. I am happy to do this, but I am not sure it's part of this diff. IMHO it's probably better to land this and then refactor the demangle API in its own diff. Comment Actions I'd rather avoid the technical debt of duplicating the is* functions only to figure out how to remove them later - the refactoring could go in before this patch to make the APIs amenable to this sort of usage.
Comment Actions Alright, I am fine with that. What approach would you prefer: doing something in Demangle.cpp that wraps the current demangle functions. Changing the different demangle functions to all use a similar api? I am thinking we could make the functions return an optional and all take a const char* as input. Then we can return a false optional if it's not the right format or if it fails. Or we make it possible to return multiple errors so we know if it failed because of the format or something else (not sure this is that applicable since there can't be that many failure states). Any other thoughts? Comment Actions I think the basic idea would be to maybe leave the existing xxxDemangle functions in place, as wrappers around more verbose functions with more precise error handling. (specifically for itaniumDemangle - it'd probably be good for the underlying function to not support fallback to type demangling, it'll be easier to fail out from there & the itaniumDemangle could rall in again and ask specifically for type demangling (there's at least one caller that's doing this already anyway)) So I guess, for instance, llvm::itaniumDemangle calls, say, llvm::itaniumDemangleWithError that returns Expected<char*> - then llvm::itaniumDemangle maps failures to nullptr (well, does the fallback to type name demangling if possible, then fails) as before, but llvm-cxxfilt can inspect the error and check if it's some particular custom Error introduced to communicate unrecognized name. Then fallback to the next one, and so on and so forth. Does that sound feasible? @jhenderson does that sound reasonable? (got other ideas about how this should all be factored?) To a comment earlier about duplicating/exposing the is**Encoding:
What did you mean by that? (I personally like the idea of not having separate is* functions and then do* functions when the do* has to do the same work anyway & could report is/isn't in the process) Comment Actions Yeah, that all sounds sensible to me. I think I recall it bugging me that there's no easy way of telling whether demangling failed because it was e.g. Itanium (based on the leading characters), but not a valid such encoding, so having the ability to report an Error to the caller to help with this would certainly be good. The only issue with this approach is that I recall at one point that the demangling library is I recall something to do with the layering of the demangler library is such that it is depended upon by the Support library, and not the other way around. Alternatively, it might have been an issue with the fact that (in theory) Demangle.h is a copy of somewhere else, IIRC (there's a comment to this effect near the top of the file). I don't know if that's still an issue though. Worst case, we could return some sort of enum error code or provide a callback function of some kind instead in place of llvm::Error. Comment Actions Ooooh, right. Yeah, that sounds like it's the case to me. So... so, std::error_code? (I thought maybe we had an LLVM error code-ish thing too, but I don't recall, maybe it's been fully replaced by llvm::Error, if it did exist) Pick a blessed value to communicate "this input isn't handled here, maybe someone else knows what it is" as distinct from "this input is mine, but it's broken". |
Nit