This is an archive of the discontinued LLVM Phabricator instance.

Move llvm-objdump demangle function into demangler library
ClosedPublic

Authored by jhenderson on Jan 15 2019, 7:41 AM.

Details

Summary

I'm currently working on https://bugs.llvm.org/show_bug.cgi?id=40054. In order to avoid duplication of the demangling code in llvm-objdump, this change moves that code into the demangler library ready for reuse in llvm-readobj. The new function in the library provides memory management via std::string, and also calls both the Itanium and Microsoft demanglers, as per the old behaviour. One minor change is that the microsoftDemangle function will always be called if the input string does not start with "_Z", since the pre-existing "?" check is unnecessary (the microsoftDemangle function does this same check).

One aspect I'm uncertain on is the usage of std::string in Demangle.h. If this is an issue, I would welcome suggestions for an alternative location for this function.

Diff Detail

Repository
rL LLVM

Event Timeline

jhenderson created this revision.Jan 15 2019, 7:41 AM

Hi, thanks for working on this! This seems like a useful function.

include/llvm/Demangle/Demangle.h
42

MangledName

lib/Demangle/Demangle.cpp
19–20

nit: Maybe just explicitly write this as std::string llvm::demangle(...) { instead of the namespace?

24

You should probably also send strings that start with "___Z" to the itanium demangler.

27

nit: may as well just pass in nullptr instead of Demangled.

tools/llvm-objdump/llvm-objdump.cpp
532

StringRef::data doesn't guarantee a null-terminator. This may not be a problem in practice here, but to be safe you should call demangle(Name.str().c_str()). Alternatively, making demangle take a std::string probably wouldn't be the end of the world, since it isn't the "high performance" demangling API anyways since it copies the result buffer (which is likely much larger than then the input).

2086

(ditto)

grimar added inline comments.Jan 16 2019, 1:11 AM
lib/Demangle/Demangle.cpp
21

This line does not seem to be clang-formatted, but I also think it is not useful.
In case MangledName is null, you'll have a crash anyways, the crash is an obvious assert by itself.
From what I saw in LLVM's code, assert is more commonly used to check the more complex logic flows, so I would remove it.

24

This patch just moves the existent code, I would not add any additional functionality in it.

27

The same.

tools/llvm-objdump/llvm-objdump.cpp
531

Not sure what is the point to have this assert either. I would remove it. If you want to add an additional check like this, it should be something that would report an error in release build then I think.

jhenderson marked 7 inline comments as done.

Address review comments:

  • Switch to .str().c_str() to ensure null-termination instead of assert
  • clang-format
  • Fix doc comment
  • Pass in nullptr into demangle functions explicitly
  • change namespacing style
jhenderson added inline comments.
lib/Demangle/Demangle.cpp
21

This protects us against undefined behaviour in strncmp below. It may not be a crash if nullptr is passed in.

24

I agree, though I don't mind it being in a follow-up change, if desired. The main advantage of this change as is, is that it is a pure refactor (there might be some minor differences in performance, but the end behaviour should be identical to what llvm-objdump previously did).

27

In this case, passing in nullptr is identical to passing in Demangled. It's safe to change.

tools/llvm-objdump/llvm-objdump.cpp
532

In this particular case, that was the purpose of the earlier assert. However, in the other two places, it's not straightforward to construct an assert that checks: although we know that the string tables in practice are null-terminated currently (and checked as such), this is not easily proven in this area.

I think being safe is important, so I'll change to the .str().c_str() suggestion, and if there's a clear performance issue, it can be revisited later.

grimar added inline comments.Jan 16 2019, 3:19 AM
lib/Demangle/Demangle.cpp
21

But this does not protect us from anything in release builds.
Should it be

if (!MangledName)
  llvm_unreachable(...);

then?

grimar added inline comments.Jan 16 2019, 3:37 AM
lib/Demangle/Demangle.cpp
21

Seems you can just change the argument to const std::string&?

But this does not protect us from anything in release builds.

Isn't that the whole point of asserts? "Catch the things that are not supposed to happen in debug builds so that you don't have to pay the cost of checking them in release builds."

jhenderson marked an inline comment as done.Jan 16 2019, 3:55 AM

But this does not protect us from anything in release builds.

Isn't that the whole point of asserts? "Catch the things that are not supposed to happen in debug builds so that you don't have to pay the cost of checking them in release builds."

Exactly. The code isn't supposed to get into this state in the first place (it would be a programmer error if it did), so we can assume it isn't in our release builds.

lib/Demangle/Demangle.cpp
21

I'd rather not do that, if I can avoid it. If the user already knows their string is a null-terminated string (e.g. because it's a string literal), they can just pass in the relevant pointer, without being forced to incur the cost of copying to a std::string. In other situations, I'd use a StringRef, to avoid the copy, but that would introduce a dependency on the Support library, which we can't do here.

OK, I have no more comments.

erik.pilkington accepted this revision.Jan 16 2019, 10:23 AM

LGTM, thanks!

lib/Demangle/Demangle.cpp
24

Sure, doing this in a follow-up is fine.

This revision is now accepted and ready to land.Jan 16 2019, 10:23 AM
lib/Demangle/Demangle.cpp
21

For the record, I would also prefer doing const string & parameter. The cost of copying the demangled string is already going to be much more than the input string, so I don't think the performance argument is that great. Your call though.

Use const std::string &. This allowed removal of the assert, the previously needed cassert and cstring headers (I'm using std::string::compare instead of strcmp), and the .str().c_str() calls. Also rebased.

grimar accepted this revision.Jan 17 2019, 3:00 AM

LGTM.

The build of llvm-undname failed with my latest version, due to an ambiguous function name, so I've just renamed the function in that tool from demangle to msDemangle.

grimar accepted this revision.Jan 17 2019, 6:45 AM

This is probably fine.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Jan 17 2019, 12:23 PM
thakis added inline comments.
llvm/trunk/lib/Demangle/Demangle.cpp
18 ↗(On Diff #182285)

FWIW Itanium symbols can have up to 4 leading underscores. For example, ___Z10blocksNRVOv_block_invoke is "___Z10blocksNRVOv_block_invoke" according to lib/Demangle.

llvm/trunk/lib/Demangle/Demangle.cpp
18 ↗(On Diff #182285)

This is handled in a follow-up: https://reviews.llvm.org/D56855

thakis added inline comments.Jan 17 2019, 12:26 PM
llvm/trunk/lib/Demangle/Demangle.cpp
18 ↗(On Diff #182285)

And your _Z3fooi test is __Z3fooi on darwin, which you probably want to support as well.

llvm/trunk/lib/Demangle/Demangle.cpp
18 ↗(On Diff #182285)

Yeah, we probably want to just do the right thing there. That requires changes to ManglingParser::parse() first though.

llvm/trunk/lib/Demangle/Demangle.cpp
18 ↗(On Diff #182285)

Fixed this in r351481.