This is an archive of the discontinued LLVM Phabricator instance.

Initial commit for the llvm-dsymutil tool.
ClosedPublic

Authored by friss on Nov 12 2014, 5:17 PM.

Details

Summary

This initial version is only able to parse debug maps contained
in mach-o binaries.

Apart from the general review, I'm especially interested in comments
around the build system plumbing.

Nick, the very simple DebugMap object that is present is this version
is meant to be filled with information during the binary link and
then passed to a DwarfLinker object as a separate step. Does that
look like the best flow for you, or would you prefer something more
'incremental'?

The big question around this tool is how the in-tree testing will be
performed. This revision doesn't include a test, but there will be
one in the initial commit. I plan to commit very simple small binaries
for the and then to extend that binary pool when adding new features.
Eevn if I plan to reuse the binaries as much as I can for the various
development steps, I fear that this won't scale in the long run. I'm
open to ideas. Is there some policy about storing binaries in the LLVM
repo?

The current approach has several known shortcomings even for the simple
task it performs. For example, it doesn't handle static libraries, and
it keeps all the object files open at the same time. This initial commit
just provides a base for the incremental updates that will improve that
situation.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kledzik added inline comments.Nov 12 2014, 6:18 PM
tools/dsymutil/DebugMap.h
40–42

This could be replaced by a begin()/end() which return an iterator that dereferences to a "const DebugMapObject &", to be nicer for clients.

49

It looks like "Filename" is the same as ObjectFile getFileName().

The existing debug map tracks the source (ex: /path/foo.c) path as well as the object (ex: /path/foo.o) path. Does this DebugMap need both? or just the object path?

59–60

Are these the address in the .o file and address in the final binary? Can the names be clearer.

What is the rule for tentative definitions which don't have an address in the .o file? Maybe OrigAddress could be an llvm::Optional<>?

62

What is "Used"? When would it be false?

71

Can this be more specific (e.g. getSourcePath() or getObjectFilePath()).

friss added a comment.Nov 12 2014, 7:17 PM

Thanks for the initial review! Some explanations and further questions inline:

tools/dsymutil/DebugMap.h
49

I think it only needs the object path, I haven't seen the source entries of the debug map being used except for detecting the end-of-scope of the current object file. The source file doesn't make much sense anyway as IIRC it is the parent of the object file scope and in an LTO scenario you end up with only one source file.

50

I'll add that in a followup commit

53–57

I'm not sure I'll use that actual type, but it seemed like some handy information to have (even if only for sanity checking that I'm not patch code addresses with data information). I actually use the Unknown type as an indicator that the Symbol hasn't been inserted in the map yet and I assert if a client tries to add 2 symbols with the same name in the same object file.

59–60

I agree that the names could be clearer.

What do you mean with "tentative definitions"? The OrigAddress gets filled with the value from the object file symbol table.

62

As you point out bellow, I add every symbol from the object file symbol table to the list of symbols. Then the ones that are present in the binary's debug map are marked Used.

I don't think libObject has a fast name -> symbol lookup, but I could have missed it. If such a feature exists, then I can populate the map with only symbols from the debug map and drop the used flag. An alternative would be to use a temporary StringMap with the whole object file symbol table in it, but this seemed a bit wasteful as I expect most of the object file symbol table to end up in the debug map anyway.

73–76

Of course adding a addSymbol(name, objectAddress, binaryAddress) is fine. The current API requires the OwningBinary to be created by the client, and that is bad. As I pointed out in the review message, we need to move away from that scheme because it might simply not be possible to hold all the objects in memory together. I think the final API will just use strings as object file identifiers (allowing the lib.a(objectfile) syntax for archives), and some smart OwningBinary caching will be performed in the DebugMap object or in the DwarfLInker itself.

81

I commented on that above. The map will be populated, but with unused symbols which won't be retained if they are not marked used by a relocateSymbol(). Or am I missing your point about the lld usecase?

friss updated this revision to Diff 16252.Nov 14 2014, 3:31 PM
friss edited edge metadata.

Simplified the DebugMap a lot. The way its used in the debug map
parser should be closer to what lld would like to do. i

Re: testing, I would prefer to include binaries because I like the
cross-platform testing even if the tool doesn't get used there. I
was going to include a test for the map parsing functionality and
some error handling in this review when I relized that ld64 would
embedded full absolute paths in the debug map. This will obviously
cause problems for testing in the testsuite. I'll defer that to
another update of this patch (I'll try to see if I can implement
and hack around --oso-prepend-path, and if not I'll switch to
compiling .ll files and testing on dwarwin systems only.).

A few drive-by comments.

tools/Makefile
36

I'd prefer binary name (llvm-dsymutil) to match the directory name. I'm not aware of existing policy, though.

tools/dsymutil/DebugMap.cpp
25

Can you just use StringMap::insert() and use its return value to check if the key is already there?

32

why not outs()? (here and below). Or, better, you can probably take

llvm::raw_ostream &OS

as an input parameter.

56

see note about llvm::raw_ostream above.

tools/dsymutil/DebugMap.h
62

llvm::DebugMap is too generic, and I can imagine it can potentially conflict with some other class. Consider introducing a namespace for dsymutil-specific classes.

tools/dsymutil/DwarfLinker.cpp
15

s/Filename/OutputFilename (to match ctor declaration).

tools/dsymutil/MachODebugMapParser.cpp
19

Can you make use of ObjectFile::createMachOObjectFile() here?

53

This line is not needed, you already call resetParserState() above.

tools/dsymutil/MachODebugMapParser.h
53 ↗(On Diff #16252)

newDebugMapObject() doesn't actually return anything, which is slightly confusing, especially when you call

return newDebugMapObject()

below.

friss added a comment.Nov 19 2014, 6:21 PM

Thanks for taking the time to review that! I'll address most of your comments for the next update. Some comments inline:

tools/Makefile
36

This is in fact on purpose. The binary is currently called llvm-dsymutil so that it doesn't interfere with the system dsymutil. When the work will be complete enough to actually replace the system dsymutil, I want to rename the binary to just dsymutil so that it takes precedence when clang invokes dsymutil. I preferred to name the directory after the expected end-result rather than renaming the directory later.

tools/dsymutil/DebugMap.cpp
25

I like the simplicity of operator[] more. I'm not sure that using insert() would simplify anything especially as its return value would be used only in Assert builds.

32

AFAIK, dump() without argument is more canonical in LLVM, and it isn't usually even compiled in in non-debug builds.
However, you are right that in this case as I use it to emit user-visible diagnostics, this convention doesn't apply.

tools/dsymutil/DebugMap.h
62

I can rename that to DSYMDebugMap, although I find that this ties it a bit too much to Darwin. I can't be sure, but I'd expect the code using the DebugMap (as opposed to the code producing it) to be fairly generic. Not that it matters for the current unique useless though.

tools/dsymutil/MachODebugMapParser.cpp
19

I think the quantity of code would be the same because to use it I have to handle the MemoryBuffer creation myself. I'm not sure which one is preferable.

tools/dsymutil/MachODebugMapParser.h
53 ↗(On Diff #16252)

Yeah the name dates back to a version where it would actually return the allocated object. I'll change that to switchToNewDebugMapObject(Name). Do you have an issue with the return voidReturningFunc(); idiom? I kinda like it when it allows to do early exits without having to brace the if body.

samsonov added inline comments.Nov 19 2014, 9:47 PM
tools/Makefile
36

Yep, makes sense.

tools/dsymutil/DebugMap.cpp
25

You can make addSymbol() return true iff the symbol wasn't there already, and assert (or print meaningful warning/error) in the caller.

32

LLVM classes often has dump() and print(llvm::raw_ostream), with former simply calling the latter.

tools/dsymutil/MachODebugMapParser.h
53 ↗(On Diff #16252)

I'm neutral to this idiom, as long as function name doesn't imply that it's supposed to return the object :)

friss updated this revision to Diff 16904.Dec 3 2014, 5:50 PM
  • Added tests for the debug map parsing
  • Added the -oso-prepend-path dsymutil option (used for tests)
  • Addressed Alexey's feedback

If there are no further comments, I'll push that in a couple of days.

friss added a comment.Dec 9 2014, 10:17 AM

ping!

I committed this briefly this morning considering that I could count your silence as approval, but Chandler told me not to do so. Sorry about that, I really didn't mean to force anything through.

My patch's short but intense in-tree live has produced one bot failure: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/19112
It seems the llvm-dsymutil tool hasn't been built at all there, thus I must be missing something in the build system hooks. I tried cmake and configure builds in Debug/Release before committing, so I'm not sure what I screwed up.

Thanks!

In D6242#14, @friss wrote:

ping!

Sorry about that, will take a look at the revised patch shortly.

I committed this briefly this morning considering that I could count your silence as approval, but Chandler told me not to do so. Sorry about that, I really didn't mean to force anything through.

My patch's short but intense in-tree live has produced one bot failure: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/19112
It seems the llvm-dsymutil tool hasn't been built at all there, thus I must be missing something in the build system hooks. I tried cmake and configure builds in Debug/Release before committing, so I'm not sure what I screwed up.

You should probably add "llvm-dsymutil" to test/lit.cfg, next to all the other llvm-XXX tools listed there.

Thanks!

samsonov added inline comments.Dec 9 2014, 2:05 PM
tools/dsymutil/DebugMap.cpp
28

IIRC this might not work on MSVC, need to double-check.

38

Looks like SymbolMapping is lightweight enough. This code can be faster if you just use

std::vector<std::pair<StringRef, SymbolMapping>> Entries;

fill it with single scan over Symbols, and then std::sort it and print it.

tools/dsymutil/DwarfLinker.h
35 ↗(On Diff #16904)

See the note for MachODebugMapParser class below - why not expose only a standalone function

bool linkDwarf(StringRef OutputFilename, const DebugMap &DM);

in a header?

tools/dsymutil/MachODebugMapParser.cpp
19

I think that using the more specific factory function would be more appropriate than calling createBinary(), and manually inspecting its output, making sure we return the same error code as Object library, etc.

53

Please use sys::path::append() here, so that you don't have to bother if oso-prepend-path should or should not end with a slash.

72
if (auto Error = ...)
  return Error;
161

Why do you initialize it lazily, instead of just calling loadMainBinarySymbols right after you have initialized MainOwningBinary?

182

Can/should we do smth. similar for loadCurrentObjectFileSymbols function?

tools/dsymutil/MachODebugMapParser.h
27 ↗(On Diff #16904)

Do you actually plan to move this object to LLVM library in future? For now, I don't see why it's convenient or desirable to expose this object in a header. You can't really do anything with it - just construct it, and then call method parse() once.

What is more, you probably don't want to have the instance of MainOwningBinary object lying around in memory after parse() returned. Can you simply expose a function

ErrorOr<std::unique_ptr<DebugMap>> parseDebugMap(StringRef BinaryPath, StringRef PathPrefix = "");

and call the whole MachODebugMapParser an "mach-specific implementation detail", hidden in the .cpp file?

34 ↗(On Diff #16904)

Is there a reason against making it a constructor argument (which can default to empty string)?

tools/dsymutil/dsymutil.cpp
20

Do you use it here?

32

Flag description could be helpful here.

66

If InputFile is stdin ("-"), this will be "-.dwarf"

friss added a comment.Dec 9 2014, 3:05 PM
In D6242#15, @samsonov wrote:
In D6242#14, @friss wrote:

ping!

Sorry about that, will take a look at the revised patch shortly.

No worries, I wasn't really expecting anyone to do further review :-)

I committed this briefly this morning considering that I could count your silence as approval, but Chandler told me not to do so. Sorry about that, I really didn't mean to force anything through.

My patch's short but intense in-tree live has produced one bot failure: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/19112
It seems the llvm-dsymutil tool hasn't been built at all there, thus I must be missing something in the build system hooks. I tried cmake and configure builds in Debug/Release before committing, so I'm not sure what I screwed up.

You should probably add "llvm-dsymutil" to test/lit.cfg, next to all the other llvm-XXX tools listed there.

I don't think this will solve it. If I'm looking at the right place, then this seems to just ensure that we call the tool with its build directory added. Looking at the build log I have the feeling that llvm-dsymutil wasn't built at all. I spotted that I used the 'dsymutil' name in LLVMBuild.txt, maybe that's it. I'm just surprised that it isn't break in any of my builds.

Thanks a bunch!

tools/dsymutil/DebugMap.cpp
28

You remember that you made me introduce this use of insert? :-) It'd be much appreciated if you have a way to verify that.

38

Why do you think it would be faster? locality?

tools/dsymutil/DwarfLinker.h
35 ↗(On Diff #16904)

That's actually a very good idea. In the code I developed since this was written, I struggled to keep the implementation details of DwarfLinker hidden, and that would make it trivial.

The choice is a bit more complicated than for MachODebugParser (which is really private) because we might want to expose the interface for lld's consumption at some point. And the free function approach might not be flexible enough depending on how much tunables I add in the linking phase.

I'll do it anyway, we can always revisit that later.

tools/dsymutil/MachODebugMapParser.cpp
53

will do

72

yep

161

Because it is not necessarily used. It is strictly needed only for common globals (this patch doesn't treat the commons differently from other globals, so the symbol table is read if you have any kind of global... which I guess you always have). Yeah, this optimization looks at least premature, if not totally useless. I'll drop it.

182

No because symbols from the object file are always useful. The debug map contains names associated with linked addresses (with the exception of common globals that need to be looked up in the binary), these linked addresses need to be correlated with the object file addresses that we always lookup in the object file's symbol table (and any defined symbol is of interest).

tools/dsymutil/MachODebugMapParser.h
27 ↗(On Diff #16904)

As I said above, good idea. MachODebugParser should never get public anyway it is truly an implementation detail of dsymutil. When lld will want to use dsymutil's code, it will build its DebugMap objects itself anyway.

34 ↗(On Diff #16904)

No real reason except that I prefer to use constructors for major properties and setters for optional/debug stuff. I can change that, it's not like the constructor will get confusing with one additional argument.

tools/dsymutil/dsymutil.cpp
20

llvm_shutdown_obj is defined there.

32

I have a description in my tree, must have missed merging that patch in the initial submission.

66

hehe, nice catch. I'm on the edge as to what to do with this: hardcoding the output file (a.out.dwarf?) or simply refusing to do that (and maybe allowing it later by adding a -o option).

friss added inline comments.Dec 9 2014, 3:16 PM
tools/dsymutil/MachODebugMapParser.cpp
19

The code I actually have in my tree moved to using createMachOObjectFile. I'll see how to change that in the initial version.

In D6242#17, @friss wrote:
In D6242#15, @samsonov wrote:
In D6242#14, @friss wrote:

ping!

Sorry about that, will take a look at the revised patch shortly.

No worries, I wasn't really expecting anyone to do further review :-)

I committed this briefly this morning considering that I could count your silence as approval, but Chandler told me not to do so. Sorry about that, I really didn't mean to force anything through.

My patch's short but intense in-tree live has produced one bot failure: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/19112
It seems the llvm-dsymutil tool hasn't been built at all there, thus I must be missing something in the build system hooks. I tried cmake and configure builds in Debug/Release before committing, so I'm not sure what I screwed up.

You should probably add "llvm-dsymutil" to test/lit.cfg, next to all the other llvm-XXX tools listed there.

I don't think this will solve it. If I'm looking at the right place, then this seems to just ensure that we call the tool with its build directory added. Looking at the build log I have the feeling that llvm-dsymutil wasn't built at all. I spotted that I used the 'dsymutil' name in LLVMBuild.txt, maybe that's it. I'm just surprised that it isn't break in any of my builds.

Ah, you should also add llvm-dsymutil to LLVM_TEST_DEPENDS in test/CMakeLists.txt.

Thanks a bunch!

tools/dsymutil/DebugMap.cpp
38

Yes. But as you're comparing StringRef's, there are still indirection, so maybe it's not a big deal.

tools/dsymutil/dsymutil.cpp
66

Pipe the output to stdout? :) a.out.dwarf looks reasonable as well.

friss added a comment.Dec 9 2014, 3:59 PM
In D6242#19, @samsonov wrote:
In D6242#17, @friss wrote:
In D6242#15, @samsonov wrote:
In D6242#14, @friss wrote:

ping!

Sorry about that, will take a look at the revised patch shortly.

No worries, I wasn't really expecting anyone to do further review :-)

I committed this briefly this morning considering that I could count your silence as approval, but Chandler told me not to do so. Sorry about that, I really didn't mean to force anything through.

My patch's short but intense in-tree live has produced one bot failure: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/19112
It seems the llvm-dsymutil tool hasn't been built at all there, thus I must be missing something in the build system hooks. I tried cmake and configure builds in Debug/Release before committing, so I'm not sure what I screwed up.

You should probably add "llvm-dsymutil" to test/lit.cfg, next to all the other llvm-XXX tools listed there.

I don't think this will solve it. If I'm looking at the right place, then this seems to just ensure that we call the tool with its build directory added. Looking at the build log I have the feeling that llvm-dsymutil wasn't built at all. I spotted that I used the 'dsymutil' name in LLVMBuild.txt, maybe that's it. I'm just surprised that it isn't break in any of my builds.

Ah, you should also add llvm-dsymutil to LLVM_TEST_DEPENDS in test/CMakeLists.txt.

That must be it!

tools/dsymutil/dsymutil.cpp
66

I considered that, of course, but a dsym bundle consists of a filesystem hierarchy and it is supposed to be the default output mode of dsymutil. I could fallback to a flat output file in that situation, though.

I'll just hardcode a.out for now.

friss updated this revision to Diff 17125.Dec 10 2014, 3:35 PM
  • Initial dsymutil tool commit.
  • Add llvm-dsymutil to the test config files.
  • Fix llvm-dsymtool name.
  • Use a standard constructor for SymbolMapping.
  • Change the default input file from '-' to a.out
  • Use less indirections in DebugMapObject::print()
  • Use ObjectFile::createMachOObjectFile instead of createBinary().
  • Do not repeat the llvm::cl namespace over and over.
  • Add a description to the oso-prepend-path option.
  • Move the high-level interface from classes to free functions.
  • Remove useless lazy loading.
  • Better error handling style.
  • Use sys::path::append()

The free functions in the llvm namespace look a bit strange. Do we have a policy about that? Should I put this into a dsymutil namespace (and thus maybe also the DebugMap class) ?

friss updated this revision to Diff 17127.Dec 10 2014, 3:45 PM
  • clang-format the initial submission.
friss added a comment.Dec 11 2014, 7:54 AM

I uploaded a ew version there, but the mail must have been lost in the phab update mail black hole.

http://reviews.llvm.org/D6242

samsonov accepted this revision.Dec 11 2014, 3:14 PM
samsonov added a reviewer: samsonov.

LGTM. I have a handful of comments, feel free to submit after addressing them.

tools/dsymutil/DebugMap.cpp
38

Can you write range-based for loop here instead?

42

You don't need a comparator here - all keys in StringMap are different, and std::pair<> objects are compared lexicographically by default, which is what you need.

tools/dsymutil/DebugMap.h
21

prepend with LLVM_TOOLS_ ?

62

consider wrapping it in one more namespace (like llvm::dsymuitl) (probably not a big deal, though).

tools/dsymutil/DwarfLinker.cpp
15

This class does nothing for now. You can just delete it from this patch, and add it when it will actually do some work.

tools/dsymutil/MachODebugMapParser.cpp
216

Shouldn't this be a local variable defined inside for-loop with no default value? (as it will be initialized by Sym.getSection()) ?

tools/dsymutil/dsymutil.cpp
34

"names of object files"?

tools/dsymutil/dsymutil.h
16

LLVM_TOOLS_DSYMUTIL_DSYMUTIL_H

25

Shouldn't you #include DebugMap.h here to use DebugMap in unique_ptr?

This revision is now accepted and ready to land.Dec 11 2014, 3:14 PM
friss added a comment.Dec 11 2014, 3:45 PM

Thanks! I'll address your last comments before commit.

Couple of answers below:

tools/dsymutil/DebugMap.cpp
42

I didn't know pairs provided lexicographical relational operators, thanks for the tip!

tools/dsymutil/MachODebugMapParser.cpp
216

section_iterator doesn't provide a default constructor, and I didn't want to all Binary.section_end() for every iteration. The default value is there only to be able to construct the variable, its value will be overwritten every time it will actually be used.

tools/dsymutil/dsymutil.cpp
34

What about "Specify a directory to prepend to the paths of object files.". I find 'name' a bit too vague (is it only the basename?).

tools/dsymutil/dsymutil.h
25

Good point, I suppose that this didn't show up because every user of dsymutil.h is also a DebugMap.h user, and 'DebugMap' comes before 'dsymutil' in a sorted include list.

friss added inline comments.Dec 11 2014, 4:34 PM
tools/dsymutil/DebugMap.cpp
42

Unfortunately, the default comparators require the member types to be comparable. I don't see a need to put an operator< in SymbolMapping, thus I'll leave the lambda comparator there.

friss closed this revision.Dec 16 2014, 2:04 PM

Landed in r224134