This is an archive of the discontinued LLVM Phabricator instance.

Extend 'target symbols add' to set symbol file for a given module
ClosedPublic

Authored by eugene on Jul 18 2017, 9:25 PM.

Details

Summary

Now -shlib flag can be provided alongside with names of symbols files:

(lldb) target symbols add --shlib stripper-lib.so unstripper-lib.so

This is helpful when default matching mechanisms by name and UUID
can't find a module, and the user needs to explicitly specify
which module the given symbol file belongs to.

Diff Detail

Repository
rL LLVM

Event Timeline

eugene created this revision.Jul 18 2017, 9:25 PM
labath edited edge metadata.

I think Jim (or maybe Greg?) should take a look at this.

Right now I'll just comment on the test.

packages/Python/lldbsuite/test/linux/add-symbols/Makefile
1 ↗(On Diff #107244)

This is a very non-standard Makefile. As it is now, it will fail if running the test against android.

I think you should be able to replace the strip command with a LD_EXTRAS += -Wl,--build-id=none. This should allow you to use Makefile.rules and be more portable. Also, I'd recommend setting this up in a way that you don't overwrite the a.out compiler output, as that will make the rules simpler. I'm thinking of something like this (untested) snippet:

LEVEL= ???
C_SOURCES := a.c
LD_EXTRAS += -Wl,--build-id=none

include $(LEVEL)/Makefile.rules

b.out: a.out
        $(OBJCOPY) --strip-debug a.out b.out #Have the test use b.out as the main executable

all: b.out
clean::
  rm -rf b.out

Btw, will this work correctly if we are cross-debugging from a mac? (because then we will use the fancy DownloadObjectAndSymbolFile implementation)

Btw, will this work correctly if we are cross-debugging from a mac? (because then we will use the fancy DownloadObjectAndSymbolFile implementation)

Yeah, you're right. I'll move this logic to the CommandObjectTarget.cpp itself.

jingham requested changes to this revision.Jul 19 2017, 11:13 AM

This seems like an okay change. You want to make sure that passing the file name and UUID into AddModuleSymbols doesn't override a UUID mismatch against the symbol file however you find it. It looks like that function does the right thing, but can you add a test for this?

You also need to fix up the error messages (around line 4270.) At present they assume that the uuid & file option's are mutually exclusive. It's disconcerting to specify two things and have the error mention only one.

This is more Greg's area than mine, so he should also give it the okay.

This revision now requires changes to proceed.Jul 19 2017, 11:13 AM
clayborg requested changes to this revision.Jul 19 2017, 11:58 AM

So we should just make sure this works:

(lldb) target symbols add --shlib a.out debug_binaries/a.out

The --uuid option is there to help you match up without having to specify the file path. If I am understanding correctly, "a.out" has no UUID or build ID, but "debug_binaries/a.out" does? In that case, there is no need to specify the --uuid option. If the UUID do match, then you could just do:

(lldb) target symbols add debug_binaries/a.out

It would extract the UUID from "debug_binaries/a.out" and then match it to the file. So the "--shlib" option is there to point you in the right direction. When the UUIDs don't match, we should make sure that "target symbols add --shlib a.out debug_binaries/a.out" by recognizing that "a.out" doesn't have a valid UUID so the UUID doesn't need to match in order for "debug_binaries/a.out" to be an acceptable match for it as a symbol file. This might make some of the changes above go away and move them to the ModuleSpec matching code.

So we should just make sure this works:

(lldb) target symbols add --shlib a.out debug_binaries/a.out

The --uuid option is there to help you match up without having to specify the file path. If I am understanding correctly, "a.out" has no UUID or build ID, but "debug_binaries/a.out" does?

In this case, neither of the files has a build id/UUID. However, the tricky part here is that in this case we "invent" a UUID by doing a checksum of the whole file. This is done to support the .gnu_debuglink https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html method of finding symbols. It works like this:

  • if a file has a gnu_debuglink section, we set the UUID to the crc inside that section
  • if a file does not have a gnu_debuglink section, we *assume* it will be used as a gnu_debuglink target, and also set the UUID to the crc we compute from the whole file contents.

This works fine for the gnu_debuglink case, as then the rest of the machinery will match the files based on their UUIDs. However, it does not work for the general case without the debug link, as then the files will end up with two different "UUID"s (because they are based on the file content), and everything will consider them as different.

The solution that would seem best to me here would be to leave the UUID of files which don't have a build-id as empty, as the gnu_debuglink is not really a UUID -- it's main function is to contain a file path, and the crc just acts as a sanity check. This should make the logic for matching two uuid-less files saner, but it would require us to implement the gnu_debuglink method in a different way (which I think is fine, as the current way seems pretty hacky, and causes us to often compute crcs of large files needlessly). I have no idea how much of our existing code would blow up if we started having files without a uuid though.

What do you think about that?

So we should just make sure this works:

(lldb) target symbols add --shlib a.out debug_binaries/a.out

The --uuid option is there to help you match up without having to specify the file path. If I am understanding correctly, "a.out" has no UUID or build ID, but "debug_binaries/a.out" does?

In this case, neither of the files has a build id/UUID. However, the tricky part here is that in this case we "invent" a UUID by doing a checksum of the whole file. This is done to support the .gnu_debuglink https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html method of finding symbols. It works like this:

  • if a file has a gnu_debuglink section, we set the UUID to the crc inside that section
  • if a file does not have a gnu_debuglink section, we *assume* it will be used as a gnu_debuglink target, and also set the UUID to the crc we compute from the whole file contents.

This works fine for the gnu_debuglink case, as then the rest of the machinery will match the files based on their UUIDs. However, it does not work for the general case without the debug link, as then the files will end up with two different "UUID"s (because they are based on the file content), and everything will consider them as different.

The solution that would seem best to me here would be to leave the UUID of files which don't have a build-id as empty, as the gnu_debuglink is not really a UUID -- it's main function is to contain a file path, and the crc just acts as a sanity check. This should make the logic for matching two uuid-less files saner, but it would require us to implement the gnu_debuglink method in a different way (which I think is fine, as the current way seems pretty hacky, and causes us to often compute crcs of large files needlessly). I have no idea how much of our existing code would blow up if we started having files without a uuid though.

What do you think about that?

The UUID is assumed to be a unique identifier that isn't based on the entire file contents, but on the content the the important bits (.text, .data, etc). On Mac, the mach-o files have a LC_UUID which is generated by carefully doing and MD5 checksum on only the bits that don't change due to debug info or strings from debug info. This means a binary that is built in /tmp/a or in /tmp/b with debug info will result in the same UUID. When we create a stand alone debug info file, this UUID gets copied into it.

One other thing we could try is to add a type to the lldb_private::UUID class that specifies the type of the UUID:

namespace lldb_private {
class UUID {
public:
  enum Type {
    eTypeBuildID,
    eTypeFileCRC, // CRC of the entire file contents
    eTypeDebugInfoCRC, // CRC of the debug info
  };
};

If I understood your comments above, one file would have a UUID::eTypeBuildID type and the other would have either a UUID::eTypeFileCRC or UUID::eTypeDebugInfoCRC. Then we teach the ObjectFile subclasses to properly set the UUID type. When comparing UUID values, we then could note that the types are not the same and do something more intelligent. Like when adding a symbol file, we could ignore the UUID mismatch if the UUID::Type doesn't match and just go on filename. But we would honor the mismatch if the UUID::Type was the same.

So we should just make sure this works:

(lldb) target symbols add --shlib a.out debug_binaries/a.out

I like this, and I'm going to implement it. Pavel is right and the way UUID is set for modules without build-id is kinda strange, but changing it seems like a big and risky change.
Actually "target symbols add" already has logic that if UUID doesn't match it solely uses filenames to match debug symbols, so being able to separately set name for both executable and debug modules seems very logical.

One other idea if we go with the UUID::Type solution: if we have UUIDs that are different, we could ask the ObjectFile (ObjectFileELF) to see if it thinks the files go together. Not sure if there is anything in the ELF file that would allow us to do that. In Mach-O we have all definitions for the .text and .data sections, just not the bytes. So it would be easy to compare the file address and file sizes of the sections that are in both to see if they match. Something like:

bool ObjectFile::FilesAreRelated(ObjectFile &rhs);

In Mach-O it could run through the segments and compare the ones in common to ensure things match.

eugene updated this revision to Diff 107628.Jul 20 2017, 8:05 PM
eugene edited edge metadata.
eugene retitled this revision from Extend 'target symbols add' to load symbols from a given file by UUID. to Extend 'target symbols add' to set symbol file for a given module.
eugene edited the summary of this revision. (Show Details)

Found a way to implement what I need much easily and switched to using file names instead of UUID.

Please take another look.

eugene marked an inline comment as done.Jul 20 2017, 8:06 PM
clayborg requested changes to this revision.Jul 21 2017, 9:11 AM

Looks fine, just add an error that checks that when the --file options is used, that only one argument (symbol file) is specified.

source/Commands/CommandObjectTarget.cpp
4302–4305 ↗(On Diff #107628)

We should verify that if the file option is set, then there is only one symbol file argument as it wouldn't make sense to do:

(lldb) target symbols add -- file a.out a.out.symbols b.out.symbols
This revision now requires changes to proceed.Jul 21 2017, 9:11 AM
eugene updated this revision to Diff 107725.Jul 21 2017, 3:18 PM
eugene edited edge metadata.

Added error message when more than one symbol file is provided.

eugene marked an inline comment as done.Jul 21 2017, 3:18 PM

Greg, are you with me checking this in?

Greg, are you with me checking this in?

Sure thing

clayborg accepted this revision.Jul 24 2017, 2:52 PM
This revision was automatically updated to reflect the committed changes.