This is an archive of the discontinued LLVM Phabricator instance.

ELF: Add --build-id-link-dir=DIR switch
Needs ReviewPublic

Authored by phosek on Sep 8 2018, 3:20 PM.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

mcgrathr created this revision.Sep 8 2018, 3:20 PM
ruiu added a subscriber: ruiu.Sep 10 2018, 9:04 AM

This seems like a new feature proposal, and we haven't discussed this before. It's not clear to me why you have to do this inside the linker rather than a post-processing tool. Could you please elaborate about why you want to add a new option?

This seems like a new feature proposal, and we haven't discussed this before. It's not clear to me why you have to do this inside the linker rather than a post-processing tool. Could you please elaborate about why you want to add a new option?

The .build-id/xx/xxx.debug lookup protocol is already used by various tools such as debuggers (e.g. here's the logic in LLDB) or Linux packagers. It's also supported by other ELF tools such as elfutils. We'd like to use it in Fuchsia as well and integrate it into our build system.

The problem is that determining the build-id after linking is done is not very straightforward, you can use a tool like llvm-readobj, but that tools doesn't have a machine-readable output so you need to parse the output which is error prone and adds an extra overhead. It also means that in our build system we'd need to add additional step or wrap linking in a script complicating things, especially if you want to do it in a portable fashion that's going to work on Linux, Windows and macOS.

Initially we considered adding a flag to lld that would allows writing the build-id into a file which would allow something like ld.lld --build-id-file=>(id=$(</dev/fd/0); mkdir -p build-id/${id:0:2} && ln -f {{output}} build-id/${id:0:2}/${id:2}), but this again is not a portable solution which won't work on Windows. So we instead tried prototyping the support for linking the output file directly into the .build-id/xx/xxx.debug layout to see how complicated it'd be, and turned out it's actually pretty straightforward as you see here (I have even more simplified version that eliminates the hex representation computation using llvm::toHex instead).

Would this be an acceptable addition? It'd simplify our build system and this solution should be working across all platforms without any extra effort. If this would be fine with you, I'm going to update the change and also write some tests.

phosek commandeered this revision.Sep 10 2018, 4:45 PM
phosek updated this revision to Diff 164767.
phosek edited reviewers, added: mcgrathr; removed: phosek.
ruiu added a comment.Sep 10 2018, 4:59 PM

I think I don't understand the use case of the feature yet.

  • If you hard-link two files, they have the identical contents (strictly speaking there is only one file with two filenames). If a debugger can find an executable having debug info in .build-id/xx/xxxxx directory, it should be able to find it in the executable that's being debugged. So, how does it work?
  • If it is the only problem that llvm-objdump's output is not machine-readable, you can add a new option to llvm-objdump to print out a build-id, can't you?

I think I don't understand the use case of the feature yet.

  • If you hard-link two files, they have the identical contents (strictly speaking there is only one file with two filenames). If a debugger can find an executable having debug info in .build-id/xx/xxxxx directory, it should be able to find it in the executable that's being debugged. So, how does it work?

We would strip the binary and that's what's being executed. This doesn't apply only to executables, but also to shared libraries. So when the debugger connects to a process, it'd find all the ELF files mapped into memory and see their build-ids, from there it needs a way to map those back to files that contain the debug information. On Fuchsia, we're always cross-compiling and producing a system image as the output that contains all stripped binaries, but we need to keep debugging binaries around so we can debug or symbolize them remotely. Ideally we would then point all debugging tools at the .build-id root to lookup the binaries using their build-id.

  • If it is the only problem that llvm-objdump's output is not machine-readable, you can add a new option to llvm-objdump to print out a build-id, can't you?

We considered it but that's going to be really complicated since llvm-objdump/llvm-readobj doesn't have any reasonable in-memory representation that we can easily serialize into a machine readable format (e.g. JSON), those tools are really optimized for printing out (formatted) output. Also this solution still requires some post-processing and we would need to ensure that this works on all platforms.