This is an archive of the discontinued LLVM Phabricator instance.

ELF: Implement --build-id.
ClosedPublic

Authored by ruiu on Mar 11 2016, 10:10 AM.

Details

Summary

This patch implements --build-id. After the linker creates an output file
in the memory buffer, it computes the FNV1 hash of the resulting file
and set the hash to the .note section as a build-id.

GNU ld and gold have the same feature, but their default choice of the
hash function is different. Their default is SHA1.

We made a deliberate choice to not use a secure hash function for the
sake of performance. Computing a secure hash is slow -- for example,
MD5 throughput is usually 400 MB/s or so. SHA1 is slower than that.

As a result, if you pass --build-id to gold, then the linker becomes about
10% slower than that without the option. We observed a similar degradation
in an experimental implementation of build-id for LLD. On the other hand,
we observed only 1-2% performance degradation with the FNV hash.

Since build-id is not for digital certificate or anything, we think that
a very small probability of collision is acceptable.

We considered using other signals such as using input file timestamps as
inputs to a secure hash function. But such signals would have an issue
with build reproducibility (if you build a binary from the same source
tree using the same toolchain, the build id should become the same.)

GNU linkers accepts --build-id=<style> option where style is one of
"MD5", "SHA1", or an arbitrary hex string. That option is out of scope
of this patch.

Diff Detail

Event Timeline

ruiu updated this revision to Diff 50445.Mar 11 2016, 10:10 AM
ruiu retitled this revision from to ELF: Implement --build-id..
ruiu updated this object.
ruiu added reviewers: emaste, rafael, silvas.
ruiu added a subscriber: llvm-commits.
ruiu updated this object.Mar 11 2016, 10:17 AM
emaste added inline comments.Mar 11 2016, 11:38 AM
ELF/Options.td
15

gold accepted both -build-id and --build-id

ruiu updated this revision to Diff 50457.Mar 11 2016, 11:44 AM
  • accept both --build-id and -build-id.
emaste edited edge metadata.Mar 11 2016, 12:28 PM

One issue, if possible the build-id note should be in a loadable note segment and in the first page of the binary. The intent is that it will be copied into a core file so that the original binary can be identified.

from readelf -S, gold:

[ 3] .note.gnu.build-i NOTE             0000000000400248  00000248
     0000000000000024  0000000000000000   A       0     0     4

patched lld:

[34] .note.gnu.build-i NOTE             0000000000000000  00003d41
     0000000000000018  0000000000000000           0     0     0
ruiu updated this revision to Diff 50466.Mar 11 2016, 12:29 PM
ruiu edited edge metadata.
  • .note.gnu.build-id should have SHF_ALLOC attribute. Otherwise I think the core file wouldn't contain the build ID.
  • Use NT_GNU_BUILD_ID constant.
ruiu added a comment.Mar 11 2016, 12:30 PM

I think I've just addressed that!

ruiu added a comment.Mar 11 2016, 12:33 PM

Does it have to be in the first page of a binary?

emaste added inline comments.Mar 11 2016, 12:33 PM
ELF/OutputSections.cpp
1543

Where is this defined (maybe I haven't updated something)?

../tools/lld/ELF/OutputSections.cpp:1562:23: error: use of undeclared identifier 'NT_GNU_BUILD_ID'
  write32<E>(Buf + 8, NT_GNU_BUILD_ID); // Type (NT_GNU_BUILD_ID = 3)
ruiu added a subscriber: ruiu.Mar 11 2016, 12:37 PM

Please synchronize your LLVM repository. (Or replace NT_GNU_BUILD_ID with
3.)

Does it have to be in the first page of a binary?

From https://fedoraproject.org/wiki/Releases/FeatureBuildId:
The build ID note is normally near the beginning of the image and will be in the first page unless there are an unreasonable number of phdrs or other notes. (The kernel should not deal with anything more complex than a simple rule like the first page, so the odd binary with its notes in the wrong places will just lose.)

It is preferred, but not required. And for matching debug files at least it won't matter.

In my hello world I see:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000010270  00000270
       0000000000000015  0000000000000000   A       0     0     1
  [ 2] .note.tag         NOTE             0000000000010288  00000288
       0000000000000030  0000000000000000   A       0     0     4
  [ 3] .rodata           PROGBITS         00000000000102b8  000002b8
       0000000000000001  0000000000000000 AMS       0     0     1
  [ 4] .eh_frame         X86_64_UNWIND    00000000000102c0  000002c0
       0000000000000080  0000000000000000   A       0     0     8
  [ 5] .rodata           PROGBITS         0000000000010340  00000340
       000000000000000f  0000000000000000   A       0     0     1
  [ 6] .note.gnu.build-i NOTE             000000000001034f  0000034f
       0000000000000018  0000000000000000   A       0     0     0

and it seems like we should move it immediately after .note.tag.

emaste accepted this revision.Mar 11 2016, 12:48 PM
emaste edited edge metadata.

This looks fine to me. We can revisit hash types, sections included in the hash, and the note location in the object file in subsequent patches.

This revision is now accepted and ready to land.Mar 11 2016, 12:48 PM
ruiu closed this revision.Mar 11 2016, 12:57 PM

Committed as r263292.