This is not ready for submission.
This patch implements --build-id in a naive way. After the linker creates
an output file in the memory buffer, it computes the md5sum of the resulting
file and set the hash to the .note section as a build-id.
The problem is that its performance impact is too large. Computing a secure
hash is a computational intensive task (our md5sum function seems a bit too
slow, though). Here are some numbers (milliseconds to link):
LLD: 713.78 -> 883.44 (+23.8%)
Scylla: 5005.64 -> 5437.84 (+8.6%)
Even if we replace MD5 with a faster hash function, it cannot be within
an "okay" range, which is, say, +3%.
Do you guys have any opnion on this? I have a few ideas.
- Use CRC32 instead of MD5. Non-secure hash should be much faster.
- Make clang to calculate a secure hash for each section. This is basically moving the workload from the linker to the compiler, but we can use the hash for ICF in the linker, so it might be a overall win.
- Compute a build-id from input files' timestamps. This makes builds non-reproducible if you touch a file, so I don't think we want this.
- Build-id is not needed for program execution in general. So we may be able to let the linker exit as soon as it's done with linking, and backfill the hash value in a background process which is kicked in by the linker. (It's probably unrealistic plan, though.)
- Do not support build-id and let the user pass --build-id=<hash value>.