Page MenuHomePhabricator

[lld] Enable a watermark of loadable sections to be generated and placed in a note section
Needs ReviewPublic

Authored by chrisjackson on Aug 19 2019, 9:56 AM.

Details

Summary

Add a '--watermark' flag to lld that enables an xxhash of loadable sections to be placed in a note section, 'note.llvm.watermark'. Then we can determine if any loadable sections have been modified since the ELF was linked.

We have selected xxhash to minimise the overhead of calculating the watermark. The functionality already provided for the GNU build Id has been reused where possible. The hash value provided by the watermark must be unaffected by stripping of debug data or symbols. As buildId hashes the entire ELF, it is not suitable.

By ensuring loadable sections have not changed since link-time, we can have confidence that they are compliant with the system ABI. This helps to ensure that changes to system software will not unexpectedly cause the ELF to execute incorrectly. If additional tooling is being used to modify the ELF this would indicate functionality that is lacking in our toolchain and is desired by users.

Diff Detail

Event Timeline

chrisjackson created this revision.Aug 19 2019, 9:56 AM
davidb added a subscriber: davidb.Aug 19 2019, 10:09 AM
chrisjackson edited the summary of this revision. (Show Details)Aug 20 2019, 2:37 AM
chrisjackson added a reviewer: ruiu.
ruiu added a comment.Aug 20 2019, 2:56 AM

IIRC lld's --build-id={md5,sha1} was slow at first but after we made them a tree-hash to utilize mutli-cores, its cost became negligible. Have you considered taking that approach? This watermark hash is probably not a thing that people attack by the collision attack, so it might not have to be a cryptographically-safe hash, but still cryptographically-safe hash function has nice properties compared to non-crypto ones.

I suggest adding a dumper to llvm-readobj and then adding a test to test/ELF/partition-notes.s

lld/ELF/Driver.cpp
924

See args.hasFlag above.

lld/ELF/SyntheticSections.cpp
342

Delete lld::elf::.

347

5 -> 8, otherwise it is incorrect to use watermarkBuf = buf + 20;

354

Delete lld::elf::

Delete llvm::

lld/ELF/Writer.cpp
614

Backfill .note.llvm.watermark section content. This is similar to .note.gnu.build-id.

2732

inline the only use of the variable

2735

Delete parts

2754

if (first >= last) might be better (WriteHash asserts first < last though I haven't found a case where first can be equal to last)

lld/test/ELF/watermark.s
2

generated placed?

6

-triple=x86_64 %s -o %t.o (this is generic, not Linux specific). Use .o for object files.

15

Consider llvm-readelf -S. Its output is concise.

16

llvm-readelf -x .note.llvm.watermark (Prefer llvm-readelf -x over llvm-objdump -s`)

24

[048C]

MaskRay added a comment.EditedAug 20 2019, 8:45 AM

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

By ensuring loadable sections have not changed since link-time, we can have confidence that they are compliant with the system ABI. This helps to ensure that changes to system software will not unexpectedly cause the ELF to execute incorrectly.

I think I want to hear a bit more about the motivation and how you'd use this feature. If this is used to measure ABI compliance, isn't it too strict? Different optimizations, adding/deleting .note* sections, shuffling .dynsym contents, linker -O1/-O2 (affecting SHF_MERGE sections), etc will change the watermark. Basically you can only do non-PT_LOAD-influencing things like upgrading linkers (lld has an embedded string in .comment), --discard-locals, --strip-debug, etc. I don't see how those things can help you share resources between two links. Those things can also be easily done with a post-processing tool (llvm-objcopy?) Basically, linkers do a lossy transformation. If a feature can be implemented without being affected by the lossy transformation, we should think carefully if it is really the business of the linker.

grimar added a subscriber: grimar.Aug 20 2019, 11:57 PM

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

What target are you looking at? I thought generally note sections aren't allocated and are usually safe to remove. If they were included in a segment, then that would be a PT_NOTE segment that isn't nested in a PT_LOAD segment.

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

What target are you looking at? I thought generally note sections aren't allocated and are usually safe to remove. If they were included in a segment, then that would be a PT_NOTE segment that isn't nested in a PT_LOAD segment.

Actually nevermind. I've just found a few example of scripts placing the build-id into text so the value can be accessed at runtime.

MaskRay added a comment.EditedAug 21 2019, 4:47 AM

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

What target are you looking at? I thought generally note sections aren't allocated and are usually safe to remove. If they were included in a segment, then that would be a PT_NOTE segment that isn't nested in a PT_LOAD segment.

Actually nevermind. I've just found a few example of scripts placing the build-id into text so the value can be accessed at runtime.

Unless you mark a SHF_ALLOC output section as NOLOAD or discard it, it should be included in at least one PT_LOAD segment. ELF spec says: "SHF_ALLOC - The section occupies memory during process execution."
SHT_NOTE sections are usually SHF_ALLOC. This is because 1) many are inspected at runtime 2) many are expected to be dumped to core (this applies to .note.ABI-tag .note.gnu.build-id .note.tag ...). Non-SHF_ALLOC SHT_NOTE section exist in the wild, but they are rare, e.g. GHC uses .debug-ghc-link-info (compiler/main/SysTools/ExtraObj.hs). (May be more common with https://fedoraproject.org/wiki/Toolchain/Watermark .gnu.build.attributes) The computation of .note.llvm.watermark cannot depend on the contents of .note.gnu.build-id so I suggested computing both before writing the contents.

I hope some of you can answer my motivation/justification question above. Whether or not this change is justified to be made into LLD, I think you'll need a llvm-readobj change to dump the note. Also, as I noted above, the llvm-readobj change will help testing the linker feature. You can add the include/llvm/BinaryFormat/ELF.h change to that llvm-readobj patch. It may be worth a llvm-objcopy change, too. One justification is that this watermark can be used to verify llvm-objcopy does not break things (does not alter SHF_ALLOC sections).

MaskRay added a subscriber: peter.smith.

My first reaction is that this seems to be quite a bit of a platform specific feature to build into the linker, it could also make the platform dependent on LLD if this didn't also get into binutils or other ELF linkers. An alternative approach which I believe has been used in other platforms before (for example https://people.freebsd.org/~tmm/elfcksum.c) is to reserve some empty space in the binary that an external tool can post-process to write any checksum/hash etc that you want into it. This is not as convenient but would be compatible with other linkers and not require a LLVM specific extension to ELF.

If this were to go in I think you'd also need to update:

Thanks for the detailed explanation, mostly tally's up with what I've just read. One thing

Non-SHF_ALLOC SHT_NOTE section exist in the wild, but they are rare

I don't think this case should be considered rare (I actually thought the opposite case was rare...). The ELF spec describes SHT_NOTE section to not have any defining attributes (SHF_ALLOC), and most embedded/memory critical systems won't want notes in memory, so I think this is very much target dependent.

Whether or not this change is justified to be made into LLD, I think you'll need a llvm-readobj change to dump the note.

Namely, please add another case here: https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-readobj/ELFDumper.cpp#L4495 (and similarly for LLVMStyle::printNotes). I imagine you can repurpose the getGNUBuildId method.

It may be worth a llvm-objcopy change, too. One justification is that this watermark can be used to verify llvm-objcopy does not break things (does not alter SHF_ALLOC sections).

Interesting idea. What patch are you suggesting here?

lld/ELF/SyntheticSections.cpp
347

http://www.sco.com/developers/gabi/1998-04-29/ch5.pheader.html#note_section
I think 5 is the correct value for namesz; padding exists in the note but is not included in the value of namesz

348

Content -> Descriptor

edd added a comment.Aug 22 2019, 8:51 AM

Hi all,

Chris is out of the office until the 28th. I'm sure he'll do what he can to address the technical concerns and gaps raised so far upon his return, but I'll try to field some of the questions regarding intent until then.

I think I want to hear a bit more about the motivation and how you'd use this feature.

We are primarily concerned about the situation where a studio produces a game for PlayStation that depends on an incidental detail of the OS. The scenario we must avoid whenever possible is an update to the OS interfering with the intended operation of a game. Fixing that is a costly process. We put a lot of effort in to making sure our tools do what they can to avoid this situation.

So given an ELF with a watermark and the ability to recalculate the watermark (perhaps with llvm-readelf), we can detect when additional transformations were applied post-link. Such transformations may break some of the invariants that we have been careful to establish and maintain in our supported tooling and workflows. There are many ways in which our users could accidentally introduce fragility, so it isn't water-tight or all-encompassing, but detecting this one situation is nevertheless useful to us as we may decide to explore what transformations were applied to seek (mutual) reassurance or identify a gap in our SDK offering.

IIRC lld's --build-id={md5,sha1} was slow at first but after we made them a tree-hash to utilize mutli-cores, its cost became negligible. Have you considered taking that approach? This watermark hash is probably not a thing that people attack by the collision attack, so it might not have to be a cryptographically-safe hash, but still cryptographically-safe hash function has nice properties compared to non-crypto ones.

We don't really need any kind of cryptographic guarantees as the watermark is not intended to be part of a strict gating process. We probably would have considered crc32 if it was already available, but our experiments have shown that xxHash adds negligible overhead. Actually, md5 or sha1 may also be fine but we had no need to explore them given the existence of xxHash.

What really is important is that we only have the content of PT_LOADs contribute to the watermark. This is because we would like to be able to recalculate the watermark post-link via a tool and get the same value back, even if the ELF has since been stripped of metadata (DWARF, .symtab, etc).

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

We would like to have two PT_NOTEs, one for use by the OS and another for use by tooling. This is achieved by linker scripts. The second PT_NOTE is outside of any PT_LOAD and this is where the watermark would be housed. As described above, a required property of the watermark is that it can be recalculated by an external tool to infer whether or not the loadable parts of the ELF have been modified post-link. By having the watermark outside of any PT_LOAD, it is simpler for the external tool to recalculate. For a similar reason, it would actually be better in our case to have the watermark calculated after the build ID value has been "filled-in", as the build ID is inside a PT_LOAD.

(Adjusting the code to better accommodate other layouts is certainly something worth considering. I'm just explaining how we intend to make use of it).

My first reaction is that this seems to be quite a bit of a platform specific feature to build into the linker, it could also make the platform dependent on LLD if this didn't also get into binutils or other ELF linkers.

This is true, but we have a very similar (although admittedly not identical) feature in our existing proprietary linker and we have a requirement that our customers use the linker supplied with our SDK.

An alternative approach which I believe has been used in other platforms before (for example https://people.freebsd.org/~tmm/elfcksum.c) is to reserve some empty space in the binary that an external tool can post-process to write any checksum/hash etc that you want into it. This is not as convenient but would be compatible with other linkers and not require a LLVM specific extension to ELF.

The existence of the watermark is a requirement on PlayStation. Indeed, we would like to avoid mandating an easily-forgotten post-link step. More to the point, adding the watermark via a post-link step would mean post-link modifications could be made before the watermark is added, which rather defeats the point (that may not have been too clear before - sorry).

Thanks,
Edd

pcc added a subscriber: pcc.Aug 23 2019, 12:31 PM
pcc added inline comments.
lld/ELF/Writer.cpp
2747

I guess the right thing to do in the case of multiple partitions would be to compute a separate hash for each partition. But this can always be changed later since the partitions feature is experimental.

2749

Should this exclude the ELF headers if present in a segment? The header fields e_shoff, e_shnum and e_shstrndx can and likely must be rewritten by strip and other tools.

This update applies suggested source corrections, enables readobj to output the note and adds logic to writeWatermark() to exclude the ELF Header and Program Header Table (PHT) from the watermark. In this diff the PHT can only be excluded if it is the first or last segment. If it is neither, an error is emitted.

This patch introduces a new ELF extension with a new tag: NT_LLVM_WATERMARK. Last times such ELF extensions were made accompanying RFCs were posted:

https://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html
https://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html

I still have some concerns about the extension, especially its generality on the widely used ELF platforms.

We would like to have two PT_NOTEs, one for use by the OS and another for use by tooling. This is achieved by linker scripts. The second PT_NOTE is outside of any PT_LOAD and this is where the watermark would be housed. As described above, a required property of the watermark is that it can be recalculated by an external tool to infer whether or not the loadable parts of the ELF have been modified post-link. By having the watermark outside of any PT_LOAD, it is simpler for the external tool to recalculate. For a similar reason, it would actually be better in our case to have the watermark calculated after the build ID value has been "filled-in", as the build ID is inside a PT_LOAD.

Sorry, I am not following. Do you have a concrete readelf -Sl dump to help my understanding?
By linking a trivial executable with ld.lld a.o --watermark -o a, what I see is:

 PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x000118 0x000118 R   0x8
 LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000174 0x000174 R   0x1000
 GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
 NOTE           0x000158 0x0000000000200158 0x0000000000200158 0x00001c 0x00001c R   0x4

Section to Segment mapping:
 Segment Sections...
  00
  01     .note.llvm.watermark
  02
  03     .note.llvm.watermark

.note.llvm.watermark is still included in a PT_LOAD segment. I think you probably meant:

  • .note.gnu.build-id is included in a PT_LOAD and a PT_NOTE
  • .note.llvm.watermark is included in another PT_NOTE

There is also a question why .note.llvm.watermark should be flagged SHF_ALLOC if it is not supposed to be inspected at runtime.

lld/ELF/InputFiles.cpp
1027
% ld.lld -r --watermark a.o -o b.o
ld.lld: error: Unable to apply watermark because no PT_LOAD segments were found!                               
ld.lld: ../projects/lld/ELF/Writer.cpp:2690: void WriteHash(std::vector<uint8_t> &, const size_t, const size_t,
size_t, const lld::elf::BuildIdKind): Assertion `first < last' failed.
lld/ELF/Writer.cpp
2761

error: missing 'typename' prior to dependent type name 'ELFType<llvm::support::big, false>::Ehdr'

ruiu added a comment.Sep 9 2019, 10:39 PM

It seems the main reason why you guys wanted to avoid an external tool is that it is too easy to forget to run the tool after link. But that can be fixed easily by writing a shell script as ld.lld which invokes the real ld.lld and a watermarking tool. Or, you could make a change to lld so that, after creating an executable file and before existing, lld invokes a watermarking tool on a file that the linker just created (in this configuration, the watermarking tool can either run in the same process or as a child process). Have you considered that approach? I think it is fine to add this feature directly to lld if it is convenient, but I'd like to explore other possibilities before we make a decision.

Besides that, there are a few technical concerns in this patch as below:

  • lld guarantees that the same build id will be computed only when the resulting output file (except the build id part itself) and the linker version are the same. We didn't guarantee that different versions of lld compute the build id in the same way. Actually, we have tweaked hash functions and the strategy for tree hash several times. This contract seems too weak for your use case -- for your use case, we need to guarantee that the way how we compute a hash value doesn't change over time. So, we need to make sure that the current way of hash computation is something that we want to maintain like forever.

    If watermarking doesn't have to be fast (e.g. users only have to do this for release binaries), consider using a simple non-tree hash function.
  • Maybe you should add a version field or something to the note section so that we can change a hash function or something in the future.
  • The watermark feature is to make sure that the program image loaded to memory hasn't changed since its file is created. In that sense, the hash function seems a bit too fragile. If you move a segment within an ELF file, you'd have to change the file offset field of a program header, but the memory image won't change by doing that, so in the sense of watermarking, I don't think it should be considered a change.

It seems the main reason why you guys wanted to avoid an external tool is that it is too easy to forget to run the tool after link. But that can be fixed easily by writing a shell script as ld.lld which invokes the real ld.lld and a watermarking tool. Or, you could make a change to lld so that, after creating an executable file and before existing, lld invokes a watermarking tool on a file that the linker just created (in this configuration, the watermarking tool can either run in the same process or as a child process). Have you considered that approach? I think it is fine to add this feature directly to lld if it is convenient, but I'd like to explore other possibilities before we make a decision.

Besides that, there are a few technical concerns in this patch as below:

  • lld guarantees that the same build id will be computed only when the resulting output file (except the build id part itself) and the linker version are the same. We didn't guarantee that different versions of lld compute the build id in the same way. Actually, we have tweaked hash functions and the strategy for tree hash several times. This contract seems too weak for your use case -- for your use case, we need to guarantee that the way how we compute a hash value doesn't change over time. So, we need to make sure that the current way of hash computation is something that we want to maintain like forever.

    If watermarking doesn't have to be fast (e.g. users only have to do this for release binaries), consider using a simple non-tree hash function.
  • Maybe you should add a version field or something to the note section so that we can change a hash function or something in the future.
  • The watermark feature is to make sure that the program image loaded to memory hasn't changed since its file is created. In that sense, the hash function seems a bit too fragile. If you move a segment within an ELF file, you'd have to change the file offset field of a program header, but the memory image won't change by doing that, so in the sense of watermarking, I don't think it should be considered a change.

Unfortunately the shell available is not particularly capable and because performance is critical we are reluctant to pay the penalty for invocation of an external tool in this way. However, we are going to experiment with invoking a process with lld directly and examine the performance differences.

Of course we would like to maintain the watermark between versions of lld, so we will decouple the buildId and watermark functionality. The inclusion of a version in the note is a good idea should changes be needed in future.

Indeed the order of segments within the ELF file could change without affecting the image loaded to memory. We will modify the watermark computation so that it is agnostic to this ordering.