This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
ELF/
-
Config.h
1
Driver.cpp
1
InputFiles.cpp
-
Options.td
-
SyntheticSections.h
5
SyntheticSections.cpp
1/9
Writer.cpp
-
test/ELF/
-
ELF/
5
watermark.s
-
llvm/
-
include/llvm/
-
llvm/
-
BinaryFormat/
-
ELF.h
-
Object/
-
Watermark.h
-
lib/Object/
-
Object/
-
CMakeLists.txt
-
Watermark.cpp
-
test/Object/
-
Object/
-
watermark.test

Differential D66426

[lld] Enable a watermark of loadable sections to be generated and placed in a note section
AbandonedPublic

Authored by chrisjackson on Aug 19 2019, 9:56 AM.

Download Raw Diff

Details

Reviewers

• espindola
jhenderson
edd
andrewng
ruiu
MaskRay
bd1976llvm

Summary

Add a '--watermark' flag to lld that enables an xxhash of loadable sections to be placed in a note section, 'note.llvm.watermark'. Then we can determine if any loadable sections have been modified since the ELF was linked.

We have selected xxhash to minimise the overhead of calculating the watermark. The functionality already provided for the GNU build Id has been reused where possible. The hash value provided by the watermark must be unaffected by stripping of debug data or symbols. As buildId hashes the entire ELF, it is not suitable.

By ensuring loadable sections have not changed since link-time, we can have confidence that they are compliant with the system ABI. This helps to ensure that changes to system software will not unexpectedly cause the ELF to execute incorrectly. If additional tooling is being used to modify the ELF this would indicate functionality that is lacking in our toolchain and is desired by users.

Diff Detail

Event Timeline

chrisjackson created this revision.Aug 19 2019, 9:56 AM

Herald added a reviewer: • espindola. · View Herald TranscriptAug 19 2019, 9:56 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, MaskRay, arichardson, emaste. · View Herald Transcript

chrisjackson added reviewers: jhenderson, edd, andrewng.Aug 19 2019, 9:57 AM

davidb added a subscriber: davidb.Aug 19 2019, 10:09 AM

chrisjackson edited the summary of this revision. (Show Details)Aug 20 2019, 2:37 AM

chrisjackson added a reviewer: ruiu.

IIRC lld's --build-id={md5,sha1} was slow at first but after we made them a tree-hash to utilize mutli-cores, its cost became negligible. Have you considered taking that approach? This watermark hash is probably not a thing that people attack by the collision attack, so it might not have to be a cryptographically-safe hash, but still cryptographically-safe hash function has nice properties compared to non-crypto ones.

I suggest adding a dumper to llvm-readobj and then adding a test to test/ELF/partition-notes.s

lld/ELF/Driver.cpp
969	See `args.hasFlag` above.
lld/ELF/SyntheticSections.cpp
339	Delete `lld::elf::`.
344	5 -> 8, otherwise it is incorrect to use `watermarkBuf = buf + 20;`
351	Delete `lld::elf::` Delete `llvm::`
lld/ELF/Writer.cpp
624	Backfill .note.llvm.watermark section content. This is similar to .note.gnu.build-id.
2764	inline the only use of the variable
2767	Delete `parts`
2771	`if (first >= last)` might be better (WriteHash asserts `first < last` though I haven't found a case where first can be equal to last)
lld/test/ELF/watermark.s
2	`generated placed`?
6	`-triple=x86_64 %s -o %t.o` (this is generic, not Linux specific). Use .o for object files.
15	Consider `llvm-readelf -S`. Its output is concise.
16	`llvm-readelf -x .note.llvm.watermark` (Prefer llvm-readelf -x over llvm-objdump -s`)
24	[048C]

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

By ensuring loadable sections have not changed since link-time, we can have confidence that they are compliant with the system ABI. This helps to ensure that changes to system software will not unexpectedly cause the ELF to execute incorrectly.

I think I want to hear a bit more about the motivation and how you'd use this feature. If this is used to measure ABI compliance, isn't it too strict? Different optimizations, adding/deleting .note* sections, shuffling .dynsym contents, linker -O1/-O2 (affecting SHF_MERGE sections), etc will change the watermark. Basically you can only do non-PT_LOAD-influencing things like upgrading linkers (lld has an embedded string in .comment), --discard-locals, --strip-debug, etc. I don't see how those things can help you share resources between two links. Those things can also be easily done with a post-processing tool (llvm-objcopy?) Basically, linkers do a lossy transformation. If a feature can be implemented without being affected by the lossy transformation, we should think carefully if it is really the business of the linker.

grimar added a subscriber: grimar.Aug 20 2019, 11:57 PM

In D66426#1637540, @MaskRay wrote:

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

What target are you looking at? I thought generally note sections aren't allocated and are usually safe to remove. If they were included in a segment, then that would be a PT_NOTE segment that isn't nested in a PT_LOAD segment.

In D66426#1639065, @davidb wrote:

In D66426#1637540, @MaskRay wrote:

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

What target are you looking at? I thought generally note sections aren't allocated and are usually safe to remove. If they were included in a segment, then that would be a PT_NOTE segment that isn't nested in a PT_LOAD segment.

Actually nevermind. I've just found a few example of scripts placing the build-id into text so the value can be accessed at runtime.

In D66426#1639069, @davidb wrote:

In D66426#1639065, @davidb wrote:

In D66426#1637540, @MaskRay wrote:

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

What target are you looking at? I thought generally note sections aren't allocated and are usually safe to remove. If they were included in a segment, then that would be a PT_NOTE segment that isn't nested in a PT_LOAD segment.

Actually nevermind. I've just found a few example of scripts placing the build-id into text so the value can be accessed at runtime.

Unless you mark a SHF_ALLOC output section as NOLOAD or discard it, it should be included in at least one PT_LOAD segment. ELF spec says: "SHF_ALLOC - The section occupies memory during process execution."
SHT_NOTE sections are usually SHF_ALLOC. This is because 1) many are inspected at runtime 2) many are expected to be dumped to core (this applies to .note.ABI-tag .note.gnu.build-id .note.tag ...). Non-SHF_ALLOC SHT_NOTE section exist in the wild, but they are rare, e.g. GHC uses .debug-ghc-link-info (compiler/main/SysTools/ExtraObj.hs). (May be more common with https://fedoraproject.org/wiki/Toolchain/Watermark .gnu.build.attributes) The computation of .note.llvm.watermark cannot depend on the contents of .note.gnu.build-id so I suggested computing both before writing the contents.

I hope some of you can answer my motivation/justification question above. Whether or not this change is justified to be made into LLD, I think you'll need a llvm-readobj change to dump the note. Also, as I noted above, the llvm-readobj change will help testing the linker feature. You can add the include/llvm/BinaryFormat/ELF.h change to that llvm-readobj patch. It may be worth a llvm-objcopy change, too. One justification is that this watermark can be used to verify llvm-objcopy does not break things (does not alter SHF_ALLOC sections).

MaskRay added a reviewer: MaskRay.Aug 21 2019, 4:47 AM

MaskRay added a subscriber: peter.smith.

MaskRay added a subscriber: rupprecht.Aug 21 2019, 4:50 AM

ikudrin added a subscriber: ikudrin.Aug 21 2019, 5:14 AM

My first reaction is that this seems to be quite a bit of a platform specific feature to build into the linker, it could also make the platform dependent on LLD if this didn't also get into binutils or other ELF linkers. An alternative approach which I believe has been used in other platforms before (for example https://people.freebsd.org/~tmm/elfcksum.c) is to reserve some empty space in the binary that an external tool can post-process to write any checksum/hash etc that you want into it. This is not as convenient but would be compatible with other linkers and not require a LLVM specific extension to ELF.

If this were to go in I think you'd also need to update:

https://llvm.org/docs/Extensions.html with the format of note.llvm.watermark
The help and docs for LLD.

Thanks for the detailed explanation, mostly tally's up with what I've just read. One thing

In D66426#1639109, @MaskRay wrote:

Non-SHF_ALLOC SHT_NOTE section exist in the wild, but they are rare

I don't think this case should be considered rare (I actually thought the opposite case was rare...). The ELF spec describes SHT_NOTE section to not have any defining attributes (SHF_ALLOC), and most embedded/memory critical systems won't want notes in memory, so I think this is very much target dependent.

Whether or not this change is justified to be made into LLD, I think you'll need a llvm-readobj change to dump the note.

Namely, please add another case here: https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-readobj/ELFDumper.cpp#L4495 (and similarly for LLVMStyle::printNotes). I imagine you can repurpose the getGNUBuildId method.

It may be worth a llvm-objcopy change, too. One justification is that this watermark can be used to verify llvm-objcopy does not break things (does not alter SHF_ALLOC sections).

Interesting idea. What patch are you suggesting here?

lld/ELF/SyntheticSections.cpp
344	http://www.sco.com/developers/gabi/1998-04-29/ch5.pheader.html#note_section I think 5 is the correct value for namesz; padding exists in the note but is not included in the value of namesz
345	Content -> Descriptor

Hi all,

Chris is out of the office until the 28th. I'm sure he'll do what he can to address the technical concerns and gaps raised so far upon his return, but I'll try to field some of the questions regarding intent until then.

In D66426#1637540, @MaskRay wrote:

I think I want to hear a bit more about the motivation and how you'd use this feature.

We are primarily concerned about the situation where a studio produces a game for PlayStation that depends on an incidental detail of the OS. The scenario we must avoid whenever possible is an update to the OS interfering with the intended operation of a game. Fixing that is a costly process. We put a lot of effort in to making sure our tools do what they can to avoid this situation.

So given an ELF with a watermark and the ability to recalculate the watermark (perhaps with llvm-readelf), we can detect when additional transformations were applied post-link. Such transformations may break some of the invariants that we have been careful to establish and maintain in our supported tooling and workflows. There are many ways in which our users could accidentally introduce fragility, so it isn't water-tight or all-encompassing, but detecting this one situation is nevertheless useful to us as we may decide to explore what transformations were applied to seek (mutual) reassurance or identify a gap in our SDK offering.

In D66426#1636845, @ruiu wrote:

IIRC lld's --build-id={md5,sha1} was slow at first but after we made them a tree-hash to utilize mutli-cores, its cost became negligible. Have you considered taking that approach? This watermark hash is probably not a thing that people attack by the collision attack, so it might not have to be a cryptographically-safe hash, but still cryptographically-safe hash function has nice properties compared to non-crypto ones.

We don't really need any kind of cryptographic guarantees as the watermark is not intended to be part of a strict gating process. We probably would have considered crc32 if it was already available, but our experiments have shown that xxHash adds negligible overhead. Actually, md5 or sha1 may also be fine but we had no need to explore them given the existence of xxHash.

What really is important is that we only have the content of PT_LOADs contribute to the watermark. This is because we would like to be able to recalculate the watermark post-link via a tool and get the same value back, even if the ELF has since been stripped of metadata (DWARF, .symtab, etc).

In D66426#1637540, @MaskRay wrote:

Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

We would like to have two PT_NOTEs, one for use by the OS and another for use by tooling. This is achieved by linker scripts. The second PT_NOTE is outside of any PT_LOAD and this is where the watermark would be housed. As described above, a required property of the watermark is that it can be recalculated by an external tool to infer whether or not the loadable parts of the ELF have been modified post-link. By having the watermark outside of any PT_LOAD, it is simpler for the external tool to recalculate. For a similar reason, it would actually be better in our case to have the watermark calculated after the build ID value has been "filled-in", as the build ID is inside a PT_LOAD.

(Adjusting the code to better accommodate other layouts is certainly something worth considering. I'm just explaining how we intend to make use of it).

In D66426#1639239, @peter.smith wrote:

My first reaction is that this seems to be quite a bit of a platform specific feature to build into the linker, it could also make the platform dependent on LLD if this didn't also get into binutils or other ELF linkers.

This is true, but we have a very similar (although admittedly not identical) feature in our existing proprietary linker and we have a requirement that our customers use the linker supplied with our SDK.

In D66426#1639239, @peter.smith wrote:

An alternative approach which I believe has been used in other platforms before (for example https://people.freebsd.org/~tmm/elfcksum.c) is to reserve some empty space in the binary that an external tool can post-process to write any checksum/hash etc that you want into it. This is not as convenient but would be compatible with other linkers and not require a LLVM specific extension to ELF.

The existence of the watermark is a requirement on PlayStation. Indeed, we would like to avoid mandating an easily-forgotten post-link step. More to the point, adding the watermark via a post-link step would mean post-link modifications could be made before the watermark is added, which rather defeats the point (that may not have been too clear before - sorry).

Thanks,
Edd

pcc added a subscriber: pcc.Aug 23 2019, 12:31 PM

pcc added inline comments.

lld/ELF/Writer.cpp
2764	I guess the right thing to do in the case of multiple partitions would be to compute a separate hash for each partition. But this can always be changed later since the partitions feature is experimental.
2766	Should this exclude the ELF headers if present in a segment? The header fields e_shoff, e_shnum and e_shstrndx can and likely must be rewritten by strip and other tools.

This update applies suggested source corrections, enables readobj to output the note and adds logic to writeWatermark() to exclude the ELF Header and Program Header Table (PHT) from the watermark. In this diff the PHT can only be excluded if it is the first or last segment. If it is neither, an error is emitted.

Herald added a subscriber: seiya. · View Herald TranscriptSep 6 2019, 3:39 AM

This patch introduces a new ELF extension with a new tag: NT_LLVM_WATERMARK. Last times such ELF extensions were made accompanying RFCs were posted:

https://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html
https://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html

I still have some concerns about the extension, especially its generality on the widely used ELF platforms.

We would like to have two PT_NOTEs, one for use by the OS and another for use by tooling. This is achieved by linker scripts. The second PT_NOTE is outside of any PT_LOAD and this is where the watermark would be housed. As described above, a required property of the watermark is that it can be recalculated by an external tool to infer whether or not the loadable parts of the ELF have been modified post-link. By having the watermark outside of any PT_LOAD, it is simpler for the external tool to recalculate. For a similar reason, it would actually be better in our case to have the watermark calculated after the build ID value has been "filled-in", as the build ID is inside a PT_LOAD.

Sorry, I am not following. Do you have a concrete readelf -Sl dump to help my understanding?
By linking a trivial executable with ld.lld a.o --watermark -o a, what I see is:

 PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x000118 0x000118 R   0x8
 LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000174 0x000174 R   0x1000
 GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
 NOTE           0x000158 0x0000000000200158 0x0000000000200158 0x00001c 0x00001c R   0x4

Section to Segment mapping:
 Segment Sections...
  00
  01     .note.llvm.watermark
  02
  03     .note.llvm.watermark

.note.llvm.watermark is still included in a PT_LOAD segment. I think you probably meant:

.note.gnu.build-id is included in a PT_LOAD and a PT_NOTE
.note.llvm.watermark is included in another PT_NOTE

There is also a question why .note.llvm.watermark should be flagged SHF_ALLOC if it is not supposed to be inspected at runtime.

lld/ELF/InputFiles.cpp

986

% ld.lld -r --watermark a.o -o b.o
ld.lld: error: Unable to apply watermark because no PT_LOAD segments were found!                               
ld.lld: ../projects/lld/ELF/Writer.cpp:2690: void WriteHash(std::vector<uint8_t> &, const size_t, const size_t,
size_t, const lld::elf::BuildIdKind): Assertion `first < last' failed.

lld/ELF/Writer.cpp

2778

error: missing 'typename' prior to dependent type name 'ELFType<llvm::support::big, false>::Ehdr'

It seems the main reason why you guys wanted to avoid an external tool is that it is too easy to forget to run the tool after link. But that can be fixed easily by writing a shell script as ld.lld which invokes the real ld.lld and a watermarking tool. Or, you could make a change to lld so that, after creating an executable file and before existing, lld invokes a watermarking tool on a file that the linker just created (in this configuration, the watermarking tool can either run in the same process or as a child process). Have you considered that approach? I think it is fine to add this feature directly to lld if it is convenient, but I'd like to explore other possibilities before we make a decision.

Besides that, there are a few technical concerns in this patch as below:

lld guarantees that the same build id will be computed only when the resulting output file (except the build id part itself) and the linker version are the same. We didn't guarantee that different versions of lld compute the build id in the same way. Actually, we have tweaked hash functions and the strategy for tree hash several times. This contract seems too weak for your use case -- for your use case, we need to guarantee that the way how we compute a hash value doesn't change over time. So, we need to make sure that the current way of hash computation is something that we want to maintain like forever.

If watermarking doesn't have to be fast (e.g. users only have to do this for release binaries), consider using a simple non-tree hash function.
Maybe you should add a version field or something to the note section so that we can change a hash function or something in the future.
The watermark feature is to make sure that the program image loaded to memory hasn't changed since its file is created. In that sense, the hash function seems a bit too fragile. If you move a segment within an ELF file, you'd have to change the file offset field of a program header, but the memory image won't change by doing that, so in the sense of watermarking, I don't think it should be considered a change.

In D66426#1664254, @ruiu wrote:

It seems the main reason why you guys wanted to avoid an external tool is that it is too easy to forget to run the tool after link. But that can be fixed easily by writing a shell script as ld.lld which invokes the real ld.lld and a watermarking tool. Or, you could make a change to lld so that, after creating an executable file and before existing, lld invokes a watermarking tool on a file that the linker just created (in this configuration, the watermarking tool can either run in the same process or as a child process). Have you considered that approach? I think it is fine to add this feature directly to lld if it is convenient, but I'd like to explore other possibilities before we make a decision.

Besides that, there are a few technical concerns in this patch as below:

lld guarantees that the same build id will be computed only when the resulting output file (except the build id part itself) and the linker version are the same. We didn't guarantee that different versions of lld compute the build id in the same way. Actually, we have tweaked hash functions and the strategy for tree hash several times. This contract seems too weak for your use case -- for your use case, we need to guarantee that the way how we compute a hash value doesn't change over time. So, we need to make sure that the current way of hash computation is something that we want to maintain like forever.

If watermarking doesn't have to be fast (e.g. users only have to do this for release binaries), consider using a simple non-tree hash function.

Maybe you should add a version field or something to the note section so that we can change a hash function or something in the future.

The watermark feature is to make sure that the program image loaded to memory hasn't changed since its file is created. In that sense, the hash function seems a bit too fragile. If you move a segment within an ELF file, you'd have to change the file offset field of a program header, but the memory image won't change by doing that, so in the sense of watermarking, I don't think it should be considered a change.

Unfortunately the shell available is not particularly capable and because performance is critical we are reluctant to pay the penalty for invocation of an external tool in this way. However, we are going to experiment with invoking a process with lld directly and examine the performance differences.

Of course we would like to maintain the watermark between versions of lld, so we will decouple the buildId and watermark functionality. The inclusion of a version in the note is a good idea should changes be needed in future.

Indeed the order of segments within the ELF file could change without affecting the image loaded to memory. We will modify the watermark computation so that it is agnostic to this ordering.

@ruiu observed that the watermark computation was reliant on the segment ordering in the ELF file. This ordering can change without affecting the loadable image. Therefore, we now apply an ordering based on the segment's virtual address when calculating the watermark.

@MaskRay our linker scripts prevent the watermark note section from being present in PT_LOAD segments. Logic has been added to prevent the section from being used in the watermark computation if it overlaps with a PT_LOAD. SHF_ALLOC has also been removed.

The readobj functionality for the new note section has been moved to D70316.

We carried out extensive tests using an external process to compute the watermark but unfortunately the penalty for accessing the file even when in the system file cache was greater than we are willing to pay.

Herald added a subscriber: mgrang. · View Herald TranscriptNov 19 2019, 5:58 AM

I think this feature is worth more attention. There may be someone who wants to start using this once it's landed, and I'd make sure that we satisfy their needs. Do you mind if I ask you to start a thread on llvm-dev to propose this feature? I think that even a comment like "we'll use this" is valuable. tlike "

lld/ELF/Writer.cpp
2755	Do you think you can move the new code to watermark.{cpp,h} and add file comment to explain (1) what this is and (2) how watermark is computed?

In D66426#1664187, @MaskRay wrote:
This patch introduces a new ELF extension with a new tag: NT_LLVM_WATERMARK. Last times such ELF extensions were made accompanying RFCs were posted:

https://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html
https://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html

I still have some concerns about the extension, especially its generality on the widely used ELF platforms.

We would like to have two PT_NOTEs, one for use by the OS and another for use by tooling. This is achieved by linker scripts. The second PT_NOTE is outside of any PT_LOAD and this is where the watermark would be housed. As described above, a required property of the watermark is that it can be recalculated by an external tool to infer whether or not the loadable parts of the ELF have been modified post-link. By having the watermark outside of any PT_LOAD, it is simpler for the external tool to recalculate. For a similar reason, it would actually be better in our case to have the watermark calculated after the build ID value has been "filled-in", as the build ID is inside a PT_LOAD.

Sorry, I am not following. Do you have a concrete readelf -Sl dump to help my understanding?
By linking a trivial executable with ld.lld a.o --watermark -o a, what I see is:
 PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x000118 0x000118 R   0x8
 LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000174 0x000174 R   0x1000
 GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
 NOTE           0x000158 0x0000000000200158 0x0000000000200158 0x00001c 0x00001c R   0x4

Section to Segment mapping:
 Segment Sections...
  00
  01     .note.llvm.watermark
  02
  03     .note.llvm.watermark
.note.llvm.watermark is still included in a PT_LOAD segment. I think you probably meant:

.note.gnu.build-id is included in a PT_LOAD and a PT_NOTE

.note.llvm.watermark is included in another PT_NOTE

There is also a question why .note.llvm.watermark should be flagged SHF_ALLOC if it is not supposed to be inspected at runtime.

I'll provide an readelf -Sl dump that will illustrate the note to segment mapping.

In D66426#1752900, @ruiu wrote:

I think this feature is worth more attention. There may be someone who wants to start using this once it's landed, and I'd make sure that we satisfy their needs. Do you mind if I ask you to start a thread on llvm-dev to propose this feature? I think that even a comment like "we'll use this" is valuable. tlike "

Of course, I will post to llvm-dev.

chrisjackson marked an inline comment as done.Nov 20 2019, 8:37 AM

chrisjackson added inline comments.

lld/ELF/Writer.cpp
2755	I have begun work on a revision with the watermarking code in a separate library so that it can be shared with the utility that checks the watermark.

aganea added a subscriber: aganea.Nov 21 2019, 5:28 AM

JonChesterfield added a subscriber: JonChesterfield.Nov 27 2019, 3:10 AM

I sympathize with the requirement to tell whether a binary has been edited after the link step. E.g. one could then raise an error from the loader.

Writing 8 bytes to a known location in the binary can't achieve that. Whatever post link modification is performed will recalculate and update the hash in the binary. If I understand correctly, the plan is to enhance a llvm binary utility to conveniently perform this updating, or at least to provide the 8 bytes to be written into the known location by any other tool.

So I see the cost - lld and other tools get more complicated - and I see the requirement - but I can't see how the proposed change meets the requirement.

In D66426#1761510, @JonChesterfield wrote:

I sympathize with the requirement to tell whether a binary has been edited after the link step. E.g. one could then raise an error from the loader.

Writing 8 bytes to a known location in the binary can't achieve that. Whatever post link modification is performed will recalculate and update the hash in the binary. If I understand correctly, the plan is to enhance a llvm binary utility to conveniently perform this updating, or at least to provide the 8 bytes to be written into the known location by any other tool.

So I see the cost - lld and other tools get more complicated - and I see the requirement - but I can't see how the proposed change meets the requirement.

A post-link modification could recalculate and update the hash, but this would only occur in a deliberate attempt to subvert the watermark mechanism. The watermark is not intended to detect all, e.g. nefarious, post-link modifications. It is not a security feature.

In D66426#1761836, @chrisjackson wrote:

A post-link modification could recalculate and update the hash, but this would only occur in a deliberate attempt to subvert the watermark mechanism

I think it follows that this patch only detects accidental modifications to the binary that occur after linking. That seems to put it in the realm of network transmission errors, disk bit rot, optical media errors and so forth.

In which case, why only guard a subset of the binary, instead of computing a sha256 of all the compiled artifacts and checking that at install/network copy time? Then there is again no linker patch required.

Unless this is intended to catch people who deliberately change the binary, but lack the skills to then update the hash, which is surely vanishingly few people. Fewer when provided with convenient tools to recalculate the hash.

@chrisjackson You replied via email, so there is no record on Phabricator. I am attaching your response below.

In D66426#1766527, @JonChesterfield wrote:

In D66426#1761836, @chrisjackson wrote:

A post-link modification could recalculate and update the hash, but this would only occur in a deliberate attempt to subvert the watermark mechanism

I think it follows that this patch only detects accidental modifications to the binary that occur after linking. That seems to put it in the realm of network transmission errors, disk bit rot, optical media errors and so forth.

In which case, why only guard a subset of the binary, instead of computing a sha256 of all the compiled artifacts and checking that at install/network copy time? Then there is again no linker patch required.

Unless this is intended to catch people who deliberately change the binary, but lack the skills to then update the hash, which is surely vanishingly few people. Fewer when provided with convenient tools to recalculate the hash.

@chrisjackson wrote:
The watermark is intended to detect changes in the loadable image of the binary, not all of the ELF file e.g. ignore debug data. As you've stated, it is there to detect post-link modifications to the loadable segments.

https://lists.llvm.org/pipermail/llvm-dev/2019-November/137319.html

The whole point of the watermark is to show that no post-link modifications have been made, and if the watermark itself is added post-link, it does not achieve this aim: someone could either deliberately or accidentally add a step prior to the watermarking happening.

I am still confused. What I infer from the sentence is that strip/llvm-strip is still allowed. To make .note.llvm.watermark survive strip/llvm-strip, you place it into a PT_NOTE segment. So post-link modification is still possible, then why can't you use another tool to compute the watermark and append a section? In my comment, there are some other questions that are not answered. I have suggested an approach that will not slow down the whole build time.

In D66426#1766527, @JonChesterfield wrote:

In D66426#1761836, @chrisjackson wrote:

A post-link modification could recalculate and update the hash, but this would only occur in a deliberate attempt to subvert the watermark mechanism

I think it follows that this patch only detects accidental modifications to the binary that occur after linking. That seems to put it in the realm of network transmission errors, disk bit rot, optical media errors and so forth.

In which case, why only guard a subset of the binary, instead of computing a sha256 of all the compiled artifacts and checking that at install/network copy time? Then there is again no linker patch required.

Unless this is intended to catch people who deliberately change the binary, but lack the skills to then update the hash, which is surely vanishingly few people. Fewer when provided with convenient tools to recalculate the hash.

The watermark is intended to detect changes in the loadable image of the binary, not all of the ELF file e.g. ignore debug data. As you've stated, it is there to detect post-link modifications to the loadable segments.

In D66426#1767580, @MaskRay wrote:

The whole point of the watermark is to show that no post-link modifications have been made, and if the watermark itself is added post-link, it does not achieve this aim: someone could either deliberately or accidentally add a step prior to the watermarking happening.

I am still confused. What I infer from the sentence is that strip/llvm-strip is still allowed. To make .note.llvm.watermark survive strip/llvm-strip, you place it into a PT_NOTE segment. So post-link modification is still possible, then why can't you use another tool to compute the watermark and append a section? In my comment, there are some other questions that are not answered. I have suggested an approach that will not slow down the whole build time.

(For clarity, @chrisjackson and I are on the same team, and I've been helping him with this). The problem with any post-link external tool used to create the watermark is that it doesn't prevent something happening between the link step and the watermarking step. For example, this would not be detected: 1) Do link; 2) Make a modification to the .data section; 3) Run the watermark tool.

Stripping (and other things that don't affect the loadable image) is allowed, because it doesn't affect the loadable image. The aim of the watermark is to detect loadable data changes.

Build times should include the watermarking process, since that is part of creating a release build (just as e.g. llvm-strip etc should be too). Thus, saying that an external tool will not slow down build times is incorrect.

In my comment, there are some other questions that are not answered. I have suggested an approach that will not slow down the whole build time.

I think you are referring to adding this to llvm-objcopy, yes? I think my above point should address this.

Watermark functionality now placed in separate source and header in the object library. This is used by Writer and SyntheticSections.

Herald added subscribers: hiraditya, mgorny. · View Herald TranscriptDec 5 2019, 11:21 AM

In D66426#1768793, @jhenderson wrote:

In D66426#1767580, @MaskRay wrote:

The whole point of the watermark is to show that no post-link modifications have been made, and if the watermark itself is added post-link, it does not achieve this aim: someone could either deliberately or accidentally add a step prior to the watermarking happening.

I am still confused. What I infer from the sentence is that strip/llvm-strip is still allowed. To make .note.llvm.watermark survive strip/llvm-strip, you place it into a PT_NOTE segment. So post-link modification is still possible, then why can't you use another tool to compute the watermark and append a section? In my comment, there are some other questions that are not answered. I have suggested an approach that will not slow down the whole build time.

(For clarity, @chrisjackson and I are on the same team, and I've been helping him with this). The problem with any post-link external tool used to create the watermark is that it doesn't prevent something happening between the link step and the watermarking step. For example, this would not be detected: 1) Do link; 2) Make a modification to the .data section; 3) Run the watermark tool.

Stripping (and other things that don't affect the loadable image) is allowed, because it doesn't affect the loadable image. The aim of the watermark is to detect loadable data changes.

As I understand it, the scenario is:

Do link; 2) Run the watermark tool to append .note.llvm.watermark; 3) Release SDK; 4) Downstream vendors modify .data and ship to end users; 5) End users verify that .note.llvm.watermark does not match computed watermark of loadable contents.

The build process before 3) are all controlled. The process should ensure there is no modification to .data between 1) and 2). How do you guarantee a linker side feature can prevent modification? How can you prevent the following:

Do link and generate .note.llvm.watermark in one step 1.5) Modify .data 2) Run the watermark tool to update .note.llvm.watermark

Build times should include the watermarking process, since that is part of creating a release build (just as e.g. llvm-strip etc should be too). Thus, saying that an external tool will not slow down build times is incorrect.

The watermark tool can append .note.llvm.watermark to the executable. It just has to rewrite the section header table at the end of the ELF (usually a few hundred bytes). This is not slower than a built-in linker feature.

In my comment, there are some other questions that are not answered. I have suggested an approach that will not slow down the whole build time.

I think you are referring to adding this to llvm-objcopy, yes? I think my above point should address this.

In D66426#1773537, @MaskRay wrote:

In D66426#1768793, @jhenderson wrote:

In D66426#1767580, @MaskRay wrote:

The whole point of the watermark is to show that no post-link modifications have been made, and if the watermark itself is added post-link, it does not achieve this aim: someone could either deliberately or accidentally add a step prior to the watermarking happening.

I am still confused. What I infer from the sentence is that strip/llvm-strip is still allowed. To make .note.llvm.watermark survive strip/llvm-strip, you place it into a PT_NOTE segment. So post-link modification is still possible, then why can't you use another tool to compute the watermark and append a section? In my comment, there are some other questions that are not answered. I have suggested an approach that will not slow down the whole build time.

(For clarity, @chrisjackson and I are on the same team, and I've been helping him with this). The problem with any post-link external tool used to create the watermark is that it doesn't prevent something happening between the link step and the watermarking step. For example, this would not be detected: 1) Do link; 2) Make a modification to the .data section; 3) Run the watermark tool.

Stripping (and other things that don't affect the loadable image) is allowed, because it doesn't affect the loadable image. The aim of the watermark is to detect loadable data changes.

As I understand it, the scenario is:

Do link; 2) Run the watermark tool to append .note.llvm.watermark; 3) Release SDK; 4) Downstream vendors modify .data and ship to end users; 5) End users verify that .note.llvm.watermark does not match computed watermark of loadable contents.

The build process before 3) are all controlled. The process should ensure there is no modification to .data between 1) and 2). How do you guarantee a linker side feature can prevent modification? How can you prevent the following:

Do link and generate .note.llvm.watermark in one step 1.5) Modify .data 2) Run the watermark tool to update .note.llvm.watermark

This is a motivation to not have an external tool on the watermarking. That being said, as @chrisjackson has said on more than one occasion, this isn't intended to be a security feature so we are not attempting to detect a malicious attacker. It could be possible for downstream LLD producers to add a local salt to the watermark to make it more secure, should they so choose, I suppose.

Build times should include the watermarking process, since that is part of creating a release build (just as e.g. llvm-strip etc should be too). Thus, saying that an external tool will not slow down build times is incorrect.

The watermark tool can append .note.llvm.watermark to the executable. It just has to rewrite the section header table at the end of the ELF (usually a few hundred bytes). This is not slower than a built-in linker feature.

It really doesn't matter which is slower: if an external watermarking tool is not part of the linker, things can be done post-link and therefore the thing being watermarked is not the output of the linker. The main goal of this is to show that there have been no changes to the loadable part of the linker output, whether controlled by the user of the linker or not.

I'm not sure if you've missed how the process is expected to go. Here's a recap:

User calls linker with --watermark specified, producing output.elf with a watermark.
Optionally a user might choose to do modifications that don't affect non-loadable data, e.g. llvm-objcopy --strip-all output.elf etc.
A user (probably a different user) runs a tool to validate that the watermark remains correct. If not, they can report it to the producer of output.elf.

Any attempt to modify e.g. .data between 1 and 3, either accidentally or deliberately, will be detected, unless the user explicitly tries to defeat the watermarking, which as mentioned is not something this feature on its own tries to detect. Having an external tool that does the watermarking post-link (i.e. after step 1) would allow users to make the modifications before watermarking, which is explicitly what the watermarking is trying to detect (and therefore wouldn't in this case).

Modified extractSegments() in watermark.h.

As I understand it, the scenario is:

Do link; 2) Run the watermark tool to append .note.llvm.watermark; 3) Release SDK; 4) Downstream vendors modify .data and ship to end users; 5) End users verify that .note.llvm.watermark does not match computed watermark of loadable contents.

The build process before 3) are all controlled. The process should ensure there is no modification to .data between 1) and 2). How do you guarantee a linker side feature can prevent modification? How can you prevent the following:

Do link and generate .note.llvm.watermark in one step 1.5) Modify .data 2) Run the watermark tool to update .note.llvm.watermark

This is a motivation to not have an external tool on the watermarking. That being said, as @chrisjackson has said on more than one occasion, this isn't intended to be a security feature so we are not attempting to detect a malicious attacker. It could be possible for downstream LLD producers to add a local salt to the watermark to make it more secure, should they so choose, I suppose.

This is actually going to be an interesting problem. Do your users make post-link modifications to executables by intention or by accident? If it's intentional, you are raising a bar of an arms race, and they'll catch up by adding --update-watermark option or something to their tool, so that they'll update a watermark when a binary is modified, which nullifies the point of this change.

In D66426#1778936, @ruiu wrote:

As I understand it, the scenario is:

Do link; 2) Run the watermark tool to append .note.llvm.watermark; 3) Release SDK; 4) Downstream vendors modify .data and ship to end users; 5) End users verify that .note.llvm.watermark does not match computed watermark of loadable contents.

The build process before 3) are all controlled. The process should ensure there is no modification to .data between 1) and 2). How do you guarantee a linker side feature can prevent modification? How can you prevent the following:

Do link and generate .note.llvm.watermark in one step 1.5) Modify .data 2) Run the watermark tool to update .note.llvm.watermark

This is a motivation to not have an external tool on the watermarking. That being said, as @chrisjackson has said on more than one occasion, this isn't intended to be a security feature so we are not attempting to detect a malicious attacker. It could be possible for downstream LLD producers to add a local salt to the watermark to make it more secure, should they so choose, I suppose.

This is actually going to be an interesting problem. Do your users make post-link modifications to executables by intention or by accident? If it's intentional, you are raising a bar of an arms race, and they'll catch up by adding --update-watermark option or something to their tool, so that they'll update a watermark when a binary is modified, which nullifies the point of this change.

The watermark is intended as a safety measure for several scenarios post-link.

A user deliberately modifies a loadable segment but is unaware that they shouldn't.
A user accidentally modifies a loadable segment.
A tool somewhere in the build system has unexpected behaviour that modifies a loadable segment.

If a user intentionally modifies a loadable segment and updates the watermark, then this is nefarious behaviour that the watermark is not intended to prevent.

Sorry for asking too many questions, but how do you verify that a watermark matches the contents? Are you creating a new command?

In D66426#1781107, @ruiu wrote:

Sorry for asking too many questions, but how do you verify that a watermark matches the contents? Are you creating a new command?

@chrisjackson has proposed it as an option to add to llvm-readobj (see D70316).

jhenderson mentioned this in D70316: [llvm-readobj] Allow printing of the watermark note section proposed in D66426.Dec 12 2019, 2:22 AM

Partially guarding against a user accidentally or incompetently modifying a binary isn't sufficiently useful to justify adding code to lld in my opinion.

In the spirit of (belated) full disclosure, I'm following this patch because I recognise a similar feature from a proprietary linker and it makes me sad to see it replicated.

However, I haven't contributed any code to lld, so will bow out at this point. It's not my call.

Ping

Hello @ruiu and @MaskRay , can I help with any more technical queries concerning this proposal? Also with respect to the associated D70316 that enables printing of watermark note sections and computing of watermarks in readobj?

ping

Apologies for the late reply.

http://lists.llvm.org/pipermail/llvm-dev/2019-November/137108.html does not evoke many responses. I take the lack of responses on the RFC and questions from @JonChesterfield and @ruiu as people are still doubting whether this feature will be generic enough to benefit the community, rather than a feature used in a very specific scenario, which can be easily replaced by another tool. I have also asked some other folks but haven't receive positive reaction yet. I concur with what @ruiu said. I feel this just raises the bar of an arm race, which may not be necessary in the first place if the process can be improved (why can the build system modify contents after linking and before a binary manipulation tool like llvm-objcopy?).

I think this feature, as it stands, is not quite justified as a linker feature. Adding to llvm-objcopy is probably fine but I hope we can aim for something bigger. The statement "compliant with the system ABI" is pretty vague. Its defined meaning here is: "if we are byte identical, then we are ABI compliant." This is obviously too strong and does not reflect the real fact. If we want to make sure external symbols (part of ABI) do not change, we can use something like interface shared objects. Fedora is doing something more generic. https://fedoraproject.org/wiki/Toolchain/Watermark I wish if we want to add an LLVM watermark, we can make it more generally useful.

In D66426#1814738, @MaskRay wrote:

Apologies for the late reply.

http://lists.llvm.org/pipermail/llvm-dev/2019-November/137108.html does not evoke many responses. I take the lack of responses on the RFC and questions from @JonChesterfield and @ruiu as people are still doubting whether this feature will be generic enough to benefit the community, rather than a feature used in a very specific scenario, which can be easily replaced by another tool.

While current applicability of this feature does appear to be limited due to the lack of responses, I don't think the proposed feature is alone in its limited userbase. For example, I'm not sure how widespread the use of ELF partitioning is, which is much more intrusive (To be clear, I'm not suggesting partitioning should not be part of lld). The feature must be part of the linker as an external tool does not prevent a modification post-link (see @edd's earlier comment), thus defeating the purpose of the watermark.

It was my understanding that I had the code owner's blessing for the watermarking feature but of course this may have changed.

I have also asked some other folks but haven't receive positive reaction yet. I concur with what @ruiu said. I feel this just raises the bar of an arm race, which may not be necessary in the first place if the process can be improved (why can the build system modify contents after linking and before a binary manipulation tool like llvm-objcopy?).

There cannot be an arms race as this is not a security feature. I think perhaps this feature is best thought of as build-id for PT_LOAD segments. It provides a tool for detecting build systems that act in this questionable manner. Why the build systems behave this way would be determined and corrected afterwards. Would a change in name to something like 'loadable-build-id' cause less confusion?

I think this feature, as it stands, is not quite justified as a linker feature. Adding to llvm-objcopy is probably fine but I hope we can aim for something bigger.
The statement "compliant with the system ABI" is pretty vague. Its defined meaning here is: "if we are byte identical, then we are ABI compliant." This is obviously too strong and does not reflect the real fact. If we want to make sure external symbols (part of ABI) do not change, we can use something like interface shared objects. Fedora is doing something more generic. https://fedoraproject.org/wiki/Toolchain/Watermark I wish if we want to add an LLVM watermark, we can make it more generally useful.

While this is a very strong statement, it is true. We don't think that the watermark feature is a perfect ABI compliance checking utility (indeed perhaps referring to the ABI was a mistake on my part), but it does enforce the requirements @edd highlighted.

ping

It would seem that there just isn't sufficient interest in this, so I'm going to mark this revision as abandoned if there are no further comments.

chrisjackson abandoned this revision.Feb 19 2020, 1:39 AM

Revision Contents

Path

Size

lld/

ELF/

1 line

1 line

12 lines

4 lines

20 lines

SyntheticSections.cpp

19 lines

Writer.cpp

54 lines

test/

ELF/

watermark.s

24 lines

llvm/

include/

llvm/

BinaryFormat/

ELF.h

1 line

Object/

Watermark.h

84 lines

lib/

Object/

CMakeLists.txt

1 line

Watermark.cpp

96 lines

test/

Object/

watermark.test

109 lines

Diff 232819

lld/ELF/Config.h

Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	struct Configuration {
bool tocOptimize;		bool tocOptimize;
bool undefinedVersion;		bool undefinedVersion;
bool useAndroidRelrTags = false;		bool useAndroidRelrTags = false;
bool warnBackrefs;		bool warnBackrefs;
bool warnCommon;		bool warnCommon;
bool warnIfuncTextrel;		bool warnIfuncTextrel;
bool warnMissingEntry;		bool warnMissingEntry;
bool warnSymbolOrdering;		bool warnSymbolOrdering;
		bool watermark;
bool writeAddends;		bool writeAddends;
bool zCombreloc;		bool zCombreloc;
bool zCopyreloc;		bool zCopyreloc;
bool zGlobal;		bool zGlobal;
bool zHazardplt;		bool zHazardplt;
bool zIfuncNoplt;		bool zIfuncNoplt;
bool zInitfirst;		bool zInitfirst;
bool zInterpose;		bool zInterpose;
▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

lld/ELF/Driver.cpp

Show First 20 Lines • Show All 960 Lines • ▼ Show 20 Lines	static void readConfigs(opt::InputArgList &args) {
config->unresolvedSymbols = getUnresolvedSymbolPolicy(args);		config->unresolvedSymbols = getUnresolvedSymbolPolicy(args);
config->warnBackrefs =		config->warnBackrefs =
args.hasFlag(OPT_warn_backrefs, OPT_no_warn_backrefs, false);		args.hasFlag(OPT_warn_backrefs, OPT_no_warn_backrefs, false);
config->warnCommon = args.hasFlag(OPT_warn_common, OPT_no_warn_common, false);		config->warnCommon = args.hasFlag(OPT_warn_common, OPT_no_warn_common, false);
config->warnIfuncTextrel =		config->warnIfuncTextrel =
args.hasFlag(OPT_warn_ifunc_textrel, OPT_no_warn_ifunc_textrel, false);		args.hasFlag(OPT_warn_ifunc_textrel, OPT_no_warn_ifunc_textrel, false);
config->warnSymbolOrdering =		config->warnSymbolOrdering =
args.hasFlag(OPT_warn_symbol_ordering, OPT_no_warn_symbol_ordering, true);		args.hasFlag(OPT_warn_symbol_ordering, OPT_no_warn_symbol_ordering, true);
		config->watermark = args.hasFlag(OPT_watermark, OPT_no_watermark, false);
		MaskRayUnsubmitted Not Done Reply Inline Actions See `args.hasFlag` above. MaskRay: See `args.hasFlag` above.
config->zCombreloc = getZFlag(args, "combreloc", "nocombreloc", true);		config->zCombreloc = getZFlag(args, "combreloc", "nocombreloc", true);
config->zCopyreloc = getZFlag(args, "copyreloc", "nocopyreloc", true);		config->zCopyreloc = getZFlag(args, "copyreloc", "nocopyreloc", true);
config->zGlobal = hasZOption(args, "global");		config->zGlobal = hasZOption(args, "global");
config->zGnustack = getZGnuStack(args);		config->zGnustack = getZGnuStack(args);
config->zHazardplt = hasZOption(args, "hazardplt");		config->zHazardplt = hasZOption(args, "hazardplt");
config->zIfuncNoplt = hasZOption(args, "ifunc-noplt");		config->zIfuncNoplt = hasZOption(args, "ifunc-noplt");
config->zInitfirst = hasZOption(args, "initfirst");		config->zInitfirst = hasZOption(args, "initfirst");
config->zInterpose = hasZOption(args, "interpose");		config->zInterpose = hasZOption(args, "interpose");
▲ Show 20 Lines • Show All 1,033 Lines • Show Last 20 Lines

lld/ELF/InputFiles.cpp

Show First 20 Lines • Show All 739 Lines • ▼ Show 20 Lines	static uint32_t readAndFeatures(ObjFile<ELFT> *obj, ArrayRef<uint8_t> data) {
while (!data.empty()) {		while (!data.empty()) {
// Read one NOTE record.		// Read one NOTE record.
if (data.size() < sizeof(Elf_Nhdr))		if (data.size() < sizeof(Elf_Nhdr))
fatal(toString(obj) + ": .note.gnu.property: section too short");		fatal(toString(obj) + ": .note.gnu.property: section too short");

auto nhdr = reinterpret_cast<const Elf_Nhdr >(data.data());		auto nhdr = reinterpret_cast<const Elf_Nhdr >(data.data());
if (data.size() < nhdr->getSize())		if (data.size() < nhdr->getSize())
fatal(toString(obj) + ": .note.gnu.property: section too short");		fatal(toString(obj) + ": .note.gnu.property: section too short");

Elf_Note note(*nhdr);		Elf_Note note(*nhdr);
if (nhdr->n_type != NT_GNU_PROPERTY_TYPE_0 \|\| note.getName() != "GNU") {		if (nhdr->n_type != NT_GNU_PROPERTY_TYPE_0 \|\| note.getName() != "GNU") {
data = data.slice(nhdr->getSize());		data = data.slice(nhdr->getSize());
continue;		continue;
}		}

uint32_t featureAndType = config->emachine == EM_AARCH64		uint32_t featureAndType = config->emachine == EM_AARCH64
? GNU_PROPERTY_AARCH64_FEATURE_1_AND		? GNU_PROPERTY_AARCH64_FEATURE_1_AND
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	InputSectionBase *ObjFile<ELFT>::createInputSection(const Elf_Shdr &sec) {
// files contain definitions of symbol "__x86.get_pc_thunk.bx" in linkonce		// files contain definitions of symbol "__x86.get_pc_thunk.bx" in linkonce
// sections. Drop those sections to avoid duplicate symbol errors.		// sections. Drop those sections to avoid duplicate symbol errors.
// FIXME: This is glibc PR20543, we should remove this hack once that has been		// FIXME: This is glibc PR20543, we should remove this hack once that has been
// fixed for a while.		// fixed for a while.
if (name == ".gnu.linkonce.t.__x86.get_pc_thunk.bx" \|\|		if (name == ".gnu.linkonce.t.__x86.get_pc_thunk.bx" \|\|
name == ".gnu.linkonce.t.__i686.get_pc_thunk.bx")		name == ".gnu.linkonce.t.__i686.get_pc_thunk.bx")
return &InputSection::discarded;		return &InputSection::discarded;

// If we are creating a new .build-id section, strip existing .build-id		// If we are creating a new .build-id section or watermark, strip existing
// sections so that the output won't have more than one .build-id.		// sections so that the output won't have more than one.
// This is not usually a problem because input object files normally don't		// This is not usually a problem because input object files normally don't
// have .build-id sections, but you can create such files by		// have .build-id sections or watermark, but you can create such files by
// "ld.{bfd,gold,lld} -r --build-id", and we want to guard against it.		// "ld.{bfd,gold,lld} -r --build-id/--watermark", and we want to guard against it.
		MaskRayUnsubmitted Not Done Reply Inline Actions % ld.lld -r --watermark a.o -o b.o ld.lld: error: Unable to apply watermark because no PT_LOAD segments were found! ld.lld: ../projects/lld/ELF/Writer.cpp:2690: void WriteHash(std::vector<uint8_t> &, const size_t, const size_t, size_t, const lld::elf::BuildIdKind): Assertion `first < last' failed. MaskRay: ``` % ld.lld -r --watermark a.o -o b.o ld.lld: error: Unable to apply watermark because no…
if (name == ".note.gnu.build-id" && config->buildId != BuildIdKind::None)		if (name == ".note.gnu.build-id" && config->buildId != BuildIdKind::None \|\| name == ".note.llvm.watermark")
return &InputSection::discarded;		return &InputSection::discarded;

// The linker merges EH (exception handling) frames and creates a		// The linker merges EH (exception handling) frames and creates a
// .eh_frame_hdr section for runtime. So we handle them with a special		// .eh_frame_hdr section for runtime. So we handle them with a special
// class. For relocatable outputs, they are just passed through.		// class. For relocatable outputs, they are just passed through.
if (name == ".eh_frame" && !config->relocatable)		if (name == ".eh_frame" && !config->relocatable)
return make<EhInputSection>(*this, sec, name);		return make<EhInputSection>(*this, sec, name);

▲ Show 20 Lines • Show All 619 Lines • Show Last 20 Lines

lld/ELF/Options.td

Show First 20 Lines • Show All 422 Lines • ▼ Show 20 Lines	defm wrap: Eq<"wrap", "Use wrapper functions for symbol">,
MetaVarName<"<symbol>=<symbol>">;		MetaVarName<"<symbol>=<symbol>">;

def z: JoinedOrSeparate<["-"], "z">, MetaVarName<"<option>">,		def z: JoinedOrSeparate<["-"], "z">, MetaVarName<"<option>">,
HelpText<"Linker option extensions">;		HelpText<"Linker option extensions">;

def visual_studio_diagnostics_format : F<"vs-diagnostics">,		def visual_studio_diagnostics_format : F<"vs-diagnostics">,
HelpText<"Format diagnostics for Visual Studio compatibility">;		HelpText<"Format diagnostics for Visual Studio compatibility">;

		defm watermark : B<"watermark",
		"Enable the computation of a hash for loadable sections",
		"Disable the computation of a hash for loadable sections">;

// Aliases		// Aliases
def: Separate<["-"], "f">, Alias<auxiliary>, HelpText<"Alias for --auxiliary">;		def: Separate<["-"], "f">, Alias<auxiliary>, HelpText<"Alias for --auxiliary">;
def: F<"call_shared">, Alias<Bdynamic>, HelpText<"Alias for --Bdynamic">;		def: F<"call_shared">, Alias<Bdynamic>, HelpText<"Alias for --Bdynamic">;
def: F<"dy">, Alias<Bdynamic>, HelpText<"Alias for --Bdynamic">;		def: F<"dy">, Alias<Bdynamic>, HelpText<"Alias for --Bdynamic">;
def: F<"dn">, Alias<Bstatic>, HelpText<"Alias for --Bstatic">;		def: F<"dn">, Alias<Bstatic>, HelpText<"Alias for --Bstatic">;
def: F<"non_shared">, Alias<Bstatic>, HelpText<"Alias for --Bstatic">;		def: F<"non_shared">, Alias<Bstatic>, HelpText<"Alias for --Bstatic">;
def: F<"static">, Alias<Bstatic>, HelpText<"Alias for --Bstatic">;		def: F<"static">, Alias<Bstatic>, HelpText<"Alias for --Bstatic">;
def: Flag<["-"], "d">, Alias<define_common>, HelpText<"Alias for --define-common">;		def: Flag<["-"], "d">, Alias<define_common>, HelpText<"Alias for --define-common">;
▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

lld/ELF/SyntheticSections.h

Show All 19 Lines
#ifndef LLD_ELF_SYNTHETIC_SECTIONS_H		#ifndef LLD_ELF_SYNTHETIC_SECTIONS_H
#define LLD_ELF_SYNTHETIC_SECTIONS_H		#define LLD_ELF_SYNTHETIC_SECTIONS_H

#include "DWARF.h"		#include "DWARF.h"
#include "EhFrame.h"		#include "EhFrame.h"
#include "InputSection.h"		#include "InputSection.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/MC/StringTableBuilder.h"		#include "llvm/MC/StringTableBuilder.h"
		#include "llvm/Object/Watermark.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include <functional>		#include <functional>

namespace lld {		namespace lld {
namespace elf {		namespace elf {
class Defined;		class Defined;
struct PhdrEntry;		struct PhdrEntry;
class SymbolTableBaseSection;		class SymbolTableBaseSection;
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:
void writeTo(uint8_t *buf) override;		void writeTo(uint8_t *buf) override;
size_t getSize() const override { return headerSize + hashSize; }		size_t getSize() const override { return headerSize + hashSize; }
void writeBuildId(llvm::ArrayRef<uint8_t> buf);		void writeBuildId(llvm::ArrayRef<uint8_t> buf);

private:		private:
uint8_t *hashBuf;		uint8_t *hashBuf;
};		};

		// .note.llvm-watermark section.
		class WatermarkSection : public SyntheticSection {
		llvm::watermark::Watermarker watermarker;
		static const unsigned headerSize = 20;
		public:
		WatermarkSection();
		void writeTo(uint8_t *buf) override;
		size_t getSize() const override {
		return headerSize + watermarker.getVersionSize() +
		watermarker.getHashSize();
		}
		void writeWatermark(llvm::ArrayRef<uint8_t> buf);
		llvm::watermark::Watermarker& getWatermarker() { return watermarker; }

		private:
		uint8_t *watermarkBuf;
		};

// BssSection is used to reserve space for copy relocations and common symbols.		// BssSection is used to reserve space for copy relocations and common symbols.
// We create three instances of this class for .bss, .bss.rel.ro and "COMMON",		// We create three instances of this class for .bss, .bss.rel.ro and "COMMON",
// that are used for writable symbols, read-only symbols and common symbols,		// that are used for writable symbols, read-only symbols and common symbols,
// respectively.		// respectively.
class BssSection final : public SyntheticSection {		class BssSection final : public SyntheticSection {
public:		public:
BssSection(StringRef name, uint64_t size, uint32_t alignment);		BssSection(StringRef name, uint64_t size, uint32_t alignment);
void writeTo(uint8_t *) override {		void writeTo(uint8_t *) override {
▲ Show 20 Lines • Show All 944 Lines • ▼ Show 20 Lines	struct Partition {
EhFrameSection *ehFrame;		EhFrameSection *ehFrame;
GnuHashTableSection *gnuHashTab;		GnuHashTableSection *gnuHashTab;
HashTableSection *hashTab;		HashTableSection *hashTab;
RelocationBaseSection *relaDyn;		RelocationBaseSection *relaDyn;
RelrBaseSection *relrDyn;		RelrBaseSection *relrDyn;
VersionDefinitionSection *verDef;		VersionDefinitionSection *verDef;
SyntheticSection *verNeed;		SyntheticSection *verNeed;
VersionTableSection *verSym;		VersionTableSection *verSym;
		WatermarkSection *watermark;

unsigned getNumber() const { return this - &partitions[0] + 1; }		unsigned getNumber() const { return this - &partitions[0] + 1; }
};		};

extern Partition *mainPart;		extern Partition *mainPart;

inline Partition &SectionBase::getPartition() const {		inline Partition &SectionBase::getPartition() const {
assert(isLive());		assert(isLive());
Show All 34 Lines

lld/ELF/SyntheticSections.cpp

Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	void BuildIdSection::writeTo(uint8_t *buf) {
hashBuf = buf + 16;		hashBuf = buf + 16;
}		}

void BuildIdSection::writeBuildId(ArrayRef<uint8_t> buf) {		void BuildIdSection::writeBuildId(ArrayRef<uint8_t> buf) {
assert(buf.size() == hashSize);		assert(buf.size() == hashSize);
memcpy(hashBuf, buf.data(), hashSize);		memcpy(hashBuf, buf.data(), hashSize);
}		}

		WatermarkSection::WatermarkSection()
		MaskRayUnsubmitted Not Done Reply Inline Actions Delete `lld::elf::`. MaskRay: Delete `lld::elf::`.
		:SyntheticSection(0x00, SHT_NOTE, 4, ".note.llvm.watermark")
		{}

		void WatermarkSection::writeTo(uint8_t *buf) {
		write32(buf, 5); // Name size
		MaskRayUnsubmitted Not Done Reply Inline Actions 5 -> 8, otherwise it is incorrect to use `watermarkBuf = buf + 20;` MaskRay: 5 -> 8, otherwise it is incorrect to use `watermarkBuf = buf + 20;`
		rupprechtUnsubmitted Not Done Reply Inline Actions http://www.sco.com/developers/gabi/1998-04-29/ch5.pheader.html#note_section I think 5 is the correct value for namesz; padding exists in the note but is not included in the value of namesz rupprecht: http://www.sco.com/developers/gabi/1998-04-29/ch5.pheader.html#note_section I think 5 is the…
		write32(buf + 4, watermarker.getVersionSize() +
		rupprechtUnsubmitted Not Done Reply Inline Actions Content -> Descriptor rupprecht: Content -> Descriptor
		watermarker.getHashSize()); // Descriptor size
		write32(buf + 8, NT_LLVM_WATERMARK); // Type
		memcpy(buf + 12, "LLVM\0\0\0", 8); // Name string
		write32(buf + 20, watermarker.getVersion()); // Version
		watermarkBuf = buf + 20 + watermarker.getVersionSize();
		}
		MaskRayUnsubmitted Not Done Reply Inline Actions Delete `lld::elf::` Delete `llvm::` MaskRay: Delete `lld::elf::` Delete `llvm::`

		void WatermarkSection::writeWatermark(ArrayRef<uint8_t> buf) {
		assert(buf.size() == watermarker.getHashSize());
		memcpy(watermarkBuf, buf.data(), watermarker.getHashSize());
		}

BssSection::BssSection(StringRef name, uint64_t size, uint32_t alignment)		BssSection::BssSection(StringRef name, uint64_t size, uint32_t alignment)
: SyntheticSection(SHF_ALLOC \| SHF_WRITE, SHT_NOBITS, alignment, name) {		: SyntheticSection(SHF_ALLOC \| SHF_WRITE, SHT_NOBITS, alignment, name) {
this->bss = true;		this->bss = true;
this->size = size;		this->size = size;
}		}

EhFrameSection::EhFrameSection()		EhFrameSection::EhFrameSection()
: SyntheticSection(SHF_ALLOC, SHT_PROGBITS, 1, ".eh_frame") {}		: SyntheticSection(SHF_ALLOC, SHT_PROGBITS, 1, ".eh_frame") {}
▲ Show 20 Lines • Show All 3,327 Lines • Show Last 20 Lines

lld/ELF/Writer.cpp

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	private:
void checkSections();		void checkSections();
void fixSectionAlignments();		void fixSectionAlignments();
void openFile();		void openFile();
void writeTrapInstr();		void writeTrapInstr();
void writeHeader();		void writeHeader();
void writeSections();		void writeSections();
void writeSectionsBinary();		void writeSectionsBinary();
void writeBuildId();		void writeBuildId();
		void writeWatermark();

std::unique_ptr<FileOutputBuffer> &buffer;		std::unique_ptr<FileOutputBuffer> &buffer;

void addRelIpltSymbols();		void addRelIpltSymbols();
void addStartEndSymbols();		void addStartEndSymbols();
void addStartStopSymbols(OutputSection *sec);		void addStartStopSymbols(OutputSection *sec);

uint64_t fileSize;		uint64_t fileSize;
▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	if (!part.name.empty()) {
add(part.programHeaders);		add(part.programHeaders);
}		}

if (config->buildId != BuildIdKind::None) {		if (config->buildId != BuildIdKind::None) {
part.buildId = make<BuildIdSection>();		part.buildId = make<BuildIdSection>();
add(part.buildId);		add(part.buildId);
}		}

		if (config->watermark) {
		part.watermark = make<WatermarkSection>();
		add(part.watermark);
		}

part.dynStrTab = make<StringTableSection>(".dynstr", true);		part.dynStrTab = make<StringTableSection>(".dynstr", true);
part.dynSymTab = make<SymbolTableSection<ELFT>>(*part.dynStrTab);		part.dynSymTab = make<SymbolTableSection<ELFT>>(*part.dynStrTab);
part.dynamic = make<DynamicSection<ELFT>>();		part.dynamic = make<DynamicSection<ELFT>>();
if (config->androidPackDynRelocs)		if (config->androidPackDynRelocs)
part.relaDyn = make<AndroidPackedRelocationSection<ELFT>>(relaDynName);		part.relaDyn = make<AndroidPackedRelocationSection<ELFT>>(relaDynName);
else		else
part.relaDyn =		part.relaDyn =
make<RelocationSection<ELFT>>(relaDynName, config->zCombreloc);		make<RelocationSection<ELFT>>(relaDynName, config->zCombreloc);
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	if (!config->oFormatBinary) {
if (config->zSeparate != SeparateSegmentKind::None)		if (config->zSeparate != SeparateSegmentKind::None)
writeTrapInstr();		writeTrapInstr();
writeHeader();		writeHeader();
writeSections();		writeSections();
} else {		} else {
writeSectionsBinary();		writeSectionsBinary();
}		}

// Backfill .note.gnu.build-id section content. This is done at last		// Backfill .note.gnu.build-id section content. This is done late
// because the content is usually a hash value of the entire output file.		// because the content is usually a hash value of the entire output file.
writeBuildId();		writeBuildId();

		// Backfill the watermark section content.
		if (config->watermark)
		writeWatermark();

if (errorCount())		if (errorCount())
return;		return;

// Handle -Map and -cref options.		// Handle -Map and -cref options.
		MaskRayUnsubmitted Not Done Reply Inline Actions Backfill .note.llvm.watermark section content. This is similar to .note.gnu.build-id. MaskRay: Backfill .note.llvm.watermark section content. This is similar to .note.gnu.build-id.
writeMapFile();		writeMapFile();
writeCrossReferenceTable();		writeCrossReferenceTable();
if (errorCount())		if (errorCount())
return;		return;

if (auto e = buffer->commit())		if (auto e = buffer->commit())
error("failed to write to the output file: " + toString(std::move(e)));		error("failed to write to the output file: " + toString(std::move(e)));
}		}
▲ Show 20 Lines • Show All 1,673 Lines • ▼ Show 20 Lines	static uint64_t computeFileOffset(OutputSection *os, uint64_t off) {
// The first section in a PT_LOAD has to have congruent offset and address		// The first section in a PT_LOAD has to have congruent offset and address
// modulo the maximum page size.		// modulo the maximum page size.
if (os->ptLoad && os->ptLoad->firstSec == os)		if (os->ptLoad && os->ptLoad->firstSec == os)
return alignTo(off, os->ptLoad->p_align, os->addr);		return alignTo(off, os->ptLoad->p_align, os->addr);

// File offsets are not significant for .bss sections other than the first one		// File offsets are not significant for .bss sections other than the first one
// in a PT_LOAD. By convention, we keep section offsets monotonically		// in a PT_LOAD. By convention, we keep section offsets monotonically
// increasing rather than setting to zero.		// increasing rather than setting to zero.
if (os->type == SHT_NOBITS)		if (os->type == SHT_NOBITS)
return off;		return off;

// If the section is not in a PT_LOAD, we just have to align it.		// If the section is not in a PT_LOAD, we just have to align it.
if (!os->ptLoad)		if (!os->ptLoad)
return alignTo(off, os->alignment);		return alignTo(off, os->alignment);

// If two sections share the same PT_LOAD the file offset is calculated		// If two sections share the same PT_LOAD the file offset is calculated
// using this formula: Off2 = Off1 + (VA2 - VA1).		// using this formula: Off2 = Off1 + (VA2 - VA1).
OutputSection *first = os->ptLoad->firstSec;		OutputSection *first = os->ptLoad->firstSec;
▲ Show 20 Lines • Show All 423 Lines • ▼ Show 20 Lines	case BuildIdKind::Uuid:
break;		break;
default:		default:
llvm_unreachable("unknown BuildIdKind");		llvm_unreachable("unknown BuildIdKind");
}		}
for (Partition &part : partitions)		for (Partition &part : partitions)
part.buildId->writeBuildId(buildId);		part.buildId->writeBuildId(buildId);
}		}

		template <class ELFT> void Writer<ELFT>::writeWatermark() {
		ruiuUnsubmitted Not Done Reply Inline Actions Do you think you can move the new code to watermark.{cpp,h} and add file comment to explain (1) what this is and (2) how watermark is computed? ruiu: Do you think you can move the new code to watermark.{cpp,h} and add file comment to explain (1)…
		chrisjacksonAuthorUnsubmitted Done Reply Inline Actions I have begun work on a revision with the watermarking code in a separate library so that it can be shared with the utility that checks the watermark. chrisjackson: I have begun work on a revision with the watermarking code in a separate library so that it can…
		if (!mainPart->watermark \|\| !mainPart->watermark->getParent())
		return;

		watermark::Watermarker &w = mainPart->watermark->getWatermarker();

		std::vector<watermark::Segment> watermarkSegments =
		w.extractSegmentInfo<lld::elf::PhdrEntry *>(mainPart->phdrs);

		if (watermarkSegments.empty())
		MaskRayUnsubmitted Not Done Reply Inline Actions inline the only use of the variable MaskRay: inline the only use of the variable
		pccUnsubmitted Not Done Reply Inline Actions I guess the right thing to do in the case of multiple partitions would be to compute a separate hash for each partition. But this can always be changed later since the partitions feature is experimental. pcc: I guess the right thing to do in the case of multiple partitions would be to compute a separate…
		error("failed to compute watermark: no PT_LOAD segments were found");

		pccUnsubmitted Not Done Reply Inline Actions Should this exclude the ELF headers if present in a segment? The header fields e_shoff, e_shnum and e_shstrndx can and likely must be rewritten by strip and other tools. pcc: Should this exclude the ELF headers if present in a segment? The header fields e_shoff, e_shnum…
		size_t programHeaderTableOffset = 0;
		MaskRayUnsubmitted Not Done Reply Inline Actions Delete `parts` MaskRay: Delete `parts`
		size_t programHeaderTableSize = 0;

		auto It = std::find_if(
		mainPart->phdrs.begin(), mainPart->phdrs.end(),
		MaskRayUnsubmitted Not Done Reply Inline Actions `if (first >= last)` might be better (WriteHash asserts `first < last` though I haven't found a case where first can be equal to last) MaskRay: `if (first >= last)` might be better (WriteHash asserts `first < last` though I haven't found a…
		[](const PhdrEntry *pHdr) { return pHdr->p_type == PT_PHDR; });

		if (It != mainPart->phdrs.end()) {
		programHeaderTableOffset = (*It)->p_offset;
		programHeaderTableSize = (*It)->p_filesz;
		}

		MaskRayUnsubmitted Not Done Reply Inline Actions error: missing 'typename' prior to dependent type name 'ELFType<llvm::support::big, false>::Ehdr' MaskRay: error: missing 'typename' prior to dependent type name 'ELFType<llvm::support::big, false>…
		Expected<std::vector<uint8_t>> watermark = w.computeWatermark(
		watermarkSegments, sizeof(typename ELFT::Ehdr), programHeaderTableOffset,
		programHeaderTableSize, Out::bufferStart);

		if (!watermark) {
		error("failed to compute watermark: " +
		llvm::toString(watermark.takeError()));
		return;
		}

		mainPart->watermark->writeWatermark(*watermark);
		}

template void createSyntheticSections<ELF32LE>();		template void createSyntheticSections<ELF32LE>();
template void createSyntheticSections<ELF32BE>();		template void createSyntheticSections<ELF32BE>();
template void createSyntheticSections<ELF64LE>();		template void createSyntheticSections<ELF64LE>();
template void createSyntheticSections<ELF64BE>();		template void createSyntheticSections<ELF64BE>();

template void writeResult<ELF32LE>();		template void writeResult<ELF32LE>();
template void writeResult<ELF32BE>();		template void writeResult<ELF32BE>();
template void writeResult<ELF64LE>();		template void writeResult<ELF64LE>();
template void writeResult<ELF64BE>();		template void writeResult<ELF64BE>();

} // namespace elf		} // namespace elf
} // namespace lld		} // namespace lld

lld/test/ELF/watermark.s

This file was added.

				## Test that a watermark is placed in the correct section with the correct
				## alignment when using --watermark. Check also that the watermark can
				MaskRayUnsubmitted Not Done Reply Inline Actions `generated placed`? MaskRay: `generated placed`?
				## be disabled with --no-watermark and that watermark is disabled by default.

				# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
				# RUN: ld.lld %t -o %t.default
				MaskRayUnsubmitted Not Done Reply Inline Actions `-triple=x86_64 %s -o %t.o` (this is generic, not Linux specific). Use .o for object files. MaskRay: `-triple=x86_64 %s -o %t.o` (this is generic, not Linux specific). Use .o for object files.
				# RUN: llvm-readelf -S %t.default \| FileCheck -check-prefix=NOWATERMARK %s
				# RUN: ld.lld --no-watermark %t -o %t.nowatermark
				# RUN: llvm-readelf -S %t.nowatermark \| FileCheck -check-prefix=NOWATERMARK %s

				# NOWATERMARK-NOT: Name: .note.llvm.watermark

				# RUN: ld.lld --watermark %t -o %t.watermark
				# RUN: llvm-readelf -x .note.llvm.watermark %t.watermark \| FileCheck --strict-whitespace -check-prefix=CONTENT %s
				# RUN: llvm-readelf -S %t.watermark \| FileCheck -check-prefix=SECTION %s
				MaskRayUnsubmitted Not Done Reply Inline Actions Consider `llvm-readelf -S`. Its output is concise. MaskRay: Consider `llvm-readelf -S`. Its output is concise.

				MaskRayUnsubmitted Not Done Reply Inline Actions `llvm-readelf -x .note.llvm.watermark` (Prefer llvm-readelf -x over llvm-objdump -s`) MaskRay: `llvm-readelf -x .note.llvm.watermark` (Prefer llvm-readelf -x over llvm-objdump -s`)
				# SECTION: .note.llvm.watermark NOTE {{[0-9a-f]+}} {{[0-9a-f]+}} 000020 00 0 0 4
				# CONTENT: Hex dump of section '.note.llvm.watermark':
				# CONTENT-NEXT: 05000000 0c000000 04000000 4c4c564d ............LLVM
				# CONTENT-NEXT: 00000000 01000000 f9ceaa42 d8d7016d ...........B...m

				.globl _start
				_start:
				nop
				MaskRayUnsubmitted Not Done Reply Inline Actions [048C] MaskRay: [048C]

llvm/include/llvm/BinaryFormat/ELF.h

Show First 20 Lines • Show All 1,414 Lines • ▼ Show 20 Lines	enum : unsigned {
NT_FILE = 0x46494c45,		NT_FILE = 0x46494c45,
NT_PRXFPREG = 0x46e62b7f,		NT_PRXFPREG = 0x46e62b7f,
NT_SIGINFO = 0x53494749,		NT_SIGINFO = 0x53494749,
};		};

// LLVM-specific notes.		// LLVM-specific notes.
enum {		enum {
NT_LLVM_HWASAN_GLOBALS = 3,		NT_LLVM_HWASAN_GLOBALS = 3,
		NT_LLVM_WATERMARK = 4,
};		};

// GNU note types		// GNU note types
enum {		enum {
NT_GNU_ABI_TAG = 1,		NT_GNU_ABI_TAG = 1,
NT_GNU_HWCAP = 2,		NT_GNU_HWCAP = 2,
NT_GNU_BUILD_ID = 3,		NT_GNU_BUILD_ID = 3,
NT_GNU_GOLD_VERSION = 4,		NT_GNU_GOLD_VERSION = 4,
▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/include/llvm/Object/Watermark.h

This file was added.

				//===- Watermark.h ----------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares functions for calculating a watermark of loadable
				// sections. Clients must provide a pointer to the ELF file buffer, plus a
				// reference to a vector of program headers. If there is a .llvm.note.watermark
				// section then its offset should be provided.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_WATERMARK_H
				#define LLVM_WATERMARK_H

				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/BinaryFormat/ELF.h"
				#include "llvm/Support/Endian.h"
				#include "llvm/Support/Errc.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/ErrorHandling.h"

				#include "llvm/Support/Parallel.h"
				#include "llvm/Support/xxhash.h"

				#include <algorithm>
				#include <numeric>
				#include <tuple>

				namespace llvm {
				namespace watermark {
				/// Structure for the information necessary to include a segment in the
				/// watermark computation.
				struct Segment {
				size_t Offset;
				size_t Size;

				Segment(size_t Offset, size_t Size) : Offset(Offset), Size(Size) {}
				};

				class Watermarker {
				const uint32_t Version = 1u;
				const size_t HashSize = 8;

				public:
				/// Extracts the information required to calculate the watermark.
				template <typename PHdr,
				std::enable_if_t<!std::is_pointer<PHdr>::value, int> = 0>
				std::vector<Segment> extractSegmentInfo(llvm::ArrayRef<PHdr> ProgramHeaders) {
				std::vector<Segment> SegmentInfo;
				for (const PHdr &pHdr : ProgramHeaders)
				if (pHdr.p_type == PT_LOAD)
				SegmentInfo.emplace_back(pHdr.p_offset, pHdr.p_filesz);

				return SegmentInfo;
				}

				template <typename PHdr,
				std::enable_if_t<std::is_pointer<PHdr>::value, int> = 0>
				std::vector<Segment> extractSegmentInfo(llvm::ArrayRef<PHdr> ProgramHeaders) {
				std::vector<Segment> SegmentInfo;
				for (const PHdr pHdr : ProgramHeaders)
				if (pHdr->p_type == PT_LOAD)
				SegmentInfo.emplace_back(pHdr->p_offset, pHdr->p_filesz);
				return SegmentInfo;
				}

				Watermarker(){};
				size_t getHashSize() const { return HashSize; }
				size_t getVersion() const { return Version; }
				size_t getVersionSize() const { return sizeof(Version); }

				/// Compute the watermark, omitting the program header table and ELF header
				llvm::Expected<std::vector<uint8_t>>
				computeWatermark(std::vector<Segment> &InputSegments, size_t ElfHeaderSize,
				size_t ProgramHeaderTableOffset,
				size_t ProgramHeaderTableSize, const uint8_t *Data);
				};
				} // namespace watermark
				} // namespace llvm
				#endif // LLVM_WATERMARK_H

llvm/lib/Object/CMakeLists.txt

Show All 18 Lines	add_llvm_component_library(LLVMObject
ObjectFile.cpp		ObjectFile.cpp
RecordStreamer.cpp		RecordStreamer.cpp
RelocationResolver.cpp		RelocationResolver.cpp
SymbolicFile.cpp		SymbolicFile.cpp
SymbolSize.cpp		SymbolSize.cpp
TapiFile.cpp		TapiFile.cpp
TapiUniversal.cpp		TapiUniversal.cpp
WasmObjectFile.cpp		WasmObjectFile.cpp
		Watermark.cpp
WindowsMachineFlag.cpp		WindowsMachineFlag.cpp
WindowsResource.cpp		WindowsResource.cpp
XCOFFObjectFile.cpp		XCOFFObjectFile.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Object		${LLVM_MAIN_INCLUDE_DIR}/llvm/Object

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
llvm_vcsrevision_h		llvm_vcsrevision_h
)		)

llvm/lib/Object/Watermark.cpp

This file was added.

				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/Object/Watermark.h"

				namespace llvm {
				namespace watermark {

				static std::vector<llvm::ArrayRef<uint8_t>> split(llvm::ArrayRef<uint8_t> Arr,
				size_t ChunkSize) {
				std::vector<ArrayRef<uint8_t>> Ret;
				while (Arr.size() > ChunkSize) {
				Ret.push_back(Arr.take_front(ChunkSize));
				Arr = Arr.drop_front(ChunkSize);
				}
				if (!Arr.empty())
				Ret.push_back(Arr);
				return Ret;
				}

				static void computeHash(llvm::MutableArrayRef<uint8_t> HashDest, size_t HashSize,
				llvm::ArrayRef<uint8_t> Data) {
				const size_t ChunkSize = 1024 * 1024;
				std::vector<ArrayRef<uint8_t>> InputChunks = split(Data, ChunkSize);
				std::vector<uint8_t> ChunkHashes(InputChunks.size() * HashSize);

				for_each_n(llvm::parallel::par, (size_t)0, InputChunks.size(), [&](size_t i) {
				llvm::support::endian::write64le(ChunkHashes.data() + i * HashSize,
				xxHash64(InputChunks[i]));
				});

				llvm::support::endian::write64le(HashDest.data(), xxHash64(ChunkHashes));
				}

				static void omitRangeFromSegments(std::vector<Segment> &InputSegments,
				size_t RangeFirst, size_t RangeSize) {
				size_t RangeLast = RangeFirst + RangeSize;

				for (unsigned I = 0; I < InputSegments.size(); I++) {
				Segment *pHdr = &InputSegments[I];
				size_t SegmentFirst = pHdr->Offset;
				size_t SegmentLast = SegmentFirst + pHdr->Size;

				if (RangeFirst >= SegmentLast \|\| RangeLast <= SegmentFirst)
				continue;

				if (RangeFirst >= SegmentFirst) {
				size_t SegmentSize = pHdr->Size;
				pHdr->Size = RangeFirst - SegmentFirst;

				// Section cleaves a segment.
				if (RangeLast < SegmentLast) {
				size_t Offset = RangeLast;
				size_t Size = SegmentSize - pHdr->Size - RangeSize;
				InputSegments.emplace(InputSegments.begin() + I + 1, Offset, Size);
				}
				} else { // Section ends within segment or at segment end.
				pHdr->Offset = std::min(SegmentLast, RangeLast);
				pHdr->Size = (RangeLast >= SegmentLast) ? 0 : SegmentLast - RangeLast;
				}
				}
				}

				llvm::Expected<std::vector<uint8_t>>
				Watermarker::computeWatermark(std::vector<Segment>& InputSegments, size_t ElfHeaderSize,size_t ProgramHeaderTableOffset,
				size_t ProgramHeaderTableSize, const uint8_t *Data) {

				// Ensure we don't include the program header
				// table or ELF header, as these may be
				// altered by tools such as objcopy.
				omitRangeFromSegments(InputSegments, (size_t) 0, ElfHeaderSize);

				if (ProgramHeaderTableSize > 0)
				omitRangeFromSegments(InputSegments, ProgramHeaderTableOffset,
				ProgramHeaderTableSize);

				std::vector<uint8_t> InputSegmentWatermarks(InputSegments.size() * HashSize);

				for_each_n(
				llvm::parallel::par, (size_t) 0, InputSegments.size(), [&](size_t I) {
				Segment Seg = InputSegments[I];

				if (Seg.Size > 0) {
				llvm::ArrayRef<uint8_t> SegmentData(Data + Seg.Offset,
				Seg.Size);
				computeHash(InputSegmentWatermarks[I * HashSize], HashSize, SegmentData);
				}
				});

				std::vector<uint8_t> FinalWatermark(HashSize);
				computeHash(FinalWatermark, HashSize,
				llvm::ArrayRef<uint8_t>(InputSegmentWatermarks.data(), InputSegmentWatermarks.size()));

				return FinalWatermark;
				}

				} // namespace watermark
				} // namespace llvm

llvm/test/Object/watermark.test

This file was added.

				## Ensure that the watermark calculation is dependant on PT_LOAD semgents,
				## that both the ELF header and program header table can be modified without
				## affecting the watermark, and that changing the order of the segments changes
				## the watermark.

				# RUN: yaml2obj --docnum=1 %s > %t.1
				# RUN: llvm-readobj --compute-watermark %t.1 \| FileCheck %s --check-prefix=SAME-WATERMARK
				# RUN: yaml2obj --docnum=2 %s > %t.2
				# RUN: llvm-readobj --compute-watermark %t.2 \| FileCheck %s --check-prefix=SAME-WATERMARK

				--- !ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_EXEC
				Machine: EM_X86_64
				Entry: 0x0000000000400000
				Sections:
				- Name: .fill1
				Type: SHT_PROGBITS
				Size: 4
				Address: 0x100
				Content: aaaaaaaa
				- Name: .fill2
				Type: SHT_PROGBITS
				Size: 4
				Address: 0x200
				Content: bbbbbbbb
				ProgramHeaders:
				- Type: PT_LOAD
				FileSize: 4
				VAddr: 0x100
				Sections:
				- Section: .fill1
				- Type: PT_LOAD
				FileSize: 4
				VAddr: 0x200
				Sections:
				- Section: .fill2

				--- !ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_NONE
				Machine: EM_X86_64
				Entry: 0x0000000000400000
				Sections:
				- Name: .fill1
				Type: SHT_PROGBITS
				Content: aaaaaaaa
				- Name: .fill2
				Type: SHT_PROGBITS
				Content: bbbbbbbb
				- Name: .fill3
				Type: SHT_PROGBITS
				Content: cccccccc
				ProgramHeaders:
				- Type: PT_LOAD
				FileSize: 4
				Sections:
				- Section: .fill1
				- Type: PT_NOTE
				FileSize: 4
				Sections:
				- Section: .fill3
				- Type: PT_LOAD
				FileSize: 4
				Sections:
				- Section: .fill2

				# SAME-WATERMARK: Computed loadable segments watermark {
				# SAME-WATERMARK: Version: 1
				# SAME-WATERMARK: Value: 0x1237491EA4CA8E6F

				# RUN: yaml2obj --docnum=3 %s > %t.3
				# RUN: llvm-readobj --compute-watermark %t.3 \| FileCheck %s --check-prefix=ORDER-WATERMARK

				--- !ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_EXEC
				Machine: EM_X86_64
				Entry: 0x0000000000400000
				Sections:
				- Name: .fill1
				Type: SHT_PROGBITS
				Size: 4
				Address: 0x200
				Content: aaaaaaaa
				- Name: .fill2
				Type: SHT_PROGBITS
				Size: 4
				Address: 0x100
				Content: bbbbbbbb
				ProgramHeaders:
				- Type: PT_LOAD
				FileSize: 4
				Sections:
				- Section: .fill2
				- Type: PT_LOAD
				FileSize: 4
				Sections:
				- Section: .fill1

				# ORDER-WATERMARK: Computed loadable segments watermark {
				# ORDER-WATERMARK: Version: 1
				# ORDER-WATERMARK: Value: 0xF1231786169E8DBB

This is an archive of the discontinued LLVM Phabricator instance.

[lld] Enable a watermark of loadable sections to be generated and placed in a note sectionAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 232819

lld/ELF/Config.h

lld/ELF/Driver.cpp

lld/ELF/InputFiles.cpp

lld/ELF/Options.td

lld/ELF/SyntheticSections.h

lld/ELF/SyntheticSections.cpp

lld/ELF/Writer.cpp

lld/test/ELF/watermark.s

llvm/include/llvm/BinaryFormat/ELF.h

llvm/include/llvm/Object/Watermark.h

llvm/lib/Object/CMakeLists.txt

llvm/lib/Object/Watermark.cpp

llvm/test/Object/watermark.test

[lld] Enable a watermark of loadable sections to be generated and placed in a note section
AbandonedPublic