This is an archive of the discontinued LLVM Phabricator instance.

[lld][ELF] Provide optional hidden symbols for build ID
Needs ReviewPublic

Authored by phosek on Mar 20 2020, 12:26 AM.

Details

Summary

This change introduces a new optional hidden symbols build_id_start
and
build_id_end which point at the contents of the .note.gnu.build-id
section (if enabled). This is useful in instrumentation runtimes such as
profile where we want to merge profiles produced by the same module.
build ID can be used to uniquely identify modules but there's no easy
way to access it at runtime from the module itself. The build_id_start
and
build_id_end symbols make this possible by providing an access to
the .note.gnu.build-id section content.

Diff Detail

Event Timeline

phosek created this revision.Mar 20 2020, 12:26 AM

I think we should be a bit more careful proposing a new reserved symbol (symbol assignment even in the absence of a SECTIONS command) and think whether it will be useful after 5 years.

In FreeBSD, their linker script defines:

.note.gnu.build-id : {
  PROVIDE (__build_id_start = .);
  *(.note.gnu.build-id)
  PROVIDE (__build_id_end = .);
}

We need __build_id_end because the size of .build-id can evolve. While I think sha1 is good enough for many use cases, I can imagine someone may want more bits.

OK, I can imagine there are use cases that an external linker script is not desired. There is another perfect alternative: parse PT_NOTE and extract NT_GNU_BUILD_ID.

lld/test/ELF/build-id.s
9

Don't repeat lld command.

10

llvm-objdump -h output does not match GNU objdump. This has been a headache for many projects. llvm-readelf -S -s is usually a bette replacement.

I think if you provide symbols for the section bounds, they should be __start_note_gnu_build_id and __stop_note_gnu_build_id. But that doesn't seem enough of an improvement over parsing phdrs from __ehdr_start to merit the feature IMHO. Consumers still have to parse (or worse yet, assume) the note headers. What would make things trivial for consumers is a pair of symbols giving exactly the bounds of the *payload*, i.e. just the build ID itself (and presenting its variable size in a simple manner).

grimar added a subscriber: grimar.Mar 23 2020, 12:51 AM
phosek updated this revision to Diff 281624.Jul 29 2020, 9:00 AM
phosek marked an inline comment as done.
phosek retitled this revision from [lld][ELF] Provide optional hidden symbol for build ID to [lld][ELF] Provide optional hidden symbols for build ID.
phosek edited the summary of this revision. (Show Details)

I have updated the patch to introduce a pair of __build_id_start and __build_id_end which point at the beginning and the end of the payload so users of these symbols don't need to parse the note headers which simplifies usage.

MaskRay added a comment.EditedJul 29 2020, 9:49 PM

I have updated the patch to introduce a pair of __build_id_start and __build_id_end which point at the beginning and the end of the payload so users of these symbols don't need to parse the note headers which simplifies usage.

@mcgrathr's comment https://reviews.llvm.org/D76482#1934097 has not been addressed. I think you can achieve your goal without any new code:

SECTIONS {
  .note.gnu.build-id : {
    __start_note_gnu_build_id = .;
    *(.note.gnu.build-id)
    __stop_note_gnu_build_id = .;
  }
} INSERT BEFORE .dynsym;

INSERT BEFORE|AFTER is special. They are not considered an external linker script (ld.bfd -T a.x --verbose).

If you still want to go down this route, I hope we can see an attempt to sell this idea on binutils side.

I have updated the patch to introduce a pair of __build_id_start and __build_id_end which point at the beginning and the end of the payload so users of these symbols don't need to parse the note headers which simplifies usage.

@mcgrathr's comment https://reviews.llvm.org/D76482#1934097 has not been addressed. I think you can achieve your goal without any new code:

SECTIONS {
  .note.gnu.build-id : {
    __start_note_gnu_build_id = .;
    *(.note.gnu.build-id)
    __stop_note_gnu_build_id = .;
  }
} INSERT BEFORE .dynsym;

INSERT BEFORE|AFTER is special. They are not considered an external linker script (ld.bfd -T a.x --verbose).

If you still want to go down this route, I hope we can see an attempt to sell this idea on binutils side.

In the latest patch __build_id_start and __build_id_end point to the payload, not beginning and end of the section, so users don't have to parse the note header, which is what @mcgrathr mentioned in his comment, or is there something else you have in mind? This is also something you cannot replicate with the linker script, with the linker you can point at the beginning and the end of the section, but not the payload.

I have updated the patch to introduce a pair of __build_id_start and __build_id_end which point at the beginning and the end of the payload so users of these symbols don't need to parse the note headers which simplifies usage.

@mcgrathr's comment https://reviews.llvm.org/D76482#1934097 has not been addressed. I think you can achieve your goal without any new code:

SECTIONS {
  .note.gnu.build-id : {
    __start_note_gnu_build_id = .;
    *(.note.gnu.build-id)
    __stop_note_gnu_build_id = .;
  }
} INSERT BEFORE .dynsym;

INSERT BEFORE|AFTER is special. They are not considered an external linker script (ld.bfd -T a.x --verbose).

If you still want to go down this route, I hope we can see an attempt to sell this idea on binutils side.

In the latest patch __build_id_start and __build_id_end point to the payload, not beginning and end of the section, so users don't have to parse the note header, which is what @mcgrathr mentioned in his comment, or is there something else you have in mind? This is also something you cannot replicate with the linker script, with the linker you can point at the beginning and the end of the section, but not the payload.

Doesn't my INSERT BEFORE example point the symbols at the beginning and the end of the section? You can use HIDDEN or PROVIDE_HIDDEN if you want them to be hidden.

I have updated the patch to introduce a pair of __build_id_start and __build_id_end which point at the beginning and the end of the payload so users of these symbols don't need to parse the note headers which simplifies usage.

@mcgrathr's comment https://reviews.llvm.org/D76482#1934097 has not been addressed. I think you can achieve your goal without any new code:

SECTIONS {
  .note.gnu.build-id : {
    __start_note_gnu_build_id = .;
    *(.note.gnu.build-id)
    __stop_note_gnu_build_id = .;
  }
} INSERT BEFORE .dynsym;

INSERT BEFORE|AFTER is special. They are not considered an external linker script (ld.bfd -T a.x --verbose).

If you still want to go down this route, I hope we can see an attempt to sell this idea on binutils side.

In the latest patch __build_id_start and __build_id_end point to the payload, not beginning and end of the section, so users don't have to parse the note header, which is what @mcgrathr mentioned in his comment, or is there something else you have in mind? This is also something you cannot replicate with the linker script, with the linker you can point at the beginning and the end of the section, but not the payload.

Doesn't my INSERT BEFORE example point the symbols at the beginning and the end of the section? You can use HIDDEN or PROVIDE_HIDDEN if you want them to be hidden.

We want the opposite though, we want these symbols to point directly at the payload to avoid having to parse the note header.

In the latest patch __build_id_start and __build_id_end point to the payload, not beginning and end of the section, so users don't have to parse the note header, which is what @mcgrathr mentioned in his comment, or is there something else you have in mind? This is also something you cannot replicate with the linker script, with the linker you can point at the beginning and the end of the section, but not the payload.

Doesn't my INSERT BEFORE example point the symbols at the beginning and the end of the section? You can use HIDDEN or PROVIDE_HIDDEN if you want them to be hidden.

We want the opposite though, we want these symbols to point directly at the payload to avoid having to parse the note header.

Is this request a bit too ad-hoc? The header is always 16 bytes (BuildIdSection::headerSize). You can write PROVIDE_HIDDEN(__buildid_start = . + 16);

The proposed solution with an input linker script is a clever kludge, but it's a kludge. There's no way to make that actually general, since there is no particular named section that you can correctly assume is present in all kinds of binaries you might be linking. The kludge could be locally tailored for different cases, but that is even worse. Since the section and layout we're talking about here is entirely synthesized by the linker, it makes a lot more sense to have the linker support convenience features for using that format directly than to require users to build detailed kludges baking in the details of the linker's built-in synthesizing behavior.

MaskRay added a subscriber: psmith.Jul 30 2020, 2:03 PM

The proposed solution with an input linker script is a clever kludge, but it's a kludge. There's no way to make that actually general, since there is no particular named section that you can correctly assume is present in all kinds of binaries you might be linking. The kludge could be locally tailored for different cases, but that is even worse. Since the section and layout we're talking about here is entirely synthesized by the linker, it makes a lot more sense to have the linker support convenience features for using that format directly than to require users to build detailed kludges baking in the details of the linker's built-in synthesizing behavior.

This may worth a discussion on binutils. There can be several alternative syntax which may meet your needs.

The proposed solution with an input linker script is a clever kludge, but it's a kludge. There's no way to make that actually general, since there is no particular named section that you can correctly assume is present in all kinds of binaries you might be linking. The kludge could be locally tailored for different cases, but that is even worse. Since the section and layout we're talking about here is entirely synthesized by the linker, it makes a lot more sense to have the linker support convenience features for using that format directly than to require users to build detailed kludges baking in the details of the linker's built-in synthesizing behavior.

This may worth a discussion on binutils. There can be several alternative syntax which may meet your needs.

One concrete use case we have is in runtimes. We would like for the runtime to use the build ID as the name of the output file because everything in our infrastructure is based on build ID as a way to uniquely identify modules. These runtimes are typically static libraries that are inserted onto the link line by the driver, e.g. if you set -fprofile-instr-generate driver will automatically put libclang_rt.profile.a onto your link line. With your suggested solution, this is going to fail at link time unless the user manually adds the linker script as link input. We could also ship that linker script with the compiler and modify the driver to always add it for you just like it does for the runtime itself. We would need to repeat this for every runtime like profile, XRay, sanitizers, etc. That seems like a worse solution that this patch and I don't think alternative syntaxes are going to help here, the linker script solution is just inconvenient.

It is a pity that the section name is not a valid C identifier otherwise start_ and stop_ would have been generated automatically.

Although less convenient to the programmer using LLD, using the section bounds at least allows someone to fake the symbols with a simple fragment of linker script, thus permitting deployment of the program on older versions of the linker and not needing custom support in ld.bfd:

.note.gnu.build-id : {
  PROVIDE (__build_id_start = .);
  *(.note.gnu.build-id)
  PROVIDE (__build_id_end = .);
}

From the comments above I guess at least __build_id_start could be set to PROVIDE (__build_id_start = . + 16); but that isn't as intuitive.

One final suggestion if the symbols are going to surround the build id payload is to make that explicit in the name, for example __build_id_payload_start.

phosek added a comment.EditedAug 18 2020, 3:36 PM

D102039 is an example use case for this feature.

Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2023, 12:15 PM