This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Add --keep-section to expose linkerscript KEEP directive as a linker flag
Needs RevisionPublic

Authored by christylee on Jul 31 2020, 4:13 PM.

Details

Summary

--gc-sections throw away all unreferenced sections, but we sometimes need to keep some of them. Although it's possible to use a linkerscript with the KEEP directive, it is often cumbersome for large repositories where each binary might have its own linkerscript. Exposing the KEEP directive as a linker flag also aids in quick experimentation and iteration.

Diff Detail

Event Timeline

christylee created this revision.Jul 31 2020, 4:13 PM
christylee requested review of this revision.Jul 31 2020, 4:13 PM
christylee edited the summary of this revision. (Show Details)Jul 31 2020, 4:16 PM

--gc-sections throw away all unreferenced sections, but we sometimes need to keep some of them.

If an input section defines a non-local symbol, you can use -u to retain the section.

Although it's possible to use a linkerscript with the KEEP directive, it is often cumbersome for large repositories where each binary might have its own linkerscript.

-T can be specified multiple times. The linker scripts are essentially concatenated. What might be inconvenient is that once a linker script (unless the INSERT AFTER|BEFORE) is specified, it is considered an external linker script (GNU ld --verbose), and several built-in rules (!hasSectionsCommand) are disabled. If such requests are common, we should communicate with binutils to get a syntax.

--gc-sections throw away all unreferenced sections, but we sometimes need to keep some of them.

If an input section defines a non-local symbol, you can use -u to retain the section.

Although it's possible to use a linkerscript with the KEEP directive, it is often cumbersome for large repositories where each binary might have its own linkerscript.

For our use case, the input sections sometimes only define local symbols.

-T can be specified multiple times. The linker scripts are essentially concatenated. What might be inconvenient is that once a linker script (unless the INSERT AFTER|BEFORE) is specified, it is considered an external linker script (GNU ld --verbose), and several built-in rules (!hasSectionsCommand) are disabled. If such requests are common, we should communicate with binutils to get a syntax.

One of the complaints we got is that linkers cripts are intrusive to build systems, and that they are difficult to experiment with. Keeping sections via a linker flag would be more light weight than full linker scripts.

Given we already have --keep-unique to keep symbol from being folded during icf, I feel that adding --keep-section would be analogous to that.

wenlei added a subscriber: wenlei.Jul 31 2020, 8:50 PM
psmith added a comment.Aug 1 2020, 5:30 AM

I do have some sympathy with wanting to use a command line option to keep an individual section from the command line as it has been useful in Arm's proprietary linker, although this is mainly due to the convenience of not having to create or modify another file.

One observation I'd make about the proposed implementation is that it looks like it only implements a subset of the Linker Script KEEP command that supports precise matches of a section name. Quoting from the GNU linker manual:

When link-time garbage collection is in use (`--gc-sections'), it is often useful to mark sections that should not be eliminated. This is accomplished by surrounding an input section's wildcard entry with KEEP(), as in KEEP(*(.init)) or KEEP(SORT_BY_NAME(*)(.ctors)).

That permits the full power of the input section description to discriminate via object, as there can be many sections with the same name. Wildcards can sometimes be useful too.

I think it would be worth considering a richer interface for keep-sections. In Arm's proprietary linker we permitted a similar syntax as for the equivalent linker script. This did mean quoting or escaping parentheses but did provide equivalence in what could be achieved.

Updated to use the script parser so we can allow all KEEP semantics

MaskRay added a comment.EditedAug 13 2020, 4:11 PM

I do have some sympathy with wanting to use a command line option to keep an individual section from the command line as it has been useful in Arm's proprietary linker, although this is mainly due to the convenience of not having to create or modify another file.

One observation I'd make about the proposed implementation is that it looks like it only implements a subset of the Linker Script KEEP command that supports precise matches of a section name. Quoting from the GNU linker manual:

When link-time garbage collection is in use (`--gc-sections'), it is often useful to mark sections that should not be eliminated. This is accomplished by surrounding an input section's wildcard entry with KEEP(), as in KEEP(*(.init)) or KEEP(SORT_BY_NAME(*)(.ctors)).

That permits the full power of the input section description to discriminate via object, as there can be many sections with the same name. Wildcards can sometimes be useful too.

I think it would be worth considering a richer interface for keep-sections. In Arm's proprietary linker we permitted a similar syntax as for the equivalent linker script. This did mean quoting or escaping parentheses but did provide equivalence in what could be achieved.

Having a non-INSERT-AFTER/BEFORE SECTIONS command is considered an external linker script and can change the default layout decisions. I can also feel sympathy with the users but I am also wary of adding these non-orthogonal features. The recent D76482 (__build_id_start = .) and this patch make me think of output section descriptions (a fragment of a SECTIONS command) which do not affect section layout.

@psmith @grimar I think we probably should start a conversation with binutils about such a feature. If they find needs as well, we will have a common ground, it'd be great. They need to be given the decision making opportunity to reduce the risk they create a similar but incompatible feature in the future.

One idea is:

OVERRIDE SECTIONS {
  .foo : { KEEP(*(.foo)) }
  .bar : { KEEP(*(.bar)) }
  sym = .;   // symbol assignments are disallowed
}

The output section descriptions will override .foo & .bar in the external linker script. If the external linker script does not describe .foo or .bar, the command will change the orphan sections.

Another syntax:

SECTIONS {
  .foo : { KEEP(*(.foo)) }
  .bar : { KEEP(*(.bar)) }
} REPLACE .foo;

I do have some sympathy with wanting to use a command line option to keep an individual section from the command line as it has been useful in Arm's proprietary linker, although this is mainly due to the convenience of not having to create or modify another file.

One observation I'd make about the proposed implementation is that it looks like it only implements a subset of the Linker Script KEEP command that supports precise matches of a section name. Quoting from the GNU linker manual:

When link-time garbage collection is in use (`--gc-sections'), it is often useful to mark sections that should not be eliminated. This is accomplished by surrounding an input section's wildcard entry with KEEP(), as in KEEP(*(.init)) or KEEP(SORT_BY_NAME(*)(.ctors)).

That permits the full power of the input section description to discriminate via object, as there can be many sections with the same name. Wildcards can sometimes be useful too.

I think it would be worth considering a richer interface for keep-sections. In Arm's proprietary linker we permitted a similar syntax as for the equivalent linker script. This did mean quoting or escaping parentheses but did provide equivalence in what could be achieved.

Having a non-INSERT-AFTER/BEFORE SECTIONS command is considered an external linker script and can change the default layout decisions. I can also feel sympathy with the users but I am also wary of adding these non-orthogonal features. The recent D76482 (__build_id_start = .) and this patch make me think of output section descriptions (a fragment of a SECTIONS command) which do not affect section layout.

@psmith @grimar I think we probably should start a conversation with binutils about such a feature. If they find needs as well, we will have a common ground, it'd be great. They need to be given the decision making opportunity to reduce the risk they create a similar but incompatible feature in the future.

One idea is:

OVERRIDE SECTIONS {
  .foo : { KEEP(*(.foo)) }
  .bar : { KEEP(*(.bar)) }
  sym = .;   // symbol assignments are disallowed
}

The output section descriptions will override .foo & .bar in the external linker script. If the external linker script does not describe .foo or .bar, the command will change the orphan sections.

Another syntax:

SECTIONS {
  .foo : { KEEP(*(.foo)) }
  .bar : { KEEP(*(.bar)) }
} REPLACE .foo;

It is a possibility although I think we'd have to think pretty hard about the possible edge cases and how to explain what the limitations are. For example:

  • What are the semantics? For a linker script I'd expect that we'd replace the OutputSection description if it existed. I'm not sure about the orphan case, I guess we'd want to insert it at the same place as if it were an Orphan.
  • LLD doesn't have a default linker script, and is not transparent about the default non-script case. It might be harder than it looks to override its behaviour. I guess we could implement something like --verbose that spits out a linker script that gets close, but keeping it up to date would be challenging if we couldn't derive it.
  • Could the replace break the default non-script case in any way? Would we use the non-script code-path or the script code-path? I guess this is possible with INSERT as well.

Sounds like it could do with a prototype to flush out the problems. Although no harm in asking binutils about the concept.

MaskRay added a subscriber: phosek.Aug 18 2020, 5:16 PM

I created https://sourceware.org/bugzilla/show_bug.cgi?id=26404 for the discussion on the syntax.

@christylee @phosek You may consider starting a thread on binutils@sourceware.org to expedite their processing:)

I created https://sourceware.org/bugzilla/show_bug.cgi?id=26404 for the discussion on the syntax.

@christylee @phosek You may consider starting a thread on binutils@sourceware.org to expedite their processing:)

Like I mentioned in D76482, I'm not a fan of this solution. The problem is that the runtimes are typically static libraries, e.g. libclang_rt.profile.a, which means that we would need to start distributing a linker script with each runtime and ensure that this linker script is always going to be used with that runtime. That's going to require additional complexity in the Clang driver. This is far less elegant than what we proposed in D76482 which is contained entirely in the linker.

MaskRay requested changes to this revision.Jul 28 2021, 8:43 PM

D103303 OVERWRITE_SECTIONS can be used instead.

This revision now requires changes to proceed.Jul 28 2021, 8:43 PM