Page MenuHomePhabricator

CodeGen: support an extension to pass linker options on ELF
ClosedPublic

Authored by compnerd on Dec 5 2017, 12:21 PM.

Details

Summary

Introduce an extension to support passing linker options to the linker.
These would be ignored by older linkers, but newer linkers which support
this feature would be able to process the linker.

Emit a special discarded section .linker-option. The content of this
section is a pair of strings (key, value). The key is a type identifier for
the parameter. This allows for an argument free parameter that will be
processed by the linker with the value being the parameter. As an example,
lib identifies a library to be linked against, traditionally the -l
argument for Unix-based linkers with the parameter being the library name.

Thanks to James Henderson, Cary Coutant, Rafael Espinolda, Sean Silva
for the valuable discussion on the design of this feature.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ruiu added a comment.Dec 13 2017, 1:56 PM

If you would make it perfectly clear how to parse an embedded string, I'd be fine with the single string approach rather than the key-value approach. But currently it seems the new feature gives too much freedom on how to use it.

Besides the space issue that I commented below, I think you need to define the character code of the embedded string. That's important because pathnames may contain non-ASCII characters particularly on Windows. I'd recommend you always store strings in UTF-8.

test/Feature/elf-linker-options.ll
7

Imagine that these two strings are pathnames. Pathnames can contain any character other than \0. That mean you can't distinguish "spaced options" from "spaced" and "options, which is bad. You should allow only one string.

Yes, gABI specifies pointer sized alignment, but that is not what currently occurs. Everything is 4-byte aligned due to a mis-interpretation of the specification long ago, and now it has fossilized into the reality (yay for compatibility!).

I'm going to drop my concern here, since other note sections do this. Also, it looks like the direction of conversation is going in the direction of creating a new section type for 64-bit notes.

Thinking more about this, while Im still not a fan of the split argument proposal, I think that is something that isn't needed for this part of the implementation. That will be a separate change to clang which will actually process the pragma and generate the associated metadata. This implementation should be able to accommodate both.

How would you distinguish between a key/value pair versus two separate options in this case?

include/llvm/BinaryFormat/ELF.h
1384

Assuming this is the first LLVM-vendor note, maybe 1? If not, what are the other note type values?

test/Feature/elf-linker-options.ll
7

The way I see it, this is intended to be interpreted as a single option. However, I think this demonstrates why a key-value pair is important.

tools/llvm-readobj/ELFDumper.cpp
3464–3469

Ok, if this is the common pattern, that's fine. I think that maps just look a little nicer to construct, since you don't need to define a new struct.

3473

I don't think you need to explicitly convert to a string here.

compnerd added inline comments.Dec 14 2017, 6:03 PM
include/llvm/BinaryFormat/ELF.h
1384

I used 1 originally as well. However, the note values themselves are not valued by owner, but in a global pool. We have values from various places like 0x53494749 which is if the siginfo_t is present in the core file that the ELF file represents. Or did I manage to confuse myself when I looked at the values?

test/Feature/elf-linker-options.ll
7

Yes, this is a single value. " spaced option". I don't remember what the issue was with trying to merge the entire spaced option into a single string. But, that would simplify the logic here. I think that we should go ahead and expect that the frontend will generate a single entry for that. The key/value wouldn't help with this if the spaces were removed in the MDNode itself.

tools/llvm-readobj/ELFDumper.cpp
3473

True, the implicit constructor would work. But, it is nicer to be explicit I think.

jhenderson added inline comments.Dec 15 2017, 1:53 AM
include/llvm/BinaryFormat/ELF.h
1384

According to the ELF gABI, the note type values are vendor-defined, so it doesn't matter that they're in a global pool (looking at ELF.h, we already support some duplicate values like NT_FREEBSD_PROCSTAT_VMMAP/NT_AMD_AMDGPU_HSAMETADATA, both with value 10). I'm not seeing a problem with sharing this value still? Consumers should always test the vendor type first.

I'm not sure where you're getting the 0x53494749 from, or its relevance here.

compnerd added inline comments.Dec 15 2017, 10:31 AM
include/llvm/BinaryFormat/ELF.h
1384

Ah, in that case, Ill go ahead and change to 1. There's no need to not use an arbitrary value then. That was the one that showed up on my screen when I was looking for the values for various notes. I had looked at the values and thought that the values were all in a single pool and that there was no pattern to any of them.

compnerd added inline comments.Dec 15 2017, 11:07 AM
test/Feature/elf-linker-options.ll
7

Wait, I forgot that I modelled this after the existing MachO and COFF behaviors. I think that changing the behavior of this documented interface is less desirable. If we want to have a new way to pass along the options, that is fine, but should be a separate change that unifies the behavior across all three object file formats.

compnerd updated this revision to Diff 127398.Dec 18 2017, 11:38 AM

Address various comments, fix test

compnerd marked 2 inline comments as done.Dec 18 2017, 11:39 AM
ruiu added inline comments.Dec 18 2017, 1:38 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
97

nit: delete the blank line.

tools/llvm-readobj/ELFDumper.cpp
3464–3478

Since this can be written in three lines like this, this code seems a bit too over-designed to me.

if (NT == ELF::NT_LLVM_LINKER_OPTIONS)
  return "NT_LLVM_LINKER_OPTIONS (linker options)";
return ("unknown note type (0x" + Twine::utohexstr(NT) + ")").str();
3558–3560

Using a switch for one case and one default seems odd. Regular if statement would be better.

3563

Remove the trailing space in the output.

3565

option -> Option

compnerd added inline comments.Dec 18 2017, 3:18 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
97

UGH, I swear I had deleted this. I hate this blank line, it keeps coming back :-(.

tools/llvm-readobj/ELFDumper.cpp
3464–3478

It wouldn't format it the same IIRC. This is also something which is effectively the same pattern as the rest of the file. If it does, then doing that in a follow up seems better.

3558–3560

It allows for adding new items in the future. It also is maintaining symmetry with the rest of the print[Owner]Note functions.

3563

Good catch.

3565

mmhm.

compnerd updated this revision to Diff 127422.Dec 18 2017, 3:20 PM

Address @ruiu's comments

jhenderson added inline comments.Dec 19 2017, 1:52 AM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
106–107

I don't like the magic number 5 here. I know it's the length of the vendor string in the first instance, but it would be easy for somebody to miss updating this if the vendor name changes. Perhaps you could do something like:

const char *Name = "LLVM";
Streamer.EmitIntValue(sizeof(Name), 4);
...
Streamer.EmitBytes(Name);

What's the purpose of the "+5" in the descsz field? I assume this is to do with the additional null and version fields, and if so, it would be good for this to be explained somehow.

109–110

I'm confused as to why you have to explicitly write a null character here instead of the code automatically using .asciz (like it does for the option string).

test/tools/llvm-readobj/elf-directives.test
23–24 ↗(On Diff #127422)

I don't think you need AddressAlign or Content fields here. In fact, do you need the section at all?

29–32 ↗(On Diff #127422)

Is this important to the test?

33–38 ↗(On Diff #127422)

Do you need Symbols/DynamicSymbols at all?

compnerd added inline comments.Dec 19 2017, 10:58 AM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
106–107

Sure, I can extract that name. Sure, I can add a comment for the size calculation.

109–110

I wish that it would add the trailing null, but it just does not. This happens in the other backends as well where we are forced to write the trailing null.

test/tools/llvm-readobj/elf-directives.test
23–24 ↗(On Diff #127422)

No, I suppose I don't need the section at all. I just did a simple obj2yaml.

29–32 ↗(On Diff #127422)

No, .note.GNU-stack is not needed for this test.

33–38 ↗(On Diff #127422)

Nope, the .file symbol is not needed in this test either.

compnerd updated this revision to Diff 127576.Dec 19 2017, 11:16 AM

Address @jhenderson's suggestions.

Okay, I have no more comments about this, aside from still being somewhat concerned about how this is supposed to be consumed by the linker. Please wait for @ruiu's further input on this.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
109

Nit: this comment needs a full stop.

109–110

Okay, sounds like this is effectively a (mostly harmless) bug in the underlying code somewhere. No worries.

silvas added a subscriber: silvas.Dec 20 2017, 1:59 PM

It looks like this thread has evolved into a design discussion. It isn't clear to me that we all agree on what problem should be solved. Saleem, it seems like you have some interest in having the format be open-ended, while many others (myself included) seem to want this to be as locked-down as possible with a narrow set of really clearly defined operations. Saleem, could you maybe start an RFC on llvm-dev so we can have a higher-level design discussion? Off the top of my head, there are a couple things that have been mentioned in this thread but not really resolved:

  • Should we discuss this with bfd and gold? If not, why not?
  • I know the PlayStation proprietary linker has support for a similar feature (as James has been discussing) and IIRC there were private patches in the compiler for it. Is it in scope for this change to allow deleting those private patches from the PlayStation compiler?
  • How does this interact with the similar functionality on COFF and MachO? Is it in scope to have a common IR representation for the three platforms?
  • From my reading of the spec, SHT_NOTE is optional. But presumably information like extra library paths is actually semantically necessary for obtaining a correct link and must not be dropped. This also is related to the question of representing this as metadata in the IR.

having the format be open-ended

By this I mean passing through arbitrary linker options without the compiler having to understand the semantics of the section contents.

  • I know the PlayStation proprietary linker has support for a similar feature (as James has been discussing) and IIRC there were private patches in the compiler for it. Is it in scope for this change to allow deleting those private patches from the PlayStation compiler?

James would be more in tune with what the PS4 linker supports than I am, but in the compiler we don't do arbitrary linker options. We support specifying libraries (a-la pragma comment lib) and dllimport/dllexport names. The proposed patch could be used for the former but I think not the latter. It's also not clear whether our mechanism for dllimport/dllexport is optimal, but it keeps us away from all the angst about trying to pervert ELF visibility into supporting it.

It looks like this thread has evolved into a design discussion. It isn't clear to me that we all agree on what problem should be solved. Saleem, it seems like you have some interest in having the format be open-ended, while many others (myself included) seem to want this to be as locked-down as possible with a narrow set of really clearly defined operations. Saleem, could you maybe start an RFC on llvm-dev so we can have a higher-level design discussion? Off the top of my head, there are a couple things that have been mentioned in this thread but not really resolved:

Sure, I can start a thread there. However, I really was hoping to get this in for the release. It really would help with swift.

  • Should we discuss this with bfd and gold? If not, why not?

Sure, if we can actually emit this, I think that bringing it up to them is reasonable, and good.

  • I know the PlayStation proprietary linker has support for a similar feature (as James has been discussing) and IIRC there were private patches in the compiler for it. Is it in scope for this change to allow deleting those private patches from the PlayStation compiler?

Yes, I want this to be sufficiently generic that they would be able to rely on the functionality provided here to implement their support. However, in either case, I don't think that the backend should be involved in understanding the options at all. This is not the proper change to discuss that IMO.

  • How does this interact with the similar functionality on COFF and MachO? Is it in scope to have a common IR representation for the three platforms?

I'm not sure what you mean by interact with them. It is implementing the same functionality. Yes, the implementation here is designed to ensure that the IR representation is identical across the three platforms. Right now, the COFF and MachO IR level representation is the same, and I did ensure that the same held for ELF. The content of the node is what changes across the implementations, because the option is linker dependent.

  • From my reading of the spec, SHT_NOTE is optional. But presumably information like extra library paths is actually semantically necessary for obtaining a correct link and must not be dropped. This also is related to the question of representing this as metadata in the IR.

Yes, that is correct, SHT_NOTE support is optional. That is why this is designed to be represented as a note. It allows linkers which do not support the feature to still work with the object file and still ensure that the object content does not make it into the final binary (it is supposed to be dropped). AIUI, any conforming implementation will drop the note even within a whole archive case.

compnerd added a subscriber: rnk.Jan 3 2018, 9:58 PM
compnerd updated this revision to Diff 130558.Jan 18 2018, 9:35 PM
compnerd edited the summary of this revision. (Show Details)
compnerd edited the summary of this revision. (Show Details)

I'm going to bring up the encoding again of this section. It's perfectly reasonable for users to have non-ASCII characters in their path (imagine e.g. Japanese developers), and whilst the back-end probably doesn't care (I assume the front-end will convert to whatever encoding we choose), readobj will print garbage if fed e.g. a UTF-8 string with non-ASCII characters. I'm happy to defer this point to not block this stage going in, since I don't think we have agreement on this yet (I'd support UTF-8 FWIW), but I think it needs to be addressed at the point when front-end support is being added.

Otherwise, I'm happy with this, barring the small suggestions I've made.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
99

It would be good to add a comment documenting the section format (and purpose?) here, even though it is simple.

test/Feature/elf-linker-options.ll
7–9

Suggestion: Add a space to one of these to show that spaces are not special in any way.

test/tools/llvm-readobj/elf-linker-options.ll
7–8

As with the other test, I'd suggest adding a space to one of these.

tools/llvm-readobj/ELFDumper.cpp
4091

clang-format?

4095

I got the impression that there was a bit of a debate about whether the library search path option should be allowed or not, so it may be a good idea to remove this one for now. I think the "lib" option is uncontroversial though.

ruiu added inline comments.Jan 19 2018, 1:41 PM
test/Feature/elf-linker-options.ll
6–8

"path" and "rpath" look too real and misleading. Can you use meaningless words such as foo or bar so that this test doesn't look like a guidance?

test/tools/llvm-readobj/elf-linker-options.ll
7–8

Ditto

tools/llvm-readobj/ELFDumper.cpp
4085–4086

It is one of the design principles of ELF that section names are not significant. Section should be identified by type and not by name. Can you define a new section type and use it?

4093–4095

I prefer just printing these strings as-is instead of converting "lib" to "library" and "path" to "library search path".

compnerd added inline comments.Jan 20 2018, 1:01 PM
tools/llvm-readobj/ELFDumper.cpp
4085–4086

As long as we are sure that this won't cause any issues with the BFD and gold linkers, I would be fine with that. Unfortunately, compatibility needs to be considered here.

compnerd marked 8 inline comments as done.Jan 20 2018, 2:35 PM
compnerd added inline comments.
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
99

Given that none of the other file formats document the format here, I think that it is better documented in the linker or in the separate documentation (we do cover LLVM extensions). I've done the latter.

tools/llvm-readobj/ELFDumper.cpp
4091

Dunno how that happened.

compnerd updated this revision to Diff 130782.Jan 20 2018, 2:36 PM
compnerd marked 2 inline comments as done.

address comments

One more thing: one of the tests should check that the section header in the object is correct, i.e. has correct type and flags.

docs/Extensions.rst
235

I think a quick example of how this would look when written in assembly would be a good idea, similar to the other code-blocks in this file.

tools/llvm-readobj/ELFDumper.cpp
4085–4086

I agree that a new type would be nice. I did some playing around with gold and BFD, and gold can handle unknown section types without complaint, whereas BFD cannot. The exception is that BFD accepts section types reserved for user applications (between SHT_LOUSER and SHT_HIUSER), and those reserved for OS-specific sections (i.e. in the SHT_LOOS to SHT_HIOS) (there are irrelevant exceptions based on the flags). Neither of these look like ideal solutions.

Maybe this is worth discussion on the gABI and/or BFD mailing list, on the more general topic of "we want to extend ELF, at least for our linker/compiler pair. What section type should we use?"

compnerd added inline comments.Jan 22 2018, 12:13 PM
docs/Extensions.rst
235

Sure, I can add that.

tools/llvm-readobj/ELFDumper.cpp
4085–4086

I can create a custom section type in the reserved range I suppose.

compnerd updated this revision to Diff 131160.Jan 23 2018, 3:21 PM

Update documentation

Looks okay to me, although we still need a decision on the section type.

@ruiu - have you had any further thoughts on this?

docs/Extensions.rst
236

SHF_PROGBITS -> SHT_PROGBITS (or whatever type we end up using)

240

Nit: either "equivalently emitted in ..." or "equivalent to the following raw assembly:"

tools/llvm-readobj/ELFDumper.cpp
4085–4086

My problem is that I don't know which reserved range is the right one to use. There doesn't seem to be a "vendor" reserved range. LLVM isn't an OS, Processor, or final application, which are what the three current ranges refer to.

compnerd updated this revision to Diff 131315.Jan 24 2018, 11:03 AM

Update documentation further, add new linker type, update tools and tests.

compnerd marked 2 inline comments as done.Jan 24 2018, 11:13 AM
compnerd added inline comments.
tools/llvm-readobj/ELFDumper.cpp
4085–4086

I just added it to the same area as the ODR table extension.

compnerd updated this revision to Diff 131319.Jan 24 2018, 11:13 AM

tweak documentation

jhenderson accepted this revision.Jan 25 2018, 1:25 AM

Okay, LGTM, with one nit fix in the documentation. Thanks.

I'd like @ruiu to confirm he's happy with the section type before this goes in.

docs/Extensions.rst
240

Reminder of this nit. Please fix :)

This revision is now accepted and ready to land.Jan 25 2018, 1:25 AM
ruiu added inline comments.Jan 26 2018, 3:01 PM
docs/Extensions.rst
228

The section name is not significant -- you should say a section with SHT_LLVM_LINKER_OPTIONS type (which is usually named .linker-options though name is not significant.)

230

Specifying the encoding is important if a path contains non-ASCII characters. "null-terminated C-strings" are a bit redundant because it means C-strings or null-terminated strings. So I'd update

The strings are encoded as null-terminated UTF-8 strings.
241

This document should also cover the options that we want to use at the moment, e.g. "lib" and "path", and let people who want to add a new option to update this document, so that this document becomes a virtual standard doc for this feature. Otherwise, there's a risk that people start to add arbitrary options and this feature gets out of control.

compnerd marked 3 inline comments as done and an inline comment as not done.Jan 27 2018, 9:38 AM
compnerd closed this revision.Jan 30 2018, 8:31 AM

SVN r323783

ruiu added a comment.Jan 30 2018, 10:12 AM

I was actually waiting for you to update this patch. You didn't address my comments.

Sorry, seems that something went wrong with the update on phabricator. I did update the documentation, I was under the impression that you were okay with the change beyond the documentation change.