This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: support an extension to pass linker options on ELF
ClosedPublic

Authored by compnerd on Dec 5 2017, 12:21 PM.

Details

Summary

Introduce an extension to support passing linker options to the linker.
These would be ignored by older linkers, but newer linkers which support
this feature would be able to process the linker.

Emit a special discarded section .linker-option. The content of this
section is a pair of strings (key, value). The key is a type identifier for
the parameter. This allows for an argument free parameter that will be
processed by the linker with the value being the parameter. As an example,
lib identifies a library to be linked against, traditionally the -l
argument for Unix-based linkers with the parameter being the library name.

Thanks to James Henderson, Cary Coutant, Rafael Espinolda, Sean Silva
for the valuable discussion on the design of this feature.

Diff Detail

Repository
rL LLVM

Event Timeline

compnerd created this revision.Dec 5 2017, 12:21 PM
jakehehrlich added inline comments.Dec 5 2017, 12:31 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
103

Shouldn't the strings be separated by null characters instead?

compnerd updated this revision to Diff 125609.Dec 5 2017, 2:06 PM

Tweak test

compnerd added inline comments.Dec 5 2017, 2:11 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
103

No, this is the space for the "spaced option". See the test :-).

jhenderson requested changes to this revision.Dec 6 2017, 2:54 AM
jhenderson added a reviewer: jhenderson.
jhenderson added subscribers: ruiu, rafael, jhenderson.

Context missing. The SHT_NOTE section has a well-defined format, which this doesn't appear to conform to. If you want this to conform to a NOTE section format, you'll need to change the struct somehow. See http://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section. You could just make this non-NOTE (e.g. SHT_PROGBITS), and have the linker rely on only the name (in which case I'd just go with name it ".linker_options", or similar).

As for the format, can I suggest a couple of changes:

  1. Make the version field a ULEB, rather than a .long, to save space. Given we're using variable length strings, there's no particular reason to have fixed width fields here, I'd say.
  2. Assuming the "Reserved" field is meant to be the "null string", why not have an empty string as the last array member, rather than a uin32_t? Or a count/length field in the struct, which says how many options there are (or how big the total size is)? (Am I right in understanding the purpose of the "Reserved" field?)

We have a similar structure in our proprietary linker. One of the things it allows us to do is insert libraries immediately after the object file on the command-line. Where is it expected that the linker adds these command-lines? If immediately after the object file, what is the linker expected to do in -r links, where the sections contents would normally be concatenated?

Could you please add comments in the code, describing the structure and its purpose. Also, how are you planning on documenting this more widely?

I've also added @ruiu and @rafael to the subscribers, for their input from an LLD perspective.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
103

.ascii directives, as emitted here, are not null terminated. You need to use .asciz. As things stand the output will look like " spaced option nospace" if you look at the binary, which means the linker won't know how to split it up.

This revision now requires changes to proceed.Dec 6 2017, 2:54 AM

This could be embedded within the note structure. Because we want this section to be discarded by default, having a PROGBITS section seems like a bad idea to me, but if someone else has a better way to accomplish this, I'm not tied to this.

The version field is for future enhancements. I suppose we can make it a ULEB, but, does a total of 8-bytes matter that much? (even in a ULEB encoding, that only saves us 6-bytes, in the object file, not the final binary, which should not preserve this field).

No, sorry, the reserved field is for future enhancements; you are right, that I forgot to null terminate the array.

I would be interested in the format of the structure in your linker; it is possible that you have a better format specified or that this version could be improved before we implement it.

Right now, I expect that most ELF linkers would process the command line arguments after all the objects have been loaded. In such a scenario, the most likely place for the addition will be the end of the command line, which is fine. For -r links, I would expect that the sections would be preserved across all the object files and re-emitted into the relocatable linked object. The final consumption of the object will process the concatenated options and then discard the note.

This could be embedded within the note structure. Because we want this section to be discarded by default, having a PROGBITS section seems like a bad idea to me, but if someone else has a better way to accomplish this, I'm not tied to this.

Okay, that makes sense. We have control over the linker scripts our proprietary linker uses, so we use /DISCARD/ to achieve this with our own version of the structure. Since there's no guarantee that a linker will recognise this new section, I'm happy for it to stay as SHT_NOTE, but it must fit the SHT_NOTE format, or tools won't be able to interpret it properly.

Should we explicitly reserve a range of version numbers (e.g. 0xF0000000-0xFFFFFFFF) to allow for vendor-specific versions? That way there's no risk of people using a number in their local LLVM ports which will clash with something done upstream. Also, it might be wise to explicitly reserve version 0 and/or some other range, to allow us to do crazier things in the future that we can't anticipate just yet.

The version field is for future enhancements. I suppose we can make it a ULEB, but, does a total of 8-bytes matter that much? (even in a ULEB encoding, that only saves us 6-bytes, in the object file, not the final binary, which should not preserve this field).

If we are staying with SHT_NOTE, I think a 4/8 byte field is okay, since it fits better with the alignment requirements anyway (pedantic - it would save 7 bytes for a low version, not 6).

No, sorry, the reserved field is for future enhancements; you are right, that I forgot to null terminate the array.

If we are using a version field, we don't need additional reserved fields, since if we need them in the future, we can just bump the version number.

I would be interested in the format of the structure in your linker; it is possible that you have a better format specified or that this version could be improved before we implement it.

Our format is essentially this struct:

struct linker_cmd {
  uint16_t type;
  uint16_t size;
  uint8_t *data;
};

The data field's interpretation depends on the type field, and is often in the form of a sub-structure, containing information related to that particular linker command (e.g. the name of the library to be added). We don't need a version field, because the type field has the same sort of effect - if we wanted to change the format for a specific command, we would just change the type field. Looking at the NOTE format, there is already a type field, which achieves the same purpose.

Having a size field allows the linker to skip over data for types of command it doesn't recognise - this suggests to me that it might be wise to add a size field to the proposed structure as well. Indeed using the NOTE format would give us this for free again, via the descsz field. It would allow linkers to ignore specific versions of commands that it doesn't support, but still allow it to continue reading the rest of the section, in the event that there are multiple commands concatenated together. A size of zero means no data, and the command effectively becomes a flag (a bit like the idea behind .note.GNU-stack).

In summary from that, our structure has properties that are similar to SHT_NOTE's format, but without the alignment requirements or name field. I think for your proposal, it allows you to drop the Version field, in favour of the type field that comes with the section. Type 1 could simply be null-terminated strings that are added to the command-line (maybe with a flag field - see below).

Right now, I expect that most ELF linkers would process the command line arguments after all the objects have been loaded. In such a scenario, the most likely place for the addition will be the end of the command line, which is fine.

The linkers that I know of process files one at a time, so I think either adding the switch after that object has been loaded, or at the end are both viable possibilities. Restricting it to adding it to the end loses flexibility in what can be done here. Some options toggle behaviour for individual or groups of files, so it doesn't make sense for them to be added late. Also, in the library use case we currently have, as noted above, this would result in different behaviour to what we currently require, because it could result in the libraries being added in a different order. I think we could/should do something about this. One idea I had was to add a "flags" field, or something that allows to toggle the behaviour between one of three modes:

  1. Add immediately before the object containing the switch (note, I'm not sure linkers will necessarily be able to support this, because switches could change how the object is loaded, and thus the new section might otherwise not be read in at all deliberately)
  2. Add immediately after the object in question, so that all future objects are affected by it.
  3. Add at the end of the link line. What happens here for options that should affect the reading in of object files?
  4. At the start of the link line, so that it affects all objects. I'm not convinced that this one is feasible but I'm adding it for completeness - for example what happens if an archive member includes this kind of directive?

For -r links, I would expect that the sections would be preserved across all the object files and re-emitted into the relocatable linked object. The final consumption of the object will process the concatenated options and then discard the note.

I suspect that the behaviour would need to be switch-specific. For some switches, it will be fine (indeed, required) to wait until the final link to process them. However, for others, it may affect how the intermediate -r link is supposed to work. In extreme cases, the switch could have an affect on both intermediate and final links (and of course, there's nothing stopping there being multiple intermediate links along the way). Perhaps another indication that a flag field might be necessary?

One other question that a colleague mentioned is what is the encoding of the command strings? UTF-8?

This could be embedded within the note structure. Because we want this section to be discarded by default, having a PROGBITS section seems like a bad idea to me, but if someone else has a better way to accomplish this, I'm not tied to this.

Okay, that makes sense. We have control over the linker scripts our proprietary linker uses, so we use /DISCARD/ to achieve this with our own version of the structure. Since there's no guarantee that a linker will recognise this new section, I'm happy for it to stay as SHT_NOTE, but it must fit the SHT_NOTE format, or tools won't be able to interpret it properly.

Should we explicitly reserve a range of version numbers (e.g. 0xF0000000-0xFFFFFFFF) to allow for vendor-specific versions? That way there's no risk of people using a number in their local LLVM ports which will clash with something done upstream. Also, it might be wise to explicitly reserve version 0 and/or some other range, to allow us to do crazier things in the future that we can't anticipate just yet.

Yes, I like the reservation for the vendor range. Yes, how about reserving the range 0x80000000-0x8fffffff for crazy things in the future? We could start with version 1, I'm not tied to 0.

The version field is for future enhancements. I suppose we can make it a ULEB, but, does a total of 8-bytes matter that much? (even in a ULEB encoding, that only saves us 6-bytes, in the object file, not the final binary, which should not preserve this field).

If we are staying with SHT_NOTE, I think a 4/8 byte field is okay, since it fits better with the alignment requirements anyway (pedantic - it would save 7 bytes for a low version, not 6).

Counting is hard! Yeah, I think the 4-byte field makes it easier and keeps things aligned.

No, sorry, the reserved field is for future enhancements; you are right, that I forgot to null terminate the array.

If we are using a version field, we don't need additional reserved fields, since if we need them in the future, we can just bump the version number.

Good point. Lets just remove that and go with the null string terminator.

I would be interested in the format of the structure in your linker; it is possible that you have a better format specified or that this version could be improved before we implement it.

Our format is essentially this struct:

struct linker_cmd {
  uint16_t type;
  uint16_t size;
  uint8_t *data;
};

The data field's interpretation depends on the type field, and is often in the form of a sub-structure, containing information related to that particular linker command (e.g. the name of the library to be added). We don't need a version field, because the type field has the same sort of effect - if we wanted to change the format for a specific command, we would just change the type field. Looking at the NOTE format, there is already a type field, which achieves the same purpose.

Having a size field allows the linker to skip over data for types of command it doesn't recognise - this suggests to me that it might be wise to add a size field to the proposed structure as well. Indeed using the NOTE format would give us this for free again, via the descsz field. It would allow linkers to ignore specific versions of commands that it doesn't support, but still allow it to continue reading the rest of the section, in the event that there are multiple commands concatenated together. A size of zero means no data, and the command effectively becomes a flag (a bit like the idea behind .note.GNU-stack).

In summary from that, our structure has properties that are similar to SHT_NOTE's format, but without the alignment requirements or name field. I think for your proposal, it allows you to drop the Version field, in favour of the type field that comes with the section. Type 1 could simply be null-terminated strings that are added to the command-line (maybe with a flag field - see below).

Yeah, I guess we can identify the "type" using the name .note.linker-options and use the type field as the version field.

Right now, I expect that most ELF linkers would process the command line arguments after all the objects have been loaded. In such a scenario, the most likely place for the addition will be the end of the command line, which is fine.

The linkers that I know of process files one at a time, so I think either adding the switch after that object has been loaded, or at the end are both viable possibilities. Restricting it to adding it to the end loses flexibility in what can be done here. Some options toggle behaviour for individual or groups of files, so it doesn't make sense for them to be added late. Also, in the library use case we currently have, as noted above, this would result in different behaviour to what we currently require, because it could result in the libraries being added in a different order. I think we could/should do something about this. One idea I had was to add a "flags" field, or something that allows to toggle the behaviour between one of three modes:

  1. Add immediately before the object containing the switch (note, I'm not sure linkers will necessarily be able to support this, because switches could change how the object is loaded, and thus the new section might otherwise not be read in at all deliberately)
  2. Add immediately after the object in question, so that all future objects are affected by it.
  3. Add at the end of the link line. What happens here for options that should affect the reading in of object files?
  4. At the start of the link line, so that it affects all objects. I'm not convinced that this one is feasible but I'm adding it for completeness - for example what happens if an archive member includes this kind of directive?

I don't think that we need to restrict the processing, but prescribing the means is a good idea. We also want to ensure that we do this well in lld. Adding the flags enables that, so, perfectly good with that.

For -r links, I would expect that the sections would be preserved across all the object files and re-emitted into the relocatable linked object. The final consumption of the object will process the concatenated options and then discard the note.

I suspect that the behaviour would need to be switch-specific. For some switches, it will be fine (indeed, required) to wait until the final link to process them. However, for others, it may affect how the intermediate -r link is supposed to work. In extreme cases, the switch could have an affect on both intermediate and final links (and of course, there's nothing stopping there being multiple intermediate links along the way). Perhaps another indication that a flag field might be necessary?

Hmm, yeah, having a flags field is reasonable, so lets go ahead and repurpose the version to flags and the notes type to version.

One other question that a colleague mentioned is what is the encoding of the command strings? UTF-8?

UTF-8 if we must, I would prefer that we be more restrictive and keep ASCII as I don't think that any of the tools (clang/lld) really treat options as UTF-8 per se, but rather as ASCII.

ruiu added a comment.Dec 8 2017, 4:02 PM

Do you mind if I ask you explain the motivation of adding this feature?

COFF linkers support somewhat similar feature. For COFF, you can embed a string containing linker command line options in a .drctive section, and if a linker decide to link that object file, it handles the command line options as if they were given via the command line. The feature is used, for example, to add "/lib" option (-l in Unix) to the command line automatically, so that you don't need to remember which libraries you have to add to the final link command line. I understand that it is sometimes useful.

As I have implemented the feature to COFF lld, I know how it works and have learned a lot from it. From that perspective, I'd like to note that the feature is often too powerful and tricky. We don't want object files to change some basic behavior of the linker via an implicit way, but with that feature, you can do that. (Even if you wouldn't do that, someone who find the feature would become very "creative" and do something very tricky with it.) For example, do you want an object file to add the --relocatable option to the command line, or do you want to allow an object file to add more object files to be linked? I'd think the answer is no. Also, some command line options cannot be implemented correctly in the first place. For example, lld processes all linker scripts before linking any object files. If you try to add a linker script using --script=<filename>, it wouldn't work because it's too late to do that.

In addition to that, the feature would make the linker's behavior more unpredictable to the user because the final command line options depend on object files the linker would pull out of static archives. Imagine that two object files contain --as-needed and --no-as-needed. The final output would depend on what order these files are pulled out of the archive and linked.

Overall, I don't think we really want to allow *all* features, because it is extremely powerful and too easy to become out of control. I'd strongly prefer adding a limited set of features in a more controlled manner.

Sure, the intent is that only a subset of features would be used. But, controlling which subset of the features shouldn't be limited in the implementation I think. This is mostly meant to serve as a means for adding equivalent functionality for linking that exists elsewhere. Im thinking of things like library search paths and linked libraries are generally pretty useful to be able to embed in a static library (as well as objects). One place where this would be immediately useful would be swift and Linux where currently the linker directives are added into a special section, a special tool will then post-process object files to extract and construct response files to pass along the options. All of this seems unnecessary and bulky when all the other object files support a clean way to support this.

ruiu added a comment.Dec 11 2017, 10:20 AM

Allowing a limited set of flags or features, such as adding a library path, is probably fine, but I'd strongly oppose to allow embedding arbitrary linker flags to object files. That would open up a can of warms as I described in the previous message.

It also means that it is harder to add features in the future for things which may not yet exist. As an example, there is no equivalent to this option today, but it would be pretty nice to have: -reexport-l<ibrary>. This would be equivalent to forwarders in COFF and LC_REEXPORT in the MachO. There is no ELF equivalent, but, were we to add that to the linker, this would now require changing the entire toolchain rather than forwarding the one single option.

compnerd updated this revision to Diff 126476.Dec 11 2017, 4:38 PM
compnerd edited edge metadata.

Address implementation details pointed out by @jakehehrlich

ruiu added a comment.Dec 11 2017, 4:48 PM

It also means that it is harder to add features in the future for things which may not yet exist. As an example, there is no equivalent to this option today, but it would be pretty nice to have: -reexport-l<ibrary>. This would be equivalent to forwarders in COFF and LC_REEXPORT in the MachO. There is no ELF equivalent, but, were we to add that to the linker, this would now require changing the entire toolchain rather than forwarding the one single option.

I don't think that's true. You can still keep the compiler agnostic of the options embedded in object files.

My point is, if you want to add a feature to embed linker options to object files, you need to clearly define the semantics of each linker command line option when it appears in an object file instead of in the command line. In lld, we have a clear distinction of "command-line processing pass" and "file reading pass". Many linker options don't make sense if they appear in the middle of linking, especially when they are included in object files that were pulled out from archive files. if you expect that linkers could just do what they "naturally" should do for embedded options, I think I have to say that that's a false assumption. What I'm thinking "natural" could be very different from yours. So I think you need to define what is the correct behavior for each command line option for the new use case.

Ah, my concern is in keeping the handling in the compiler agnostic. I don't mind discussing the aspects of the options. I think that the two options that really matter are -L and -l. I think that adjusting the linker search path should apply to all future library searches. For the -l, the ideal location would be after the object itself. But, I don't think that should really gate this infrastructure level support.

ruiu added a comment.Dec 11 2017, 10:04 PM

Yeah. As long as you are limiting the options that you can use in the embedded option string small and define them clearly, I'm basically fine with the basic idea of the embedded option itself.

As to the file format, does this patch produce a single string containing multiple options? If so, please consider emitting one string for each command line option. In COFF, an object file has only one string containing all embedded options, and parsing that string using the exact same rule as the shell (in this case DOS command prompt) would use is unnecessarily complicated and error-prone. We shouldn't repeat the same mistake in ELF.

The current elf gABI says that we should be using 8-byte alignment for 64-bit objects, not 4-byte alignment, so I think that this code needs updating for that case. However, I gather from an ongoing discussion on the gABI mailing lists that this has been inconsistently applied, and I haven't looked to see what LLVM does for other .note sections, so I would advise doing whatever is consistent with other LLVM-emitted .note sections.

I have two possible thoughts based on @ruiu's comments, which are effectively competing approaches:

  1. Why not explicitly define now types for the specific use cases already known about, e.g. one for adding libraries, and one for adding library paths? Then, instead of this being arbitrary option strings, the desc becomes the name of the library/path that should be added.
  2. Linkers could simply reject/ignore any directives received that make no sense for that specific linker (they could make more sense for other linkers, depending on their design).

I personally prefer option 1), as it better matches what our proprietary linker does, and will make it easier to port things! Also, it makes things a little easier to control, I think. Option 2) though is perhaps more flexible for different linkers to support - it may be, for example, that some options make perfect sense for some linkers and not for others, though there's no reason why LLD and other linkers couldn't just reject/ignore unrecognised/unsupported types. Option 2) also allows for the compiler to remain agnostic, as you described, which leads me onto my next question:

How do you anticipate end-users making use of this functionality? Is it via command-line options? Source code additions? Something else that I haven't thought of? I think the mechanism used might inform the design choice above. For example, if the mechanism is to do it via different kinds of pragmas (e.g. #pragma comment(lib, "mylib.a" and #pragma comment(libpath, "/some/library/path"), I think option 1) starts to make more sense again - to increase the set of options that this supports would require adding new pragma options, so it requires modifying the compiler anyway. If, on the other hand, it's more like a single pragma that provides a raw command-line (e.g. #pragma comment(linker, "-lmylib.a -L/some/lib/path", or better yet, two different pragma lines one for each option, to avoid having to parse the string), then clearly option 2) makes more sense. In either case, the linker is likely to need updating to support the new option that is desired, since we want to restrict what the linker will attempt to use, or we could get very broken links.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
119

For a valid note section, you also need to align the end of the desc field.

test/Feature/elf-linker-options.ll
2

Could we have a 64-bit test-case, as well, please?

@ruiu I had the same concern and thats why I am doing one option per C-style string with an array that is null terminated to represent the options. If there are options with spaces they will be a string onto themselves. That is in the test itself (two options: "spaced option" and "nospace").

@jakehehrlich LLVM currently does 4-byte alignment and this code is following suit. My reading of the gABI was that it was 4-byte on 32-bit, 8-byte on 64-bit. Personally, I think that we should pass all options through and let the linker decide which option(s) it permits and rejects. There are options that I would prefer that the linker flat out reject (e.g. -nopie). This allows us to remain agnostic to the various different linkers and allows someone to implement extensions in a fairly easy manner. For end users, I was thinking more the #pragma comment(linker, "...") approach (again, in line with option 2). I think that source code annotations would be better and more obvious that they are adding this. In the short term, even having IR level support and lld support would mean that we could actually start using this for swift immediately :-).

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
119

Oh, right, forgot about that, good catch!

test/Feature/elf-linker-options.ll
2

Yes, yes we should have one of those, I'll add one.

compnerd updated this revision to Diff 126585.Dec 12 2017, 11:05 AM
compnerd edited the summary of this revision. (Show Details)

Add llvm-readobj decoding support, tweak the structure layout a bit.

compnerd marked 3 inline comments as done.Dec 12 2017, 11:05 AM
ruiu added a comment.Dec 12 2017, 1:05 PM

@ruiu I had the same concern and thats why I am doing one option per C-style string with an array that is null terminated to represent the options. If there are options with spaces they will be a string onto themselves. That is in the test itself (two options: "spaced option" and "nospace").

Could you push it further and make the options key-value pairs instead of just a sequence of strings? In the current form, there's an ambiguity such as "-l /foo/bar" and "-l/foo/bar", and parsing them correctly in the linker is tedious and error-prone (the former could mean a path starting with a space character, for example). The value can be empty if the option doesn't take an argument.

I don't think that parsing the options on the LLVM side is a good idea. What if the linker uses a non-GNU style driver? What if there are multiple ways of specifying the option (joined vs separate)? This is something that really should be pushed further down. Can you even do -l library with the GNU driver?

ruiu added a comment.Dec 12 2017, 1:52 PM

I don't think that parsing the options on the LLVM side is a good idea. What if the linker uses a non-GNU style driver? What if there are multiple ways of specifying the option (joined vs separate)? This is something that really should be pushed further down. Can you even do -l library with the GNU driver?

I'd make users pass a key-value pair instead of a single string. A key-value pair is actually more straightforward than passing a single string for something like #pragma comment(lib, "foo"), no?

I don't think we should accept more than one way of doing one thing. If we allow -l, we should allow only -l and shouldn't allow any other form of the same option including --library=foo, -library=foo or -library foo`.

I was thinking more along the lines of #pragma comment(linker, "-liberty").

ruiu added a comment.Dec 12 2017, 2:55 PM

I was thinking more along the lines of #pragma comment(linker, "-liberty").

I think #pragma linker_option("-l", "iberty") would be better.

@ruiu, that explicitly has the problem that Im trying to avoid: having to modify the compiler for any new option.

ruiu added a comment.Dec 12 2017, 4:00 PM

@ruiu, that explicitly has the problem that Im trying to avoid: having to modify the compiler for any new option.

Why? To compilers, both key and value are opaque tokens which they just store into an object file as a key-value pair.

I think you're getting mixed up between me and Jake :-)

LLVM currently does 4-byte alignment and this code is following suit. My reading of the gABI was that it was 4-byte on 32-bit, 8-byte on 64-bit.

But this code doesn't do what the gABI says? I would expect 8-byte note sections on 64-bit ELF, such as x86_64. Anyway, as noted, there's quite a bit of ongoing discussion on the gABI mailing list, so maybe it's best to wait on a final decision on format until that concludes?

@ruiu's proposal works for me. There is one new pragma defined "linker_command", which takes two options, one being the switch ("-l"), and the other the values for that switch ("iberty"), with an empty string being used ("") for switches with no argument, if we ever decide they are supported in the future, and similarly and empty string being used for the switch for new positional arguments (again I'm not saying we should necessarily support this, but the option will be there). Each linker will then be able to accept/reject/ignore the switch, without having to go through complicated parsing routines. I would hope it should be possible to pass it to the existing command line parsing library and find out which switch it corresponds to, somehow, but I don't know that area so well. The compiler does not have to change at all after the pragma is put in, even if new options are desired.

include/llvm/BinaryFormat/ELF.h
1384 ↗(On Diff #126585)

This number seems a bit arbitrary. Any particular reason for this value specifically?

test/Feature/elf-linker-options.ll
20

I'm confused. Why is this .p2align 3 and not .p2align 2?

tools/llvm-readobj/ELFDumper.cpp
3464–3469

Why not make this a proper map of some kind (e.g. DenseMap), so that you can use a straightforward lookup?

Yes, gABI specifies pointer sized alignment, but that is not what currently occurs. Everything is 4-byte aligned due to a mis-interpretation of the specification long ago, and now it has fossilized into the reality (yay for compatibility!).

Thinking more about this, while Im still not a fan of the split argument proposal, I think that is something that isn't needed for this part of the implementation. That will be a separate change to clang which will actually process the pragma and generate the associated metadata. This implementation should be able to accommodate both.

include/llvm/BinaryFormat/ELF.h
1384 ↗(On Diff #126585)

No, there is no logical reasoning for this value. Would another value be preferable?

test/Feature/elf-linker-options.ll
20

Because I can't copy and paste properly. This should be .p2align 2.

tools/llvm-readobj/ELFDumper.cpp
3464–3469

It was the pattern that I had started when I was adding support for other note types in llvm-readobj. This is just following that pattern. Is a DenseMap really better than the linear lookup for a very small number of items (<8)? If so, I can do that in a subsequent change.

ruiu added a comment.Dec 13 2017, 1:56 PM

If you would make it perfectly clear how to parse an embedded string, I'd be fine with the single string approach rather than the key-value approach. But currently it seems the new feature gives too much freedom on how to use it.

Besides the space issue that I commented below, I think you need to define the character code of the embedded string. That's important because pathnames may contain non-ASCII characters particularly on Windows. I'd recommend you always store strings in UTF-8.

test/Feature/elf-linker-options.ll
7

Imagine that these two strings are pathnames. Pathnames can contain any character other than \0. That mean you can't distinguish "spaced options" from "spaced" and "options, which is bad. You should allow only one string.

Yes, gABI specifies pointer sized alignment, but that is not what currently occurs. Everything is 4-byte aligned due to a mis-interpretation of the specification long ago, and now it has fossilized into the reality (yay for compatibility!).

I'm going to drop my concern here, since other note sections do this. Also, it looks like the direction of conversation is going in the direction of creating a new section type for 64-bit notes.

Thinking more about this, while Im still not a fan of the split argument proposal, I think that is something that isn't needed for this part of the implementation. That will be a separate change to clang which will actually process the pragma and generate the associated metadata. This implementation should be able to accommodate both.

How would you distinguish between a key/value pair versus two separate options in this case?

include/llvm/BinaryFormat/ELF.h
1384 ↗(On Diff #126585)

Assuming this is the first LLVM-vendor note, maybe 1? If not, what are the other note type values?

test/Feature/elf-linker-options.ll
7

The way I see it, this is intended to be interpreted as a single option. However, I think this demonstrates why a key-value pair is important.

tools/llvm-readobj/ELFDumper.cpp
3464–3469

Ok, if this is the common pattern, that's fine. I think that maps just look a little nicer to construct, since you don't need to define a new struct.

3473

I don't think you need to explicitly convert to a string here.

compnerd added inline comments.Dec 14 2017, 6:03 PM
include/llvm/BinaryFormat/ELF.h
1384 ↗(On Diff #126585)

I used 1 originally as well. However, the note values themselves are not valued by owner, but in a global pool. We have values from various places like 0x53494749 which is if the siginfo_t is present in the core file that the ELF file represents. Or did I manage to confuse myself when I looked at the values?

test/Feature/elf-linker-options.ll
7

Yes, this is a single value. " spaced option". I don't remember what the issue was with trying to merge the entire spaced option into a single string. But, that would simplify the logic here. I think that we should go ahead and expect that the frontend will generate a single entry for that. The key/value wouldn't help with this if the spaces were removed in the MDNode itself.

tools/llvm-readobj/ELFDumper.cpp
3473

True, the implicit constructor would work. But, it is nicer to be explicit I think.

jhenderson added inline comments.Dec 15 2017, 1:53 AM
include/llvm/BinaryFormat/ELF.h
1384 ↗(On Diff #126585)

According to the ELF gABI, the note type values are vendor-defined, so it doesn't matter that they're in a global pool (looking at ELF.h, we already support some duplicate values like NT_FREEBSD_PROCSTAT_VMMAP/NT_AMD_AMDGPU_HSAMETADATA, both with value 10). I'm not seeing a problem with sharing this value still? Consumers should always test the vendor type first.

I'm not sure where you're getting the 0x53494749 from, or its relevance here.

compnerd added inline comments.Dec 15 2017, 10:31 AM
include/llvm/BinaryFormat/ELF.h
1384 ↗(On Diff #126585)

Ah, in that case, Ill go ahead and change to 1. There's no need to not use an arbitrary value then. That was the one that showed up on my screen when I was looking for the values for various notes. I had looked at the values and thought that the values were all in a single pool and that there was no pattern to any of them.

compnerd added inline comments.Dec 15 2017, 11:07 AM
test/Feature/elf-linker-options.ll
7

Wait, I forgot that I modelled this after the existing MachO and COFF behaviors. I think that changing the behavior of this documented interface is less desirable. If we want to have a new way to pass along the options, that is fine, but should be a separate change that unifies the behavior across all three object file formats.

compnerd updated this revision to Diff 127398.Dec 18 2017, 11:38 AM

Address various comments, fix test

compnerd marked 2 inline comments as done.Dec 18 2017, 11:39 AM
ruiu added inline comments.Dec 18 2017, 1:38 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
97

nit: delete the blank line.

tools/llvm-readobj/ELFDumper.cpp
3464–3478

Since this can be written in three lines like this, this code seems a bit too over-designed to me.

if (NT == ELF::NT_LLVM_LINKER_OPTIONS)
  return "NT_LLVM_LINKER_OPTIONS (linker options)";
return ("unknown note type (0x" + Twine::utohexstr(NT) + ")").str();
3558–3560

Using a switch for one case and one default seems odd. Regular if statement would be better.

3563

Remove the trailing space in the output.

3565

option -> Option

compnerd added inline comments.Dec 18 2017, 3:18 PM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
97

UGH, I swear I had deleted this. I hate this blank line, it keeps coming back :-(.

tools/llvm-readobj/ELFDumper.cpp
3464–3478

It wouldn't format it the same IIRC. This is also something which is effectively the same pattern as the rest of the file. If it does, then doing that in a follow up seems better.

3558–3560

It allows for adding new items in the future. It also is maintaining symmetry with the rest of the print[Owner]Note functions.

3563

Good catch.

3565

mmhm.

compnerd updated this revision to Diff 127422.Dec 18 2017, 3:20 PM

Address @ruiu's comments

jhenderson added inline comments.Dec 19 2017, 1:52 AM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
106–107

I don't like the magic number 5 here. I know it's the length of the vendor string in the first instance, but it would be easy for somebody to miss updating this if the vendor name changes. Perhaps you could do something like:

const char *Name = "LLVM";
Streamer.EmitIntValue(sizeof(Name), 4);
...
Streamer.EmitBytes(Name);

What's the purpose of the "+5" in the descsz field? I assume this is to do with the additional null and version fields, and if so, it would be good for this to be explained somehow.

109–110

I'm confused as to why you have to explicitly write a null character here instead of the code automatically using .asciz (like it does for the option string).

test/tools/llvm-readobj/elf-directives.test
23–24 ↗(On Diff #127422)

I don't think you need AddressAlign or Content fields here. In fact, do you need the section at all?

29–32 ↗(On Diff #127422)

Is this important to the test?

33–38 ↗(On Diff #127422)

Do you need Symbols/DynamicSymbols at all?

compnerd added inline comments.Dec 19 2017, 10:58 AM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
106–107

Sure, I can extract that name. Sure, I can add a comment for the size calculation.

109–110

I wish that it would add the trailing null, but it just does not. This happens in the other backends as well where we are forced to write the trailing null.

test/tools/llvm-readobj/elf-directives.test
23–24 ↗(On Diff #127422)

No, I suppose I don't need the section at all. I just did a simple obj2yaml.

29–32 ↗(On Diff #127422)

No, .note.GNU-stack is not needed for this test.

33–38 ↗(On Diff #127422)

Nope, the .file symbol is not needed in this test either.

compnerd updated this revision to Diff 127576.Dec 19 2017, 11:16 AM

Address @jhenderson's suggestions.

Okay, I have no more comments about this, aside from still being somewhat concerned about how this is supposed to be consumed by the linker. Please wait for @ruiu's further input on this.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
109

Nit: this comment needs a full stop.

109–110

Okay, sounds like this is effectively a (mostly harmless) bug in the underlying code somewhere. No worries.

silvas added a subscriber: silvas.Dec 20 2017, 1:59 PM

It looks like this thread has evolved into a design discussion. It isn't clear to me that we all agree on what problem should be solved. Saleem, it seems like you have some interest in having the format be open-ended, while many others (myself included) seem to want this to be as locked-down as possible with a narrow set of really clearly defined operations. Saleem, could you maybe start an RFC on llvm-dev so we can have a higher-level design discussion? Off the top of my head, there are a couple things that have been mentioned in this thread but not really resolved:

  • Should we discuss this with bfd and gold? If not, why not?
  • I know the PlayStation proprietary linker has support for a similar feature (as James has been discussing) and IIRC there were private patches in the compiler for it. Is it in scope for this change to allow deleting those private patches from the PlayStation compiler?
  • How does this interact with the similar functionality on COFF and MachO? Is it in scope to have a common IR representation for the three platforms?
  • From my reading of the spec, SHT_NOTE is optional. But presumably information like extra library paths is actually semantically necessary for obtaining a correct link and must not be dropped. This also is related to the question of representing this as metadata in the IR.

having the format be open-ended

By this I mean passing through arbitrary linker options without the compiler having to understand the semantics of the section contents.

  • I know the PlayStation proprietary linker has support for a similar feature (as James has been discussing) and IIRC there were private patches in the compiler for it. Is it in scope for this change to allow deleting those private patches from the PlayStation compiler?

James would be more in tune with what the PS4 linker supports than I am, but in the compiler we don't do arbitrary linker options. We support specifying libraries (a-la pragma comment lib) and dllimport/dllexport names. The proposed patch could be used for the former but I think not the latter. It's also not clear whether our mechanism for dllimport/dllexport is optimal, but it keeps us away from all the angst about trying to pervert ELF visibility into supporting it.

It looks like this thread has evolved into a design discussion. It isn't clear to me that we all agree on what problem should be solved. Saleem, it seems like you have some interest in having the format be open-ended, while many others (myself included) seem to want this to be as locked-down as possible with a narrow set of really clearly defined operations. Saleem, could you maybe start an RFC on llvm-dev so we can have a higher-level design discussion? Off the top of my head, there are a couple things that have been mentioned in this thread but not really resolved:

Sure, I can start a thread there. However, I really was hoping to get this in for the release. It really would help with swift.

  • Should we discuss this with bfd and gold? If not, why not?

Sure, if we can actually emit this, I think that bringing it up to them is reasonable, and good.

  • I know the PlayStation proprietary linker has support for a similar feature (as James has been discussing) and IIRC there were private patches in the compiler for it. Is it in scope for this change to allow deleting those private patches from the PlayStation compiler?

Yes, I want this to be sufficiently generic that they would be able to rely on the functionality provided here to implement their support. However, in either case, I don't think that the backend should be involved in understanding the options at all. This is not the proper change to discuss that IMO.

  • How does this interact with the similar functionality on COFF and MachO? Is it in scope to have a common IR representation for the three platforms?

I'm not sure what you mean by interact with them. It is implementing the same functionality. Yes, the implementation here is designed to ensure that the IR representation is identical across the three platforms. Right now, the COFF and MachO IR level representation is the same, and I did ensure that the same held for ELF. The content of the node is what changes across the implementations, because the option is linker dependent.

  • From my reading of the spec, SHT_NOTE is optional. But presumably information like extra library paths is actually semantically necessary for obtaining a correct link and must not be dropped. This also is related to the question of representing this as metadata in the IR.

Yes, that is correct, SHT_NOTE support is optional. That is why this is designed to be represented as a note. It allows linkers which do not support the feature to still work with the object file and still ensure that the object content does not make it into the final binary (it is supposed to be dropped). AIUI, any conforming implementation will drop the note even within a whole archive case.

compnerd added a subscriber: rnk.Jan 3 2018, 9:58 PM
compnerd updated this revision to Diff 130558.Jan 18 2018, 9:35 PM
compnerd edited the summary of this revision. (Show Details)
compnerd edited the summary of this revision. (Show Details)

I'm going to bring up the encoding again of this section. It's perfectly reasonable for users to have non-ASCII characters in their path (imagine e.g. Japanese developers), and whilst the back-end probably doesn't care (I assume the front-end will convert to whatever encoding we choose), readobj will print garbage if fed e.g. a UTF-8 string with non-ASCII characters. I'm happy to defer this point to not block this stage going in, since I don't think we have agreement on this yet (I'd support UTF-8 FWIW), but I think it needs to be addressed at the point when front-end support is being added.

Otherwise, I'm happy with this, barring the small suggestions I've made.

lib/CodeGen/TargetLoweringObjectFileImpl.cpp
99

It would be good to add a comment documenting the section format (and purpose?) here, even though it is simple.

test/Feature/elf-linker-options.ll
7–9

Suggestion: Add a space to one of these to show that spaces are not special in any way.

test/tools/llvm-readobj/elf-linker-options.ll
7–8

As with the other test, I'd suggest adding a space to one of these.

tools/llvm-readobj/ELFDumper.cpp
4091

clang-format?

4095

I got the impression that there was a bit of a debate about whether the library search path option should be allowed or not, so it may be a good idea to remove this one for now. I think the "lib" option is uncontroversial though.

ruiu added inline comments.Jan 19 2018, 1:41 PM
test/Feature/elf-linker-options.ll
6–8

"path" and "rpath" look too real and misleading. Can you use meaningless words such as foo or bar so that this test doesn't look like a guidance?

test/tools/llvm-readobj/elf-linker-options.ll
7–8

Ditto

tools/llvm-readobj/ELFDumper.cpp
4085–4086

It is one of the design principles of ELF that section names are not significant. Section should be identified by type and not by name. Can you define a new section type and use it?

4093–4095

I prefer just printing these strings as-is instead of converting "lib" to "library" and "path" to "library search path".

compnerd added inline comments.Jan 20 2018, 1:01 PM
tools/llvm-readobj/ELFDumper.cpp
4085–4086

As long as we are sure that this won't cause any issues with the BFD and gold linkers, I would be fine with that. Unfortunately, compatibility needs to be considered here.

compnerd marked 8 inline comments as done.Jan 20 2018, 2:35 PM
compnerd added inline comments.
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
99

Given that none of the other file formats document the format here, I think that it is better documented in the linker or in the separate documentation (we do cover LLVM extensions). I've done the latter.

tools/llvm-readobj/ELFDumper.cpp
4091

Dunno how that happened.

compnerd updated this revision to Diff 130782.Jan 20 2018, 2:36 PM
compnerd marked 2 inline comments as done.

address comments

One more thing: one of the tests should check that the section header in the object is correct, i.e. has correct type and flags.

docs/Extensions.rst
235

I think a quick example of how this would look when written in assembly would be a good idea, similar to the other code-blocks in this file.

tools/llvm-readobj/ELFDumper.cpp
4085–4086

I agree that a new type would be nice. I did some playing around with gold and BFD, and gold can handle unknown section types without complaint, whereas BFD cannot. The exception is that BFD accepts section types reserved for user applications (between SHT_LOUSER and SHT_HIUSER), and those reserved for OS-specific sections (i.e. in the SHT_LOOS to SHT_HIOS) (there are irrelevant exceptions based on the flags). Neither of these look like ideal solutions.

Maybe this is worth discussion on the gABI and/or BFD mailing list, on the more general topic of "we want to extend ELF, at least for our linker/compiler pair. What section type should we use?"

compnerd added inline comments.Jan 22 2018, 12:13 PM
docs/Extensions.rst
235

Sure, I can add that.

tools/llvm-readobj/ELFDumper.cpp
4085–4086

I can create a custom section type in the reserved range I suppose.

compnerd updated this revision to Diff 131160.Jan 23 2018, 3:21 PM

Update documentation

Looks okay to me, although we still need a decision on the section type.

@ruiu - have you had any further thoughts on this?

docs/Extensions.rst
236

SHF_PROGBITS -> SHT_PROGBITS (or whatever type we end up using)

240

Nit: either "equivalently emitted in ..." or "equivalent to the following raw assembly:"

tools/llvm-readobj/ELFDumper.cpp
4085–4086

My problem is that I don't know which reserved range is the right one to use. There doesn't seem to be a "vendor" reserved range. LLVM isn't an OS, Processor, or final application, which are what the three current ranges refer to.

compnerd updated this revision to Diff 131315.Jan 24 2018, 11:03 AM

Update documentation further, add new linker type, update tools and tests.

compnerd marked 2 inline comments as done.Jan 24 2018, 11:13 AM
compnerd added inline comments.
tools/llvm-readobj/ELFDumper.cpp
4085–4086

I just added it to the same area as the ODR table extension.

compnerd updated this revision to Diff 131319.Jan 24 2018, 11:13 AM

tweak documentation

jhenderson accepted this revision.Jan 25 2018, 1:25 AM

Okay, LGTM, with one nit fix in the documentation. Thanks.

I'd like @ruiu to confirm he's happy with the section type before this goes in.

docs/Extensions.rst
240

Reminder of this nit. Please fix :)

This revision is now accepted and ready to land.Jan 25 2018, 1:25 AM
ruiu added inline comments.Jan 26 2018, 3:01 PM
docs/Extensions.rst
228

The section name is not significant -- you should say a section with SHT_LLVM_LINKER_OPTIONS type (which is usually named .linker-options though name is not significant.)

230

Specifying the encoding is important if a path contains non-ASCII characters. "null-terminated C-strings" are a bit redundant because it means C-strings or null-terminated strings. So I'd update

The strings are encoded as null-terminated UTF-8 strings.
241

This document should also cover the options that we want to use at the moment, e.g. "lib" and "path", and let people who want to add a new option to update this document, so that this document becomes a virtual standard doc for this feature. Otherwise, there's a risk that people start to add arbitrary options and this feature gets out of control.

compnerd marked 3 inline comments as done and an inline comment as not done.Jan 27 2018, 9:38 AM
compnerd closed this revision.Jan 30 2018, 8:31 AM

SVN r323783

ruiu added a comment.Jan 30 2018, 10:12 AM

I was actually waiting for you to update this patch. You didn't address my comments.

Sorry, seems that something went wrong with the update on phabricator. I did update the documentation, I was under the impression that you were okay with the change beyond the documentation change.