This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Implement Dependent Libraries Feature
ClosedPublic

Authored by bd1976llvm on Apr 4 2019, 10:17 AM.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

bd1976llvm created this revision.Apr 4 2019, 10:17 AM
ormris added a subscriber: ormris.Apr 4 2019, 10:49 AM
smeenai edited reviewers, added: ruiu; removed: rui314.Apr 4 2019, 11:38 AM
smeenai added a subscriber: smeenai.

(adding the correct Phabricator account for Rui)

ruiu added a comment.Apr 4 2019, 4:07 PM

First round of review for the lld part -- this patch generally looking good. Nice and compact and should be useful. I think I like this.

lld/ELF/Config.h
124 ↗(On Diff #193740)

We use the same name for a variable and a command line, thus IgnoreDeplibs.

lld/ELF/Driver.cpp
763 ↗(On Diff #193740)

Maybe something like -autolink/-no-autolink is more conventional.

1500 ↗(On Diff #193740)

nit: 0u → 0

lld/ELF/InputFiles.cpp
400 ↗(On Diff #193740)

Please add a function comment, e.g.

An ELF object file may contain an `.autolink` section. If exists, the section contains a library name such as `m` for libm. This function resolves a given name as if it were given via -l. This ELF extension is called autolinking and currently unique to LLVM and lld.
401 ↗(On Diff #193740)

Let's flip the condition and return early.

654 ↗(On Diff #193740)

Flip the condition and return early.

657 ↗(On Diff #193740)

Instead of allowing multiple names in a single .autolink section, why don't you create multiple .autolink sections?

pcc added inline comments.Apr 4 2019, 4:09 PM
lld/ELF/Config.h
124 ↗(On Diff #193740)

Sort alphabetically.

lld/ELF/InputFiles.cpp
406 ↗(On Diff #193740)

searchLibrary contains the support for handling flags like -l:foo; I guess if you don't want that behaviour here then it will need to be factored out of searchLibrary.

llvm/docs/Extensions.rst
291 ↗(On Diff #193740)

*specifying

llvm/include/llvm/Object/IRSymtab.h
138 ↗(On Diff #193740)

I think the version number needs to be incremented.

157 ↗(On Diff #193740)

Maybe just Range<Str>?

llvm/lib/LTO/LTOModule.cpp
653 ↗(On Diff #193740)

This could use a range for loop.

I think it will be worth adding some user level documentation, whether in this patch or another, for the pragma and how autolinking is expecting to work. My apologies I've not had time to go through the code in detail.

llvm/docs/Extensions.rst
290 ↗(On Diff #193740)

"allows for embedding a set" could probably be worded a bit more simply. For example:
This section contains strings specifying libraries to be added to the link by the linker.

llvm/docs/LangRef.rst
6084 ↗(On Diff #193740)

I prefer the wording of your opening sentence, "Some targets support embedding of" is it worth synchronising?

llvm/include/llvm/BinaryFormat/ELF.h
841 ↗(On Diff #193740)

This is the same value as has been proposed in another ELF extension in https://reviews.llvm.org/D60242 , May be worth coordinating here. It is probably worth us having a central place to document the LLVM specific ELF extensions as we are accumulating enough of them.

pcc added inline comments.Apr 4 2019, 8:28 PM
llvm/include/llvm/BinaryFormat/ELF.h
841 ↗(On Diff #193740)

I don't have any existing dependencies on the value that I've chosen. If Ben has no dependencies of his own, I reckon that whoever ends up committing first can just get this value, and the next person will get the next one. I think this can be our general policy.

Do you think that https://llvm.org/docs/Extensions.html#elf-dependent is not sufficient as a place to document our extensions?

peter.smith added inline comments.Apr 4 2019, 8:33 PM
llvm/include/llvm/BinaryFormat/ELF.h
841 ↗(On Diff #193740)

I think it could be, now that I see it fully expanding with some of the extensions mentioning the elf terminology. I think we should mention the SHT_... and hex value so that someone doesn't have to dig for that in the source code.

Am I mistaken in that this effectively prevents the use of options to handle frameworks? That really is a strong requirement and should be part of this patch.

clang/lib/CodeGen/CodeGenModule.cpp
451 ↗(On Diff #193740)

I don't think its worth the shortening here, this is only going to be seen in the IR, so, lets spell it out (llvm.dependent.libraries or llvm.dependent-libraries sound good to me).

Am I mistaken in that this effectively prevents the use of options to handle frameworks? That really is a strong requirement and should be part of this patch.

How so? The .linker-options implementation hasn't been modified (and is still needed for autolinking in clang 5 modules). The change in the fronted is that --dependent-lib and "comment lib" pragmas use .deplibs for ELF. Can you provide more details on your concerns?

The early check for ELF is the problem. I care very little about .linker-options itself as long as the ability to ensure that framework linkage is provided (i.e. -framework ..., and '-L ...` can be passed along as options to the linker).

bd1976llvm added a comment.EditedApr 5 2019, 10:42 AM

The early check for ELF is the problem.

Apologies, I still don't understand this.. probably because I am a bit out of my depth modifying Clang. As this implementation is ELF only I though that the early check for ELF is appropriate. In order to support framework linking on ELF, you are going to have to re-implement the framework options in ELF specific code, no?

I care very little about .linker-options itself as long as the ability to ensure that framework linkage is provided (i.e. -framework ..., and '-L ...` can be passed along as options to the linker).

Maybe we were not on the same page after the RFC then. I am only attempting to support "comment lib" pragmas here. If we want to support framework linking with this approach the best I can offer is to change the handleDepLib function to:

static void processDepLib(StringRef Specifier, const InputFile *F) {

if (Config->DepLibs)
  if (fs::exists(Specifier))
    Driver->addFile(Specifier, /*WithLOption=*/false);
  else if (Optional<std::string> S = findFromSearchPaths(Specifier))
    Driver->addFile(*S, /*WithLOption=*/true);
  else if (Optional<std::string> S = searchLibrary(Specifier))
    Driver->addFile(*S, /*WithLOption=*/true);
  else if (Optional<std::string> S = searchFramework(Specifier))
    Driver->addFile(*S, /*WithLOption=*/true);
  else
    error(toString(F) + ": unable to find library from dependent library specifier: " + Specifier);

}

We could modify this function to warn/error if it finds two matching libraries for a specifier to *perhaps" address some of you earlier concerns.

This dependent libraries proposal doesn't handle -L options at all. If you need that and fine control over framework linking then we will need to put in .linker-options support into LLD; but, it is not clear exactly how that should work and I think it is another change distinct from this one.

This commit message is rather bare-bones -- I hope that the ultimate commit message will be updated to contain more than just a reference to the RFC thread.

lld/ELF/InputFiles.cpp
402 ↗(On Diff #193740)

This bit seems unfortunate. "-lfoo" won't search for a file named "foo", and having this code do so does not seem like a good idea. I'd want this feature to work just like -l.

Actually, I think it'd be better to preserve the "-l" prefix in the deplibs section. (I'd note that even if libs in the search path are spelled "-lfoo", that doesn't mean we need accept any other options)

If we want to support loading a filename which not on the search path (but...do we, even? That seems like a kinda scary feature to me...), then it'd be obvious how to spell that: "libfoo.a" and "/baz/bar/foo.a" would open files directly, vs "-lfoo" and "-l:foo.a" would search for them on the search path.

bd1976llvm marked an inline comment as done.Apr 5 2019, 2:29 PM
bd1976llvm added inline comments.
lld/ELF/InputFiles.cpp
402 ↗(On Diff #193740)

What you have described is the behavior of the INPUT linker script command: https://sourceware.org/binutils/docs/ld/File-Commands.html#File-Commands.

We have carefully designed this "dependent libraries" feature to avoid mapping "comment lib" pragmas to --library command line options. My reasons:

  • I don't want the compiler doing string processing to try to satisfy the command line parsing behaviour of the target linker.
  • The syntax is #pragma comment(lib, ...) not #pragma linker-option(library, ...) i.e. the only thing this (frankly rather bizarre) syntax definitely implies is that the argument is related to libraries (and comments ¯\_(ツ)_/¯); it is a bit of a stretch to interpret "comment lib" pragmas as mapping directly to "specifying an additional --library command line option".
  • I want to avoid GNUism's and use a "general" mechanism. MSVC source code compatibility is a usecase.
  • It is important to support loading a filename which not on the search path. For example I have seen developers use the following: #pragma comment(lib, FILE"\\..\\foo.lib")

I would like the following to work on for ELF:

#pragma comment(lib, "foo") => add libfoo.a to the link.
#pragma comment(lib, "foo.lib") => add foo.lib to the link
#pragma comment(lib, "c:\\foo.lib") => add c:\foo.lib to the link

The way the code is now these will work on both LLD and MSVC (and I think it will work for Apple's linker as well although I haven't checked). In addition, we have been careful to come up with a design that can be implemented by the GNU linkers... with the cost that some complicated links will not be possible to do via autolinking; in these (IMO rare) cases developers will need to use the command line directly instead.

What I am trying to accomplish is to try to keep #pragma comment(lib, "foo") compatible across linkers. Otherwise we will be in a situation where you will have to have the equivalent of...

#ifdef COFF_LIBRARIES:
#pragma comment(lib, "/DEFAULTLIB:foo.lib")
#'elif ELF_LIBRARIES_GNU_STYLE_LINKER:
#pragma comment(lib, "-lfoo")
#'elif DARWIN_FRAMEWORKS:
#pragma comment(lib, "-framework foo")
#esle
#error Please specify linking model
#endif

... in your source code (or the moral equivalent somewhere else). We can avoid this.

Also note that I am not proposing to remove the .linker-options extension. We can add support for .linker-options to LLD in the future if we find a usecase where developers want pragmas that map directly to the linkers command line options for ELF. If this is a usecase then, in the future, we can implement support for #pragma comment(linker, ....) pragmas that lower to .linker-options; #pragma comment(lib, ...) pragmas can continue to lower to .deplibs.

MaskRay added inline comments.Apr 5 2019, 10:54 PM
lld/ELF/InputFiles.cpp
658 ↗(On Diff #193740)

ELF/EhFrame.cpp readString has an example using llvm::find(D, '\0') to prevent out-of-bounds read in case the section is corrupted.

663 ↗(On Diff #193740)

Early return fixes fallthrough issue here.

jyknight added inline comments.Apr 6 2019, 3:13 PM
lld/ELF/InputFiles.cpp
402 ↗(On Diff #193740)

It just seems to me a really bad idea to have a situation where specifying #pragma comment(lib, "foo") can either open a file named "foo" in the current directory, or search for libfoo.{a,so} in the library search path.

That is pretty much guaranteed to cause problems due to unintentional filename collision in the current-directory.

bd1976llvm marked an inline comment as done.Apr 8 2019, 7:01 AM
bd1976llvm added inline comments.
lld/ELF/InputFiles.cpp
402 ↗(On Diff #193740)

Linkers already have the behaviour of finding the first matching file in the library search paths, and gnu-ld at least seems to add "." implicitly as the first library search path. So, developers already need to make sure that they don't have multiple copies of a library in the wrong place. This behaviour doesn't seem “too much” of a deviation from that. In other words I suspect that the chance of having a library named foo in the current directory and a library named libfoo.a in a library search path is remote, no?

Would your concerns be alleviated by changing the precedence, like so:

static void processDepLib(StringRef Specifier, const InputFile *F) {
  if (!Config->DepLibs)
    return;
  else if (Optional<std::string> S = searchLibrary(Specifier))
    Driver->addFile(*S, /*WithLOption=*/true);
  else if (Optional<std::string> S = findFromSearchPaths(Specifier))
      Driver->addFile(*S, /*WithLOption=*/true);
  if (fs::exists(Specifier))
      Driver->addFile(Specifier, /*WithLOption=*/false);
  else
    error(toString(F) +
          ": unable to find library from dependent library specifier: " +
          Specifier);
}

We could also change this function to error if a given specifier would find a library in more than one location (assuming that there was no early out)?

bd1976llvm updated this revision to Diff 194320.Apr 9 2019, 7:24 AM

Addressed review comments.

bd1976llvm marked 25 inline comments as done.Apr 9 2019, 7:53 AM

I have updated the diff to address review comments. I think we can continue to refine this patch in parallel with discussing the design further (I am not dismissing the concerns raised by @compnerd and @jyknight).

I am unsure if I should be doing validation on the structure of the llvm.dependent-libraries named metadata e.g. what if there are zero operands or more than one operand for a metadata node. Does anyone know what LLVM's policy is?

lld/ELF/InputFiles.cpp
406 ↗(On Diff #193740)

I was undecided here. Refactoring the code to remove the colon prefix behavior and adding other checks like only allowing libraries (not objects) to be added to the link adds complexity that might not be justified. Maybe we should simply document that some behaviors might work but are unsupported and liable to change without warning?

657 ↗(On Diff #193740)

This would simplify the implementation but I don't think that the size increases in the input files is worth it. Also, it is easier to dump the dependent libraries if they are all in one section.

llvm/include/llvm/BinaryFormat/ELF.h
841 ↗(On Diff #193740)

I think that it is better if the hex value is in this one place and we use the SHT_ token everywhere else. This reduces the potential for mistakes.

llvm/include/llvm/Object/IRSymtab.h
157 ↗(On Diff #193740)

Thanks! I went for the struct with a single element representation because that is used for comdats. Should we make this same change for comdats?

jyknight added inline comments.Apr 9 2019, 7:59 AM
lld/ELF/InputFiles.cpp
402 ↗(On Diff #193740)

gnu-ld at least seems to add "." implicitly as the first library search path

I don't see that behavior. Neither GNU ld.bfd, GNU ld.gold, nor llvm's lld use "." as a default library search path as far as my testing can determine. Neither when invoked directly, nor when invoked through the gcc and clang drivers.

In other words I suspect that the chance of having a library named foo in the current directory and a library named libfoo.a in a library search path is remote, no?

No, I don't think the chance is remote at all. I can easily imagine linking against a library named "clang", ("libclang.so"), while having a file or directory named "clang" in the current directory, for example.

Would your concerns be alleviated by changing the precedence

No, not really.

bd1976llvm marked 5 inline comments as done.Apr 9 2019, 11:06 AM
bd1976llvm added inline comments.
lld/ELF/InputFiles.cpp
402 ↗(On Diff #193740)

I don't see that behavior. Neither GNU ld.bfd, GNU ld.gold, nor llvm's lld use "." as a default library search path as far as my testing can determine. Neither when invoked directly, nor when invoked through the gcc and clang drivers.

I made the mistake of testing on an older copy of the binutils (from a 14.04 ubuntu). I get the same results as you with a more recent versions of the gnu tools. Apologies. Interestingly, binutils from 14.04 also allow absolute paths via -l:<absolute path> which is another behavior that has been removed.

No, I don't think the chance is remote at all. I can easily imagine linking against a library named "clang", ("libclang.so"), while having a file or directory named "clang" in the current directory, for example.

In this case are you envisioning that both of these files are competing copies of the same library, or is the "clang" in the current directory another type of file? I ask because if the "clang" in the current directory is not a library then the link will fail fast and the developer can then move this file somewhere else. If you had two different copies of the same library hanging around, then picking up the wrong one might cause bad behavior at runtime - that is the scenario that I consider to be remote.

For the example that you gave the change in precedence would be a fix, as it would cause the linker to pick up "libclang.so" and ignore "clang" in the current directory. FWIW we have been shipping with this behavior on PlayStation and it has not caused an issue for game links.

If you have a hard objection to implicitly searching in the current directory *at all* then we will have to go with another option. I would rather not remove absolute path support as it would reduce MSVC compatibility. Potential other options are:

  • We could remove support for cwd-relative paths and instead check the path to see if it is absolute and load the library directly if it is.
  • Another variation would be if the string contains at least one path separator, treat it as a path. That's simpler than checking for an absolute path. Users could also do 'pragma comment(lib, "./my.lib")' for relative paths too if they really wanted.
compnerd added inline comments.Apr 12 2019, 9:17 AM
lld/ELF/InputFiles.cpp
662 ↗(On Diff #194320)

Can you make the flag here reflect the name as well? (SHT_LLVM_DEPENDENT_LIBRARIES)

lld/ELF/Options.td
74 ↗(On Diff #194320)

I think that I prefer the --dependent-libraries flag which is similar to the recent additions to the linker options.

phosek added a subscriber: phosek.Apr 13 2019, 11:10 PM
grimar added a subscriber: grimar.Apr 15 2019, 6:49 AM

No longer shortening "dependent libraries" to "deplibs" except for the .deplibs section (as this takes up bytes on disk).

bd1976llvm marked 2 inline comments as done.Apr 16 2019, 5:52 PM

I'm okay with the PS4-specific bits.

I am keen to keep this moving.

I think there are a few things outstanding:

  1. Need to resolve concerns w.r.t the design.
  2. I need to find out whether I should be doing validation when reading the metadata in LLVM.
  3. The LLD side needs a more detailed review.

@jyknight @compnerd can we progress the design conversation?

ruiu added a comment.Apr 25 2019, 9:08 PM

I don't have any particular comment, and I'll give an LGTM soon as people seem to generally happy about this. If you are not happy about this patch or the feature itself, please shout. This feature is about to land.

ruiu accepted this revision.May 9 2019, 5:48 AM

LGTM

This revision is now accepted and ready to land.May 9 2019, 5:48 AM

@bd1976llvm do you plan on landing this? We'd really like to start using this feature.

This revision was automatically updated to reflect the committed changes.