This is an archive of the discontinued LLVM Phabricator instance.

[LLD][COFF] Add LLVM toolchain library paths by default.
ClosedPublic

Authored by thieta on May 23 2023, 1:06 AM.

Details

Summary

We want lld-link to automatically find compiler-rt's and
libc++ when it's in the same directory as the rest of the
toolchain. This is because on Windows linking isn't done
via the clang driver - but instead invoked directly.

This prepends: <llvm>/lib <llvm>/lib/clang/XX/lib and
<llvm>/lib/clang/XX/lib/windows automatically to the library
search paths.

Related to #63827

Diff Detail

Event Timeline

thieta created this revision.May 23 2023, 1:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 1:06 AM
thieta requested review of this revision.May 23 2023, 1:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 1:06 AM

Hi! This is the first draft of what we discussed at EuroLLVM - to let LLD find the runtime and lib dir to integrate sanitizers easier and libc++ on Windows.

This patch is not done, but I wanted to open it early because there are some discussion points - specifically:

  • LLD doesn't know the full triple, which we need to find the directory where the runtimes or libraries are located when building the runtimes with PER_TARGET_DIRECTORY. I added code to synthesize the triple from some options in LLD, but it feels a little icky. I don't know if you guys have any thoughts on that.
  • The way lld finds libraries seems to happen very early, so unless the user specifies /machine:x64, the libraries won't be seen because we haven't parsed any object files and extracted the target machine yet. One thought I had here was to try to find the libraries early, and if we don't see a library, we can wait until we have parsed an object file, extracted the machine, and then tried to refind those libraries we failed to find. But I wanted to make sure that this is the path we want to take.
  • I don't think we can extract the straight logic for finding the paths from the clang driver (it's pretty well nestled in there), so I ended up more or less rewriting the code in WindowsDriver - not sure how to get that logic back into the clang driver yet, maybe we could call it from the MSVC/MingW drivers instead of keeping it in the base Driver/toolchain.

With the current patch above I was able to do:

bin\lld-link.exe test.obj -wholearchive:clang_rt.asan.lib -wholearchive:clang_rt.asan_cxx.lib /machine:x64 and it found the runtimes without any additional arguments.

I didn't have a look at the contents of the file yet, but wanted to add a reference here:

  • The way lld finds libraries seems to happen very early, so unless the user specifies /machine:x64, the libraries won't be seen because we haven't parsed any object files and extracted the target machine yet. One thought I had here was to try to find the libraries early, and if we don't see a library, we can wait until we have parsed an object file, extracted the machine, and then tried to refind those libraries we failed to find. But I wanted to make sure that this is the path we want to take.

I believe this is closely related to what D150981 does too - please have a look at that if it helps something here.

llvm/include/llvm/WindowsDriver/LLVMPaths.h
26 ↗(On Diff #524598)

Missing newline at the end of the file

llvm/lib/WindowsDriver/LLVMPaths.cpp
109 ↗(On Diff #524598)

Missing newline at the end of the file

I believe this is closely related to what D150981 does too - please have a look at that if it helps something here.

Oh, this is exactly what I was thinking! Thanks for pointing that out. I will try to apply it to my local tree and see if it works in this context.

hans added a comment.May 23 2023, 6:20 AM

I also only looked briefly, but it seems like this assumes LLVM_ENABLE_PER_TARGET_RUNTIME_DIR builds? I think that's still off by default on Windows, so maybe it would be enough to start with supporting the non-per-target-runtime-dir case, or we should aim to support both cases.

I also only looked briefly, but it seems like this assumes LLVM_ENABLE_PER_TARGET_RUNTIME_DIR builds? I think that's still off by default on Windows, so maybe it would be enough to start with supporting the non-per-target-runtime-dir case, or we should aim to support both cases.

No both should be supported if you check the function getToolchainRuntimeDirs() and getToolchainLibDirs() they are checking for both style but then we only add the directories that exists.

MaskRay added a comment.EditedMay 23 2023, 12:39 PM

Adding default library paths to the linker seems odd. The convention is to add the code to Clang Driver.
ld.lld (and likely gold) doesn't add any default library path and it has been working fine.
GNU ld may add some paths like /usr/lib that's already been added by gcc/clang drivers.

Adding default library paths to the linker seems odd. The convention is to add the code to Clang Driver.

The issue with lld-link is that in MSVC environments, the convention is to invoke link or lld-link directly without going through cl or clang-cl. That's what makes things problematic here, and this is one potential path forward.

ld.lld (and likely gold) doesn't add any default library path and it has been working fine.
GNU ld may add some paths like /usr/lib that's already been added by gcc/clang drivers.

FWIW, IIRC there was a discussion a couple years ago, where iirc NetBSD wanted ld.lld to use some default paths to make it a better drop-in replacement for the GNU linker.

MaskRay added a comment.EditedMay 23 2023, 1:21 PM

Adding default library paths to the linker seems odd. The convention is to add the code to Clang Driver.

The issue with lld-link is that in MSVC environments, the convention is to invoke link or lld-link directly without going through cl or clang-cl. That's what makes things problematic here, and this is one potential path forward.

ld.lld (and likely gold) doesn't add any default library path and it has been working fine.
GNU ld may add some paths like /usr/lib that's already been added by gcc/clang drivers.

FWIW, IIRC there was a discussion a couple years ago, where iirc NetBSD wanted ld.lld to use some default paths to make it a better drop-in replacement for the GNU linker.

Yes, and NetBSD's proposal was NAKed. Many people can feel that that kind of custom code doesn't belong to the upstream C++ code.
It doesn't scale if every ELF OS having different ideas about default library paths wants to add custom code to lld...
Actually, it's fairly easy to use a shell script wrapper instead of C++ code to do the same thing.

I know that Apple and Windows platforms are quite different and their linkers have some (unfortunate) designs that hard code quite a bit about the operating-system or environment specific information into linker options... I usually only comment when I find something too odd (usually about things with many approaches, but the proposer picked one which is the most convenient but probably not the cleanest in the long term..)

The alternative proposal we discussed at EuroLLVM would be to create a custom Clang tool akin to -cc1 or -cc1as, for example -link, which would accept the same options as lld-link (in addition to flags like --target) and invoke lld-link with the appropriate search paths. The users would then use clang -link in place of of lld-link. The advantage is that lld-link can remain agnostic to default library paths. The disadvantage is the extra complexity in the Clang driver.

rnk added a comment.May 23 2023, 2:00 PM

So, the way that MSVC handles this is with the LIB and LIBPATH environment variables, I forget which. The linker doesn't have to know the search path, target OS, or ISA. People don't like environment variables for various reasons ("how come the tool doesn't work? Oh right, you have to launch it from the VS developer prompt..." 💩 ), so some build systems clear out the environment and pass the library path explicitly on the command line. The library path could be computed by interrogating the compiler directly with one of the -print* family of options, but in Chrome, it is recomputed with gn logic, which is no good.

I wasn't involved in these EuroLLVM conversations and I strongly support the goal here: we need lld-link to find compiler-rt out of the box! We need to be able to find the PGO, builtins, and sanitizer libraries with minimal configuration from the user. But, it seems like we are going down the path of replicating some extremely complicated clang library search path logic. I think documenting a solution like clang-cl /clang:--print-search-paths might be a better way to go, but maybe other folks feel that is too much like Makefile goop and this should work out of the box.

llvm/lib/WindowsDriver/LLVMPaths.cpp
14 ↗(On Diff #524598)

Why angle includes?

Hi everyone, and thanks for your input. My goal with this patch is to let lld-link find the rest of the files from the toolchain without teaching every build system in the world extra logic. I have thought about the different ways to handle this, and this one seemed the least complicated, to be honest. If we wanted to directly re-use the clang driver logic with clang -link or similar - we would have to teach the build systems to call clang instead of lld-link, we would have to teach it to pass --target to that, or we would have to parse the machine information out of the object files in clang as well (non-starter I think).

While it would be better if we didn't need to add this logic to lld at all, I don't think we can change the paradigm that Microsoft has with its toolchain already. There is already precedent for this - if you look in the WindowsDriver library, we already probed for the visual studio installation and added those default search paths in lld. This expands that functionality to add (less complicated) logic to find the toolchain library paths. So I don't think this is way out of line.

There are other ways to do this: We could have lld execute clang in the same directory as itself to query the driver about the paths. I'm not sure I love that, but it would solve the problems of not duplicating logic or linking to things left and right.

The logic in lld could be much simpler as well. Instead of constructing a Triple (the worst part of this patch IMHO), we could probe for static directories x86_64/i386/aarch64-pc-windows-msvc/gnu. This won't be bulletproof but would solve 95% of the users' use-case.

I hope we can avoid duplicating the logic between Clang and LLD by extracting it into a library which is what we discussed and this patch is trying to do. We also had a discussion about extracting more of Clang's driver logic into a new top-level project so it can be shared between Clang and Flang, but that's a much bigger project that should be discussed separately (although if we decide to pursue it, we would move the WindowsDriver library into this new project).

Regarding the target handling, what's your opinion on introducing a new option (for example /target:) with the default value being the host target (that is the same behavior that's used by Clang)?

thieta added inline comments.May 24 2023, 2:07 AM
llvm/lib/WindowsDriver/LLVMPaths.cpp
14 ↗(On Diff #524598)

probably just clang automatic header insertion. I will fix that.

hans added a comment.May 24 2023, 2:29 AM

The target handling, which seems to be the tricky part, is only needed for LLVM_ENABLE_PER_TARGET_RUNTIME_DIR builds, right? And as far as I know that's not used on Windows, so maybe it's not needed?

I think I asked this at the meeting, but I forgot the answer already: is there a reason the compiler can't pass in the (relative) library paths as linker directives along with the library names in the .obj files?

The target handling, which seems to be the tricky part, is only needed for LLVM_ENABLE_PER_TARGET_RUNTIME_DIR builds, right? And as far as I know that's not used on Windows, so maybe it's not needed?

We use it in our toolchain - we had problems with supporting all different cross-compile scenarios unless we used this setting. And, afaik, this is what we want to default to in the future, so not supporting it would not be good enough.

I think I asked this at the meeting, but I forgot the answer already: is there a reason the compiler can't pass in the (relative) library paths as linker directives along with the library names in the .obj files?

I am not sure, I thought we only talked about not being possible to pass absolute paths because of reproducibility.

Regarding the target handling, what's your opinion on introducing a new option (for example /target:) with the default value being the host target (that is the same behavior that's used by Clang)?

Not a bad idea!

hans added a comment.May 24 2023, 6:53 AM

Regarding the target handling, what's your opinion on introducing a new option (for example /target:) with the default value being the host target (that is the same behavior that's used by Clang)?

Not a bad idea!

But then we're back to requiring the user to pass flags to the linker (even if it's a nicer flag).

I think we should aim for this to work out of the box:

C:\src\temp>type a.cc
int main(int argc, char **argv) {
        __int128 a = 123;
        __int128 b = 1;
        return a / b;
}

C:\src\temp>clang-cl /c a.cc

C:\src\temp>lld-link a.obj
lld-link: error: undefined symbol: __divti3
>>> referenced by a.obj:(main)

For the non-per-target-runtime case it seems relatively easy: if the compiler would add clang_rt.builtins-x86_64.lib as a dependent lib, and the linker would implicitly search the compiler's library path, it would work:

C:\src\temp>clang-cl /c a.cc -Xclang --dependent-lib=clang_rt.builtins-x86_64.lib && lld-link a.obj "/libpath:C:\Program Files\llvm\lib\clang\16\lib\windows"
(success)

for the per-target-runtime case, how about if the compiler included the target-dependent part of the dir in the dependent lib name, something like --dependent-lib=x86_64-pc-win32/clang_rt.builtins.lib? That wouldn't currently work, because lld-link expects the dependent lib to be a filename only.

But MSVC does allow relative paths in the dependent lib names, so lld-link should too:

C:\src\temp>clang-cl /c a.cc -Xclang --dependent-lib=windows\clang_rt.builtins-x86_64.lib && link a.obj "/libpath:C:\Program Files\llvm\lib\clang\16\lib"
Microsoft (R) Incremental Linker Version 14.29.30145.0
Copyright (C) Microsoft Corporation.  All rights reserved.

(success)

So how about we:

  • Make lld-link implicitly add the compiler's lib dir to its library search path (this patch)
  • Make clang pass the path to the builtins library, relative to that lib dir, as a --dependent-lib (and other libraries as needed)
  • Make lld-link allow dependent libs to have relative paths

(The ideal would be to only add these dependent libraries when we actually need them, otherwise it breaks linking with link.exe. Or maybe we could do this only with -fuse-ld=lld?)

hans added a comment.May 24 2023, 7:14 AM

I think I asked this at the meeting, but I forgot the answer already: is there a reason the compiler can't pass in the (relative) library paths as linker directives along with the library names in the .obj files?

I am not sure, I thought we only talked about not being possible to pass absolute paths because of reproducibility.

(Answering my own question: I guess a relative path doesn't work since it would have to be relative to the CWD when invoking the linker, and we don't know that. We could however invent something like rpath's $origin so the compiler could pass a directive like /libpath:$origin\..\lib\clang\16\lib\windows. But I think having the linker find that dir by itself is better.)

rnk added a comment.May 24 2023, 8:59 AM

While it would be better if we didn't need to add this logic to lld at all, I don't think we can change the paradigm that Microsoft has with its toolchain already. There is already precedent for this - if you look in the WindowsDriver library, we already probed for the visual studio installation and added those default search paths in lld. This expands that functionality to add (less complicated) logic to find the toolchain library paths. So I don't think this is way out of line.

So, I agree that if we have already gone down the path of discovering the location of the MSVC installation and recomputing the architecture and version specific paths to libraries, I don't have a principled reason for not doing the same thing for Clang.

It will, however, be brittle. Users customize their triple spellings. It is not at all straightforward. Perhaps we can live with that.

I think we should aim for this to work out of the box:

Yep! Agreed.

So how about we:

  • Make lld-link implicitly add the compiler's lib dir to its library search path (this patch)
  • Make clang pass the path to the builtins library, relative to that lib dir, as a --dependent-lib (and other libraries as needed)
  • Make lld-link allow dependent libs to have relative paths

Sounds like a good plan to me. I think the code changes will be straightforward at this point, at least in LLD.

(The ideal would be to only add these dependent libraries when we actually need them, otherwise it breaks linking with link.exe. Or maybe we could do this only with -fuse-ld=lld?)

Yeah, this is going to be a bit tricky, I guess. In that case, we need to expose that flag in clang-cl as well - it's not perfect, but I don't have a better idea at this point.

I'll start by changing this patch to include the base dir to the search paths.

thieta updated this revision to Diff 526040.May 26 2023, 6:16 AM
thieta retitled this revision from [lld] Find resource and lib dir to [LLD][COFF] Add LLVM toolchain library paths by default..
thieta edited the summary of this revision. (Show Details)

Implemented much simpler method

I implemented a much nicer and easier method to just find the following three paths:

<root>/lib
<root>/lib/clang/XX/lib
<root>/lib/clang/XX/lib/windows

This made the following work:

bin\clang-cl.exe /c test.cpp -fsanitize=address -Xclang --dependent-lib=x86_64-pc-windows-msvc\clang_rt.asan.lib
bin\lld-link.exe test.obj

It also worked with the non-per-target directories with another library name.

I think this feels much cleaner and without so much other code being dragged into lld.

lld/COFF/Driver.cpp
487

In order to get relative paths to work, I had to change this line. I am NOT sure what the original intent of this block was. It will make any relative path only return the filename. And none of the tests seems to cover this case, everything works after my change.

But if someone have a bit more context on this - I'll be happy to understand what's going on in this block.

rnk added inline comments.Jun 7 2023, 12:40 PM
lld/COFF/Driver.cpp
487

I think the old logic was based on observed MSVC link.exe behavior.

Can you check that the new strategy is actually compatible with MSVC link.exe? As in, if we change clang to embed relative paths, does that actually work with MSVC link.exe with -libpath:${resource}/lib?

If link.exe does support relative paths in the embedded directives, lots of this plan should be quite good to go.

But for the cases with e.g. libc++ header pragmas, if we want to keep them working as before, we're in a trickier situation. As far as I can see that, we'd need to split it up into three parts:

  1. -libpath:${toolchain}/lib - either explicitly on the command line or implied by the lld/WindowsDriver logic
  2. -relative-libpath:${triple} - essentially a new directive or similar, as an embedded directive in the object files, generated by Clang. I presume link.exe doesn't have anything like that...
  3. -defaultlib:libc++.lib - produced by the libc++ header pragmas

Or should Clang, given pragmas, try to resolve them and see if they need a relative path, so with #pragma comment(lib, "c++.lib"), it checks for c++.lib in ${toolchain}/lib/${triple}, and if found there, embedding -defaultlib:{triple}/c++.lib?

thieta updated this revision to Diff 539898.Jul 13 2023, 1:57 AM
thieta edited the summary of this revision. (Show Details)

Added tests and rebased

thieta marked an inline comment as done.Jul 13 2023, 1:59 AM

This seems to work fine now and have tests. Next up would be to add some dependent-lib statements in various places. Please review this and let me know if there is something else that should be fixed here.

lld/COFF/Driver.cpp
487

Verified that it worked with link.exe as well.

thieta marked an inline comment as done.Jul 13 2023, 2:01 AM
thieta added inline comments.
lld/test/COFF/print-search-paths.s
18

I would like to check the full path here similar to the [[SYSROOT]] stuff, but I couldn't find a substitution that gives me the binary/build directory path. Does anyone have a hint or another idea how to test that?

thieta updated this revision to Diff 539925.Jul 13 2023, 3:30 AM
thieta edited the summary of this revision. (Show Details)

Switch to prepend paths

hans added a comment.Jul 13 2023, 5:10 AM

Nice!

lld/COFF/Driver.cpp
651

Instead of constructing a std::string, this could be libDir.str(). Same for the others below.

1574

nit: I'd suggest maybe skipping the blank line above, to keep this within the "Construct search path list" block.

lld/test/COFF/print-search-paths.s
3

I suppose the sort makes the output nicer, but doesn't it the test can't tell the difference between e.g. append or prepending the new paths?

(Also ending up with Library search paths: on the last line instead of the first looks funny.)

18

Is the lib/ dir (without the clang/) missing?

18

I don't have any good ideas for that, but you could at least use a regex to capture the "base path" for the first dir and then re-use that named regex for the next lines, to ensure that they're the same.

thieta updated this revision to Diff 540044.Jul 13 2023, 7:58 AM

C/R fixes

thieta marked 5 inline comments as done.Jul 13 2023, 7:59 AM
hans accepted this revision.Jul 13 2023, 9:45 AM

lgtm

This revision is now accepted and ready to land.Jul 13 2023, 9:45 AM
rnk accepted this revision.Jul 13 2023, 10:03 AM

lgtm

MaskRay added inline comments.Jul 13 2023, 10:07 AM
lld/COFF/Driver.h
87

Mostly ok with me, a couple minor comments and a small discussion item.

The change about tolerating relative paths where we previously only accepted filenames, is separate from the rest IMO, and it would be good to have a testcase for that too.

lld/COFF/Driver.cpp
487

Isn't this part of the patch in principle a separate functional change with a different subject?

The rest of the patch is about adding extra directories to the libpath, while this particular change is about allowing/handling relative paths in the way we previously handled only bare file names? And I don't see this functional change covered by the tests?

647

To clarify the context of this comment, maybe move it down to exactly before the searchPaths.insert()?

654

Curious - is the clang version always exactly identical to LLVM_VERSION_MAJOR, or can it change? E.g. previously, we used to have the full version number here, 16.0.0 etc, while it now is only the major number. If nothing else, it would be good to have a comment at the other code site where that is decided, to remind people to update these two code sites in sync.

2079

Nit: stray change

lld/test/COFF/print-search-paths.s
17

Nit: Stray change

thieta updated this revision to Diff 540301.Jul 14 2023, 12:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2023, 12:48 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
thieta updated this revision to Diff 540302.Jul 14 2023, 12:49 AM
thieta updated this revision to Diff 540304.Jul 14 2023, 12:51 AM
thieta marked 6 inline comments as done.Jul 14 2023, 12:53 AM
  • Split the relative patch to this diff https://reviews.llvm.org/D155268
  • Added some comments in both Clang and LLD
  • Fixed some code style.
  • Removed stray changes.
mstorsjo accepted this revision.Jul 14 2023, 2:57 AM

LGTM, thanks!

Maybe add a release note for this too? It's rather significant.

thieta updated this revision to Diff 540344.Jul 14 2023, 3:15 AM
This revision was landed with ongoing or failed builds.Jul 14 2023, 5:38 AM
This revision was automatically updated to reflect the committed changes.

Is it possible to override this behaviour ? We have an issue where we have our own FortranRuntimelib built with MT linkage and in the past I could just specify a path and -lFortranRuntime and all was well. Having just moved our machines to llvm17 this no longer works as it always overrides our libraries with the installed (MD linkage) versions.