This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Add --vfs[overlay|replace] flags to change library search behavior
Needs ReviewPublic

Authored by abrachet on Apr 25 2022, 12:29 PM.

Details

Reviewers
MaskRay
phosek
Summary

This patch --vfsoverlay analogous to clang's -ivfsoverlay, but it only changes the behavior of library (and linker script) search resolution.

This can be useful for having libraries of different names on disk than what they might be searched for. The use-case I see here is --vfsreplace which will only search for libraries in the vfs, particularly useful for ensuring no other libraries are linked implicitly by a compiler driver.

Diff Detail

Event Timeline

abrachet created this revision.Apr 25 2022, 12:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 12:29 PM
abrachet requested review of this revision.Apr 25 2022, 12:29 PM

My gut feeling is that the options are poorly specified and I am unsure why we suddenly need new mechanisms like this in the linker.
Moreover, YAML usage is unnatural for an ELF linker.

I know you have stated "it is useful", but there is no information how this is used and why other alternatives are excluded and you end up deciding to add new options.
For example, an overlayfs can be used and that will transparently many other tools including binary utilities you may use.

I am also concerned with additional complexity in findFile.

My gut feeling is that the options are poorly specified and I am unsure why we suddenly need new mechanisms like this in the linker.

Poorly specified in the commit message/help text, or the behavior of llvm::vfs::RedirectingFileSystem and it's yaml format are poorly specified? Or perhaps that it just affects library searching?

Moreover, YAML usage is unnatural for an ELF linker.

Noted. I choose to go this direction because clang and some of its tools already support this. You could imagine using one overlay file for both clang and lld.

I know you have stated "it is useful", but there is no information how this is used and why other alternatives are excluded and you end up deciding to add new options.

It would be used by a build system to ensure the linker always correctly resolves references to the correct place. Particularly with --vfsreplace, you can ensure the linker is only resolving library locations you explicitly intended to be linked.

FWIW, we had also considered a flag like --library-path=lib=/path/to/liblib.a, with an accompanying flag to ensure all libraries are explicitly specified by --library-path. Does that kind of a solution sound better?

For example, an overlayfs can be used and that will transparently many other tools including binary utilities you may use.

I am also concerned with additional complexity in findFile.

The semantic complexity or code complexity?

Another use case besides the one already described by @abrachet is better support for case preserving and case insensitive file systems.

We're aware of cases where projects that were developed on Windows or macOS assume case preserving behavior and they would use #include <foo.h> and #include <Foo.h> as well as -lfoo and -lFoo to refer to the same file. This works on Windows and macOS, but breaks on Linux.

The solution that's commonly used in these situations is ciopfs. The problem with ciopfs is that mounting FUSE filesystem typically requires root permissions and is not a great fit for automated infrastructure. ciopfs also doesn't seem to be actively maintained anymore.

Using VFS overlay has been suggested as another alternative and it is the solution we would like to use, but it's currently only supported by Clang and not by LLD which this change addresses. Ideally, we would also introduce support for --vfsoverlay into other LLVM tools in the future if needed.

I don't have a strong opinion with regards to file format. I can imagine other file formats that might be better a fit, but I think that consistency is more important in this case as we would ideally use the same overlay specification with both Clang and LLD. If we can come up with a different format that we believe would be a better fit, and we could convince the existing Clang users to switch to it, then I'd be fine with that.

haowei added a subscriber: haowei.May 4 2022, 11:31 AM

I am working on cross compiling support for Fuchsia Clang toolchain targeting Windows and recently encountered an issue that would be the perfect use case for this case.

Error message:

➜  testproj ~/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/bin/clang-cl helloworld.cpp -o helloworld --target=x86_64-pc-windows-msvc -I../sdk/VC/Tools/MSVC/14.30.30705/include -I"../sdk/Windows Kit/10/Include/10.0.19041.0/ucrt" -D_HAS_EXCEPTIONS=0 -v /link /LIBPATH:../sdk/VC/Tools/MSVC/14.30.30705/lib/x64 /LIBPATH:"../sdk/Windows Kit/10/Lib/10.0.19041.0/um/x64"
Fuchsia clang version 15.0.0 (https://llvm.googlesource.com/a/llvm-project ec2de7490813a7593dad59f210c7ec41f1a29002)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: /usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/bin
 "/usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/bin/clang-15" -cc1 -triple x86_64-pc-windows-msvc19.20.0 -emit-obj -mrelax-all -mincremental-linker-compatible --mrelax-relocations -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name helloworld.cpp -mrelocation-model pic -pic-level 2 -mframe-pointer=none -relaxed-aliasing -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -mllvm -x86-asm-syntax=intel -tune-cpu generic -mllvm -treat-scalable-fixed-error-as-warning -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames -stack-protector 2 -fms-volatile -fdiagnostics-format msvc -v -fcoverage-compilation-dir=/mnt/nvme_sec/SRC/WinSDK/testproj -resource-dir /usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/lib/clang/15.0.0 -I ../sdk/VC/Tools/MSVC/14.30.30705/include -I "../sdk/Windows Kit/10/Include/10.0.19041.0/ucrt" -D _HAS_EXCEPTIONS=0 -internal-isystem /usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/lib/clang/15.0.0/include -fdeprecated-macro -fdebug-compilation-dir=/mnt/nvme_sec/SRC/WinSDK/testproj -ferror-limit 19 -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.20 -std=c++14 -fdelayed-template-parsing -fcolor-diagnostics -faddrsig -o /tmp/helloworld-0be692.obj -x c++ helloworld.cpp
clang -cc1 version 15.0.0 based upon LLVM 15.0.0git default target x86_64-unknown-linux-gnu
#include "..." search starts here:
#include <...> search starts here:
 ../sdk/VC/Tools/MSVC/14.30.30705/include
 ../sdk/Windows Kit/10/Include/10.0.19041.0/ucrt
 /usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/lib/clang/15.0.0/include
End of search list.
 "/usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/bin/lld-link" -out:helloworld.exe -libpath:lib/amd64 -libpath:atlmfc/lib/amd64 -libpath:/usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/lib/clang/15.0.0/lib/x86_64-pc-windows-msvc -nologo /LIBPATH:../sdk/VC/Tools/MSVC/14.30.30705/lib/x64 "/LIBPATH:../sdk/Windows Kit/10/Lib/10.0.19041.0/um/x64" /usr/local/google/home/haowei/SRC/testbed/newTQ/prebuilt/third_party/clang/linux-x64/lib/clang/15.0.0/lib/x86_64-pc-windows-msvc/clang_rt.builtins.lib /tmp/helloworld-0be692.obj
lld-link: error: could not open 'kernel32.lib': No such file or directory
lld-link: error: could not open 'libucrt.lib': No such file or directory
lld-link: error: could not open 'uuid.lib': No such file or directory
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)

lld-link complaining kernel32.lib cannot be found. But in reality, these libraries are all in the search path:

➜  x64 pwd
/mnt/nvme_sec/SRC/WinSDK/sdk/Windows Kit/10/Lib/10.0.19041.0/um/x64
➜  x64 ls |grep kernel
kernel32legacylib.lib
kernel32.Lib
➜  x64

The reason that lld-link works fine on Windows but failed when I tried to cross compile for Windows under Linux is that even though NTFS is a case sensitive filesystem, Windows ignores the filesystem cases and kernel32.lib will be treated as the same as kernel32.Lib. On Linux, it is not. With vfsoverlay, we can ship a map file for known case errors to mitigate this issue for lld and it will be the least intrusive solution. Alternatively, we could use filesystems that ignores cases under Linux, however, that requires root access and it is not feasible for LuCI builders and most developers plus IO performance penalties.

Please consider accepting this patch. I believe we won't be the only ones that encounter case issues when doing cross compilation for Windows and this feature will for sure solve this issue.

We're aware of cases where projects that were developed on Windows or macOS assume case preserving behavior and they would use #include <foo.h> and #include <Foo.h> as well as -lfoo and -lFoo to refer to the same file. This works on Windows and macOS, but breaks on Linux.

The solution is to fix the buggy build script with case mismatching issues, instead of offloading the burden to the involved tools (ld.lld).

The reason that lld-link works fine on Windows but failed when I tried to cross compile for Windows under Linux is that even though NTFS is a case sensitive filesystem,

This seems to me another case that you need to fix the sysroot environment instead of adding file mapping logic to tools.
Currently you need clang and lld (lld-link or ld.lld), if you later need more tools, are you going to add the logic to all the involved tools?


I'll take some days off so if keep arguing, it'll probably take longer for my response.
I'd also note that I have asked some other folks and objection seems much more than support.

haowei added a comment.May 4 2022, 7:56 PM

This seems to me another case that you need to fix the sysroot environment instead of adding file mapping logic to tools.
Currently you need clang and lld (lld-link or ld.lld), if you later need more tools, are you going to add the logic to all the involved tools?

I dig into this issue a bit and it looks like the "kernel32.lib" is specified using compiler pragmas (using dependent library feature), see the grep results below.

➜  sdk grep -i "kernel32.lib" -rI ./
./VC/Tools/MSVC/14.30.30705/crt/src/vcruntime/mstartup.cpp:#pragma comment(linker, "/defaultlib:kernel32.lib")
./VC/Tools/MSVC/14.30.30705/crt/src/vcruntime/initializers.cpp:    // Link with the legacy kernel32.lib for the "normal" libraries
./VC/Tools/MSVC/14.30.30705/crt/src/vcruntime/initializers.cpp:    #pragma comment(linker, "/defaultlib:kernel32.lib")
./VC/Tools/MSVC/14.30.30705/atlmfc/include/atlbase.h:#pragma comment(lib, "kernel32.lib")
./Windows Kit/10/Include/10.0.19041.0/um/oobenotification.h://    and link against the published kernel32.lib.
./Windows Kit/10/Debuggers/x86/sdk/samples/exdi/ExdiGdbSrvSample/ExdiGdbSrvSample/sources:      $(CLIENTCORE_EXTERNAL_SDK_LIB_PATH)\kernel32.lib \
./Windows Kit/10/Debuggers/x86/sdk/samples/exdi/ExdiGdbSrvSample/ExdiGdbSrvSample/ExdiGdbSrvSample.vcxproj:      <AdditionalDependencies>kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;xmllite.lib;%(AdditionalDependencies)</AdditionalDependencies>
./Windows Kit/10/Debuggers/x64/sdk/samples/exdi/ExdiGdbSrvSample/ExdiGdbSrvSample/sources:      $(CLIENTCORE_EXTERNAL_SDK_LIB_PATH)\kernel32.lib \
./Windows Kit/10/Debuggers/x64/sdk/samples/exdi/ExdiGdbSrvSample/ExdiGdbSrvSample/ExdiGdbSrvSample.vcxproj:      <AdditionalDependencies>kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;xmllite.lib;%(AdditionalDependencies)</AdditionalDependencies>

I agree with you that this is a sysroot issue and if it is possible, I would like to fix these cases instead of trying to mitigate it in the toolchain. However, these headers and libraries are coming from the vanilla Windows SDK and MSVCRT. They are not open source. There is no way we can "fix" them, only Microsoft can. As I explained, vfsoverlay is the least intrusive solution to this problem we found so far, the VirtualFileSystem class is part of LLVM and vfsoverlay is already supported by clang and clang-tidy. Given the close relationship between LLVM tools and the fact that they all rely on LLVM libraries I don't see why lld adopting it would be a problem.

if you later need more tools, are you going to add the logic to all the involved tools?

I doubt this is going to be needed since other tools don't have the same use cases, but if it is an LLVM tool, I don't see why it cannot be done. Very little changes will be needed as evident in this change which is also fairly small.

For example, an overlayfs can be used and that will transparently many other tools including binary utilities you may use.

If overlayfs can be done entirely in user space, I will agree it would be a good solution. Unfortunately it is not. mount command requires root permissions and in a lot of cases this access cannot be granted.