This is an archive of the discontinued LLVM Phabricator instance.

[clang] [MinGW] Avoid adding <base>/include and <base>/lib when cross compiling
ClosedPublic

Authored by mstorsjo on Jan 7 2023, 2:38 PM.

Details

Summary

The MinGW compiler driver first tries to deduce the root of
the toolchain installation (either clang itself or a separate
cross mingw gcc installation). On top of this root, a number
of include and lib paths are added (some added unconditionally,
some only if they exist):

  • <base>/x86_64-w64-mingw32/include
  • <base>/include
  • <base>/include/x86_64-w64-windows-gnu

(Some more are also added for libstdc++ and/or libc++.)

The first one is the one commonly used for MinGW targets so
far. For LLVM runtimes installed with the
LLVM_ENABLE_PER_TARGET_RUNTIME_DIR option, the latter two are
used though (this is currently not the default, not yet at least).

For cross compiling, if base is a separate dedicated directory,
this is fine, but when using the sysroots of a distro-installed
cross mingw toolchain, base is /usr - and having /usr/include
in the include path for cross compilation is a potential
source for problems; see
https://github.com/llvm/llvm-project/issues/59871.

If not cross compiling though, <base>/include needs to be included
too. E.g. in the case of msys2, most headers are in e.g.
/mingw64/include while the compiler is /mingw64/bin/clang.

When cross compiling, if the sysroot has been explicitly set
by the user, keep <base>/include too. (In the case of a distro
provided cross gcc toolchain in /usr, the sysroot needs to be set
to /usr and not /usr/x86_64-w64-mingw32 though, to be able to find
libgcc files under /usr/lib/gcc/x86_64-w64-mingw32. So with such a
toolchain, setting the sysroot explicitly does retain the problem.)

All in all - this avoids adding /usr/include and /usr/lib to the
include/lib paths when doing mingw cross compilation with a
distro-provided sysroot in /usr/x86_64-w64-mingw32.

Diff Detail

Event Timeline

mstorsjo created this revision.Jan 7 2023, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2023, 2:38 PM
Herald added a subscriber: pengfei. · View Herald Transcript
mstorsjo requested review of this revision.Jan 7 2023, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2023, 2:38 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

Also, this patch is lacking tests for now - mostly bringing it up for discussion first - I can add tests if others agree that it seems like a reasonable path forward.

I had thought we do that already so this change looks reasonable for me.
Just one thought, do we support multilib toolchains? I think in that case Clang would have to add <base>/include despite different triple.

I had thought we do that already so this change looks reasonable for me.
Just one thought, do we support multilib toolchains? I think in that case Clang would have to add <base>/include despite different triple.

This patch actually does that already, see the parameter RequireArchMatch which is set to true for the <base>/lib case and false for <base>/include.

I had thought we do that already so this change looks reasonable for me.
Just one thought, do we support multilib toolchains? I think in that case Clang would have to add <base>/include despite different triple.

This patch actually does that already, see the parameter RequireArchMatch which is set to true for the <base>/lib case and false for <base>/include.

... but thanks for considering and bringing up that case, I hadn't really thought much about that - I don't see multilib setups that often actually. Not sure if and/or how well they work with clang.

I had thought we do that already so this change looks reasonable for me.
Just one thought, do we support multilib toolchains? I think in that case Clang would have to add <base>/include despite different triple.

This patch actually does that already, see the parameter RequireArchMatch which is set to true for the <base>/lib case and false for <base>/include.

Ah, I have messed up those 2.
I haven't seen multilib mingw-w64 distribution either but just wanted to be cautious here to not accidentally regress if there is somebody doing that.
Then I cannot think of any issues this patch could cause.

BTW Looking at Arch Linux 32-bit libs are placed in <some_lib_dir>/32 but this is off-topic.

The idea sounds reasonable. I don't know mingw-w64 toolchains well enough, but I'll try:

How does it interact with the following conditions (lines 469-475)? They look like they may be looking for a mingw-w64 sysroot, which may be ignored by the new check.

else if (llvm::ErrorOr<std::string> TargetSubdir = findClangRelativeSysroot(
             getDriver(), LiteralTriple, getTriple(), SubdirName))
  Base = std::string(llvm::sys::path::parent_path(TargetSubdir.get()));
else if (llvm::ErrorOr<std::string> GPPName =
             findGcc(LiteralTriple, getTriple()))
  Base = std::string(llvm::sys::path::parent_path(
      llvm::sys::path::parent_path(GPPName.get())));

The idea sounds reasonable. I don't know mingw-w64 toolchains well enough, but I'll try:

How does it interact with the following conditions (lines 469-475)? They look like they may be looking for a mingw-w64 sysroot, which may be ignored by the new check.

else if (llvm::ErrorOr<std::string> TargetSubdir = findClangRelativeSysroot(
             getDriver(), LiteralTriple, getTriple(), SubdirName))
  Base = std::string(llvm::sys::path::parent_path(TargetSubdir.get()));
else if (llvm::ErrorOr<std::string> GPPName =
             findGcc(LiteralTriple, getTriple()))
  Base = std::string(llvm::sys::path::parent_path(
      llvm::sys::path::parent_path(GPPName.get())));

Thanks for having a look!

Those cases are taken into account here; for the case of findClangRelativeSysroot, if clang is executing in e.g. <somewhere>/llvm-mingw/bin/clang and it finds <somewhere>/llvm-mingw/x86_64-w64-mingw32, then it doesn't pick that as <base>, but it picks <somewhere>/llvm-mingw, and gets the x86_64-w64-mingw32 subdir prefix included via the SubdirName variable.

Likewise, if findClangRelativeSysroot failed, but findGcc found /usr/bin/x86_64-w64-mingw32-gcc in $PATH, then it also picks /usr as <base> (which is needed for finding libgcc things under <base>/lib/gcc).

So for these reasons, the search can't stay entirely within <base>/<target triple> - and we may want to skip <base>/include and <base>/lib when we believe that's meant for a different target.

mstorsjo updated this revision to Diff 488182.Jan 11 2023, 5:55 AM

Updated with some testcases. This does test that the include directory is omitted when cross compiling, but those kinds of tests, which set up a simulated toolchain environment with symlinks, don't run on actual Windows - so it's not quite as easy to test the case that we actually do keep the unprefixed include directory on Windows. As this matches the current testing coverage status quo, I hope this is ok...

alvinhochun added inline comments.Jan 12 2023, 5:09 AM
clang/test/Driver/mingw-sysroot.cpp
25–46

I think this only verifies that the include path is not present after the last CHECK_TESTROOT_GCC match. The --implicit-check-not option should be used to check that the pattern is not present anywhere in the output.

mstorsjo added inline comments.Jan 12 2023, 6:07 AM
clang/test/Driver/mingw-sysroot.cpp
25–46

Right - that's true (although this is the spot where the entry did appear before - I checked that the test did fail before applying the functional change).

I tried adding --implicit-check-not here, but the quoting/escaping of the regexes seems tricky - even getting it to recognize the literal quotes surrounding the string.

I tried with --implicit-check-not="\"{{.*}}/testroot-gcc{{/|\\\\}}x86_64-w64-mingw32{{/|\\\\}}include\"", but that's not working (i.e. it's not detecting anything even when I remove the functional change of the patch). With a trivial reduction of it, into --implicit-check-not="testroot-gcc/x86_64-w64-mingw32/include", it's detecting things as expected (but that quote gets handled by the shell, so this would match any substring, such as .../include/subdir too. But even --implicit-check-not="testroot-gcc/x86_64-w64-mingw32/include\"" does not match correctly.

The documentation says that the --implicit-check-not option does take a pattern, but it doesn't really say what's allowed in that pattern - apparently not the full syntax used in normal CHECK lines at least...

Updated with some testcases. This does test that the include directory is omitted when cross compiling, but those kinds of tests, which set up a simulated toolchain environment with symlinks, don't run on actual Windows - so it's not quite as easy to test the case that we actually do keep the unprefixed include directory on Windows. As this matches the current testing coverage status quo, I hope this is ok...

Personally I don't mind that, it's unlikely to ever get 100% coverage because of known limitations.

alvinhochun added inline comments.Jan 12 2023, 10:14 AM
clang/test/Driver/mingw-sysroot.cpp
25–46

Maybe it's a problem with escaping those backslashes? I think the command line is interpreted by lit which has its own escaping rules.

mstorsjo added inline comments.Jan 12 2023, 1:22 PM
clang/test/Driver/mingw-sysroot.cpp
25–46

Ah, sorry for the noise, I just messed up with a typo - I got it working now.

mstorsjo updated this revision to Diff 488752.Jan 12 2023, 1:24 PM

Switch the negative test to --implicit-check-not.

I cannot speak much about LLVM tests but the code looks good for me.

mati865 accepted this revision.Jan 13 2023, 11:25 AM
This revision is now accepted and ready to land.Jan 13 2023, 11:25 AM
alvinhochun accepted this revision.Jan 14 2023, 4:40 AM

Also lgtm