Page MenuHomePhabricator

[LLDB] [test] Allow specifying the full arch for msvc/clang-cl in build.py
AbandonedPublic

Authored by mstorsjo on Oct 15 2019, 3:40 AM.

Details

Summary

This fixes running check-lldb on arm linux.

Previously the helper python scipt only passed -m32/-m64, but some tests assume it to be compiling for an x86 architecture, and the -m32/-m64 flags only pick the right bitness of the compiler's default target architecture.

Diff Detail

Event Timeline

mstorsjo created this revision.Oct 15 2019, 3:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2019, 3:40 AM

+stella, for the aspects of running msvc/clang in various environments..

Yeah, I raised this issue back when build.py was first introduced, but it did not really get resolved properly. (Back then, windows arm wasn't considered, so I guess the issue wasn't seen as pressing.) Right now, build.py has a sort of a split personality, where it generally builds stuff for the host, *except* if it uses the clang-cl driver, where it builds stuff for x86 windows, regardless of what happens to be the host platform.

I think the issue should be tackled more wholistically, but I am not really sure how, and I'm not sure I know about all the complexities that need to be considered here. Since you've looked at these tests lately, let me ask you some questions...

For the tests that "assume" that they are building for x86, how exactly do these assumptions of theirs manifest? I assume these are tests that currently run on non win-x86 platforms too, or else you wouldn't have picked them up. Is that right? Would it be possible to rewrite these tests to call clang-cl directly?

In my ideal world, I'd have tests which do not need to produce a "functional" executable (but rather always create the same kind of target, regardless of what the host is), invoke the right compiler directly -- since the target/compiler is fixed, they can pass any funky option they like without needing to worry about portability. That would leave build.py with the task of figuring out how to build a binary that will run correctly on the host (possibly with a different bitness).

However, I don't know if that is really possible. I know there are some complexities with configuring msvc/clang-cl, where one needs to fetch registry keys and whatnot in order to get the right build environment. *However*, given that these tests don't always run on the host, they cannot assume that the registry keys will be there anyway, and so avoiding their lookup might actually be good, as it will ensure the tests will run the same way on regardless of the host environment.

This revision is now accepted and ready to land.Oct 15 2019, 4:44 AM

Sorry, I didn't see Pavel's message. In this case I have no objections to the patch if other people also won't have.

labath requested changes to this revision.Oct 15 2019, 4:50 AM

Yea, let's discuss this a bit further. It's possible this may turn out to be the best solution to the problem, but I'd like to explore the various aspects of the problem first...

@aleksandr.urakov, since you probably know more about windows and these tests than I do, I'd be interested in hearing your thoughts too...

This revision now requires changes to proceed.Oct 15 2019, 4:50 AM

For the tests that "assume" that they are building for x86, how exactly do these assumptions of theirs manifest?

Two tests fail, SymbolFile/NativePDB/disassembly.cpp and SymbolFile/NativePDB/function-types-calling-conv.cpp.

disassembly.cpp fails at disassemble --flavor=intel since --flavor=intel doesn't make sense when disassembling an arm binary. But even if that would have succeeded, the test checks for specific disassembly in the output, so obviously this test assumes that the compiler defaults to x86_64 (so this test would fail on any 32 bit host platform as well).

function-types-calling-conv.cpp fails with the error "error in backend: target does not implement codeview register mapping". When running tests on an arm64 target, but building with -m32, it tries to build for arm-windows-msvc. This target is less feature complete than arm64, and e.g. codeview debug info isn't supported at all there. Furthermore, this test expects to inspect certain calling convention attributes, which only are valid on i386 (hence the -m32 in the test, but it should really be --arch i386 or similar).

There's other tests in this directory, which don't specify -m32, which on arm64 linux ends up building for arm64 windows (which is more complete than the arm32 windows target) and the tests do actually succeed, even though they probably do something else than what was intended. But if run on e.g. arm32 linux, many more of the tests (that don't specify any desired architecture) would end up with the less functional arm32 windows target.

I assume these are tests that currently run on non win-x86 platforms too, or else you wouldn't have picked them up. Is that right? Would it be possible to rewrite these tests to call clang-cl directly?

I guess it should be possible to rewrite them that way, yes.

In my ideal world, I'd have tests which do not need to produce a "functional" executable (but rather always create the same kind of target, regardless of what the host is), invoke the right compiler directly -- since the target/compiler is fixed, they can pass any funky option they like without needing to worry about portability. That would leave build.py with the task of figuring out how to build a binary that will run correctly on the host (possibly with a different bitness).

However, I don't know if that is really possible. I know there are some complexities with configuring msvc/clang-cl, where one needs to fetch registry keys and whatnot in order to get the right build environment. *However*, given that these tests don't always run on the host, they cannot assume that the registry keys will be there anyway, and so avoiding their lookup might actually be good, as it will ensure the tests will run the same way on regardless of the host environment.

At least these tests run with --nodefaultlib so they shouldn't actually need a full winsdk installation, and as they do run and succeed with this change, it should be possible to rewrite them to use clang-cl directly.

Yeah, I raised this issue back when build.py was first introduced, but it did not really get resolved properly. (Back then, windows arm wasn't considered, so I guess the issue wasn't seen as pressing.)

Well, windows arm doesn't really affect this aspect at all. But if windows arm didn't exist, then clang-cl would probably error out more directly when run on arm linux, unless given an explicit architecture option.

Right now, build.py has a sort of a split personality, where it generally builds stuff for the host, *except* if it uses the clang-cl driver, where it builds stuff for x86 windows, regardless of what happens to be the host platform.

Yes, that does seem like the root of the problem.

Yeah, I raised this issue back when build.py was first introduced, but it did not really get resolved properly. (Back then, windows arm wasn't considered, so I guess the issue wasn't seen as pressing.)

Well, windows arm doesn't really affect this aspect at all. But if windows arm didn't exist, then clang-cl would probably error out more directly when run on arm linux, unless given an explicit architecture option.

Ah, so clang-cl takes the host triple, and just sets the "os" field to "windows"? That's fun. :)

At least these tests run with --nodefaultlib so they shouldn't actually need a full winsdk installation, and as they do run and succeed with this change, it should be possible to rewrite them to use clang-cl directly.

Cool. Can we try that instead? (Maybe don't charge for it straight away, but wait to see if anyone comes up with a reason to not do that.). That would make it consistent with what we do for non-windows targets, where we invoke "clang --target=XXX" directly when we want a specific target hardcoded. If there's a substantial amount of repeated arguments involved, you can make a lit substitution to house them...

@aleksandr.urakov, since you probably know more about windows and these tests than I do, I'd be interested in hearing your thoughts too...

Sorry, I made some changes to the testing infrastructure only occasionally, so I can't say I'm deep into it. It looks like this patch fixes the problem (may be in a tricky way...) and doesn't make the things worse than they are, that's why I think that proceeding with it for now would be not a bad idea. But I have nothing against fixing the root cause :)

This change looks fine assuming all the tests continue to pass.

I think it makes sense to update the script to allow more arguments and/or invoke clang-cl directly instead of using the script. I suspect that using clang-cl directly will not work though - the script does a lot of the setup needed to run clang-cl correctly today (previously the environment for clang-cl was not setup correctly and the tests either didn't pass or passed for the wrong reasons, so using build.py has been a huge improvement).

I suspect that using clang-cl directly will not work though - the script does a lot of the setup needed to run clang-cl correctly today (previously the environment for clang-cl was not setup correctly and the tests either didn't pass or passed for the wrong reasons, so using build.py has been a huge improvement).

Can you elaborate on what kind of setup do you have in mind here? Bear in mind that here we are only talking about tests that do not use any system libraries or headers, and which already run fine on a linux system which does not have any windows-specific stuff installed. With those restrictions, I don't see what can be possibly gained from using build.py. The only thing I can see it possibly doing is to clear some environment variables which might otherwise be present and confuse clang-cl. However, that's something that can be easily done in lit.

I suspect that using clang-cl directly will not work though - the script does a lot of the setup needed to run clang-cl correctly today (previously the environment for clang-cl was not setup correctly and the tests either didn't pass or passed for the wrong reasons, so using build.py has been a huge improvement).

Can you elaborate on what kind of setup do you have in mind here? Bear in mind that here we are only talking about tests that do not use any system libraries or headers, and which already run fine on a linux system which does not have any windows-specific stuff installed. With those restrictions, I don't see what can be possibly gained from using build.py. The only thing I can see it possibly doing is to clear some environment variables which might otherwise be present and confuse clang-cl. However, that's something that can be easily done in lit.

The two things that come to mind are the path to clang-cl (which is sometimes a clang build and sometimes installed on the system as part of a VS installation or an LLVM installation) as well as the path to the linker when it is needed. This is most often an issue in the case of a VS install - I don't remember all the details any more, but I believe that before Zach added the script, we were often picking up the wrong clang-cl and ending up not being able to compile the tests at all.

I know there are some complexities with configuring msvc/clang-cl, where one needs to fetch registry keys and whatnot in order to get the right build environment.

My understanding is that the clang-cl driver does some poking around in the file system and registry to come up with defaults for the compatibility level, locations of the library and platform headers and such, but I believe that happens only if it needs to. If all those answers are in the command line, it shouldn't bother inspecting the system.

I agree with Pavel in principle, but I'm not sure that finding a better general solution in the long run should hold this workaround up in the short term.

The two things that come to mind are the path to clang-cl (which is sometimes a clang build and sometimes installed on the system as part of a VS installation or an LLVM installation) as well as the path to the linker when it is needed. This is most often an issue in the case of a VS install - I don't remember all the details any more, but I believe that before Zach added the script, we were often picking up the wrong clang-cl and ending up not being able to compile the tests at all.

Thanks.

Was this during a standalone lldb build? In a non-standalone build, lit should definitely prefer the just-built clang/lld (and if it doesn't, it should be fixed to do that). The situation is more complicated for a standalone build because the clang binary is sort of out of our control. But, in this case, I don't see how having build.py around can help, because the information about which clang to use has to come externally anyway...

My understanding is that the clang-cl driver does some poking around in the file system and registry to come up with defaults for the compatibility level, locations of the library and platform headers and such, but I believe that happens only if it needs to. If all those answers are in the command line, it shouldn't bother inspecting the system.

Right, but the tests we are talking about here (e.g. SymbolFile/NativePDB/disassembly.cpp) should not need any of that registry stuff, should they?

The problem I have with this "workaround" is that it's unlikely to ever get fixed if it goes in, and I'd like to clarify the role of build.py in all this. E.g., if we are convinced that build.py needs to be used for all clang-cl invocations, then a more generic --target argument might be in order, so that one can later also use it to build arm-windows binaries (as they will likely have a similar problem).

The two things that come to mind are the path to clang-cl (which is sometimes a clang build and sometimes installed on the system as part of a VS installation or an LLVM installation) as well as the path to the linker when it is needed. This is most often an issue in the case of a VS install - I don't remember all the details any more, but I believe that before Zach added the script, we were often picking up the wrong clang-cl and ending up not being able to compile the tests at all.

Thanks.

Was this during a standalone lldb build? In a non-standalone build, lit should definitely prefer the just-built clang/lld (and if it doesn't, it should be fixed to do that). The situation is more complicated for a standalone build because the clang binary is sort of out of our control. But, in this case, I don't see how having build.py around can help, because the information about which clang to use has to come externally anyway...

Both. Again, it's been a long time, but when we use VS for building, the environment already contains a path to clang-cl so regardless of whether the build is standalone or not, the build gets confused about which clang-cl to use. @mstorsjo might want to update the tests to use clang-cl but then make sure that the update works on the Windows Buildbot (or similar environment) before committing.

The two things that come to mind are the path to clang-cl (which is sometimes a clang build and sometimes installed on the system as part of a VS installation or an LLVM installation) as well as the path to the linker when it is needed. This is most often an issue in the case of a VS install - I don't remember all the details any more, but I believe that before Zach added the script, we were often picking up the wrong clang-cl and ending up not being able to compile the tests at all.

Thanks.

Was this during a standalone lldb build? In a non-standalone build, lit should definitely prefer the just-built clang/lld (and if it doesn't, it should be fixed to do that). The situation is more complicated for a standalone build because the clang binary is sort of out of our control. But, in this case, I don't see how having build.py around can help, because the information about which clang to use has to come externally anyway...

How do tests like test/Shell/Register/x86*.test work in such standalone builds then? They use lines like these:

# XFAIL: system-windows
# REQUIRES: native && target-x86_64
# RUN: %clangxx -fomit-frame-pointer %p/Inputs/x86-64-gp-read.cpp -o %t
# RUN: %lldb -b -s %s %t | FileCheck %s

(The XFAIL for system-windows, at least in this test, when I tried it out, seemed to relate to the fact that register read --all on windows didn't include all the 32 bit subregisters.)

If tests already can use such constrcuts, I don't see why we couldn't use # RUN: %clang_cl .., as lit sets up %clang_cl in the same way as %clangxx.

Both. Again, it's been a long time, but when we use VS for building, the environment already contains a path to clang-cl so regardless of whether the build is standalone or not, the build gets confused about which clang-cl to use. @mstorsjo might want to update the tests to use clang-cl but then make sure that the update works on the Windows Buildbot (or similar environment) before committing.

Is there any way of checking whether it works on this buildbot, other than committing (after doing best effort testing of it in whatever comparable windows environments I have available)?

Both. Again, it's been a long time, but when we use VS for building, the environment already contains a path to clang-cl so regardless of whether the build is standalone or not, the build gets confused about which clang-cl to use. @mstorsjo might want to update the tests to use clang-cl but then make sure that the update works on the Windows Buildbot (or similar environment) before committing.

Is there any way of checking whether it works on this buildbot, other than committing (after doing best effort testing of it in whatever comparable windows environments I have available)?

Not really. If you have a patch, I can run it for you on my windows environment (which should be almost identical to the Buildbot).

The two things that come to mind are the path to clang-cl (which is sometimes a clang build and sometimes installed on the system as part of a VS installation or an LLVM installation) as well as the path to the linker when it is needed. This is most often an issue in the case of a VS install - I don't remember all the details any more, but I believe that before Zach added the script, we were often picking up the wrong clang-cl and ending up not being able to compile the tests at all.

Thanks.

Was this during a standalone lldb build? In a non-standalone build, lit should definitely prefer the just-built clang/lld (and if it doesn't, it should be fixed to do that). The situation is more complicated for a standalone build because the clang binary is sort of out of our control. But, in this case, I don't see how having build.py around can help, because the information about which clang to use has to come externally anyway...

How do tests like test/Shell/Register/x86*.test work in such standalone builds then? They use lines like these:

# XFAIL: system-windows
# REQUIRES: native && target-x86_64
# RUN: %clangxx -fomit-frame-pointer %p/Inputs/x86-64-gp-read.cpp -o %t
# RUN: %lldb -b -s %s %t | FileCheck %s

(The XFAIL for system-windows, at least in this test, when I tried it out, seemed to relate to the fact that register read --all on windows didn't include all the 32 bit subregisters.)

If tests already can use such constrcuts, I don't see why we couldn't use # RUN: %clang_cl .., as lit sets up %clang_cl in the same way as %clangxx.

%clang_cl is where we started before we had build.py. It was over a year ago, so it's possible things have improved.

The two things that come to mind are the path to clang-cl (which is sometimes a clang build and sometimes installed on the system as part of a VS installation or an LLVM installation) as well as the path to the linker when it is needed. This is most often an issue in the case of a VS install - I don't remember all the details any more, but I believe that before Zach added the script, we were often picking up the wrong clang-cl and ending up not being able to compile the tests at all.

Thanks.

Was this during a standalone lldb build? In a non-standalone build, lit should definitely prefer the just-built clang/lld (and if it doesn't, it should be fixed to do that). The situation is more complicated for a standalone build because the clang binary is sort of out of our control. But, in this case, I don't see how having build.py around can help, because the information about which clang to use has to come externally anyway...

How do tests like test/Shell/Register/x86*.test work in such standalone builds then? They use lines like these:

# XFAIL: system-windows
# REQUIRES: native && target-x86_64
# RUN: %clangxx -fomit-frame-pointer %p/Inputs/x86-64-gp-read.cpp -o %t
# RUN: %lldb -b -s %s %t | FileCheck %s

(The XFAIL for system-windows, at least in this test, when I tried it out, seemed to relate to the fact that register read --all on windows didn't include all the 32 bit subregisters.)

If tests already can use such constrcuts, I don't see why we couldn't use # RUN: %clang_cl .., as lit sets up %clang_cl in the same way as %clangxx.

%clang_cl is where we started before we had build.py. It was over a year ago, so it's possible things have improved.

Possible, but I'm getting the feeling that there's cases that at least I'm overlooking, so I think it might be safer to just add a --target option to build.py. Or maybe make it accept an actual architecture name to --arch, in addition to 32/64/host? For non-clang-cl cases, it should only allow 32/64/host, but for clang-cl it could accept e.g. i386. How does that sound to you?

mstorsjo updated this revision to Diff 225171.Oct 16 2019, 1:40 AM
mstorsjo retitled this revision from [LLDB] [test] Pass --target=<arch>-windows-msvc to clang-cl to [LLDB] [test] Allow specifying the full arch for msvc/clang-cl in build.py.
mstorsjo edited the summary of this revision. (Show Details)

Extended the --arch option to build.py, to allow it to take a proper architecture name instead of just "32" and "64", when used with msvc or clang-cl.

Having proper "target" support is one way to go, though I am disappointed to see it go this way, particularly as nobody really knows why invoking clang-cl directly should not work, and so the main motivation seems to be a fear of potentially breaking something, without understanding what that "something" is, or whether it even exists. The reason I am disappointed is because this script is a fairly big departure from the way most lit tests in llvm work, and it wasn't ever really fully embraced by the lldb community (as can be seen by the amount of %clangxx tests that could be using it, but aren't). And while I do think it has some benefits, I don't think it should be used as a replacement for all clang-cl invocations. If it turns out that there having a script wrapping clang-cl is a necessity (which I doubt), then I'd rather see a different python script (possibly reusing parts of this one) which just sets up the necessary environment, but otherwise forwards arguments to clang-cl verbatim. That would enable one to invoke clang-cl with any fancy argument he wants (and this could be hidden under %clang-cl), while leaving this script with the job of figuring out how to build a working host executable with a random compiler.

Anyway, if we're going down this path, then I think it should be done properly:

  • having "host" mean "x64" is just wrong. It should be the "host"
  • "32" and "64" sound like they could be useful, so they can stay if they are needed, but they should mean "the 32-bit flavour of 'host'", not "i386".
  • anything else specifies an exact target, and it should probably be a full triple, not just the architecture
  • using an unsupported builder/target (gcc) should be an error
mstorsjo abandoned this revision.Sat, Nov 2, 3:00 PM

This is no longer necessary since D69031.