This is an archive of the discontinued LLVM Phabricator instance.

[openmp] Support building for armv7 Windows with mingw tools
ClosedPublic

Authored by mstorsjo on Nov 24 2022, 1:57 PM.

Details

Summary

This does things in the same way as
D137168 / a356782426f5bf54a00570e1f925345e5fda7b2e and
D101173 / 4fb0aaf03381473ec8af727edb4b5d59b64b0d60 did for aarch64.

This adds a C implementation of __kmp_invoke_microtask in the same
way as the fallback C implementation in z_Linux_util.cpp - however
this one handles up to 17 arguments, which makes
test/misc_bugs/many-microtask-args.c pass. (The version in
z_Linux_util.cpp only handles up to 15 arguments. And contrary to
aarch64, z_Linux_asm.S doesn't have any assembly for 32 bit arm.)

This passes all tests (with D137748 and D137772 applied).

Diff Detail

Event Timeline

mstorsjo created this revision.Nov 24 2022, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2022, 1:57 PM
mstorsjo requested review of this revision.Nov 24 2022, 1:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2022, 1:57 PM
Herald added a subscriber: sstefan1. · View Herald Transcript

(The version in z_Linux_util.cpp only handles up to 15 arguments. And contrary to aarch64, z_Linux_asm.S doesn't have any assembly for 32 bit arm.)

So this means that simply extending it to handle 17 arguments wouldn't help because it would want to use the assembly version of the other functions?

openmp/runtime/src/kmp_atomic.cpp
866

Is the ! correct here?

openmp/runtime/src/z_Windows_NT_util.cpp
81

Does X86 mean 32 bit here?

(The version in z_Linux_util.cpp only handles up to 15 arguments. And contrary to aarch64, z_Linux_asm.S doesn't have any assembly for 32 bit arm.)

So this means that simply extending it to handle 17 arguments wouldn't help because it would want to use the assembly version of the other functions?

I'm not quite sure I understand the question...

The assembly versions of __kmp_invoke_microtask can handle arbitrary numbers of arguments, since they can do the passing of variable numbers of arguments on the stack correctly. The C version of the function, in the MSVC/arm64 case, also handles it for any arbitrary number of arguments by doing a brittle trick with alloca, here: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/z_Windows_NT-586_util.cpp#L170-L182

default: {
  // p_argv[6] and onwards must be passed on the stack since 8 registers are
  // already used.
  size_t len = (argc - 6) * sizeof(void *);
  void *argbuf = alloca(len);
  memcpy(argbuf, &p_argv[6], len);
}
  [[fallthrough]];
case 6:
  (*pkfn)(&gtid, &tid, p_argv[0], p_argv[1], p_argv[2], p_argv[3], p_argv[4],
          p_argv[5]);
  break;
}

This brittle use of alloca and memcpy only works on MSVC - Clang optimizes it out altogether, hence D137827 where we used the assembly version of __kmp_invoke_microtask on Windows/aarch64 with Clang too.

For ARM (on existing handled platforms like Linux), it uses a fallback C version from z_Linux_util.cpp: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/z_Linux_util.cpp#L2443-L2532

This is hardcoded to handle only up to 15 arguments (and errors out on any higher number). Extending it to handle up to 17, or any number, like in this patch would be trivial. That would fix the https://github.com/llvm/llvm-project/blob/main/openmp/runtime/test/misc_bugs/many-microtask-args.c testcase for arm linux too. But that doesn't fix the root issue that it only handles up to a specific number of arguments. (The comment in https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/z_Linux_util.cpp#L2447-L2448 which claims that only one argument is needed is wrong - Clang currently doesn't bundle up arguments in a struct.)

I thought this patch would be the simplest way of getting this new build configuration to work (plus achieving the "all tests passing" status).

The proper solution would be to write an assembly version of __kmp_invoke_microtask for ARM too. I actually already did that and have that available locally, but I thought that might be harder to get reviewed/approved (I'm not sure if the runtime is meant to support older ARM architectures than the armv7 I'm used to writing assembly for, etc), so I went with this patch first. I could also reverse them, first post a patch to switch ARM Linux OpenMP over to an assembly version of the function, then layer this patch on top, to use that assembly function for Windows too, just like for aarch64.

openmp/runtime/src/kmp_atomic.cpp
866

Oh, oops, thanks for catching. Actually, this isn't an !, it's an UTF8 non-breaking space. My keyboard layout produces that for alt+space, which is fairly easy to type accidentally after using alt for the |.

openmp/runtime/src/z_Windows_NT_util.cpp
81

Yes, exactly. Ideally this should probably be specifically "#if BITNESS == 32` or something like that, maybe #ifndef _WIN64 or so. But I'm sticking with the existing code style for now.

mstorsjo updated this revision to Diff 477907.Nov 25 2022, 3:47 AM

Remove an accidental non-ascii character.

The proper solution would be to write an assembly version of __kmp_invoke_microtask for ARM too.

This looks fine on its own now that I understand the purpose.

(I'm not sure if the runtime is meant to support older ARM architectures than the armv7 I'm used to writing assembly for, etc)

I remember from removing armv2/3 support this file openmp/runtime/src/kmp_platform.h. Though that checks for defines all the way back to armv2 so it's almost no use, I highly doubt anyone builds this for v2. The original Raspberry Pi is Armv6, there is still a Debian build for v4 I think. I suspect we don't know what the minimum should be, but keeping the C version isn't a bad option in that case.

Maybe it makes sense to xfail the number of arguments test on arm Windows as well? So when it comes time to fix both are accounted for (if that time is soon we'd remember, but if it's in a year's time, we won't).

Maybe it makes sense to xfail the number of arguments test on arm Windows as well? So when it comes time to fix both are accounted for (if that time is soon we'd remember, but if it's in a year's time, we won't).

I guess that sounds reasonable too - I'll try to see if I can do that.

mstorsjo updated this revision to Diff 477949.Nov 25 2022, 6:47 AM

Reduced the C implementation of __kmp_invoke_microtask to only handle 15 arguments, exactly like the same implementation used for ARM on Linux in z_Linux_util.cpp.

This makes the mingw/armv7 build succeed as much as builds on ARM on Linux.

DavidSpickett accepted this revision.Nov 25 2022, 7:31 AM

LGTM, remember to update your commit message to reflect the 15 arguments thing.

This revision is now accepted and ready to land.Nov 25 2022, 7:31 AM