When building for Windows aarch64, and not using the actual MSVC,
we can assemble gnu assembly files just fine, and the existing
correct implementation of __kmp_invoke_microtask is fully usable.
(This current patch only uses it when building in mingw mode, but
when building with clang-cl, we could also just as well build and
rely on the assembly file - but that would require a bit more
preprocessor and cmake checks to distinguish that case.)
The C implementation of __kmp_invoke_microtask in
z_Windows_NT-586_util.cpp relies on unguaranteed assumptions about
the compiler behaviour - it might work on MSVC, but doesn't necessarily
on other compilers. That function uses an alloca to pass parameters
on the stack to the called functions.
However, there's no guarantee that the buffer allocated by alloca is
exactly at the bottom of the stack when doing the call; the compiler
might have left space for extra things to save on the stack there.
Additionally, when compiled with Clang with optimization, Clang
optimizes out the alloca and memcpy entirely. On the C language
level, they don't have any visible effect outside of the function
and thus can be omitted entirely.
I guess we could make this AND (NOT MSVC OR CMAKE_C_COMPILER_ID STREQUAL "Clang") - so we'd use the assembly form with clang in MSVC mode too. Clang's optimizations break the current C implementation, but Clang should be able to handle the gas assembly even in msvc mode.
Then we'd make the condition in the source file KMP_ARCH_AARCH64 && defined(_MSC_VER) && !defined(__clang__).