This is an archive of the discontinued LLVM Phabricator instance.

[openmp] Use z_Linux_asm.S to provide __kmp_invoke_microtask with Clang for Windows/aarch64
ClosedPublic

Authored by mstorsjo on Nov 11 2022, 4:09 AM.

Details

Summary

When building for Windows aarch64, and not using the actual MSVC,
we can assemble gnu assembly files just fine, and the existing
correct implementation of __kmp_invoke_microtask is fully usable.

(This current patch only uses it when building in mingw mode, but
when building with clang-cl, we could also just as well build and
rely on the assembly file - but that would require a bit more
preprocessor and cmake checks to distinguish that case.)

The C implementation of __kmp_invoke_microtask in
z_Windows_NT-586_util.cpp relies on unguaranteed assumptions about
the compiler behaviour - it might work on MSVC, but doesn't necessarily
on other compilers. That function uses an alloca to pass parameters
on the stack to the called functions.

However, there's no guarantee that the buffer allocated by alloca is
exactly at the bottom of the stack when doing the call; the compiler
might have left space for extra things to save on the stack there.

Additionally, when compiled with Clang with optimization, Clang
optimizes out the alloca and memcpy entirely. On the C language
level, they don't have any visible effect outside of the function
and thus can be omitted entirely.

Diff Detail

Event Timeline

mstorsjo created this revision.Nov 11 2022, 4:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2022, 4:09 AM
mstorsjo requested review of this revision.Nov 11 2022, 4:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2022, 4:09 AM
Herald added a subscriber: sstefan1. · View Herald Transcript
mstorsjo added inline comments.Nov 14 2022, 7:08 AM
openmp/runtime/src/CMakeLists.txt
98

I guess we could make this AND (NOT MSVC OR CMAKE_C_COMPILER_ID STREQUAL "Clang") - so we'd use the assembly form with clang in MSVC mode too. Clang's optimizations break the current C implementation, but Clang should be able to handle the gas assembly even in msvc mode.

Then we'd make the condition in the source file KMP_ARCH_AARCH64 && defined(_MSC_VER) && !defined(__clang__).

mstorsjo updated this revision to Diff 475752.Nov 16 2022, 3:14 AM

Use the assembly implementation for Clang-cl builds for arm64 too (as the C implementation gets misoptimized by Clang but not by MSVC).

(Building openmp with clang-cl for arm64 requires one other unrelated patch, which I can send when the others I have outstanding have settled.)

mstorsjo retitled this revision from [openmp] Use z_Linux_asm.S to provide __kmp_invoke_microtask on mingw aarch64 to [openmp] Use z_Linux_asm.S to provide __kmp_invoke_microtask with Clang for Windows/aarch64.Nov 16 2022, 3:15 AM

I have zero idea about OpenMP, just one comment on the cmake.

openmp/runtime/src/CMakeLists.txt
101

The implicit else here is else use the .cpp file ? This would happen when you are on AArch64 and using MSVC.

Worth adding that as a comment?

I have zero idea about OpenMP, just one comment on the cmake.

Yeah, it seems hard to get someone to approve this... I presume the asm changes look straightforward/ok to you too? I'm wrapping the .comm directive since the alignment in that directive is interpreted differently between ELF and COFF.

openmp/runtime/src/CMakeLists.txt
101

Sure, it never hurts to be more explicit with things like this.

mstorsjo updated this revision to Diff 477748.Nov 24 2022, 4:43 AM

Clarified the aarch64+msvc case in the cmake file.

The C implementation of __kmp_invoke_microtask in z_Windows_NT-586_util.cpp relies on unguaranteed assumptions about the compiler behaviour

Meaning no reason to use the C version, that part is clear to me.

I'm wrapping the .comm directive since the alignment in that directive is interpreted differently between ELF and COFF.

I didn't know this was a thing. Can you cite some documentation for the .comm behaviour?

I assume this passes whatever OpenMP testing we have. I'd be ok approving it on that basis.

openmp/runtime/src/z_Linux_asm.S
153

I guess this is 2 byte function start alignment? I'd have guessed 4 so maybe I'm wrong.

I'm wrapping the .comm directive since the alignment in that directive is interpreted differently between ELF and COFF.

I didn't know this was a thing. Can you cite some documentation for the .comm behaviour?

https://sourceware.org/binutils/docs/as/Comm.html - which says:

When using ELF or (as a GNU extension) PE, the .comm directive takes an optional third argument. This is the desired alignment of the symbol, specified for ELF as a byte boundary (for example, an alignment of 16 means that the least significant 4 bits of the address should be zero), and for PE as a power of two (for example, an alignment of 5 means aligned to a 32-byte boundary).

I assume this passes whatever OpenMP testing we have. I'd be ok approving it on that basis.

Yes - this, together with D137748 and D137772, makes all tests pass in the mingw/aarch64 build configuration.

openmp/runtime/src/z_Linux_asm.S
153

No, the ALIGN macro takes a power of two - so this sets 4 byte alignment.

(The ALIGN macro is probably used because the .align directive is inconsistent across platforms - that's why there are the .p2align and .balign directives too. But changing the existing code to use them instead would be a different change.)

DavidSpickett accepted this revision.Nov 24 2022, 6:27 AM

LGTM

openmp/runtime/src/z_Linux_asm.S
153

Of course yes. Every time I have to read these things I get number of bytes mixed up with power of 2.

This revision is now accepted and ready to land.Nov 24 2022, 6:27 AM