This is an archive of the discontinued LLVM Phabricator instance.

[Clang][OpenMP] Avoid emitting a __kmpc_alloc_shared for implicit casts which do not have their address taken
Needs ReviewPublic

Authored by doru1004 on Apr 20 2023, 7:37 AM.

Details

Summary

This patch avoids emitting __kmpc_alloc_shared allocation calls for implicitly cast variables which are CK_ArrayToPointerDecay that are not having their address taken explicitly.

Note: if the condition should be refined instead of removed then I am looking for suggestions as to how to keep the check for CK_ArrayToPointerDecay but restrict its applicability with further conditions. It is not clear to me what those conditions could be hence the complete removal of the condition. So far none of the existing lit tests needed to be changed as a consuquence of this change and no LLVM/OpenMP tests have failed.

OpenMP-Opt is usually able to transform the __kmpc_alloc_shared calls emitted this way to allocas except in this case the size of the allocated local array (256) is preventing that from happening (limit is 128).

Diff Detail

Event Timeline

doru1004 created this revision.Apr 20 2023, 7:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 7:37 AM
doru1004 requested review of this revision.Apr 20 2023, 7:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 7:37 AM
ABataev added inline comments.Apr 20 2023, 7:45 AM
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
448

I think you need to check that the array is allocated in the parallel context, otherwise there might be a crash, if it is allocated in the target context and many threads would like to access it.

doru1004 added inline comments.Apr 20 2023, 8:06 AM
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
448

I believe that this is how this condition got here: the inability to check that particular aspect (since we are in a function and the target/parallel is not visible) so basically just emit it as kmpc_alloc_shared conservatively. The more I think about it the more I believe we should leave this as is and not change it. The solution might be to improve the optimization of these cases rather than the emission itself.

I don't understand the concerns. A array to ptr decay, by itself, does not capture the pointer such that multiple threads can access it. You need to check the uses of the decay and decide based on them. Thus, I think this is fine.