This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP]Dynamic globalization for parallel target regions.
ClosedPublic

Authored by ABataev on Jun 22 2020, 12:34 PM.

Details

Summary

Added support for dynamic memory allocation for globalized variables in
case if execution of target regions in parallel is required.

Diff Detail

Event Timeline

ABataev created this revision.Jun 22 2020, 12:34 PM

I do not understand why we need the flag. As far as I can tell, it has to be on to support the standard described behavior, right? Why should we (allow to) turn it off?

I do not understand why we need the flag. As far as I can tell, it has to be on to support the standard described behavior, right? Why should we (allow to) turn it off?

It might use "slow" allocation functions, in general, since it may use malloc on the device side. Being disabled, it uses statically preallocated memory, which might be faster, if parallel target regions are not required.

Let me rephrase. Does the user needs to request the fast path or the user needs to request the slow but correct path? Only the former is acceptable IMHO.

ABataev marked an inline comment as done.Jun 24 2020, 12:44 PM

Let me rephrase. Does the user needs to request the fast path or the user needs to request the slow but correct path? Only the former is acceptable IMHO.

By default, the universal, but slower option is enabled. If the user is sure that there is no parallel target regions in his code, he can compile with fno-openmp-cuda-parallel-target-regions to get better performance. I.e. fopenmp-cuda-parallel-target-regions is enabled by default (slow, but reliable).

clang/lib/Driver/ToolChains/Clang.cpp
5264

The slow but reliable option is enabled by default here.

jdoerfert accepted this revision.Jun 24 2020, 1:07 PM

LGTM. Thanks for the explanation.

This revision is now accepted and ready to land.Jun 24 2020, 1:07 PM
This revision was automatically updated to reflect the committed changes.