Added support for dynamic memory allocation for globalized variables in
case if execution of target regions in parallel is required.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I do not understand why we need the flag. As far as I can tell, it has to be on to support the standard described behavior, right? Why should we (allow to) turn it off?
It might use "slow" allocation functions, in general, since it may use malloc on the device side. Being disabled, it uses statically preallocated memory, which might be faster, if parallel target regions are not required.
Let me rephrase. Does the user needs to request the fast path or the user needs to request the slow but correct path? Only the former is acceptable IMHO.
By default, the universal, but slower option is enabled. If the user is sure that there is no parallel target regions in his code, he can compile with fno-openmp-cuda-parallel-target-regions to get better performance. I.e. fopenmp-cuda-parallel-target-regions is enabled by default (slow, but reliable).
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
5264 | The slow but reliable option is enabled by default here. |
The slow but reliable option is enabled by default here.