Page MenuHomePhabricator

[OpenMP][NVPTX] Avoid data sharing if in parallel region
AbandonedPublic

Authored by Hahnfeld on Oct 1 2018, 10:10 AM.

Details

Summary

Previously the generated code only checked if the kernel is executing
in SPMD mode. However a worker thread participating in a parallel
region will never execute serialize nested parallel directives and
doesn't need data sharing with other threads either.
Refactor the code from emitNonSPMDParallelCall into a helper function
to make clear that these two things use the same conditions.

Diff Detail

Event Timeline

Hahnfeld created this revision.Oct 1 2018, 10:10 AM

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

You are right, it's increasing register usage but I think it shouldn't: The generated code is always checking __kmpc_is_spmd_exec_mode first. So if LLVM would be able to optimize this out in SPMD mode, __kmpc_parallel_level should never be called.

I guess this doesn't work because it's illegal to hoist the load of execution_param across a barrier?

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

You are right, it's increasing register usage but I think it shouldn't: The generated code is always checking __kmpc_is_spmd_exec_mode first. So if LLVM would be able to optimize this out in SPMD mode, __kmpc_parallel_level should never be called.

I guess this doesn't work because it's illegal to hoist the load of execution_param across a barrier?

Even if we will be able to reduce register usage for SPMD, it still going to be high for non-SPMD constructs. The optimizer is not able to understand that it is in parallel region or not during the compilation phase

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

You are right, it's increasing register usage but I think it shouldn't: The generated code is always checking __kmpc_is_spmd_exec_mode first. So if LLVM would be able to optimize this out in SPMD mode, __kmpc_parallel_level should never be called.

I guess this doesn't work because it's illegal to hoist the load of execution_param across a barrier?

Even if we will be able to reduce register usage for SPMD, it still going to be high for non-SPMD constructs. The optimizer is not able to understand that it is in parallel region or not during the compilation phase

Instead we avoid the runtime overhead of data sharing. Plus we'll be able to drop around 2 thirds of the static data allocation in libomptarget-nvptx which leaves more room for the user's application... We'll see, for now I agree that the added registers for SPMD are unacceptable.

Hahnfeld abandoned this revision.Nov 16 2019, 9:44 AM
Herald added a project: Restricted Project. · View Herald Transcript