There are two places in current deviceRTLs where it computes parallel level explicitly,
which is basically the functionality of __kmpc_parallel_level. Starting from
D105787, we plan to introduce a series of function call folding based on information
that can be deducted during compilation time. Computation of parallel level is
the next target. This patch makes steps for the optimization.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/libomptarget/deviceRTLs/common/src/libcall.cu | ||
---|---|---|
132 | Because of the legacy from libomp, __kmpc_parallel_level takes two arguments: location and gtid, which are completely ignored in the function definition. As a result, we simply pass nullptr and 0. |
kmpc_parallel_level is defined in parallel.cu as
EXTERN uint16_t __kmpc_parallel_level(kmp_Ident *loc, uint32_t global_tid) { PRINT0(LD_IO, "call to __kmpc_parallel_level\n"); return parallelLevel[GetWarpId()] & (OMP_ACTIVE_PARALLEL_LEVEL - 1); }
Similar to the last patch, we should drop the PRINT0 in that call before adding extra calls to it elsewhere.
The function interface is clang internal, let's change it to take void instead of a couple of ignored arguments. That will slightly simplify clang codegen and means the pass that looks for and replaces calls to it doesn't have to consider arguments (the latter may work out of the box)
That's probably not working because as far as I can tell, we share some code generation with host code, but for host code, the two arguments are not ignored.
I don't know what you mean by not ignored. the function is device only.
Drop the print and the arguments, also consider adding a new api for other parallel level lookups, we might even just reuse this one.
Oh, I misunderstood the function here. I thought it just inherits from what we have in libomp as it keeps the same name style starting with __kmpc.
Because of the legacy from libomp, __kmpc_parallel_level takes two arguments: location and gtid, which are completely ignored in the function definition. As a result, we simply pass nullptr and 0.