If the user did not provide any static clause to override the grid size,
we assume the default grid size as upper bound and use it to improve
code generation through vendor specific attributes.
Details
Details
Diff Detail
Diff Detail
Event Timeline
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | ||
---|---|---|
4133 | Can't we just get the module from the Function and check the triple? |
Comment Actions
It's a range. What really matters is the upper bound, that is the value you see in the test (max flat ...).
Comment Actions
This patch produces the following difference in IR out of CodeGen.
Without this patch:
%nvptx_num_threads.i = tail call i32 @__kmpc_get_hardware_num_threads_in_block() #2 call void @__kmpc_distribute_static_init_4(ptr addrspacecast (ptr addrspace(1) @2 to ptr), i32 %1, i32 91, ptr nonnull %.omp.is_last.ascast.i, ptr nonnull %.omp.comb.lb.ascast.i, ptr nonnull %.omp.comb.ub.ascast.i, ptr nonnull %.omp.stride.ascast.i, i32 1, i32 %nvptx_num_threads.i) #2
With this patch:
call void @__kmpc_distribute_static_init_4(ptr addrspacecast (ptr addrspace(1) @2 to ptr), i32 %1, i32 91, ptr nonnull %.omp.is_last.ascast.i, ptr nonnull %.omp.comb.lb.ascast.i, ptr nonnull %.omp.comb.ub.ascast.i, ptr nonnull %.omp.stride.ascast.i, i32 1, i32 256) #2
Setting the blocksize to a constant too early would be a problem if the runtime changes the blocksize, e.g. because of an environment variable or because of a low trip count (D152014). Comments? @jdoerfert
Comment Actions
From OpenMP-Opt:
case OMPRTL___kmpc_get_hardware_num_threads_in_block: Changed = Changed | foldKernelFnAttribute(A, "omp_target_thread_limit"); break;
this is wrong. We should fold thread limit, not num_threads_in_block.
The latter can only be folded if we will not lower it, which we currently cannot guarantee.
Can't we just get the module from the Function and check the triple?