D113602 broke the custom state machine when a reduction is present., as
For example, seerevealed by the test case in this patchreproducer this patch adds to the test suite. Somehow iIn that case,
case, openmp-opts decides tos changes the return value to undef in
`__kmpc_get_warp_size` (which the custom state machine calls as of
D113602). Later optimizations then optimize away the custom state
machine code as if all threads are outside the thread block, so the
target region does not execute.
Other runtime functions do not seem to have this problem, so I looked
for differences. I found that adding `registerFoldRuntimeCall` and
`foldKernelFnAttribute` calls for `__kmpc_get_warp_size` to OpenMPOpt
fixed the problem, so that's what this patch does. I do not yet
understand much of OpenMPOpt, and I am not confident in this solution,D114802 fixed that but didn't add a
so please advisereproducer.
This patch also adds a `__OMP_RTL_ATTRS` entry for
`__kmpc_get_warp_size` to OMPKinds.def, which D113602 missed. This
change does not seem to have any impact on the reduction problem.