D113602 broke the custom state machine when a reduction is present, as
revealed by the reproducer this patch adds to the test suite. In that
case, openmp-opts changes the return value to undef in
__kmpc_get_warp_size (which the custom state machine calls as of
D113602). Later optimizations then optimize away the custom state
machine code as if all threads are outside the thread block, so the
target region does not execute. D114802 fixed that but didn't add a
reproducer.
This patch also adds a __OMP_RTL_ATTRS entry for
__kmpc_get_warp_size to OMPKinds.def, which D113602 missed. This
change does not seem to have any impact on the reduction problem.