When input to intrinsic is uniform value, reduced value is
same as input whereas if input value is divergent we need
to iterate over all the active lane to perform the reduction.
The control flow for a loop has been set up, which
iterates over only active lanes to perform reduction.
Introduced WAVE_REDUCE_UMIN_PSEUDO_U32 and
WAVE_REDUCE_UMAX_PSEUDO_U32 Pseudos which
are lowered Post-ISel (in EmitInstrWithCustomInserter ).
Should have a mangled type. I also think it should have an immarg operand for the preferred lowering strategy to use. Also, wave_reduce?
Also umin would be a better choice for a first one, given that we want it for dynamic alloca handling