This revision introduces a pattern to lower gpu.subgroup_reduce op into to the nvvm.redux_sync op. The op must be run by the entire subgroup, otherwise it is undefined behaviour.
It also adds a flag and populate function, because the op is not avaiable for every gpu (sm80+), so it can be used when it is desired.
Depends on D142088
Instead of hard crashing should probably return optional and check it in pattern.