The implementation follows the patterns established by the lowering of other similar intrinsics.
In addition to the code for lowering, the DoTotalReduction template had to be fixed to correctly break when signaled to do so by the accumulator function.
Is this change needed for findloc?