Use a modeling similar to SCF ParallelOp to support arbitrary parallel
reductions. The two main differences are: (1) reductions are named and declared
beforehand similarly to functions using a special op that provides the neutral
element, the reduction code and optionally the atomic reduction code; (2)
reductions go through memory instead because this is closer to the OpenMP
semantics.
See https://llvm.discourse.group/t/rfc-openmp-reduction-support/3367.
We discussed this in the RFC, but will this dependency cause issues for any future non-llvm or out of tree lowerings?