Use the recently introduced OpenMPIRBuilder facility to transate OpenMP
workshare loops with reductions to LLVM IR calling OpenMP runtime. Most of the
heavy lifting is done at the OpenMPIRBuilder. When other OpenMP dialect
constructs grow support for reductions, the translation can be updated to
operate on, e.g., an operation interface for all reduction containers instead
of workshare loops specifically. Designing such a generic translation for the
single operation that currently supports reductions is premature since we don't
know how the reduction modeling itself will be generalized.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
185 | I'd add some text here about why the (to me unexpected) default behavior of forgetMapping that isn't recursive, is desired. | |
325 | Nit, doesn't => don't | |
mlir/test/Target/LLVMIR/openmp-reduction.mlir | ||
401 | Presumably handled elsewhere, but I would like to see a custom reducer, or alternatively ensure an XFAIL for a reduction without an equivalent atomic operation. For example if the reduction body were something like the following which picks the value which has the least value when squared. The actual reduction doesn't matter so long as it's something for which there isn't an atomic op (I know there's an atomic min, but believe that this case can't handled by it). ^bb1(%arg0: f32, %arg1: f32): %arg0Sq = llvm.fmul %arg0, %arg0 %arg1Sq = llvm.fmul %arg1, %arg1 %cmp = llvm.fcmp olt %arg0Sq, %arg1Sq %1 = llvm.select %cmp %arg0, %arg1 : f32 omp.yield (%1 : f32) } Also note that I'm not suggesting this be handled presently, just want to double check that this doesn't accidentally and incorrectly succeed if not yet fully implemented. |
Address review.
mlir/test/Target/LLVMIR/openmp-reduction.mlir | ||
---|---|---|
401 | OpenMP can handle non-atomic reductions fine, atomic is an optimization. The reduction declaration has an optional atomic region that indicates how to combine two values atomically. At no point the translation will try to turn something atomic or non-atomic, it merely passes on whatever is present in the declaration. If the atomic region is missing, OpenMPIRBuilder::createReductions will just emit IR that instructs the runtime to never go on the atomic path. There is an additional safeguard of having unreachable in the branch that would have been taken had the runtime decided to go on the atomic pass. This test checks exactly this, 5 lines above. |
Thanks, @ftynse for this patch. I have started going through this patch. Have a few questions.
mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h | ||
---|---|---|
176 | I don't see any changes in convertBlock to use convertOp. | |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
90 | Is this the correct successor block? Or will it be corrected later on? | |
111–119 | Should this be part of the verifier or a verification pass? | |
186 | Is that true for nested OpenMP regions? | |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
710 | Is making the LLVM IR ModuleTranslation aware of omp::ReductionDeclareOp the only way possible? | |
mlir/test/Target/LLVMIR/openmp-reduction.mlir | ||
67 | Nit spelling: reduciton -> reduction. Here and in a few other places as well. |
Address review.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
90 | It's not not a successor block but the block before which the created block is inserted in the IR (https://llvm.org/doxygen/classllvm_1_1BasicBlock.html#ace940beeee97c222f836fe0ac70f6cf5), there's no branch being created. | |
111–119 | It is verified in reduction regions where this is actually necessary, but I prefer to assert liberally here in case somebody later decides to use omp.yield differently from current uses | |
186 | Okay, since you both think recursion might be necessary (there are no cases currently), I made forgetMapping traverse nested regions. | |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
710 | I'm considering to drop this check entirely, but so far this looks like the simplest change. |
On a second thought, we can recycle the operation conversion flexibility to
support top-level operations that are not functions or globals. Functions and
globals need explicit support because of potential definition/reference cycles
in them that requires function signatures to be translated before globals and
separately from function bodies.
Looks OK. I have one question remaining to clear my understanding.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
616 | Nit: Available as ompBuilder from 582. | |
618 | Is this related to the nowait of the worksharing loop? | |
669–670 | The OpenMPIRBuilder already takes in OwningReductionGen and OwningAtomicReductionGen which inlines the reduction body based on whether it is non-atomic or atomic. Why is a separate non-atomic reduction required? |
Adress more review.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
669–670 | Because we also need to generate the body of the reduction in the loop: #pragma omp parallel for reduction(+:x) for (...) { x += 42; //< this needs to be in the IR executed by the loop, //< and we just `omp.reduce 42` in MLIR. } |
I don't see any changes in convertBlock to use convertOp.