Page MenuHomePhabricator

[mlir] support translating OpenMP loops with reductions
ClosedPublic

Authored by ftynse on Aug 3 2021, 5:10 AM.

Details

Summary

Use the recently introduced OpenMPIRBuilder facility to transate OpenMP
workshare loops with reductions to LLVM IR calling OpenMP runtime. Most of the
heavy lifting is done at the OpenMPIRBuilder. When other OpenMP dialect
constructs grow support for reductions, the translation can be updated to
operate on, e.g., an operation interface for all reduction containers instead
of workshare loops specifically. Designing such a generic translation for the
single operation that currently supports reductions is premature since we don't
know how the reduction modeling itself will be generalized.

Diff Detail

Event Timeline

ftynse created this revision.Aug 3 2021, 5:10 AM
ftynse requested review of this revision.Aug 3 2021, 5:10 AM
wsmoses added inline comments.Aug 3 2021, 1:28 PM
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
185

I'd add some text here about why the (to me unexpected) default behavior of forgetMapping that isn't recursive, is desired.

324

Nit, doesn't => don't

mlir/test/Target/LLVMIR/openmp-reduction.mlir
401

Presumably handled elsewhere, but I would like to see a custom reducer, or alternatively ensure an XFAIL for a reduction without an equivalent atomic operation. For example if the reduction body were something like the following which picks the value which has the least value when squared. The actual reduction doesn't matter so long as it's something for which there isn't an atomic op (I know there's an atomic min, but believe that this case can't handled by it).

^bb1(%arg0: f32, %arg1: f32):
  %arg0Sq = llvm.fmul %arg0, %arg0
  %arg1Sq = llvm.fmul %arg1, %arg1
  %cmp = llvm.fcmp olt %arg0Sq, %arg1Sq
  %1 = llvm.select %cmp %arg0, %arg1 : f32
  omp.yield (%1 : f32)
}

Also note that I'm not suggesting this be handled presently, just want to double check that this doesn't accidentally and incorrectly succeed if not yet fully implemented.

ftynse updated this revision to Diff 364421.Aug 5 2021, 4:49 AM
ftynse marked 3 inline comments as done.

Address review.

mlir/test/Target/LLVMIR/openmp-reduction.mlir
401

OpenMP can handle non-atomic reductions fine, atomic is an optimization. The reduction declaration has an optional atomic region that indicates how to combine two values atomically. At no point the translation will try to turn something atomic or non-atomic, it merely passes on whatever is present in the declaration. If the atomic region is missing, OpenMPIRBuilder::createReductions will just emit IR that instructs the runtime to never go on the atomic path. There is an additional safeguard of having unreachable in the branch that would have been taken had the runtime decided to go on the atomic pass. This test checks exactly this, 5 lines above.

Thanks, @ftynse for this patch. I have started going through this patch. Have a few questions.

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h
184

I don't see any changes in convertBlock to use convertOp.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
90

Is this the correct successor block? Or will it be corrected later on?

111–119

Should this be part of the verifier or a verification pass?

186

Is that true for nested OpenMP regions?

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
714

Is making the LLVM IR ModuleTranslation aware of omp::ReductionDeclareOp the only way possible?

mlir/test/Target/LLVMIR/openmp-reduction.mlir
68

Nit spelling: reduciton -> reduction. Here and in a few other places as well.

ftynse updated this revision to Diff 368345.Aug 24 2021, 7:48 AM
ftynse marked 6 inline comments as done.

Address review.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
90

It's not not a successor block but the block before which the created block is inserted in the IR (https://llvm.org/doxygen/classllvm_1_1BasicBlock.html#ace940beeee97c222f836fe0ac70f6cf5), there's no branch being created.

111–119

It is verified in reduction regions where this is actually necessary, but I prefer to assert liberally here in case somebody later decides to use omp.yield differently from current uses

186

Okay, since you both think recursion might be necessary (there are no cases currently), I made forgetMapping traverse nested regions.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
714

I'm considering to drop this check entirely, but so far this looks like the simplest change.

ftynse updated this revision to Diff 368362.Aug 24 2021, 8:56 AM

On a second thought, we can recycle the operation conversion flexibility to
support top-level operations that are not functions or globals. Functions and
globals need explicit support because of potential definition/reference cycles
in them that requires function signatures to be translated before globals and
separately from function bodies.

ftynse updated this revision to Diff 368368.Aug 24 2021, 9:14 AM

Drop unnecessary library dependencies.

Looks OK. I have one question remaining to clear my understanding.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
681

Nit: Available as ompBuilder from 582.
Also applies for 686.

683

Is this related to the nowait of the worksharing loop?

734–735

The OpenMPIRBuilder already takes in OwningReductionGen and OwningAtomicReductionGen which inlines the reduction body based on whether it is non-atomic or atomic. Why is a separate non-atomic reduction required?

ftynse updated this revision to Diff 370184.Sep 2 2021, 1:16 AM
ftynse marked 3 inline comments as done.

Adress more review.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
734–735

Because we also need to generate the body of the reduction in the loop:

#pragma omp parallel for reduction(+:x)
for (...) {
  x += 42; //< this needs to be in the IR executed by the loop,
           //< and we just `omp.reduce 42` in MLIR.
}
This revision is now accepted and ready to land.Sep 2 2021, 4:04 AM
This revision was automatically updated to reflect the committed changes.