This is an archive of the discontinued LLVM Phabricator instance.

[MLIR,OpenMP] Added support for lowering MasterOp to LLVMIR
ClosedPublic

Authored by SouraVX on Sep 7 2020, 10:35 AM.

Details

Summary

Some Ops in OMP dialect have regions associated with them i.e
ParallelOp MasterOp. Lowering of these regions involves interfacing
with OMPIRBuilder using callbacks, yet there still exist opportunities
for sharing common code in between.

This patch factors out common code into a separate function and adds
support for lowering MasterOp using that. Lowering of ParallelOp is
also modified appropriately.

Diff Detail

Event Timeline

SouraVX created this revision.Sep 7 2020, 10:35 AM
Herald added a project: Restricted Project. · View Herald Transcript
SouraVX requested review of this revision.Sep 7 2020, 10:35 AM

Thanks @SouraVX for working on this. Looks OK. A few comments to address.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
447–448

Should valueMapping and blockMapping be passed? Aren't these available here since convertOmpOpRegions is a member function? Or did you want this as a static function?

492–496

Can this loop also go into convertOmpOpRegions?

505

lint warning?

mlir/test/Target/openmp-llvm.mlir
225–227
  1. Can you add this inside a parallel here or in another test?
  2. Nit: I think other tests here use the pretty print syntax.
kiranktp added inline comments.Sep 7 2020, 6:03 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
491

convertOmpMaster and convertOmpParallel function body is 99% same except the casting type (omp::MasterOp/omp::ParallelOp).
Will it be good to have a template function (eg: convertOmpOpWithRegions(..)) and call it with explicit type.
convertOmpOpWithRegions<omp::MasterOp>(opInst, builder);
convertOmpOpWithRegions<omp::ParallelOp>(opInst, builder);
This way we can reduce code duplication.
You can choose to not follow this.

SouraVX marked 2 inline comments as done.Sep 8 2020, 8:33 AM

Thanks @kiranktp ,@kiranchandramohan for reviewing this!
When adding a test case for a MasterOp inside the region of ParallelOp(typical use case.)
Test snippet:

llvm.func @test_omp_master() -> () {
  omp.parallel {
  "omp.master" ()({
    omp.terminator
  }):()->()
    omp.terminator
}
  llvm.return
}

This produces following crash/assertion failure in OMPIRBuilder:

mlir-translate: llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:717: llvm::IRBuilderBase::InsertPoint llvm::OpenMPIRBuilder::CreateParallel(const llvm::OpenMPIRBuilder::LocationDescription&, llvm::OpenMPIRBuilder::InsertPointTy, llvm::OpenMPIRBuilder::BodyGenCallbackTy, llvm::OpenMPIRBuilder::PrivatizeCallbackTy, llvm::OpenMPIRBuilder::FinalizeCallbackTy, llvm::Value*, llvm::Value*, llvm::omp::ProcBindKind, bool): Assertion `Outputs.empty() && "OpenMP outlining should not produce live-out values!"' failed.

digging in further, I realized this Outputs is getting populated with(below instruction) and finally asserting at Outputs.empty().

%omp_global_thread_num2 = call i32 @__kmpc_global_thread_num(%struct.ident_t* @3), !dbg !9

I'll dig further, BTW if you guys have any inputs WRT this. Please share!

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
447–448

Yes, initially planned for static function. But later realized convertBlock (used here) is private within ModuleTranslation.

And I didn't find enough motivation to expose that as public, if time demands we'll do that.
I'll take care of the NIT's when revising.

491

Thought of this approach too, templatizing this won't help since Op is passed around as generic Operation so it seems to me we can't get away with this.
I even thought pulling out the Callback itself, but that will also incur if/else ladder and so forth.

492–496

It can be, but keeping it separate seems more reasonable.
Since we have to create block with Op Name(this can be done there also but that will incur one more parameter passing and maybe changing the function name itself to convey the intent).

505

Thanks for catching this! Sure, it's only needed for converting ParallelOp.

mlir/test/Target/openmp-llvm.mlir
225–227

Nit: I think other tests here use the pretty print syntax.

MasterOp doesn't has custom assembly registered.

error: custom op 'omp.master' has no custom assembly form

Can you add this inside a parallel here or in another test?

Thanks for this, I discovered one crash.
Details follows:

This produces following crash/assertion failure in OMPIRBuilder:

mlir-translate: llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:717: llvm::IRBuilderBase::InsertPoint llvm::OpenMPIRBuilder::CreateParallel(const llvm::OpenMPIRBuilder::LocationDescription&, llvm::OpenMPIRBuilder::InsertPointTy, llvm::OpenMPIRBuilder::BodyGenCallbackTy, llvm::OpenMPIRBuilder::PrivatizeCallbackTy, llvm::OpenMPIRBuilder::FinalizeCallbackTy, llvm::Value*, llvm::Value*, llvm::omp::ProcBindKind, bool): Assertion `Outputs.empty() && "OpenMP outlining should not produce live-out values!"' failed.

I don't have any immediate inputs regarding this crash. Maybe @jdoerfert or @fghanim.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
492–496

No strong opinion here. Let us see whether anyone else has an opinion here before making any change.
But what is the change in name of the function you are hinting here? Creating the llvm blocks is part of the conversion right?

mlir/test/Target/openmp-llvm.mlir
225–227

Can you add an assembly format to print/parse this operation in custom syntax as part of this patch? I think it will look a bit odd if this operation only has the generic syntax.
From the latest mlir news discourse post I see that they have added support for region parsing also in assembly format. master looks like a simple op to try this out.
https://github.com/llvm/llvm-project/commit/eaeadce9bd11

I don't know enough about the OMPIR for me to be helpful, however, I noticed a couple of things that I wanted to ask about for better understanding. :)

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
483

I don't see where you set the location where alloca should be codegened. how do you handle variable declarations inside the omp master scope? when do you codegen the relevant alloca?

505

All llvm passes expect Allocas to be in the entry block of the function. In this case, the soon-to-be-outlined region. is builder's current insertion point in the entry block? Also, is it guaranteed to not be empty?

@kiranchandramohan This error happens when there are things within the OMP region still alive/are used outside the region, I cannot comment on what causes this here. Usually, this suggests that "something" that should be contained within the parallel region is not being detected as such, and so when the region is outlined, this User remains as part of the original function.

The first thing that comes to mind, make sure continuationBB post dominate everything within OMP parallel

jdoerfert added inline comments.Sep 10 2020, 8:16 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
489

Why do you set the insertion point to something other than codeGenIP?

505

Where is allocaIP used?

jdoerfert added inline comments.Sep 10 2020, 8:40 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
489

Do not insert code or modify anything else. Basically you get this:

block:
  ... 
  %...      <- codeGenIP
  ...
  terminator

And if you want to insert control flow, split it:

block.pre:
  ... 
  %...      <- codeGenIP
  br %block.post

block.post:
  ...
  terminator

No you can remove the branch that was introduced and create instructions and control flow that converge at block.post.

ftynse added inline comments.Sep 10 2020, 9:08 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
460

It's a good time to do this.

503

Nit: a more comprehensible variable name is welcome.

mlir/test/Target/openmp-llvm.mlir
225–227

Have created a trivial patch for adding the assembly format.
https://reviews.llvm.org/D87549

SouraVX updated this revision to Diff 299326.EditedOct 20 2020, 3:55 AM
SouraVX marked 2 inline comments as done.

At https://github.com/llvm/llvm-project/blob/master/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp#L918 @jdoerfert could you please explain the need for passing empty AllocaIP ? I meant this will be empty? while interfacing MasterOp from flang/mlir. Why is that.

That looks to me like a mistake which needs to be fixed. And, by glancing at it, it seems we now need to do what I feared in D82470, we need to add the AllocaIP to a lot more APIs so it can be passed down properly. Maybe there is no way around it.

It reuses BodyGenCallbackTy which has an AllocaIP argument as part of the type. But since this is an inlined region it doesn't benefit from it. Why? because inlined-regions should be using the AllocaIP of the innermost outlined region they belong to, for any allocations that happen inside them. Hence, you don't need to update the AllocaIP for inlined regions and so it is left empty. I wrote this with a design like clang's in mind, where an AllocaIP is always maintained in the frontend, and BodyCB is a place for the frontend to generate whatever non-OMP code it wants according to its own rules.

SouraVX updated this revision to Diff 306372.Nov 19 2020, 4:46 AM

Rebase + Ping!

I guess the only remaining point to address is to handle return the pass/fail status to MLIR module translation.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
483

AllocaIP issues will be handled in a separate patch.
https://reviews.llvm.org/D88706

489

We do not change the codeGenIP now. This was fixed in the following review.
https://reviews.llvm.org/D88720

505

AllocaIP issues will be handled in a separate patch.
https://reviews.llvm.org/D88706

mlir/test/Target/openmp-llvm.mlir
272–292

Can you nest the omp operations with appropriate formatting?

fghanim added inline comments.Nov 20 2020, 9:29 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
483

Sure. If I may suggest things to consider:
1- Privatization clauses: The OMPBuilder will never handle privatization or provide creators for it - with minor exceptions- and so this is something you guys will have to handle on the MLIR side. and it is something you should consider because you will need to allocate local variables inside the entry block of an outlined region based on how Fortran does it. Again, the OMPBuilder will never do any of that.
2- Variables declared inside a scope: I don't know if fortran has the concept of scopes like C/C++, but it is also something to keep in mind - although this is likely to be handled by FIR.

SouraVX updated this revision to Diff 307413.Nov 24 2020, 11:05 AM

Addressed comments related to formatting test case + Rebase.

SouraVX added inline comments.Nov 24 2020, 11:05 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
466

This is getting trickier than what we initially thought. Even if we return LogicalResult back to caller(which is callback bodyGenCB). Propagating same error upward in hierarchy may not be possible(or degenerate the interface) because this bodyGenCB will be called very late inside OMPIRBuilder using createParallel.
Now createParallel is primary interfacing which returns InsertionPoint.

SouraVX added inline comments.Nov 26 2020, 9:19 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
466

@ftynse are you Okay with this ? One other approach we have is something of sort:

bool ErrorStatus = false/*Initial state*/;
bodyGenCB = [&] (..){
...
if (failed(convertBlock(*bb, /*ignoreArguments=*/indexedBB.index() == 0)))
      ErrorStatus = true;
...
};

builder.restoreIP(ompBuilder->createMaster(builder, bodyGenCB, finiCB));

if (ErrorStatus)
    return failure();

return success();

};
SouraVX updated this revision to Diff 308652.Dec 1 2020, 7:32 AM

Added error propagation support. @ftynse I was waiting for your response, however I noticed same approach in D92055. So I've revised this.

SouraVX marked 2 inline comments as done.Dec 1 2020, 7:34 AM
ftynse accepted this revision.Dec 4 2020, 5:29 AM
ftynse added inline comments.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
450

bodyGenStatus should be a reference here

466

LGTM modulo the fix above.

That being said, I expect all this callback and delayed execution to bite us back one day.

500

Not for this diff, but I am not a fan of this stack approach. The parent op of the region can handle conversion of the terminators of all the blocks instead. This will avoid the need to communicate through this explicit stack between functions (it will be instead on the actual call stack), and better corresponds to how MLIR expects one to handle regions -- the parent op knows more about terminator semantics than terminators themselves.

This revision is now accepted and ready to land.Dec 4 2020, 5:29 AM
SouraVX marked 2 inline comments as done.Dec 6 2020, 8:55 PM
SouraVX added inline comments.Dec 7 2020, 7:17 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
500

Thanks for the pointers, me and @kiranchandramohan will consider this and how to use this.