This is an archive of the discontinued LLVM Phabricator instance.

[OpenMPIRBuilder] Implement CreateCanonicalLoop.
ClosedPublic

Authored by Meinersbur on Nov 5 2020, 12:48 AM.

Details

Summary

CreateCanonicalLoop generates a standardized control flow structure for OpenMP canonical for loops. The structure can be consumed by loop-associated directives such as worksharing-loop, distribute, simd etc. as well as loop transformations such as tile and unroll.

This is a first design without considering all complexities yet. The control-flow emits more basic block than strictly necessary, but these will be optimized by CFGSimplify anyway, provide a nice separation of concerns and might later be useful with more complex scenarios. I successfully implemented a basic tile construct using this API, which is not part of this patch.

The fundamental building block is the CreateCanonicalLoop that only takes the loop trip count and operates on the logical iteration spaces only. An overloaded CreateCanonicalLoop for using LB, UB, Increment is provided as well, but at least for C++, Clang will need to implement a loop counter to logical induction variable mapping anyway, since iterator overload resolution cannot be done in LLVMFrontend.

As there currently is no user for CreateCanonicalLoop, it is only called from unittests. Similarly, CanonicalLoopInfo::eraseFromParent() is used in my file implementation and might be generally useful for implementing loop-associated constructs, but is not used in this patch itself.

The following non-exhaustive list describes not yet covered items:

  • collapse clause (including non-rectangular and non-perfectly nested); idea is to provide a OpenMPIRBuilder::collapseLoopNest method consuming multiple nested loops and returning a new CanonicalLoopInfo that can be used for loop-associated directives.
  • simarly: ordered clause for DOACROSS loops
  • branch weights
  • Cancellation point (?)
  • AllocaIP
  • break statement (if needed at all)
  • Exceptions (if not completely handled in the front-end)
    • Using it in Clang; this requires implementing at least one loop-associated construct.
  • ...

Diff Detail

Event Timeline

Meinersbur created this revision.Nov 5 2020, 12:48 AM
Meinersbur requested review of this revision.Nov 5 2020, 12:48 AM
Meinersbur edited the summary of this revision. (Show Details)Nov 5 2020, 8:04 AM

Some minor comments, overall this looks reasonable. I'll let someone else take a look.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
702

front = getTerminator, right?
Can we add a few more assertions here and elsewhere, just to make sure the structure is as we expect it.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
887

I usually prefer the assert(CL->isValid() && "...") style but no objection at the end of the day. (the reason is the other one can be used to dump more of the context easily, e.g., if (!isValid()) dump_stuff();.

903

Nit : && "Loop counter type mismatch" or similar.

958

Nit: msg.

978

I thought we have an API to delete blocks without the need for this. Maybe try to use that instead?

Thanks for the Patch. Generally looks Good.
Just a couple of very minor comments/questions

  • My understanding of the usage case for this, is that when we get CreateDistribute, CreateSimd, CreateLoop, etc., they will use this to generate the loop/s required, but this is not meant to be used directly by clang or flang? is that correct? If yes, why do we have them as public methods, wouldn't it be better to make them private?
  • Also, a question; I think when we have something like:
#pragma omp for
for ( ... ; ... ; ...) {
 //body
}

this is going to be encoded in the clang AST as the omp for statement, whose body the ForStatement node. Now the first will end up using the CreateForDirective() that we plan to add in the OMPBuilder, while the latter will basically have clang generate the loop for us ( i.e. using`
emitForStatement() ). is this correct? If yes, wouldn't that mean that there would be some redundancy, or is this for a different case?

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1393

did you mean to say Cond->getTerminator()?

Meinersbur marked 5 inline comments as done.Nov 6 2020, 10:22 PM

Thanks for the Patch. Generally looks Good.
Just a couple of very minor comments/questions

  • My understanding of the usage case for this, is that when we get CreateDistribute, CreateSimd, CreateLoop, etc., they will use this to generate the loop/s required, but this is not meant to be used directly by clang or flang? is that correct? If yes, why do we have them as public methods, wouldn't it be better to make them private?

OpenMP 5.1 includes the tile and unroll directive that allows chaining loop-associated constructs as controlled by the front-end. I.e.

#pragma omp for
#pragma omp unroll partial
for (int i = 0; i < N; i+=1)

the front-end will do:

auto LiteralLoop = CreateCanonicalLoop(ForStmt)
auto UnrolledLoop = CreateUnrollDirective(LiteralLoop );
CreateWorksharingDirective(UnrolledLoop);

Since LLVMFrontend cannot implement all combinations in which tiling/unrolling/other loop-associated can be combined, the clang/flang must have access to CreateCanonicalLoop to represent the composition in the right order.

The concept is more powerful, e.g. for combined directives:

auto LiteralLoop = CreateCanonicalLoop(ForStmt)
auto VectorizedLoop = CreateSimdDirective(LiteralLoop);
CreateWorksharingDirective(VectorizedLoop);

to implement #pragma omp for simd without OMPIRBuilderr necessarily implementing every valid combination of directives.

  • Also, a question; I think when we have something like:
#pragma omp for
for ( ... ; ... ; ...) {
 //body
}

this is going to be encoded in the clang AST as the omp for statement, whose body the ForStatement node. Now the first will end up using the CreateForDirective() that we plan to add in the OMPBuilder, while the latter will basically have clang generate the loop for us ( i.e. using`
emitForStatement() ). is this correct? If yes, wouldn't that mean that there would be some redundancy, or is this for a different case?

Unfortunately CodeGenFunction::EmitForStmt does not generate a predictable control-flow, or a way to get the loop's trip count that we need to determine the logical iterations. I.e. we have to implement that ourselves. However, my experimental clang code started its live as a copy&paste of CodeGenFunction::EmitForStmt and I try to keep them similar.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
702

front is the first instruction in the block, this is the compare instruction for i < TripCount. I.e. the second operator is the trip count.

This is already asserted in assertOK, but I can add one here as well.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
887

This follows the pattern AssertOK used by LLVM's instructions. I usually prefer this pattern because the crash message includes what is wrong, not just that it is invalid. Dumping stuff also assumes that the data structure is valid.

1393

Yep.

Meinersbur marked 2 inline comments as done.
Meinersbur edited the summary of this revision. (Show Details)

Address review comments

  • re-add CreateCopyPrivate

Thanks for the response, I think I am beginning to understand where this fits.
Just one Nit, O/W LGTM

I am going to wait for @jdoerfert to finish his review, and I'll accept the patch if he doesn't

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
625–639

Thank you for adding this!

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
1402

Nit: must have two successors

jdoerfert accepted this revision.Nov 9 2020, 7:47 AM

LGTM. We will make progress based on this as we go. Thanks :)

This revision is now accepted and ready to land.Nov 9 2020, 7:47 AM
This revision was landed with ongoing or failed builds.Nov 9 2020, 1:03 PM
This revision was automatically updated to reflect the committed changes.
Meinersbur marked an inline comment as done.