This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Implement '#pragma omp tile'
ClosedPublic

Authored by Meinersbur on Mar 17 2020, 8:33 PM.

Details

Summary

The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.

This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.

A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of std::distance) and converting the logical iteration number to the iteration variable (e.g. overload resolution of iteration + .omp.iv). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.

I would also like to refactor checkOpenMPLoop into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).

Diff Detail

Event Timeline

Meinersbur created this revision.Mar 17 2020, 8:33 PM
Meinersbur edited the summary of this revision. (Show Details)
  • Rebase
  • Remove unrelated change
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2020, 11:14 PM
  • Rebase after clang-format of cindex
JonChesterfield added a comment.EditedMar 25 2020, 9:53 AM

edit: actually you've already done the clang-format on trunk as I hoped, phabricator mislead me. Apologies for the noise

edit: actually you've already done the clang-format on trunk as I hoped, phabricator mislead me. Apologies for the noise

The previous diff was against a branch with only the whitespace changes to get rid of the noise in the diff. It seems that the pre-merge check instead compared formatting to the master (probably Arcanist/Phabricator looked for the common ancestor of origin/master and my git branch). This let to weird formatting warnings that weren't even changes highlighted in this diff. They are gone now with clang-formatted origin/master. Fotunately.

jdenny added a subscriber: jdenny.Mar 31 2020, 12:49 PM
ABataev added inline comments.Apr 14 2020, 1:42 PM
clang/include/clang/AST/StmtOpenMP.h
4751–4754

Not sure that this is a good idea to treat this directive as the executable directive. To me, it looks like kind of AttributedStmt. Maybe better to introduce some kind of a new base node for this and similar constructs, which does not own the loop but is its kind of attribute-like entity?
Also, can we have something like:

#pragma omp simd
#pragma omp tile ...
for(...) ;

Thoughts?

Meinersbur marked an inline comment as done.Apr 14 2020, 5:23 PM
Meinersbur added inline comments.
clang/include/clang/AST/StmtOpenMP.h
4751–4754

While not executed at runtime, syntactically it is parsed like a executable (loop-associated) directive. IMHO it does 'own' the loop, but produces another one for to be owned(/associated) by a different directive, as in your tile/simd example, which should already work. Allowing this was the motivation to do the transformation on the AST-level for now.

Herald added a project: Restricted Project. · View Herald Transcript
DavidTruby resigned from this revision.May 4 2020, 10:24 AM
ABataev added inline comments.Jun 22 2020, 1:42 PM
clang/include/clang/AST/StmtOpenMP.h
4751–4754

I'm not saying that we should separate parsing of this directive from others, it is just better to treat this directive as a little bit different node. Currently, it introduces too many changes in the base classes. Better to create a new base class, that does not relies on CapturedStmt as the base, and derive OMPExecutableDirective and this directive and other similar (+ maybe, OMPSimdDirective) from this new base class.

Meinersbur marked an inline comment as done.Jun 23 2020, 7:05 PM
Meinersbur added inline comments.
clang/include/clang/AST/StmtOpenMP.h
4751–4754

Unless you tell me otherwise, OMPLoopDirective represents a loop-associated directive. #pragma omp tile is a loop-associated directive. OMPLoopDirective contains all the functionality to parse associated loops, and unfortunately if derived from OMPExecutableDirective.

You seem to ask me to create a new class "OMPDirectiveAssociatedWithLoopButNotExecutable" that duplicates the parsing part of "OMPLoopDirective"? This will either be a lot of duplicated code or result in even more changes to the base classes due to the refactoring.

By the OpenMP specification, simd and tile are executable directives, so structurally I think the class hierarchy as-is makes sense. From the glossary of the upcoming OpenMP 5.1:

An OpenMP directive that appears in an executable context and results in implementation code and/or prescribes the manner in which associated user code must execute.

Avoiding a CapturedStmt when not needed would a modification of clang::getOpenMPCaptureRegions which currently adds a capture of type OMPD_unknown for such directives. This is unrelated to loop-associated directives.

ABataev added inline comments.Jun 24 2020, 5:27 AM
clang/include/clang/AST/StmtOpenMP.h
4751–4754

No, this is not what I'm asking for. I asked you to think about adding a new level of abstraction somewhere between the OMPLoop... and OMPExecutableDirective classes to minimize the functional changes and make the classes more robust. Anyway, give me a week or so to try to find possible better abstractions and if there are no better ones, we'll proceed with your original solution.

Meinersbur marked an inline comment as done.Jun 24 2020, 7:41 AM
Meinersbur added inline comments.
clang/include/clang/AST/StmtOpenMP.h
4751–4754

derive OMPExecutableDirective and this directive and other similar (+ maybe, OMPSimdDirective) from this new base class

sounded like the new class hierarchy should be

Stmt -> NewClass -> OMPExecutableDirective -> OMPLoopDirective -> OMPForDirective
                \
                 -> OMPTileDirective
                \
                 -> OMPSimdDirective

Since OMPSimdDirective seems to be affected as well, this seems more like a refactoring of the existing structure than something specific to #pragma omp tile.

Looking forward to your idea.

ABataev added inline comments.Jun 24 2020, 7:59 AM
clang/include/clang/AST/StmtOpenMP.h
4751–4754

I have a little bit different scheme in my mind, but you got the main idea. Let me check, if it works, or I can find a better design.

ping

@ABataev Have you checked the alternative scheme?

ping

@ABataev Have you checked the alternative scheme?

Working on it, need a little bit more time.

Meinersbur updated this revision to Diff 276259.Jul 7 2020, 4:31 PM
  • Rebase, especially omp_gen
  • Some cleanup

Rebase after D83261

I will try to adapt it to the new interface (it won't work for simd, still need to capture the variables) and will send the patch back to you.

cchen added a subscriber: cchen.Sep 23 2020, 2:34 PM
ABataev updated this revision to Diff 315503.Jan 8 2021, 1:38 PM

Rework, bug fixes.

martong removed a subscriber: martong.Jan 19 2021, 6:50 AM

Thank you for the update.

From comparing the diffs, here is the list of main changes I found:

  1. Create OMPLoopBasedDirective as a new base class for OMPLoopDirective (and OMPTileDirective) for CollapsedNum and some utility methods for analyzing the CapturedStmt/ForStmt structure (replacing getTopmostAssociatedStructuredBlock, collectAssociatedLoops)
  2. Add PreInits child to OMPTileDirective (instead of wrapping them in a CompoundStmt of the transformed body/collectAssociatedLoops)
  3. Use meaningful SourceLocations instead of my placeholder {}
  4. New OMPTransformDirectiveScopeRAII
    1. to collect nested PreInits (instead by collectAssociatedLoops)
    2. to assign values to CapturedDecls (instead of adding a Capturing argument to various function)
  5. no call to checkOpenMPLoop for the entire lop nest (but still once per loop)

My remarks to these changes:

  1. Saves us to store all the loop-specific subexpressions in the AST node. However, they are still computed in ActOnOpenMPTileDirective. With the OpenMPIRBuilder (D94973) these will also not be necessary for other loop-associated statements.
  2. looks more consistent with other directives
  3. This was just laziness on my size. Thank you.
  4. is what I just did not know. 4.b. still feels like a hack because there are captures variables outside of any CapturedStmt and therefore complicated the AST. Comparable directive such as master don't do this.
  5. The additional call on the entire loop nest felt necessary to check constraints such as invariant loop bounds and disallow nesting. However, both properties are still checked now.
clang/include/clang/AST/RecursiveASTVisitor.h
2790–2791

[no change request] OMPLoopBasedDirective is not represented in the visitor.

Well, nor does anything call TraverseOMPLoopDirective, so the possibility to only override the base class traverse-function is unused.

clang/include/clang/AST/StmtOpenMP.h
443

Please add a documentation what this class represents.

448

Inaccurate for the tile directive.

456

Inaccurate for the tile directive.

651–677

Please add doxygen comments.

IMHO, using callbacks makes the callers significantly more complicated. Why not fill a SmallVectorImpl<Stmt*> with the result?

659

I do not see why making this a forwarding reference.

680–709

Use isOpenMPLoopDirective() instead?

Thank you for the update.

From comparing the diffs, here is the list of main changes I found:

  1. Create OMPLoopBasedDirective as a new base class for OMPLoopDirective (and OMPTileDirective) for CollapsedNum and some utility methods for analyzing the CapturedStmt/ForStmt structure (replacing getTopmostAssociatedStructuredBlock, collectAssociatedLoops)
  2. Add PreInits child to OMPTileDirective (instead of wrapping them in a CompoundStmt of the transformed body/collectAssociatedLoops)
  3. Use meaningful SourceLocations instead of my placeholder {}
  4. New OMPTransformDirectiveScopeRAII
    1. to collect nested PreInits (instead by collectAssociatedLoops)
    2. to assign values to CapturedDecls (instead of adding a Capturing argument to various function)
  5. no call to checkOpenMPLoop for the entire lop nest (but still once per loop)

My remarks to these changes:

  1. Saves us to store all the loop-specific subexpressions in the AST node. However, they are still computed in ActOnOpenMPTileDirective. With the OpenMPIRBuilder (D94973) these will also not be necessary for other loop-associated statements.
  2. looks more consistent with other directives
  3. This was just laziness on my size. Thank you.
  4. is what I just did not know. 4.b. still feels like a hack because there are captures variables outside of any CapturedStmt and therefore complicated the AST. Comparable directive such as master don't do this.
  5. The additional call on the entire loop nest felt necessary to check constraints such as invariant loop bounds and disallow nesting. However, both properties are still checked now.

Just one thing. Preinits does not directly relate to CapturedStmt, you can consider it as a temp way to optimize the codegen. Also, it allows to support the use of data members as loop counters in classes/structures.

clang/include/clang/AST/StmtOpenMP.h
448

We can rename it to something like NumAssociatedLoops

651–677

It just hides all internal representation in the class and the user/caller should not worry about proper implementation of the loops processing. Consider it as a way to encapsulate representation details.

659

There is a type mismatch in callback types, Stmt * and const Stmt *

680–709

I'll check if it can be used instead, though I just want to be consistent with what we have for other statements/expressions/directives.

Meinersbur added inline comments.Feb 2 2021, 1:32 PM
clang/include/clang/AST/StmtOpenMP.h
448

Sounds good.

651–677

That could also be done by returning and iterator/filling an array with the requested data. The latter is one of the primary uses of SmallVectorImpl.

coroutines should make returning an iterator much simpler, eventually when we are allowed to use C++20.

659

I understand why the additional lambda is needed, but wondered about why the declaration is not

auto NewCallback = [Callback](unsigned Cnt, Stmt *CurStmt) {
680–709

It would reduce the maintenance cost of not forgetting to update this list as well.

Just a note:

4.b. still feels like a hack because there are captures variables outside of any CapturedStmt and therefore complicated the AST. Comparable directive such as master don't do this.

It is not a hack, this was an optimization to minimize the number of recalculations of some expressions for the loops. For master or other directives we just did not need to do it at all.

clang/include/clang/AST/StmtOpenMP.h
651–677

I just prefer not to expose internals if they could be processed within the class instance.

659

I believe it just saves us from the extra constructor call.

ABataev updated this revision to Diff 323453.Feb 12 2021, 12:47 PM

Update+Rebase

Meinersbur accepted this revision.Feb 13 2021, 8:54 PM

Who is going to commit?

clang/include/clang/AST/StmtOpenMP.h
651–677

Using SmallVectorImpl for this is still the dominant style in clang. For instance:

  • Sema::getUndefinedButUsed
  • Sema::FindHiddenVirtualMethods
  • Sema::CollectMultipleMethodsInGlobalPool
  • Sema::SelectBestMethod
  • Sema::CollectIvarsToConstructOrDestruct
  • Sema::collectUnexpandedParameterPacks
  • Sema::FindProtocolDeclaration
  • Sema::GatherArgumentsForCall
  • Sema::GatherGlobalCodeCompletions
  • Sema::getUndefinedButUsed
  • ASTContext::getOverriddenMethods
  • ASTContext::DeepCollectObjCIvars
  • ASTContext::getInjectedTemplateArgs

    ...
659
This revision is now accepted and ready to land.Feb 13 2021, 8:54 PM

Who is going to commit?

I can commit it as soon as you accepted it if you don't mind of course.

clang/include/clang/AST/StmtOpenMP.h
651–677

There is also another motivation behind this solution: to reuse the code as much as possible and avoid code duplication (loops to process inner loops/bodies) and focus just on loops/bodies processing.

659

Yes, currently it is not, it was before. But even now there is a difference in AST:

`-CompoundStmt <col:34, line:10:1>
  |-DeclStmt <line:4:5, line:6:6>
  | `-VarDecl <line:4:5, line:6:5> line:4:17 used Lambda '(lambda at line:4:26)':'(lambda at line:4:26)' cinit
  |   `-ExprWithCleanups <col:26, line:6:5> '(lambda at line:4:26)':'(lambda at line:4:26)'
  |     `-CXXConstructExpr <line:4:26, line:6:5> '(lambda at line:4:26)':'(lambda at line:4:26)' 'void ((lambda at line:4:26) &&) noexcept' elidable
  |       `-MaterializeTemporaryExpr <line:4:26, line:6:5> '(lambda at line:4:26)' xvalue

With &&:

`-CompoundStmt <col:34, line:10:1>
  |-DeclStmt <line:4:5, line:6:6>
  | `-VarDecl <line:4:5, line:6:5> line:4:17 used Lambda '(lambda at line:4:26) &&' cinit
  |   `-ExprWithCleanups <col:26, line:6:5> '(lambda at line:4:26)' xvalue
  |     `-MaterializeTemporaryExpr <line:4:26, line:6:5> '(lambda at line:4:26)' xvalue extended by Var 0x562226678ad8 'Lambda' '(lambda at line:4:26) &&'

Just the constructor call is marked as elidable and is not emitted anymore, but it still saves some memory/time because we don't need to emit an extra AST node.

I can commit it as soon as you accepted it if you don't mind of course.

I do not mind

I can commit it as soon as you accepted it if you don't mind of course.

I do not mind

Ok, will commit it in a few minutes, thanks!

This revision was landed with ongoing or failed builds.Feb 16 2021, 9:47 AM
This revision was automatically updated to reflect the committed changes.
clementval added a subscriber: clementval.EditedFeb 16 2021, 12:05 PM

@Meinersbur / @ABataev This patch is making Flang buildbots failing.

Since you have added a clause in OMP.td you have to add lines here as well -> https://github.com/llvm/llvm-project/blob/adfd3c7083f9808d145239153c10f72eece485d8/flang/lib/Semantics/check-omp-structure.cpp#L669

Adding this line

CHECK_SIMPLE_CLAUSE(Sizes, OMPC_sizes)

After

CHECK_SIMPLE_CLAUSE(Simd, OMPC_simd)

Fixes the build problem

clementval added a comment.EditedFeb 16 2021, 12:31 PM

A fix is ready here https://reviews.llvm.org/D96808

Fix has landed so no need to revert.