This is an archive of the discontinued LLVM Phabricator instance.

[clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
ClosedPublic

Authored by Meinersbur on Jan 19 2021, 8:17 AM.

Details

Summary

Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:

  • Only the worksharing-loop directive.
  • Recognizes only the nowait clause.
  • No loop nests with more than one loop.
  • Untested with templates, exceptions.
  • Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:

  • The distance function: How many loop iterations there will be before entering the loop nest.
  • The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Diff Detail

Event Timeline

Meinersbur created this revision.Jan 19 2021, 8:17 AM
Meinersbur requested review of this revision.Jan 19 2021, 8:17 AM
Meinersbur updated this revision to Diff 320983.Feb 2 2021, 8:41 PM
Meinersbur edited the summary of this revision. (Show Details)
  • Rebase
  • Fix crash in irbuilder_nested_parallel_for.c
  • Fix OpenMP/cancel_codegen.cpp
  • Fix warnings

This seems sensible. The clang parts look OK but I haven't thought everything through to the last detail. I left some questions below, I guess we can go down this road and slowly transition to the new scheme.

clang/lib/AST/Stmt.cpp
1275

Why does this have to go? Is that avoidable?

clang/lib/Sema/SemaExpr.cpp
17274

This doesn't impact anyone else negatively? I don't understand why we need this now.

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
369

Nit: Add the argument description (from above).

949

Nit: documentation, also above.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
987

I'm unsure I understand when it would make sense to not have a body generator.

Meinersbur marked 3 inline comments as done.
clang/lib/Sema/SemaExpr.cpp
17274

This taken from how implicit lambda captures are handled: for nested lambdas, only the outermost lambda captures byval, the remaining use that existing copy. I think there are no other uses of CaptureStmt that have explicit captures, I am introducing the first.

With removing the assertion and therefore effectively allowing byval captures for CapturedStmts, I intended to make its behaviour resemble that of lambdas (captureInLambda).

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
987

As done by EmitOMPCanonicalLoop, the body code can also added to CL->getBodyIP() to the CL returned by this function. Calling the callback is the last action done anyway, using the callback just makes the code harder to understand.

Meinersbur added inline comments.Feb 3 2021, 11:19 AM
clang/lib/AST/Stmt.cpp
1275

This is used to capture the loop counter (e.g. __begin for a CXXForRangeStmt) in front of the loop before it is modified. This can be an iterator whose copy constructor needs to be called and this assertion did restrict it to scalar types. It is used for the LoopVarFunc closure which is of the form [__begin,&](LogicalIterationNumberTy indvar) { return *(__begin + indvar); }
The alternative would be to introduce more implicit children of an OMPCanonicalLoop that declares a variable and assigns the initial loop counter value. I find the byval-capture much more elegant.

The code implementing the byval-capture is shared with the implementation of lambdas, so the only reason seems to be that there was no use case when introduced in D14940.

jdenny added a comment.Feb 5 2021, 6:07 PM

@Meinersbur Sorry, I haven't gotten very far yet in reviewing this, but I'll try to work on this more soon.

clang/include/clang/AST/StmtOpenMP.h
31

I think it would be helpful to cite the OpenMP spec here. Just version and section number.

37
41

I suggest the terms "loop user variable" and "loop iteration variable". In particular, I find "loop counter" to be misleading because "counter" suggests to me it's always an integer, so it's easy to get it confused with the "logical iteration counter". If you're using standard terminology that I'm not aware of, then never mind.

43
46

Please list pointer too as I don't think C has an iterator concept.

48
61

Are those hyphens intentional?

92

Why is __begin an explicit capture here but not for the distance function?

94

Thanks for this careful documentation!

101

Why "loopy" with a "y"?

Meinersbur updated this revision to Diff 321975.Feb 6 2021, 9:15 PM
Meinersbur marked 10 inline comments as done.
  • Rebase
  • Address @jdenny's review
  • Address clang-tidy remarks
clang/include/clang/AST/StmtOpenMP.h
41

Good suggestion

92

Because __begin is captured by-value, everything lese uses the &default capture. This is the loop counter variable is modified inside the loop body.

101

loopy was meant to refer to the property of the statement (ForStmt, CXXForRangeStmt, potentially others such as WhileStmt, DoStmt, functions from #include <algorithm>, etc) instead of a specific loop node (such as OMPLoopDirective or OMPCanonicalLoop itself), although I did not apply this idea consistently. Do you prefer a plain 'LOOP_STMT'?

I have a single last comment/request. @jdenny I'll take it you finish the review and accept as you see fit.

llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
987

I think in the long run the callbacks will provide more usefulness. Since the code generation scope for constructs is then known by the OMPIRBuilder we can track the OpenMP context in the OMPIRBuilder including the construct traits. You think it is worse at the call site to pack the body gen in a lambda?

  • Use BodyGen Callback to fill loop body.
jdenny added a comment.Feb 9 2021, 4:53 PM

I have a single last comment/request. @jdenny I'll take it you finish the review and accept as you see fit.

@jdoerfert, if you've already reviewed everything and want to accept so this can move forward, I'm fine with that. Otherwise, I'll review more as I find time. There's a lot of code.

clang/include/clang/AST/StmtOpenMP.h
45
54
58
59
92

Because __begin is captured by-value, everything lese uses the &default capture. This is the loop counter variable is modified inside the loop body.

92

Ah, I now see there are comments about this in the code. I think a brief note here too would help because it stands out in the example, but your call.

97
101

Yes. In my mind, "loopy" doesn't help with that distinction. But your call.

166

To convert run-time errors into compile-time errors, what if setLoopStmt is overloaded to take either a ForStmt or CXXForRangeStmt instead of any Stmt?

190
196
clang/include/clang/Sema/Sema.h
10486
jdenny added inline comments.Feb 9 2021, 4:53 PM
clang/lib/Sema/SemaOpenMP.cpp
5301
Meinersbur marked 11 inline comments as done.
Meinersbur marked an inline comment as done.
  • Missed a suggestion
clang/include/clang/AST/StmtOpenMP.h
97

Unfortunately, clang-format does this when it wants to word-wrap a paragraph.

166

This would also require callers to call two different versions and just removes the problem up the call stack. For instance, StmtReader just receives a Stmt from ReadStmt. If we had setLoopStmt, we'd need a switch to call one of overloads.

jdenny added inline comments.Feb 12 2021, 1:24 PM
clang/include/clang/AST/StmtOpenMP.h
166

OK, I checked the call stack for your setters this time. It looks like it's just create, and I see that trying to achieve my goal would make a mess of that call stack. Sorry for the noise.

clang/lib/CodeGen/CGStmtOpenMP.cpp
1872
3494
clang/lib/CodeGen/CodeGenFunction.cpp
94–95

Without this patch, finalize is called even if !CurFn?

clang/lib/CodeGen/CodeGenFunction.h
288
clang/lib/Sema/SemaOpenMP.cpp
3771

Isn't VisitStmt called for any Stmt without this?

5139

I think this means:

If already parsed statements/expressions, used to capture variable references into a CapturedStatement.

But it reads like it means:

If already parsed statements/expressions into a CapturedStatement, used to capture variable references.

5215

It looks like the AST nodes NewStart, NewStop, and NewStep are shared by multiple parent nodes here. Does that ever happen anywhere else in a Clang AST?

Meinersbur marked 6 inline comments as done.Feb 13 2021, 4:58 PM
Meinersbur added a subscriber: rsmith.
Meinersbur added inline comments.
clang/lib/CodeGen/CodeGenFunction.cpp
94–95

Without CurFn, there is no function to finalize.

The problem before this patch is that finalize would finalize ALL functions, even those that are still in construction. In particular, finalizing the Distance or LoopVar function would also finalize the parent function.

I added an assert for OpenMPIRBuilder's dtor that checks that everything has eventually been finalized.

clang/lib/Sema/SemaOpenMP.cpp
3771

I think you are correct.

5139

if -> of

I reformulated the sentence which I hope is now more clear.

5215

Yes. For instance, when instantiating a template, nodes form the TemplatedDecl are reused unless they need to change. Therefore multiple template instantiation can also share entire AST subtrees. Why rebuild them if they are identical?

However, I also found the following description of TreeTransform::AlwaysRebuild():

/// We must always rebuild all AST nodes when performing variadic template
/// pack expansion, in order to avoid violating the AST invariant that each
/// statement node appears at most once in its containing declaration.

It was added by @rsmith in rG2aa81a718e64fb4ca651ea12ab7afaeffc11e463 to fix llvm.org/PR17800 . The assertion doesn't seem to exist in the current code base.

Does this apply here?

Meinersbur marked 2 inline comments as done.
Meinersbur added inline comments.Feb 15 2021, 1:16 PM
clang/lib/Sema/SemaOpenMP.cpp
5215

I think the invariant exists because codegen has a map from ast expression nodes to its temporary memory location. This means that reusing the same AST expression node indeed should not be done.

I think the assertion that failed in llvm.org/PR17800 is now here:
https://github.com/llvm/llvm-project/blob/7ba2e1c6011eeb1b91ce3f5d8fa7187b7518e77a/clang/lib/AST/ExprConstant.cpp#L1872

Assertion message was changed in this commit:
https://github.com/llvm/llvm-project/commit/f7f2e4261a98b2da519d58e7f6794b013cda7a4b

jdenny added inline comments.Feb 16 2021, 4:11 PM
clang/lib/CodeGen/CodeGenFunction.h
292

Unless I missed something, the only accesses to OMPLoopNestStack are push_back, clear, back, and size. It's never popped. That is, every push appears to mean that all existing elements will never be accessed again. So why is it a stack? Is that purely for the sake of the assertion that calls size?

clang/lib/Sema/SemaOpenMP.cpp
5139

That's better. Thanks.

5215

Thanks for the links.

I guess reuse is ok across different instantiations of a template because this map is cleared across different functions.

It might not be worth much at this point, but I also saw this comment as more evidence for this invariant: "expression's are never shared".

5356

CounterDecl is the declaration of the "loop iteration variable" based on the comments now on OMPCanonicalLoop, right? If so, can we update the variable names here? One possibility is LIVDecl and LUVDecl.

clang/lib/Sema/TreeTransform.h
8321

I'm used to seeing TransformX call RebuildX call ActOnX. Why not do that for X=OMPCanonicalLoop? Does TransformOMPExecutableDirective really need a special case for OMPCanonicalLoop?

clang/tools/libclang/CIndex.cpp
2839

Doesn't VisitStmt(L) already call EnqueueChildren(L)?

Meinersbur marked 4 inline comments as done.
  • Don't reuse NewStart/NewStop/NewStep inside same CapturedStmt
  • Support more operator overloads
  • Address @jdenny's review
clang/lib/CodeGen/CodeGenFunction.h
292

When eventually supporting loop nests, we will need not only the outermost loop, but also inner ones.

clang/lib/Sema/TreeTransform.h
8321

The intended behaviour is:

  1. Transform the child loop
  2. Return the child loop as representing the OMPCanonicalLoop (i.e. OMPCanonicalLoop wrapper is removed).
  3. Parent loop-associated directive (e.g. workshare) is processed.
  4. ActOnOpenMPCanonicalLoop adds new OMPCanonicalLoop wrapper for loop nest.

This guarantees maximum flexibility on what of the loop can be changed, such as

  • Change lower bound, upper bound, step
  • Convert between CXXForRangeStmt and ForStmt
  • Change the associated depth (e.g. different value for collapse clause)
  • Remove the directive and no OMPCanonicalLoop remain

This also saves adding many lines of code handling transforming each member of OMPCanonicalLoop separately.

clang/tools/libclang/CIndex.cpp
2839

It does indeed.

jdenny added inline comments.Feb 19 2021, 12:37 PM
clang/lib/Sema/SemaOpenMP.cpp
5356

This is marked done, but I don't see a change or reply for it.

clang/lib/Sema/TreeTransform.h
8321

The intended behaviour is:

  1. Transform the child loop

For my suggestion, that call would remain within TransformOMPCanonicalLoop where it is now.

  1. Return the child loop as representing the OMPCanonicalLoop (i.e. OMPCanonicalLoop wrapper is removed).

For my suggestion, this would not happen. I think it's normal for a TransformX to return the transformed X not the transformed child of X. If a caller wants to transform the child, then it should transform the child directly instead.

  1. Parent loop-associated directive (e.g. workshare) is processed.

It looks to me like step 3 is currently within TransformOMPExecutableDirective and starts before the call to TranformOMPCanonicalLoop and thus before step 1. It completes after step 4. Or am I misunderstanding what you're describing as step 3?

  1. ActOnOpenMPCanonicalLoop adds new OMPCanonicalLoop wrapper for loop nest.

For my suggestion, this would still happen. However, instead of step 2: within TransformOMPCanonicalLoop, you would call RebuildOMPCanonicalLoop, which would call ActOnOpenMPCanonicalLoop as step 4. The effect is you moved ActOnOpenMPCanonicalLoop from the caller (TransformOMPExecutableDirective) to the callee's callee, but the behavior should remain the same.

This guarantees maximum flexibility on what of the loop can be changed, such as

  • Change lower bound, upper bound, step
  • Convert between CXXForRangeStmt and ForStmt
  • Change the associated depth (e.g. different value for collapse clause)
  • Remove the directive and no OMPCanonicalLoop remain

Flexibility for whom?

A class extending TreeTransform? With my suggestion, it can override TransformOMPCanonicalLoop or RebuildOMPCanonicalLoop, depending on how much it wants to alter the transformation.

Or a caller of TransformOMPCanonicalLoop within TreeTransform? I only see one right now, TransformOMPExecutableDirective, and I don't see how it needs the flexibility. Are there other callers I missed?

Are you trying to create flexibility without requiring deriving from TreeTransform? But, as far as I can tell, you're doing so at the expense of normal TreeTransform semantics. Doesn't seem worth it.

If you see a precedent for your approach elsewhere in TreeTransform, please point it out.

This also saves adding many lines of code handling transforming each member of OMPCanonicalLoop separately.

Why would you need to? In the other TransformX functions I looked at, the arguments to the RebuildX function are transformed, and those are typically just the arguments to the ActOnX function. In other words, you would just transform the loop within your OMPCanonicalLoop as you're doing now.

This patch has no effect if the OpenMP IR builder is not enabled, and it's disabled by default. Is that right?

Meinersbur marked an inline comment as done.Feb 23 2021, 6:07 PM

This patch has no effect if the OpenMP IR builder is not enabled, and it's disabled by default. Is that right?

Yes, this is how it is intended.

clang/lib/Sema/SemaOpenMP.cpp
5356

Sorry, don't know what how I overlooked it.

clang/lib/Sema/TreeTransform.h
8321

I could not find where you outlined your solution, I tried to infer it from your comments.

I'm used to seeing TransformX call RebuildX call ActOnX.

Introduced new RebuildOMPCanonicalLoop that does nothing else then calling ActOnOMPCanonicalLoop from the Sema object, like e.g. RebuildOMPExecutableDirective does. This allows overriding RebuildOMPCanonicalLoop in a derived transform.

Does TransformOMPExecutableDirective really need a special case for OMPCanonicalLoop?

As it does for atomic, critical, section and master.

Flexibility for whom? A class extending TreeTransform?

Yes

If you see a precedent for your approach elsewhere in TreeTransform, please point it out.

The precedence is TransformOMPExecutableDirective it unwraps the CapturedStmt to get its body code. The following TransformStmt of the unwrapped Stmt never calls TransfomCapturedStmt since it was already unwrapped. The re-wrapping is done by the surrounding ActOnOpenMPRegionStart/ActOnOpenMPRegionEnd.

To reproduce this, I changed TransformOMPExecutableDirective to also unwrap the OMPCanonicalLoop instead of doing so implicitly when TransformStmt calls TransformOMPCanonicalLoop. As a result, TransformOMPCanonicalLoop (like TransfomCapturedStmt for OpenMP directives; maybe TransfomCapturedStmt is called for other purposes) should never be called and I put an assertion there instead.

Note that it is not required for a TransformXYZ method to exactly return the type in its name, for instance TransformUnresolvedLookupExpr may return whatever the template instantiation resolves to.

Meinersbur marked an inline comment as done.
  • Address forgotten remark.
  • Introduce RebuildOMPCanonicalLoop
  • Handle OMPCanonicalLoop as part of TransformOMPExecutableDirective
jdenny added inline comments.Feb 24 2021, 4:11 PM
clang/lib/Sema/TreeTransform.h
8318–8323

Because this receives an OMPCanonicalLoop, I'm assuming the OpenMP IR builder is enabled. Without my suggested edits, the caller checks that. After my suggested edits, maybe it's worthwhile to have an assertion here that it's enabled. I don't know.

8321

I could not find where you outlined your solution, I tried to infer it from your comments.

Sorry for the confusion.

I've added suggested edits to show how to tweak the patch into what I'm suggesting. I applied the result to my checkout, and check-clang-openmp still passes.

I think my suggestion is more consistent with existing TreeTransform design. If it won't work for your use cases, please help me to understand why.

I'm used to seeing TransformX call RebuildX call ActOnX.

Introduced new RebuildOMPCanonicalLoop that does nothing else then calling ActOnOMPCanonicalLoop from the Sema object, like e.g. RebuildOMPExecutableDirective does. This allows overriding RebuildOMPCanonicalLoop in a derived transform.

Thanks. That part looks good.

Does TransformOMPExecutableDirective really need a special case for OMPCanonicalLoop?

As it does for atomic, critical, section and master.

With my suggestion, the special case for OMPCanonicalLoop doesn't seem necessary.

Flexibility for whom? A class extending TreeTransform?

Yes

After applying my suggestion, a class extending TreeTransform can override TransformOMPCanonicalLoop or RebuildOMPCanonicalLoop. If that's not sufficient, please help me to understand the use cases that require more flexibility.

If you see a precedent for your approach elsewhere in TreeTransform, please point it out.

The precedence is TransformOMPExecutableDirective it unwraps the CapturedStmt to get its body code. The following TransformStmt of the unwrapped Stmt never calls TransfomCapturedStmt since it was already unwrapped. The re-wrapping is done by the surrounding ActOnOpenMPRegionStart/ActOnOpenMPRegionEnd.

To reproduce this, I changed TransformOMPExecutableDirective to also unwrap the OMPCanonicalLoop instead of doing so implicitly when TransformStmt calls TransformOMPCanonicalLoop. As a result, TransformOMPCanonicalLoop (like TransfomCapturedStmt for OpenMP directives; maybe TransfomCapturedStmt is called for other purposes) should never be called and I put an assertion there instead.

Yes, I'm assuming TransformCapturedStmt is needed for other purposes and its implementation doesn't satisfy TransformOMPExecutableDirective's needs. In contrast, I think the TransformOMPCanonicalLoop I've suggested seems fine for TransformOMPExecutableDirective's needs, and it follows the usual behavior of any TransformXYZ.

Note that it is not required for a TransformXYZ method to exactly return the type in its name, for instance TransformUnresolvedLookupExpr may return whatever the template instantiation resolves to.

Understood, but I'm not aware of a case where TransformXYZ returns the transformation of the child and leaves it up to the caller to finish the transformation of XYZ itself. The type that results from that transformation is a different issue.

8361–8367
Meinersbur added inline comments.Feb 25 2021, 10:11 AM
clang/lib/Sema/TreeTransform.h
8321

Thanks for you proposed edits.

I still think think we should we should handle this like TransformOMPExecutableDirective handles CapturedStmt, because (1) OMPExecutableDirective is in control of how its children is structured, (2) TransformOMPExecutableDirective already does this this way (3) derived TreeTransform can change more aspects of a loop/loop-associated directive without having to override TransformOMPCanonicalLoop/TransformOMPExecutableDirective (4) fewer risk of derived TreeTransform to leave the AST in an inconsistent state and (5) is will not work when the loop depth is dependent on a template.

To elaborate on (5):

template<int n>
void func() {
   #pragma omp for collapse(n)
   for (...
       for (...
}
func<1>();
func<2>();

When parsing the function body, the depth of the loop nest is unknown, hence we cannot OMPCanonicalLoops. They have to be inserted when Sema's TemplateInstantiator knows the value of n, but with you proposal, there are no OMPCanonicalLoops in the templated AST that TransformOMPCanonicalLoops would be called on.

I will am still willing to go with your proposal if you think this case should not be supported or that TemplateInstantiator` should override TransformOMPExecutableDirective to handle this case.

Understood, but I'm not aware of a case where TransformXYZ returns the transformation of the child and leaves it up to the caller to finish the transformation of XYZ itself.

This refers to an outdated version of this patch.

jdenny accepted this revision.Feb 25 2021, 5:38 PM

Other than the additional comments I've requested, LGTM. Thanks for the explanations and the changes.

clang/lib/Sema/TreeTransform.h
8321

I still think think we should we should handle this like TransformOMPExecutableDirective handles CapturedStmt, because (1) OMPExecutableDirective is in control of how its children is structured, (2) TransformOMPExecutableDirective already does this this way

I agree that CapturedStmt is a precedent for the current patch's approach. My struggle is to understand why that approach is relevant for OMPCanonicalLoop, as discussed below.

(3) derived TreeTransform can change more aspects of a loop/loop-associated directive without having to override TransformOMPCanonicalLoop/TransformOMPExecutableDirective

If not those functions, what would the derived TreeTransform override to produce such changes?

(4) fewer risk of derived TreeTransform to leave the AST in an inconsistent state and

I'm trying to imagine how this would happen. I suppose there could be a transformation that removes an OMPExecutableDirective from its associated statement, on which it then calls TransformStmt, failing to notice that an OMPCanonicalLoop then needs to be removed too. With my suggestion, TransformStmt would call TransformOMPCanonicalLoop and accidentally rebuild the OMPCanonicalLoop. Without my suggestion, it would fail an assertion, indicating the transformation is broken. Either way, it needs a special case for OMPCanonicalLoop.

OK, that seems like a benefit of the current patch over my suggestion. I'm not sure whether it's worth it (TransformCapturedStmt doesn't have that benefit), but I'm fine with it. If you prefer the current patch, could you please add a comment to TransformOMPCanonicalLoop to document why it's special? The llvm_unreachable text alone doesn't make it clear to me. Here's a long-winded suggestion:

"An OMPCanonicalLoop must appear only as a child of an OMPExecutableDirective that represents a loop directive. To help maintain this invariant, TransformOMPCanonicalLoop always fails an assertion in order to force any transformation to handle OMPCanonicalLoop as a special case. For example, the default TransformOMPExecutableDirective checks for an OMPCanonicalLoop and rebuilds it within the OMPExecutableDirective. On the other hand, a transformation that removes an OMPExecutableDirective from its associated statement should also check for any OMPCanonicalLoop and remove it as well."

(5) is will not work when the loop depth is dependent on a template.

To elaborate on (5):

template<int n>
void func() {
   #pragma omp for collapse(n)
   for (...
       for (...
}
func<1>();
func<2>();

When parsing the function body, the depth of the loop nest is unknown, hence we cannot OMPCanonicalLoops. They have to be inserted when Sema's TemplateInstantiator knows the value of n, but with you proposal, there are no OMPCanonicalLoops in the templated AST that TransformOMPCanonicalLoops would be called on.

In the current patch, OMPCanonicalLoop is created and TransformOMPExecutableDirective assumes it's present regardless of template parameters. I guess you're saying that will change once you handle depth>1.

This revision is now accepted and ready to land.Feb 25 2021, 5:38 PM
Meinersbur added inline comments.Feb 26 2021, 9:39 AM
clang/lib/Sema/TreeTransform.h
8321

If not those functions, what would the derived TreeTransform override to produce such changes?

TransformOMPExecutableDirective, any`TransformOMPxyzDirective` where xyz is loop-associated, TransformForStmt or TransformCXXForRangeStmt.

Without my suggestion, it would fail an assertion, indicating the transformation is broken.

The option triggering an assertion seems superior to me.

TransformOMPExecutableDirective is removing OMPCanonicalLoop associated to it such that these do not need to be handled individually. TransformOMPExecutableDirective knows how many loops it is associated to and has to remove that many OMPCanonicalLoop. On the other side, TransformOMPCanonicalLoop has no access to its parents and therefore does not know what it is associated to, whether it is still needed, and cannot insert new ones if necessary (such as in my template example).

I suppose there could be a transformation that removes an OMPExecutableDirective from its associated statement, on which it then calls TransformStmt, failing to notice that an OMPCanonicalLoop then needs to be removed too.
Either way, it needs a special case for OMPCanonicalLoop.

It does with the current patch, in the previous patch TransformOMPCanonicalLoop would just have removed all of them. However, the TreeTransform that removes an OMPExecutableDirective will already have to remove the CapturedStmts/Decls, so also having to remove OMPCanonicalLoop seems consistent.

I will add your suggested source comment in the next differential update.

In the current patch, OMPCanonicalLoop is created and TransformOMPExecutableDirective assumes it's present regardless of template parameters.

Yes, templates are explicitly not handled in this patch yet (see summary).

I guess you're saying that will change once you handle depth>1.

This makes me remind of that removing the OMPCanonicalLoop in TransformOMPExecutableDirective with depth>1 is more difficult in TransformOMPExecutableDirective. Since the OMPCanonicalLoop and the loop statements are interleaved, and the loop statements have to be preserved, it is not sufficient to just select a subtree root. However, TransformOMPCanonicalLoop just returning its child is trivial.

Is the additional complexity worth avoiding the unusual behavior of a TransformXYZ just returning its child?

One property of this patch that has bothered me is that OMPCanonicalLoop is not a loop. Instead, it's an AST node that is sandwiched between a directive and a loop to contain extra information about the loop. The TreeTransform issues we've been discussing highlight how that extra node can be confusing.

Now I remember that there's D95496, which seems not to have that property. I haven't reviewed that patch in any detail, and I haven't tried to locate any prior discussions. Can you please summarize why you prefer the current patch over that one?

clang/lib/Sema/TreeTransform.h
8321

I guess you're saying that will change once you handle depth>1.

This makes me remind of that removing the OMPCanonicalLoop in TransformOMPExecutableDirective with depth>1 is more difficult in TransformOMPExecutableDirective. Since the OMPCanonicalLoop and the loop statements are interleaved, and the loop statements have to be preserved,

So, when depth>1 is handled, OMPCanonicalLoop won't always be the child of OMPExecutableDirective?

it is not sufficient to just select a subtree root. However, TransformOMPCanonicalLoop just returning its child is trivial.

Won't TransformOMPExecutableDirective have to iterate through the subtree and build every new interleaved OMPCanonicalLoop either way? Can't it remove the originals as it goes?

One property of this patch that has bothered me is that OMPCanonicalLoop is not a loop. Instead, it's an AST node that is sandwiched between a directive and a loop to contain extra information about the loop. The TreeTransform issues we've been discussing highlight how that extra node can be confusing.

There are many other AST nodes that are sandwiched carrying semantic information. Examples: AttributedStmt, ImplicitCast, CapturedStmt of any OMPExecutableStmt, ExprWithCleanups, CXXRewrittenBinaryOperator, CXXBindTemporaryExpr. I even think that representing semantic information alongside of syntax is one of the principles of Clang's AST.

Now I remember that there's D95496, which seems not to have that property. I haven't reviewed that patch in any detail, and I haven't tried to locate any prior discussions. Can you please summarize why you prefer the current patch over that one?

D95496 is a lot more invasive to code that does not even use OpenMP. E.g.
it increases the object size of any ForStmt or CXXForRangeStmt even if not used as an OpenMP loop-associated construct.
ForStmt and CXXForRangeStmt have a new base class called MaybeCanonicalLoopStmt fow which I could not find a better name. Because I cannot put the common new fields (DISTANCE_FUNC, LOOPVAR_FUNC, LOOPVAR_REF) at the beginning of the SubStmt list without big compatibility issues, they have to be put at the and and requires getCanonicalChildren() to access them without a switch for which of the derived classes it actually is.
It also introduces a variable number of children for those two statements.

On the other side, sandwiching implicit semantics is already pretty common in the AST.

clang/lib/Sema/TreeTransform.h
8321

I guess you're saying that will change once you handle depth>1.

So, when depth>1 is handled, OMPCanonicalLoop won't always be the child of OMPExecutableDirective?

If you mean direct child, this already the case now because of the CapturedStmt/CapturedDecl put between the OMPExectableDirective and its associated loops.

If you mean decendent, then no. An OMPCanonicalLoop should always be associated to some OpenMP directive,which is to be found as one of its ancestors, otherwise it would be structurally inconsistent.

Note that there already can be implicit AST nodes sandwiched between the loops of an OpenMP loop nest:

#pragma omp for collapse(2)
for (int i = 0; i < 20; ++i)
    [[likely]]
    for (int j = 0; j < 20; ++j)
      Body(i,j);
`-OMPForDirective 0x202c7d35a20 <line:6:1, col:28>
  `-CapturedStmt 0x202c7d07f00 <line:7:1, line:10:15>
    `-CapturedDecl 0x202c7d078e0 <<invalid sloc>> <invalid sloc>
      |-ForStmt 0x202c7d07ec8 <line:7:1, line:10:15>
      | `-AttributedStmt 0x202c7d07eb0 <line:8:5, line:10:15>
      |   |-LikelyAttr 0x202c7d07e88 <line:8:7>
      |   `-ForStmt 0x202c7d07e50 <line:9:5, line:10:15>

it is not sufficient to just select a subtree root. However, TransformOMPCanonicalLoop just returning its child is trivial.

Won't TransformOMPExecutableDirective have to iterate through the subtree and build every new interleaved OMPCanonicalLoop either way? Can't it remove the originals as it goes?

What do you mean by "Can't it remove the originals as it goes?". This is what TransformOMPCanonicalLoop would have done returning its child.

Btw I found precendence which does exactly that:

template<typename Derived>
ExprResult
TreeTransform<Derived>::TransformImplicitCastExpr(ImplicitCastExpr *E) {
  // Implicit casts are eliminated during transformation, since they
  // will be recomputed by semantic analysis after transformation.
  return getDerived().TransformExpr(E->getSubExprAsWritten());
}

One property of this patch that has bothered me is that OMPCanonicalLoop is not a loop. Instead, it's an AST node that is sandwiched between a directive and a loop to contain extra information about the loop. The TreeTransform issues we've been discussing highlight how that extra node can be confusing.

There are many other AST nodes that are sandwiched carrying semantic information. Examples: AttributedStmt, ImplicitCast, CapturedStmt of any OMPExecutableStmt, ExprWithCleanups, CXXRewrittenBinaryOperator, CXXBindTemporaryExpr.

Thanks for the examples. I never thought of those cases in that way probably because they're easier to imagine as distinct constructs or operations. But I'm probably splitting hairs.

I even think that representing semantic information alongside of syntax is one of the principles of Clang's AST.

Interesting. Is that documented somewhere?

Now I remember that there's D95496, which seems not to have that property. I haven't reviewed that patch in any detail, and I haven't tried to locate any prior discussions. Can you please summarize why you prefer the current patch over that one?

D95496 is a lot more invasive to code that does not even use OpenMP.

Fair enough.

clang/lib/Sema/TreeTransform.h
8321

Btw I found precendence which does exactly that:

template<typename Derived>
ExprResult
TreeTransform<Derived>::TransformImplicitCastExpr(ImplicitCastExpr *E) {
  // Implicit casts are eliminated during transformation, since they
  // will be recomputed by semantic analysis after transformation.
  return getDerived().TransformExpr(E->getSubExprAsWritten());
}

Thanks. That looks like a reasonable precedent to me.

In your comment for TransformOMPCanonicalLoop, please point out the invariant that OMPCanonicalLoop must be a descendant of OMPExecutableDirective.

I would ask that your comment mention that TransformOMPExecutableDirective is what rebuilds the OMPCanonicalLoop nodes that are removed here. However, for depth>1, I don't know if that's where that always happens given the interleaving of OMPCanonicalLoop and loop statements. I'll find out when you write that patch.

Meinersbur updated this revision to Diff 327926.Mar 3 2021, 2:35 PM
Meinersbur marked 12 inline comments as done.
  • Rebase to trunk
  • Return to previous TransformOMPCanonicalLoop removing itself from the AST (since its exactly what TransformImplicitCast does)
  • Add doxygen comment for position of OMPCanonialLoop in AST
Meinersbur updated this revision to Diff 327927.Mar 3 2021, 2:36 PM
  • Don't lint

I even think that representing semantic information alongside of syntax is one of the principles of Clang's AST.

Interesting. Is that documented somewhere?

https://clang.llvm.org/docs/InternalsManual.html#faithfulness

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics,

Meinersbur updated this revision to Diff 328370.Mar 4 2021, 8:26 PM
  • Fix MLIR build
This revision was landed with ongoing or failed builds.Mar 4 2021, 8:53 PM
This revision was automatically updated to reflect the committed changes.

Hello! I am seeing a downstream build failure on Mac as a result of your changes:

clang/lib/CodeGen/CGStmtOpenMP.cpp:1916:3: error: too few template arguments for class template 'SmallVector'
SmallVector<llvm::Value *> EffectiveArgs;
^
clang/include/clang/Basic/LLVM.h:35:42: note: template is declared here
template<typename T, unsigned N> class SmallVector;

Is is that same problem that D98265 addresses?

Is is that same problem that D98265 addresses?

Yes, that appears to be the same. Thank you!