Page MenuHomePhabricator

[mlir][OpenMP] Add recursive side effect information and tests for canonicalize pass
Changes PlannedPublic

Authored by shraiysh on May 25 2022, 11:56 AM.

Details

Summary

This patch adds RecursiveSideEffects trait to the operations in OpenMP
Dialect that have a region associated with them. This allows the
canonicalize pass to remove such operations when they have an empty
region.

Diff Detail

Event Timeline

shraiysh created this revision.May 25 2022, 11:56 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 25 2022, 11:56 AM
shraiysh requested review of this revision.May 25 2022, 11:56 AM
NimishMishra accepted this revision.May 25 2022, 8:15 PM

LGTM. Thank you for this.

This revision is now accepted and ready to land.May 25 2022, 8:15 PM
kiranchandramohan added a comment.EditedMay 26 2022, 3:20 AM

Not sure about this change, particularly when there are no optimisations specified.

Would adding this as a pass in MLIR or OpenMPOpt in LLVM be a better option?

Not sure about this change, particularly when there are no optimisations specified.

Would adding this as a pass in MLIR or OpenMPOpt in LLVM be a better option?

+1

shraiysh updated this revision to Diff 432270.May 26 2022, 6:59 AM

Move the change to MLIR Canonicalizer and add tests for the same.

shraiysh requested review of this revision.May 26 2022, 7:00 AM
shraiysh retitled this revision from [flang][OpenMP] Prevent emitting sections when there are none to [mlir][OpenMP] Add recursive side effect information and tests for canonicalize pass.
shraiysh edited the summary of this revision. (Show Details)
shraiysh added reviewers: ftynse, zero9178.
peixin added a comment.EditedSat, May 28, 8:15 AM

I am not an expert in MLIR attributes, so I will leave the review for other expects. The results by this patch LG. @shraiysh Where can I get the information of RecursiveSideEffects? Can you share one link?

I do not know of any link for this. The only explanation I found is in the comment to its definition -

This trait indicates that the side effects of an operation includes the effects of operations nested within its regions. If the operation has no derived effects interfaces, the operation itself can be assumed to have no side effects.

It is a bit tricky to understand, and it took me atleast 10 re-reads of this comment and a lot of discussion to understand this, but I don't know how we would be able to phrase this better.

An example where we should have this trait: omp.parallel has the RecursiveSideEffects trait because when an someone asks whether an omp.parallel operation in a .mlir file has side effects, we should look into the operations under it. If any of those child operations has side effects, then we say that the omp.parallel has side effects.

On an unrelated note, omp.parallel in itself introduces no side effects. When these two facts are combined, empty omp.parallel or omp.parallel with no-side-effect computations inside it can be eliminated (I will add a testcase for this - it is not already in this patch).

An example where we should not have this trait: when the operations inside the region are not executed while the main operation is executed and child operations with side effects cause no side effects to the main operation. For example, something like lambdas. Even if the operations within a lambda operation have side effects, the lambda operation will not have side effects because those child operations are not executed when a lambda is defined. Hence it has no recursive side effects. So, all lambda operations for which the results aren't used anywhere can be eliminated even if their inner bodies have side effects.

I hope this helps and I don't add to the confusion about this trait. :)

I am not an expert in MLIR attributes, so I will leave the review for other expects.

Sure, no problem.

I do not know of any link for this. The only explanation I found is in the comment to its definition -

This trait indicates that the side effects of an operation includes the effects of operations nested within its regions. If the operation has no derived effects interfaces, the operation itself can be assumed to have no side effects.

It is a bit tricky to understand, and it took me atleast 10 re-reads of this comment and a lot of discussion to understand this, but I don't know how we would be able to phrase this better.

An example where we should have this trait: omp.parallel has the RecursiveSideEffects trait because when an someone asks whether an omp.parallel operation in a .mlir file has side effects, we should look into the operations under it. If any of those child operations has side effects, then we say that the omp.parallel has side effects.

On an unrelated note, omp.parallel in itself introduces no side effects. When these two facts are combined, empty omp.parallel or omp.parallel with no-side-effect computations inside it can be eliminated (I will add a testcase for this - it is not already in this patch).

An example where we should not have this trait: when the operations inside the region are not executed while the main operation is executed and child operations with side effects cause no side effects to the main operation. For example, something like lambdas. Even if the operations within a lambda operation have side effects, the lambda operation will not have side effects because those child operations are not executed when a lambda is defined. Hence it has no recursive side effects. So, all lambda operations for which the results aren't used anywhere can be eliminated even if their inner bodies have side effects.

I hope this helps and I don't add to the confusion about this trait. :)

Got it. Thanks for the explanations.

Would it be always correct to remove the OpenMP operation if there are clauses? Particularly, thinking about cases like allocate, lastprivate (not an issue now since we handle it elsewhere), worksharing loop (without nowait, will cause removal of a synch point).

Would it be always correct to remove the OpenMP operation if there are clauses? Particularly, thinking about cases like allocate, lastprivate (not an issue now since we handle it elsewhere), worksharing loop (without nowait, will cause removal of a synch point).

That's a good concern. I think we cannot put this trait under worksharing loop because of the implicit barrier. I believe the other constructs create their own thread-group and the implicit barrier would not affect anything outside. Another way to handle wsloop will be to disconnect the implicit barrier from the wsloop operation. That means #pragma omp wsloop {} gets translated to the following. With nowait, there would be no omp.barrier operation. This is less closer to the standard, but allows simpler optimization by eliminating codegen for empty wsloops and gives us more explicit IR.

omp.wsloop {...}
omp.barrier

I don't know much about the intricacies of allocate clause however afaiu, it looks like allocate clause only affects how the data is perceived under the construct. Is that correct? Will the following code have a side effect outside the construct?

omp.some_construct allocate(..) {
  omp.terminator
}

Would it be always correct to remove the OpenMP operation if there are clauses? Particularly, thinking about cases like allocate, lastprivate (not an issue now since we handle it elsewhere), worksharing loop (without nowait, will cause removal of a synch point).

That's a good concern. I think we cannot put this trait under worksharing loop because of the implicit barrier. I believe the other constructs create their own thread-group and the implicit barrier would not affect anything outside. Another way to handle wsloop will be to disconnect the implicit barrier from the wsloop operation. That means #pragma omp wsloop {} gets translated to the following. With nowait, there would be no omp.barrier operation. This is less closer to the standard, but allows simpler optimization by eliminating codegen for empty wsloops and gives us more explicit IR.

omp.wsloop {...}
omp.barrier

Let us not separate the barrier. As part of the current implementation, the OpenMPIRBuilder adds the barrier based on the presence of the nowait attribute.
If in future we have lastprivate support, and if the index variable is marked as lastprivate, then removing the region in that case is probably not correct.

I don't know much about the intricacies of allocate clause however afaiu, it looks like allocate clause only affects how the data is perceived under the construct. Is that correct? Will the following code have a side effect outside the construct?

omp.some_construct allocate(..) {
  omp.terminator
}

Allocate will only allocate inside the scope of the construct, in that sense it is local. But it can ask to allocate in different kinds of memory. These will be calls to the OpenMP runtime and I believe they will probably call the OS or drivers for these resources.

Another thought that came to mind is what if there are task dependency clauses specified and we remove an OpenMP operation.

BTW, the openmp-opt passes are able to remove some trivial OpenMP regions. In the past, there were some requests to perform these kinds of transformations in that layer so that both compilers benefit. But I agree that this is easier at the MLIR layer without the expansion of the region and the insertion of runtime calls. At the same time, I am not 100% sure whether clauses can cause side effects and hence would it be better to do it at the openmp-opt level when everything is expanded?
https://openmp.llvm.org/remarks/OMP160.html#omp160

Let us not separate the barrier. As part of the current implementation, the OpenMPIRBuilder adds the barrier based on the presence of the nowait attribute.
If in future we have lastprivate support, and if the index variable is marked as lastprivate, then removing the region in that case is probably not correct.

Okay, I will remove this from omp.wsloop operation. Thank you!

Allocate will only allocate inside the scope of the construct, in that sense it is local. But it can ask to allocate in different kinds of memory. These will be calls to the OpenMP runtime and I believe they will probably call the OS or drivers for these resources.

The system calls will have side-effects (like memory allocation on device etc.) but I thought those effects should be erased before the construct ends (deallocation should happen before it ends either because the programmer asks for it, or because of the OpenMP Runtime). Is that correct?

Another thought that came to mind is what if there are task dependency clauses specified and we remove an OpenMP operation.
BTW, the openmp-opt passes are able to remove some trivial OpenMP regions. In the past, there were some requests to perform these kinds of transformations in that layer so that both compilers benefit. But I agree that this is easier at the MLIR layer without the expansion of the region and the insertion of runtime calls. At the same time, I am not 100% sure whether clauses can cause side effects and hence would it be better to do it at the openmp-opt level when everything is expanded?
https://openmp.llvm.org/remarks/OMP160.html#omp160

Thank you for the reference, I did not know about this. However as you pointed out, it would be more desirable to do it at the MLIR layer itself. Would it be okay if we had conditional elimination of constructs? i.e. we eliminate only when there are no side-effect-producing clauses on the construct - like if clause and num_threads clauses. I can try to break this into separate patches for each construct and we can have the discussion about side effects of clauses on that particular construct. Should we do it that way?

Another thought that came to mind is what if there are task dependency clauses specified and we remove an OpenMP operation.
BTW, the openmp-opt passes are able to remove some trivial OpenMP regions. In the past, there were some requests to perform these kinds of transformations in that layer so that both compilers benefit. But I agree that this is easier at the MLIR layer without the expansion of the region and the insertion of runtime calls. At the same time, I am not 100% sure whether clauses can cause side effects and hence would it be better to do it at the openmp-opt level when everything is expanded?
https://openmp.llvm.org/remarks/OMP160.html#omp160

Thank you for the reference, I did not know about this. However as you pointed out, it would be more desirable to do it at the MLIR layer itself. Would it be okay if we had conditional elimination of constructs? i.e. we eliminate only when there are no side-effect-producing clauses on the construct - like if clause and num_threads clauses. I can try to break this into separate patches for each construct and we can have the discussion about side effects of clauses on that particular construct. Should we do it that way?

Yes, separating it out for each construct is a good way to focus and discuss.

shraiysh planned changes to this revision.Mon, Jun 6, 11:36 AM