This is an archive of the discontinued LLVM Phabricator instance.

[flang][hlfir] Add pass to inline elementals
ClosedPublic

Authored by tblah on Apr 26 2023, 7:30 AM.

Details

Summary

Implement hlfir.elemental inlining as proposed in
flang/docs/HighLevelFIR.md.

This is a separate pass to make the code easier to understand. One
alternative would have been to modify the hlfir.elemental lowering in
the HLFIR bufferization pass.

Inlining is done by splicing the mlir operations form the operations
list of the old block to the new block. We cannot inline by cloning the
operations because this causes pending rewrites and erasures of these
operations not to be performed on the cloned operations.

A hlfir.elemental can only be inlined once; if there are more uses, the
existing bufferization is used instead. It is not possible to splice the
operations to be in multiple places at once - that would require inlining
by cloning the operations.

Diff Detail

Event Timeline

tblah created this revision.Apr 26 2023, 7:30 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
tblah requested review of this revision.Apr 26 2023, 7:30 AM
tschuett added inline comments.
flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
37

anonymous namespace + static is redundant.

53

std::nullopt

70

I believe you first have to check the optional:

std::optional<...> uses = getTwoUses(elemental);
if (!uses)
  return mlir::failure();

auto [apply, destroy] = *uses;
tblah added inline comments.Apr 26 2023, 9:07 AM
flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
37

The style guide says that it is hard to look at a function defined inside of an anonymous namespace and notice that it is static without scrolling: https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

The style guide recommends only putting declarations in the anonymous namespace and then defining things separately. I did not follow that advice because it felt like overkill for such short methods.

In this case, it is obvious that the function is inside an anonymous namespace. Would you rather I move it outside of the namespace, or to remove the static specifier?

70

This is checked by MLIR when determining if this hlfir.elemental is legal after the current pass. See line 97

target.addDynamicallyLegalOp<hlfir::ElementalOp>(
     [](hlfir::ElementalOp elemental) { return !getTwoUses(elemental); });

This means that if getTwoUses would have returned a std::nullopt, the elemental operation will be legal (for the purposes of this pass) and so InlineElementalConversion::matchAndRewrite would not be called.

It is unfortunate that getTwoUses is called twice, but the MLIR mechanism to avoid this (adding getTwoUses as an analysis pass) feels like overkill for such a tiny function.

jeanPerier accepted this revision.Apr 26 2023, 10:06 AM

Thanks a lot, looks great!

Just to document a few things here that justify this pass is OK:

This optimization assumes that no operations between the hlfir.elemental/hlfir.apply is affected by or affects the hlfir.elemental evaluation.

Fortran/Lowering justification:
This is true with the code produced by lowering given Fortran "10.1.4 Evaluation of operations" that says that an expression evaluation shall not impact/be impacted by other expression evaluation in the statement. HLFIR hlfir.expr are not exported in the code lowering statements other than assignment/forall/where: if the value are needed (e.g. by an associate construct) they are placed in memory.

What rules should the optimizations running on HLFIR respect so that this assumption remains valid:

This optimization is possible, provided we do not start propagating hlfir.elemental (like by blindly rewriting an hlfir.associate + hlfir.as_expr to a no-op (We are not doing this, we are doing the reverse in bufferization, which is OK)). Any motion of an hlfir.apply outside of this pass should likely place its old operand in memory if no further analysis.

The implicit rule exported from Fortran into HLFIR is: any operation between an hlfir.elemental and a related hlfir.apply, or indirect hlfir.apply (*) shall not affect or be affected by the evaluation of the hlfir.elemental.
(*) an indirect hlfir.apply of hlfir.elemental A is an hlfir.apply of an hlfir.elemental B that contains a direct or indirect hlir.apply to A.

Given "10.1.4" cannot be enforced at compile time, there is also no way this can be enforced in the HLFIR on the code generated by lowering (we cannot have an MLIR verifier here). But any compiler motion of hlfir.elemental and hlfir.apply should prove it is not going passed an operation with that would affect/be affected by the hlfir.elemental.

How can this rule be exported to MLIR and the passes we do not control:

I think we may want to remove the "NoMemoryEffect" and replace it by an interface where hlfir.apply memory effects are the same as the ones inside the related hlfir.expr producer just to be safe. That way, any motion of the hlfir.apply by some MLIR compliant pass that we do not control will have to verify it that the hlfir.elemental content could be moved too. After all, an hlfir.elemental is a bit like a call to a lambda here, so it makes sense to consider the lambda side effects when looking at the effect of a call to it.

If we hit unexpected issues with this approach, I think we will need to add a concept to strongly highlight the parts where hlfir.elemental can be inlined with no issues (the program parts describes in 10.1.4). This may have other benefits outside of hlfir.apply/hlfir.elemental case, like providing the ability to share pointer fir.box loads if they appear in several places in a same expression, even when they escape in a call inside that expressions. But I am a bit unclear with how to best do this. So best to focus on getting HLFIR complete feature wise before doing this.

If you agree with this description, could you:

  • add the rule in italic above to hlfir.apply documentation, just so we do not forget about this?
  • remove the "NoMemoryEffect" of hlfir::ApplyOp (we can deal with adding an DeclareOpInterfaceMethods<MemoryEffectsOpInterface> that says it has the side effect of its producer later. I am not sure it will pessimize HLFIR performance at that point to just drop it for now).
This revision is now accepted and ready to land.Apr 26 2023, 10:06 AM
jeanPerier added inline comments.Apr 26 2023, 10:14 AM
flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
70

I would still apply the suggestion here, although I agree there is no reason the result should change, the hlfir.apply are being rewritten (the hlfir.apply using an hlfir.elemental may have been inlined in another hlfir.elemental).
The logic to bail here in case of unforeseen bugs looks cheap.

tschuett added inline comments.Apr 26 2023, 11:17 AM
flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
37

You could move getTwoUses above the anonymous namespace. With the Conversion and Pass classes you also diverge from the style guide.

tschuett added inline comments.Apr 26 2023, 1:54 PM
flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
70

I am always surprised when you have a function returning an optional and you don't need to check for nullopt due to some side-channel knowledge. Why does it need to return an optional at all?

tblah retitled this revision from [flang][hlfir] Add pass to inline elementals to WIP: [flang][hlfir] Add pass to inline elementals.Apr 27 2023, 9:21 AM
tblah edited the summary of this revision. (Show Details)
tblah added a comment.Apr 27 2023, 9:31 AM

Thanks for the review so far. I have marked this WIP because I have realized it needs more work.

When compiling a long chain of elemental operations e.g.

subroutine reproducer(ain, aout)
  real, dimension(:) :: ain, aout
  aout = sqrt(ain * (ain - 1))
end subroutine

MLIR fails to erase one of the hlfir.elemental operations, thinking that there it is still used by a hlfir.apply. I believe original hlfir.apply will have been marked as erased so I suspect the erased status of operations in the body of one elemental is not carried over when those operations are cloned into another elemental. I suspect I am using MLIR wrong. I will update once I have a solution.

Removing any one of the elemental operations in the example above results in correct code. It does not matter which elemental operations are used.

flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
70

Because on the first call to getTwoUses (to determine if the elemental operation is legal) it does need to be checked. getTwoUses was factored out to allow both of these calls to share the code.

I will check here with an assertion and write a test inlining elemental operations inside of elemental operations

Thanks for the review so far. I have marked this WIP because I have realized it needs more work.
...
MLIR fails to erase one of the hlfir.elemental operations, thinking that there it is still used by a hlfir.apply. I believe original hlfir.apply will have been marked as erased so I suspect the erased status of operations in the body of one elemental is not carried over when those operations are cloned into another elemental. I suspect I am using MLIR wrong. I will update once I have a solution.

One reason this could failed is that using the fir::FirOpBuilder in rewrite pass prevents the pass rewriter from being notified on new operation being created. This is "OK in many cases because we replace patterns by IR that do not contain these patterns. But here what could happen is that MLIR is not working top-down, and that one of the call to hlfir::inlineElementalOp clone an hlfir.apply that has not been rewritten yet. This new hlfir.apply, because it is created without notifying the pass rewritter, will not be considered for rewrites.

The solution here is to notify the pass rewriter through a fir::FirOpBuilder listener. I had to do it in bufferization: see https://github.com/llvm/llvm-project/blob/683a6e1c9e5396f64086c07bec334a38acd0ec7a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp#L501

If that is indeed the issue, you may want to move the HLFIRListener in a header to share it.

tblah updated this revision to Diff 519108.May 3 2023, 9:15 AM
tblah marked 6 inline comments as done.

Sorry for the delay in updating this.

Changes:

  • Inline by splicing instead of cloning to get around a limitation (or my missunderstanding) in MLIR. Add a test for this.
  • Better document assumptions made by the pass (TL;DR: the contents of the hlfir.elemental should be pure, which is already asserted by lowering code because hlfir.elemental's can be evaluated in arbitrary order).
  • Fix nits from review

The MLIR problem is that MLIR does not actually erase or rewrite operations when
instructed to do so during a pass. Instead it stores a log of all of these
changes then batches them at the end of the pass. This log is indexed by the
address of the mlir::Operation which is mutated. If I erase an operation (e.g.
a hlfir.yeild operation from an inlined hlfir.elemental), then clone the
instructions in that block into another hlfir.elemental, a new hlfir.yeild
operation with a different address is created and so MLIR looses track of the
fact that the source operation was erased. I work around this by splicing
operations from the linked list of one block to the destination block: avoiding
allocating any new operations.

I suspect the problem here is that I am using MLIR incorrectly. I asked about
this on the MLIR discord channel but got no responses. Feedback is welcome.

tblah retitled this revision from WIP: [flang][hlfir] Add pass to inline elementals to [flang][hlfir] Add pass to inline elementals.May 3 2023, 9:18 AM
tblah edited the summary of this revision. (Show Details)
tblah added a comment.May 9 2023, 5:58 AM

Ping for review on the changes, especially the use of splicing for inlining.

Thanks for finding the issue here!

After reading MLIR documentation a bit, I think we are using the wrong driver here. We are doing a canonicalization like pattern rewrite that we want fully applied before doing the next pattern. So the greedy-pattern-rewrite-driver is probably what is needed here ([1]). I tried to use it successfully with your previous code (see [2]), a few changes are needed in the expected output because the greedy driver does DCE and moves constant up in the regions.

[1] https://mlir.llvm.org/docs/PatternRewriter/#greedy-pattern-rewrite-driver
[2] https://github.com/jeanPerier/llvm-project/commit/d58fdcbe4cc2ab4d3065c3071a68412dc0442012

flang/test/HLFIR/inline-elemental.fir
85

Why did you use fir.array_load/fir.array_update/fir.array_merge_store operations here? Were they generated from some Fortran code with HLFIR enabled?

I do not think it is incorrect to have them used inside HLFIR, but it is not planned, so if they are currently generated somewhere in lowering to HLFIR, this could be a bug.

tblah updated this revision to Diff 522576.May 16 2023, 6:13 AM

Thanks for the suggestion - that's hugeley helpful. I think a lot of Flang's passes ought to be using that driver.

Changes: drop the list splicing hack and use the greedy pattern rewrite driver.

tblah added inline comments.May 16 2023, 6:30 AM
flang/test/HLFIR/inline-elemental.fir
85

Don't worry, I was just lazy when I wrote the test. Copying some FIR code to get a do_loop over an array.

jeanPerier accepted this revision.May 17 2023, 12:37 AM

Thanks!

flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
70

What does rewriter.notifyMatchFailure do, I am not familiar with it?

tblah marked 2 inline comments as done.May 17 2023, 2:14 AM
tblah added inline comments.
flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
70

It calls the notifyMatchFailure hook on the listener (which by default returns mlir::failure()), but api users can customize the listener to take other actions. For flang this will always inline to a mlir::failure(), but looking at upstream mlir and at the documentation [1] I think this is the idomatic thing to do.

[1] https://mlir.llvm.org/docs/PatternRewriter/

tblah marked an inline comment as done.May 17 2023, 2:16 AM

The CI failure looks like it is because of a CMake version mismatch.

The CI failure looks like it is because of a CMake version mismatch.

Yes, it is not related to your patch, all patches are failing with this since yesterday or so.

flang/lib/Optimizer/HLFIR/Transforms/InlineElementals.cpp
70

Thanks for the explanation.

This revision was landed with ongoing or failed builds.May 18 2023, 4:04 AM
This revision was automatically updated to reflect the committed changes.

Hi @tblah, could you please take a look at this regression?

subroutine test
  interface
     function func(i,j,k)
       character(5),allocatable :: func(:,:,:)
     end function func
  end interface
  character(13),allocatable :: a(:,:,:)
  print *, (func(2,5,3)//reshape([(char(ichar('a')+n),n=1,2*5*3)], &
       & [2,5,3]))
end subroutine test

It fails verification after InlineElementals:

error: loc("test.f90":8:12): 'hlfir.no_reassoc' op requires the same type for all operands and results

The problem operation is:

%58 = "hlfir.concat"(%56, %57, %1) : (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1>>, index) -> !hlfir.expr<!fir.char<1,6>>
%59 = "hlfir.no_reassoc"(%58) : (!hlfir.expr<!fir.char<1,6>>) -> !hlfir.expr<!fir.char<1,?>>

Please let me know if I need to create a github issue.

tblah added a comment.May 23 2023, 3:03 AM

Hi @tblah, could you please take a look at this regression?

subroutine test
  interface
     function func(i,j,k)
       character(5),allocatable :: func(:,:,:)
     end function func
  end interface
  character(13),allocatable :: a(:,:,:)
  print *, (func(2,5,3)//reshape([(char(ichar('a')+n),n=1,2*5*3)], &
       & [2,5,3]))
end subroutine test

It fails verification after InlineElementals:

error: loc("test.f90":8:12): 'hlfir.no_reassoc' op requires the same type for all operands and results

The problem operation is:

%58 = "hlfir.concat"(%56, %57, %1) : (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1>>, index) -> !hlfir.expr<!fir.char<1,6>>
%59 = "hlfir.no_reassoc"(%58) : (!hlfir.expr<!fir.char<1,6>>) -> !hlfir.expr<!fir.char<1,?>>

Please let me know if I need to create a github issue.

Thanks for reporting this. I have reproduced the issue and I will start work on this today.