This is an archive of the discontinued LLVM Phabricator instance.

[flang][hlfir] add an optimized bufferization pass
ClosedPublic

Authored by tblah on Aug 4 2023, 8:11 AM.

Details

Summary

This pass is intended to spot cases where we can do better than the
default bufferization and to rewrite those specific cases. Then the
default bufferization (bufferize-hlfir pass) can handle everything else.

The transformation added in this patch rewrites simple element-wise
updates to an array to a do-loop modifying the array in place instead of
creating and assigning an array temporary.

See the RFC at
https://discourse.llvm.org/t/rfc-hlfir-optimized-bufferization-for-elemental-array-updates

This patch gets the improvement to exchange2 but not the improvement to cam4
described in the RFC. I think the cam4 improvement will require better alias
analysis. I aim to follow up to fix this in a later patch.

Since the RFC the design has diverged to better support cases where the array which is written to is not read from inside the elemental. This often causes the elemental to dominate the written-to array. See the description in the patch comments.

Depends on: D156805 D157106 D157718 D157626

Diff Detail

Event Timeline

tblah created this revision.Aug 4 2023, 8:11 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Herald added a subscriber: mehdi_amini. · View Herald Transcript
tblah requested review of this revision.Aug 4 2023, 8:11 AM

May I suggest reorganizing the pass so that it can be extended to support more general case?

As I see it the main objective is to prove that we can pull the assignment up to the elemental operation that produces the RHS of the assignment. It may apply to elemental updates of the array both read and written in the same assignment statement, but it may also apply to cases like z = x + y, where completely different arrays are read/written (akin @tschuett's example in the RFC thread). I think we can use the alias analysis to collect the set of operations inside the elemental that conflict with the "write" implied by the assignment. And we can check if there are operations between the elemental and the assignment that conflict with the assignment's write (I think that the check cannot just look for "the array" being read, it has to use generic alias analysis) - if there is conflict, we stop. Otherwise, we proceed to analyzing the set of the operations from the elemental op: if the set is empty, we can just do the transformation; otherwise, we have to analyze the dependence distance vectors.

I think this restructuring is not that huge, but it may help resolving issues with redundant temporaries in polyhedron cases.

In future, I think we may try to reuse the affine loop fusion pass. If we unconditionally or heuristically inline the assignment loop[-nest], the problem becomes just a loop fusion task (given that appropriate aliasing replies are provided).

tschuett added a comment.EditedAug 7 2023, 11:06 PM

LLVM offers LoopAccessAnalysis for natural loops. If DO loops are thing, maybe introduce a DoLoopAccessAnalysis to drive the bufferization of such loops.

A[I-1] = B[I] + B[I+1];
B[I-1] = A[I-1] + B[I-2];

There are a lot of weird/complex assignments.

I would suggest focusing on the elemental assignments for this patch, since we are trying to address a critical performance issue with HLFIR lowering first. So generic do-loop optimizations seem to be out of the scope.

tblah updated this revision to Diff 549466.Aug 11 2023, 11:21 AM

Thanks for review

Changes:

  • reworked the side effect analysis so that cases where the written-to array value does not dominate the elemental
  • allow a case where the elemental and written-to array have different shape values but are known to be conformable because the assignment doesn't have the "realloc" attribute
  • create do-loop at the hlfir.assign instead of at the location of the elemental

With these changes, the pass can speed up polyhedron/channel2 by about 50%.
Even after this patch, channel2 is much slower with HLFIR than with the old FIR
lowering.

tblah edited the summary of this revision. (Show Details)Aug 11 2023, 11:41 AM
vzakhari added inline comments.Aug 11 2023, 2:50 PM
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
195

nit: braces not needed

296

We also have to check that any memory that is read inside the elemental is not written by an operation between the elemental and the assign.

tblah updated this revision to Diff 549874.Aug 14 2023, 4:30 AM

Thanks for pointing that out Slava.

Changes:

  • fix braces
  • catch reads/writes which would be changed by moving the elemental to the assignment
vzakhari accepted this revision.Aug 14 2023, 12:19 PM

Just a couple of remarks. Looks great otherwise. Thank you, Tom!

flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
97
98
353

It should not matter for the trivial types, but temporary_lhs value must probably be taken from the original hlfir.assign.

This revision is now accepted and ready to land.Aug 14 2023, 12:19 PM
tblah marked 5 inline comments as done.Aug 15 2023, 3:33 AM
tblah added inline comments.
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
143–155

Question to reviewers: (if this hack is acceptable) how much of a priority should it be to implement proper alias analysis of fir.load? Is this something nice to have some time in the future when another use case comes along, or would this be considered important for the correctness of this pass?

My opinion is that this is okay so long as we don't dramatically change the situations in which lowering produces elemental operations.

tblah updated this revision to Diff 550241.Aug 15 2023, 3:35 AM

Thanks for review

Changes: fix nits

Matt added a subscriber: Matt.Aug 15 2023, 8:41 PM
vzakhari added inline comments.Aug 16 2023, 11:40 AM
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
143–155

I think this is acceptable for the time being.

I am not sure whether the alias analysis has to be specialized for fir.load. I think it is a question of aliasing of the result of hlfir.designate with its memref operand. Or maybe I am missing something.

tblah added inline comments.Aug 17 2023, 8:35 AM
flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
143–155

I was worried about something like

%ref = hlfir.designate %struct{"component"}
fir.store %something to %ref
fir.load %ref

But I guess, so long as %something has a trivial type (required by the pass), we can't introduce new aliases using the fir.store so at worst this would result in a false positive (if the store would have overwritten an aliasing access). The false positive would be slow but safe.

This revision was automatically updated to reflect the committed changes.