This is an archive of the discontinued LLVM Phabricator instance.

[flang][hlfir] Fixed associate-op codegen for optimized HLFIR.
ClosedPublic

Authored by vzakhari on Aug 21 2023, 5:24 PM.

Details

Summary

This effectively reverts D154715.

The issue appears as the dialect conversion error because we try to
erase an op that has already been erased. See the added LIT test case
with HLFIR that may appear as a result of CSE.
The adaptor.getSource() is an operation producing a tuple,
which does not have users, so allOtherUsesAreSafeForAssociate
just looks at the empty list of users. So we get completely wrong
answers from it. This causes problems with the following
eraseAllUsesInDestroys that tries to remove the DestroyOp twice
during both hflir.associate processing.

But we also cannot use associate.getSource() *efficiently*, because
the original users may still hang around: one example is the original body
of hlfir.elemental (see D154715), another example is other already converted
AssociateOp's that are pending removal in the rewriter
(that is why we have a temporary created for each hlfir.associate
in the newly added LIT case).

This patch just fixes the correctness issue. I think we have to separate
the buffer reuse analysis from the conversion itself.

I also tried to address the issues with the cloned bodies of hlfir.elemental,
but this should not matter since D155778: if hlfir.associate is inside
hlfir.elemental, it will end up inside a do-loop body region, so the early
exit added in D155778 will prevent the buffer reuse.

Diff Detail

Event Timeline

vzakhari created this revision.Aug 21 2023, 5:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 5:24 PM
vzakhari requested review of this revision.Aug 21 2023, 5:24 PM
tblah accepted this revision.Aug 22 2023, 2:38 AM

Thank you for picking up these bugs from adding CSE to the pipeline. This patch looks good to me. The comments are just questions and are not intended to block the patch.

flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
538

I wonder if using the conversion pattern rewriter driver is creating more trouble than it is worth in this patch. Personally, I find these sorts of things makes this pass very difficult to debug.

How would you feel about using the greedy pattern rewriter API instead? This way updates would be immediately visible.

flang/test/HLFIR/associate-codegen.fir
242

Have you observed performance regressions as a result of this?

If you have, I don't think that should block the patch; but it would be helpful to know.

This revision is now accepted and ready to land.Aug 22 2023, 2:38 AM
vzakhari updated this revision to Diff 552452.Aug 22 2023, 11:21 AM

I added the dominance check to avoid TODO failures on cases like this:

subroutine test(i)
  integer, intent(in) :: i(:, :)
  associate (j => transpose(i))
    if (j(1,1) /= 1) then
       stop
    end if
  end associate
end subroutine test
flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
538

The greedy pattern match driver usage is not straightforward either. One of the problems is that the operands are not converted before the users, so for the LIT case from flang/test/HLFIR/associate-codegen.fir the very first pattern rewrite (DestroyOpConversion) produces this:

"func.func"() <{arg_attrs = [{fir.bindc_name = "a"}], function_type = (!fir.boxchar<1>) -> (), sym_name = "_QPtest"}> ({
^bb0(%arg0: !fir.boxchar<1>):
...
  %3 = "hlfir.elemental"(%2, %0) ({
  ^bb0(%arg1: index):
    %6 = "fir.undefined"() : () -> !fir.ref<!fir.char<1>>
    "hlfir.yield_element"(%6) : (!fir.ref<!fir.char<1>>) -> ()
  }) {operandSegmentSizes = array<i32: 1, 0, 1>, unordered} : (!fir.shape<1>, index) -> !hlfir.expr<10x!fir.char<1>>
...
  "fir.if"(%3) ({
    %6 = "fir.convert"(%3) : (!hlfir.expr<10x!fir.char<1>>) -> !fir.heap<!fir.array<10x!fir.char<1>>>
    "fir.freemem"(%6) : (!fir.heap<!fir.array<10x!fir.char<1>>>) -> ()
    "fir.result"() : () -> ()
  }, {
  }) : (!hlfir.expr<10x!fir.char<1>>) -> ()
  "func.return"() : () -> ()
}) : () -> ()

I guess we can "legalize" the operand of DestroyOp before generating the if block with freemem, e.g. do something like this:

"func.func"() <{arg_attrs = [{fir.bindc_name = "a"}], function_type = (!fir.boxchar<1>) -> (), sym_name = "_QPtest"}> ({
^bb0(%arg0: !fir.boxchar<1>):
...
  %3 = "hlfir.elemental"(%2, %0) ({
  ^bb0(%arg1: index):
    %6 = "fir.undefined"() : () -> !fir.ref<!fir.char<1>>
    "hlfir.yield_element"(%6) : (!fir.ref<!fir.char<1>>) -> ()
  }) {operandSegmentSizes = array<i32: 1, 0, 1>, unordered} : (!fir.shape<1>, index) -> !hlfir.expr<10x!fir.char<1>>
...
  %tuple:2 = hlfir.unpack_expr %3 : (!hlfir.expr<10x!fir.char<1>>) -> !fir.ref<!fir.array<10x!fir.char<1>>>, i1
  "fir.if"(%tuple#1) ({
    %6 = "fir.convert"(%tuple#0) : (!fir.ref<!fir.array<10x!fir.char<1>>>) -> !fir.heap<!fir.array<10x!fir.char<1>>>
    "fir.freemem"(%6) : (!fir.heap<!fir.array<10x!fir.char<1>>>) -> ()
    "fir.result"() : () -> ()
  }, {
  }) : (i1) -> ()
  "func.return"() : () -> ()
}) : () -> ()

Then after hlfir.elemental has been replaced with a tuple producing fir.insert_value we would have to pattern-match and eliminate fir.insert_value->hlfir.unpack_expr chains.

But this does not seem to help to recognize the "allOtherUsesAreSafeForAssociate(): last use of hlfir.expr" at all: different users of the hlfir.elemental may introduce their own hlfir.unpack_expr, so there will be multiple uses of hlfir.elemental that we will have to chase starting from a hlfir.upack_expr operand of a hlfir.associate; and the users are "unrecognizable" after some conversions have happened (like above, the DestroyOp has become and if operation with type casts and freemem).

So I think pre-conversion analysis for deciding about the buffer reuse is the way to go here. I would like to discuss this more with @jeanPerier. In the meantime I will try to fix this for correctness.

flang/test/HLFIR/associate-codegen.fir
242

There are no regressions on exchange2, but I will run more benchmarks to be on the safe side.

This patch did not cause any performance regressions on x86 for SPEC CPU2000/2006/2017 or Polyhedron benchmarks.

tblah added inline comments.Aug 23 2023, 2:55 AM
flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
538

+1, thanks for trying this