This is an archive of the discontinued LLVM Phabricator instance.

[flang][hlfir] Apply NoReassocOp to ElementalOp result.
AbandonedPublic

Authored by vzakhari on Jun 29 2023, 6:20 PM.

Details

Reviewers
jeanPerier
tblah
Summary

By the current lowering, the parentheses operation results
in generation of NoReassocOp inside the ElementalOp (i.e.
the result of NoReassocOp is the one yielded by the ElementalOp).
This does not make much sense, because there is no reordering
opportunity for YieldElementOp operand-result. HLFIR seems to be more
clear when NoReassocOp is applied to the result of ElementalOp.
Moreover, with this representation the elemental inlining
is not trying to do illegal inlining, e.g.:

module mod
contains
  elemental subroutine sub(a,b)
    integer ,intent(out) :: a
    integer,intent(in) :: b
    a = b
  end subroutine
end
  use mod
  integer array1(3,2),expected(3,2)
  data array1(:,1)/1,2,3/,array1(:,2)/-1,-2,-3/
  expected = array1(3:1:-1,2:1:-1)
  call sub(array1,(array1(3:1:-1,2:1:-1)))
  if (any(array1/=expected)) then
    print *, 'Fail'
  else
    print *, 'OK'
  end if
end

Diff Detail

Event Timeline

vzakhari created this revision.Jun 29 2023, 6:20 PM
vzakhari requested review of this revision.Jun 29 2023, 6:20 PM

Thank you for this solution Slava, after thinking more about this issue, I think we may want to address this problem in another way. Rational:

  1. This does not fix all the issues with passing an elemental expressions to an impure function/subroutine: in call impure(a, b+1), I think we also need to fully evaluate b+1 before calling impure (impure could modify b).
  2. This may pessimize perf with kernels that uses parentheses for numerical stability: e.g.: A = (B+C) + D with A,B,C,D numerical arrays. This patch would cause an array temporary to be created for B+C. But the parentheses are not here to guarantee that B+C is fully evaluated before the other addition here, they are here to guarantee the addition order at the element level (hence the reason why the no.reassoc is inside the elemental, after elemental inlining, if any, it will preserve the order of additions at the element level).

I think the fix is actually to force the "evaluation/bufferization" of array expression actual arguments of impure elemental procedures (and pure elemental subroutine with intent(out) arguments) when placing the call. Any hlfir.expr argument of such functions could be placed in memory at the array level (before the loops) with an hlfir.associate.

Regarding 1. the behaviors of Fortran compilers is not unanimous. Nvfortran and ifort are actually failing the parentheses use case you are fixing (xlf, nagfor and gfortran gets it right). Also, the same compiler may behave differently if the elemental procedure is a subroutine or a function.... So before you try what I am suggesting, I think we need to clarify some expectations with regard to elemental procedures that are impure, or pure elemental subroutine that may modify there arguments.

@klausler, I would welcome your input on two questions here:

  • where does the standard rules if elemental arguments must be fully evaluated or not before the call?
  • Below is an illustration of a program that behaves in at least 4 different ways according to the compiler that compiles it.... Who is right?
module test
contains
  impure elemental integer function func(a,b)
    integer ,intent(out) :: a
    integer,intent(in) :: b
    print *, "func"
    a = b
    func = b
  end function
  impure elemental subroutine sub(a,b)
    integer ,intent(out) :: a
    integer,intent(in) :: b
    print *, "sub"
    a = b
  end subroutine
  impure elemental integer function identity(a)
    integer,intent(in) :: a
    print *, "identity"
    identity = a
  end function
  subroutine nop(i)
    integer :: i(4)
  end subroutine
end module

  use test
  integer :: i(4) = [1,2,3,4]
  print *, "test elemental subroutine:"
  call sub(i, identity(i(4:1:-1)))
  print *, i

  i = [1,2,3,4]
  print *, "test elemental function:"
  call nop(func(i, identity(i(4:1:-1))))
  print *, i
end

XLF: Fully evaluates impure elemental function/subroutine arguments before the call.

test elemental subroutine:
identity
identity
identity
identity
sub
sub
sub
sub
4 3 2 1
test elemental function:
identity
identity
identity
identity
func
func
func
func
4 3 2 1

Gfortran: weird, computes the "4 3 2 1" result in the subroutine case while still interleaving the elemental calls (I guess it is saving the inner elemental call argument).

test elemental subroutine:
identity
sub
identity
sub
identity
sub
identity
sub
          4           3           2           1
test elemental function:
identity
func
identity
func
identity
func
identity
func
          4           3           3           4

Nvfortran, ifort, and flang currently: simply ignore impure/intent(out) aspects.

test elemental subroutine:
identity
sub
identity
sub
identity
sub
identity
sub
4 3 3 4
test elemental function:
identity
func
identity
func
identity
func
identity
func
4 3 3 4

nagfor: Fully evaluate the argument in the subroutine case, but not the function case.

test elemental subroutine:
identity
identity
identity
identity
sub
sub
sub
sub
4 3 2 1
test elemental function:
identity
func
identity
func
identity
func
identity
func
4 3 3 4
tblah accepted this revision.EditedJun 30 2023, 2:36 AM

Good find! Thank you for fixing this. The changes LGTM. There's a test failure but it looks unrelated to this patch.

Edit: please address Jean's comment. This is much more complex than I thought!

This revision is now accepted and ready to land.Jun 30 2023, 2:36 AM

"In a reference to an elemental subroutine, if the actual arguments corresponding to INTENT (OUT) and INTENT (INOUT) dummy arguments are arrays, the values of the elements, if any, of the results are the same as would be obtained if the subroutine had been applied separately, in array element order, to corresponding elements of each array actual argument." (F'202X 15.9.3)

"When it is invoked, all actual argument expressions are evaluated, then the arguments are associated, and then the function is executed. When execution of the function is complete, the value of the function result is available for use in the expression that caused the function to be invoked." (15.5.3)

"When a subroutine is invoked, all actual argument expressions are evaluated, then the arguments are associated, and then the subroutine is executed." (15.5.4)

So: when an *impure* elemental is involved, I think that we're required to fully evaluate actual arguments before making any per-element call. And in the case of an impure elemental subroutine, the order of calls must follow the array element ordering of the INTENT(OUT or IN OUT) dummy array argument(s). For pure elementals, I think that we have more freedom to avoid temporaries and follow any convenient ordering, since a conforming program shouldn't be able to tell from side effects.

Thanks a lot for the answer @klausler!

@vzakhari, that confirms that the fix should be to force the evaluation of all impure elemental procedure arguments that are expressions (not only the parenthesized ones).
You can maybe loop over the "preparedActual" before creating the loops/hlfir.elemental after "mustBeOrdered" was set if the procedure is impure/their are intent(out) arguments, and place array hlfir.expr arguments in memory with hlfir.associate, and clean-that up with hlfir.end_associate after the call loop/hlfir.elemental.

vzakhari abandoned this revision.Jun 30 2023, 1:01 PM

Thanks a lot for the thorough explanation, Jean! I am testing the changes that you proposed currently. I verified that your test with my local Flang shows the same behavior as with XLF compiler.

I will abandon this patch and create a new one, once my testing is clean.