This is an archive of the discontinued LLVM Phabricator instance.

[flang][hlfir] Allow expanding realloc assignments with scalar RHS.
ClosedPublic

Authored by vzakhari on Sep 1 2023, 4:43 PM.

Details

Summary

F18 10.2.1.3 p. 3 states:
If the variable is an unallocated allocatable array, expr shall have the same rank.

So if LHS is an array and RHS is a scalar, then LHS must be allocated and
the assignment is performed according to F18 10.2.1.3 p. 5:
If expr is a scalar and the variable is an array,
the expr is treated as if it were an array of the same shape as the
variable with every element of the array equal to the scalar value of expr.

This resolves performance regression in CPU2006/437.leslie3d caused
by extra Assign runtime calls for ALLOCATABLE local arrays.
Note that the extra calls do not add overhead themselves.
The problem is that the descriptor for ALLOCATABLE is passed
to Assign runtime function, and this messes up the points-to
analysis.

Example:

      ALLOCATABLE DUDX(:),DUDY(:),DUDZ(:)
...
      ALLOCATE( QS(IMAX-1),FSK(IMAX-1,0:KMAX,ND),
     >      QDIFFZ(IMAX-1), RMU(IMAX-1), EKCOEF(IMAX-1),
     >      DUDX(IMAX-1),DUDY(IMAX-1),DUDZ(IMAX-1),
...
      DUDZ=0D0
...
               DO I = I1, I2
                  DUDZ(I) =
     >                  DZI * ABD * ((U(I,J,KBD) - U(I,J,KCD)) +
     >                       8.0D0 * (U(I,J, KK) - U(I,J,KBD))) * R6I

When we are not lowering DUDZ=0D0 to Assign call, the base_addr of
DUDZ's descriptor is a result of malloc, and LLVM is able to figure out
that the accesses through this base_addr cannot overlap with accesses of,
for exmaple, module (global) variable DZI. This enables CSE and LICM
for the loop, eventually, resulting in clean vectorization.

When DUDZ's descriptor "escapes" to Assign runtime function,
there are no guarantees about where base_addr can point to.
I do not think this can be resolved by using any existing LLVM function/argument
attributes. Maybe we will be able to communicate the no-aliasing information
to LLVM using Full Restrict Support representation.

For the purpose of enabling HLFIR by default, I am just aligning the IR
with what we have with FIR lowering.

Diff Detail

Event Timeline

vzakhari created this revision.Sep 1 2023, 4:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2023, 4:43 PM
vzakhari requested review of this revision.Sep 1 2023, 4:43 PM
tblah accepted this revision.Sep 4 2023, 2:35 AM

Thanks for this, LGTM!

flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
463

For any other reviewers: see f2018 10.2.1.3 p3: "If the variable is an unallocated allocatable array, expr shall have the same rank". The checks below require that rhs ("expr") has a trivial type (not an array), and lhs (" the variable") is an array, therefore they cannot have the same rank and so lhs cannot be unallocated.

This revision is now accepted and ready to land.Sep 4 2023, 2:35 AM