This is an archive of the discontinued LLVM Phabricator instance.

[flang] Initial support of allocate statement with source
ClosedPublic

Authored by peixin on Nov 10 2022, 7:33 PM.

Details

Summary

Support allocate statement with source in runtime version. The source
expression is evaluated only once for each allocate statement. When the
source expression has shape-spec, uses it for bounds. Otherwise, get
the bounds from the source expression. Get the length if the source
expression has deferred length parameter.

Diff Detail

Event Timeline

peixin created this revision.Nov 10 2022, 7:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2022, 7:33 PM
peixin requested review of this revision.Nov 10 2022, 7:33 PM
  1. When I tried to support this feature using runtime, I found I need one temporary storage for non-contiguous variable, and that operation is actually one allocate-from-source-operation.
  1. This contains too many scenarios, and it is not easy to review from IR check. So should I add all the IR checks for the test cases in this patch. For now, I only add the IR check for scalar. Please notice this is RFC.
  1. I put the execution tests in https://github.com/llvm/llvm-project/issues/58927, and hope it can help the review process.
peixin retitled this revision from [flang][RFC] Initial support allocate statement with source to [flang][RFC] Initial support of allocate statement with source.Nov 13 2022, 5:09 PM

When I tried to support this feature using runtime, I found I need one temporary storage for non-contiguous variable, and that operation is actually one allocate-from-source-operation.

Could you be more specific here, which runtime did you try using, and why you need this copy exactly ? AllocatableAllocateSource signature is defined but not implemented, and that was the one I had in mind you would implement and use.

I am rather against having an inlined implementation for this using the current lowering/assignment lowering.

flang/lib/Lower/Allocatable.cpp
411

The lower bounds should be the same as sourceExv: "When an ALLOCATE statement is executed for an array with no allocate-shape-spec-list, the bounds of source-expr determine the bounds of the array." (9.7.1.2 6).

489

Won't this lead to re-evaluating the source-expr for each allocation object ? I think this is illegal if the source-expr may have side effects.

Could you be more specific here, which runtime did you try using, and why you need this copy exactly ?

When the subscript-triplet has stride, I need to copy the array section to one contiguous storage since I did not find the helper function to add the stride info. Is it allowed to use sm (memory stride in bytes) of CFI_dim_t to add it to handle this case?

AllocatableAllocateSource signature is defined but not implemented, and that was the one I had in mind you would implement and use.

I tried to use AllocatableAllocateSource, but I haven't figured out how to handle the stride info. And another problem is what if the source expression is non-pointer non-allocatable variables and elem_len (element size in bytes) in CFI_cdesc_t would be unknown.

flang/lib/Lower/Allocatable.cpp
489

Right, I shouldn't share the code of expression evaluating, and use ExvValue as argument.

Could you be more specific here, which runtime did you try using, and why you need this copy exactly ?

When the subscript-triplet has stride, I need to copy the array section to one contiguous storage since I did not find the helper function to add the stride info. Is it allowed to use sm (memory stride in bytes) of CFI_dim_t to add it to handle this case?

Why does it matters if the source expression has triplets, can you give an example ? I would expect you would use genExprBox() on the source expression and pass the descriptor value (fir.box) to the runtime regardless of what the expression is (they are likely some optimizations that one could think of depending on the what storage the source expression is evaluated in, e.g. in certain cases it could be moved into the allocate object, but let's keep this optimizations for when there is simple code able to yield correct code).

AllocatableAllocateSource signature is defined but not implemented, and that was the one I had in mind you would implement and use.

I tried to use AllocatableAllocateSource, but I haven't figured out how to handle the stride info. And another problem is what if the source expression is non-pointer non-allocatable variables and elem_len (element size in bytes) in CFI_cdesc_t would be unknown.

You do not really need to handle the stride info manually inside the runtime. I expect the code in AllocatableAllocateSource would be somehow similar to FortranAssign for the reallocation case (except in that for source allocation it should be an error if the allocatable/pointer is already allocated). See https://github.com/llvm/llvm-project/blob/main/flang/runtime/assign.cpp#L62.

Could you be more specific here, which runtime did you try using, and why you need this copy exactly ?

When the subscript-triplet has stride, I need to copy the array section to one contiguous storage since I did not find the helper function to add the stride info. Is it allowed to use sm (memory stride in bytes) of CFI_dim_t to add it to handle this case?

Why does it matters if the source expression has triplets, can you give an example ? I would expect you would use genExprBox() on the source expression and pass the descriptor value (fir.box) to the runtime regardless of what the expression is (they are likely some optimizations that one could think of depending on the what storage the source expression is evaluated in, e.g. in certain cases it could be moved into the allocate object, but let's keep this optimizations for when there is simple code able to yield correct code).

AllocatableAllocateSource signature is defined but not implemented, and that was the one I had in mind you would implement and use.

I tried to use AllocatableAllocateSource, but I haven't figured out how to handle the stride info. And another problem is what if the source expression is non-pointer non-allocatable variables and elem_len (element size in bytes) in CFI_cdesc_t would be unknown.

You do not really need to handle the stride info manually inside the runtime. I expect the code in AllocatableAllocateSource would be somehow similar to FortranAssign for the reallocation case (except in that for source allocation it should be an error if the allocatable/pointer is already allocated). See https://github.com/llvm/llvm-project/blob/main/flang/runtime/assign.cpp#L62.

Thanks a lot. I tried as you suggest. I don't need to handle the stride info manually. Using genExprBox should work for each case. Will change this PR with the runtime version of implementation later.

flang/lib/Lower/Allocatable.cpp
411

I am confused about this description when I read this. Specifically, the "determine" is not clear in the standard. It does not clearly state that the allocate object have the same bounds of the source-expr. For the following case:

program main
  integer, parameter :: n = 5
  integer :: a(2:n+1) = [1, 2, 3, 4, 5]
  call test1(n, a)
  call test2(n, a)
contains
  subroutine test1(n, a)
    integer, allocatable :: x1(:), x2(:), x3(:), x4(:)
    integer :: n, sss, a(n)
  
    allocate(x1, x2, source = a)
    allocate(x3, source = a(2:5:2))
    allocate(x4, source = a(2:5), stat=sss)
    print *, "allocatable: ", lbound(x1), ubound(x1)
    print *, "allocatable: ", lbound(x2), ubound(x2)
    print *, "allocatable: ", lbound(x3), ubound(x3)
    print *, "allocatable: ", lbound(x4), ubound(x4)
  end
  subroutine test2(n, a)
    integer, pointer :: x1(:), x2(:), x3(:), x4(:)
    integer :: n, sss, a(n)
  
    allocate(x1, x2, source = a)
    allocate(x3, source = a(2:5:2))
    allocate(x4, source = a(2:5), stat=sss)
    print *, "pointer: ", lbound(x1), ubound(x1)
    print *, "pointer: ", lbound(x2), ubound(x2)
    print *, "pointer: ", lbound(x3), ubound(x3)
    print *, "pointer: ", lbound(x4), ubound(x4)
  end
end

gfortran, ifort and classic-flang have the same output:

allocatable:             1            5
allocatable:             1            5
allocatable:             1            2
allocatable:             1            4
pointer:             1            5
pointer:             1            5
pointer:             1            2
pointer:             1            4
jeanPerier added inline comments.Nov 15 2022, 2:54 AM
flang/lib/Lower/Allocatable.cpp
411

I read the "determine" in standard as telling "the new bounds are the one of the source expression".
In your test, the lower bounds of the source expressions in the allocate are all ones (*), so the ifort, gfortran, and classic flang results do not invalidate this and looks good to me.

Here is an example where the lower bounds of the source expression are not ones:

  real :: src(2:11)
  real, allocatable :: x(:)
  allocate(x, source=src)
  print *, lbound(x), ubound(x)
end

gfortran, ifort and nvfortran all prints 2 11 as expected.

(*): The program you gave above as an example contains two things that are counterintuitive in Fortran (IMHO):

  • Your program seems to expect that the lower bounds of a in test1 and test2 are the one from their effective argument a in the main program (2). This is incorrect, assumed shapes only carry the extents and strides, but the lower bounds of the dummy, if not explicitly specified, are ones (F2018 8.5.8.3 point 3: "If lower-bound appears it specifies the lower bound; otherwise the lower bound is 1."). Only pointers and allocatable dummies convey the lower bounds from their effective arguments
  • Your program also seems to expect that the lower bounds of an expression that is an array section are the lower bounds of the triplet. But lower bounds of expressions that are not whole named variables or components are always ones (see LBOUND definition F2018 16.9.109 point 5: the "If DIM is present, ARRAY is a whole array [...], the result has a value equal to the lower bound for subscript DIM of ARRAY. Otherwise, if DIM is present, the result value is 1.").
peixin added inline comments.Nov 15 2022, 3:03 AM
flang/lib/Lower/Allocatable.cpp
411

You are right. I realized I was wrong just now. Thanks for the clarification.

For your test case, I used gcc 9.3.0, and it prints 1 10. It prints 2 11 when I use gcc 12.

jeanPerier added inline comments.Nov 16 2022, 12:47 AM
flang/lib/Lower/Allocatable.cpp
411

For your test case, I used gcc 9.3.0, and it prints 1 10. It prints 2 11 when I use gcc 12.

Funny, I only tried one of the latest version. They may have fixed it as a bug.

peixin updated this revision to Diff 475733.Nov 16 2022, 2:08 AM
peixin retitled this revision from [flang][RFC] Initial support of allocate statement with source to [flang] Initial support of allocate statement with source.
peixin edited the summary of this revision. (Show Details)
peixin added a reviewer: klausler.

Use the runtime version as suggested.

Please check #58927 for execution tests.

klausler added inline comments.Nov 16 2022, 9:44 AM
flang/include/flang/Runtime/assign.h
31

The comment doesn't adequately explain what this API does. What are the two Boolean arguments? What Fortran constructs would use this API instead of Assign()? Does this routine need to be in the API of the runtime support library at all, or is it only used internally?

flang/runtime/allocatable.cpp
102

Why is wasJustAllocated false? You just allocated it.

peixin updated this revision to Diff 476037.Nov 17 2022, 12:59 AM

Update runtime part to address the comments.

The comment doesn't adequately explain what this API does. What are the two Boolean arguments? What Fortran constructs would use this API instead of Assign()? Does this routine need to be in the API of the runtime support library at all, or is it only used internally?

The main difference of using Assign for allocate statement with source is that there is no realloc. I rethink about it. Add one more argument skipRealloc to Assign is more clear.

@peixin Are you planning more work on this patch or is this the final version to be reviewed?

@peixin Are you planning more work on this patch or is this the final version to be reviewed?

This is the final version to be reviewed.

clementval accepted this revision.Dec 2 2022, 12:51 AM

LGTM. Maybe wait for @klausler or @jeanPerier confirmation.

This revision is now accepted and ready to land.Dec 2 2022, 12:51 AM
peixin added a comment.Dec 2 2022, 7:37 PM

LGTM. Maybe wait for @klausler or @jeanPerier confirmation.

Sure. Thanks for the review.

jeanPerier added inline comments.Dec 6 2022, 12:18 AM
flang/lib/Lower/Allocatable.cpp
525

The lower bounds in the fir.box of a fir::BoxValue are unreliable, the code should use fir::factory::readLowerBound(builder, loc, sourceExv, i, one) to get it.
That's because local lower bounds are tracked in the extended value for the entity, not its fir.box. This will change with the lowering update because it is a bit tricky, but for now you should use readLowerBound.

flang/runtime/assign.cpp
167

Is there code relying on doing assignments with mismatching element numbers ? That seems a bit wild, and although you added a std::min to make it well defined, it seems to me it most likely is a user error and I am not sure we should close the eyes in the runtime call. In an inline implementation, skipping the complexity of emitting the checking code makes sense to me, but here, adding complexity to optionally avoid the check just to match ifort and gfortran default blindness seems overkill to me, unless of course, there is a use case motivating this.

peixin added inline comments.Dec 6 2022, 12:33 AM
flang/lib/Lower/Allocatable.cpp
525

Thanks a lot. Will fix this in next update.

flang/runtime/assign.cpp
167

I hit this kind of errors in the test cases of classic-flang. I didn't capture one case in real workloads since this check is not enabled by gfortran/ifort/classic-flang.

We can support the runtime check by default for now, and turn it back to the current settings when we hit the errors in real workloads. Do you prefer this?

klausler requested changes to this revision.Dec 6 2022, 10:58 AM
flang/include/flang/Runtime/assign.h
29–30

If this is not an external API for the runtime, then this header file is not the right place to put it.

flang/runtime/allocatable.cpp
103

I don't think that the runtime check should ever be disabled, and certainly not disabled by default.

flang/runtime/assign.cpp
196

Don't recompute min() on every iteration.

205

Don't recompute min() on every iteration.

215

Don't recompute min() on every iteration.

224

Don't recompute min() on every iteration.

258

Don't recompute min() on every iteration.

flang/runtime/pointer.cpp
146

-fcheck-bounds is for indexing. This runtime check should always be enabled.

This revision now requires changes to proceed.Dec 6 2022, 10:58 AM
peixin updated this revision to Diff 480882.Dec 7 2022, 6:05 AM
peixin edited the summary of this revision. (Show Details)

Address the comments:

  1. Use fir::factory::readLowerBound to get the lower bound.
  2. Remove the runtime check option.
  3. Move the internal assignment to one new file.
clementval added inline comments.Dec 7 2022, 6:39 AM
flang/runtime/assign-object.h
1 ↗(On Diff #480882)

Why do you need a new header file and cpp file for this?

peixin added inline comments.Dec 7 2022, 7:16 AM
flang/include/flang/Runtime/assign.h
29–30

@clementval I add the new header file and cpp file because of this comment. I noticed that flang/include/flang/Runtime/ is where the external APIs belongs to. The internal API can be placed to flang/runtime/. Do I misunderstand this comment?

peixin updated this revision to Diff 480902.Dec 7 2022, 7:17 AM

Rebase to fix patch application fail.

clementval added inline comments.Dec 8 2022, 1:20 AM
flang/include/flang/Runtime/assign.h
29–30

Why not keeping flang/runtime/assign.cpp and just adding flang/runtime/assign.h?

peixin updated this revision to Diff 481231.Dec 8 2022, 3:57 AM
peixin added inline comments.
flang/include/flang/Runtime/assign.h
29–30

Thanks. Fixed.

peixin updated this revision to Diff 487149.Jan 8 2023, 12:23 AM

Fix clang-format issue.

jeanPerier accepted this revision.Jan 12 2023, 9:38 AM

ping for review @jeanPerier @klausler

Hi @peixin, this looks good to me, thanks of the updates!

This revision was not accepted when it landed; it landed in state Needs Review.Jan 13 2023, 4:43 AM
This revision was automatically updated to reflect the committed changes.

How is the scalar source handled? Here is what I see:

% cat sa.f90
program a
  integer n
  integer, allocatable :: i(:)
  n = 42
  allocate(i(4), source=n)
end program a
% flang sa.f90
% a.out

fatal Fortran runtime error(./sa.f90:5): Assign: mismatched ranks (1 != 0) in assignment to unallocated allocatable
Aborted

How is the scalar source handled? Here is what I see:

% cat sa.f90
program a
  integer n
  integer, allocatable :: i(:)
  n = 42
  allocate(i(4), source=n)
end program a
% flang sa.f90
% a.out

fatal Fortran runtime error(./sa.f90:5): Assign: mismatched ranks (1 != 0) in assignment to unallocated allocatable
Aborted

Thanks @sscalpone for providing this case. The case of allocate array with scalar source is missed in this patch. Can I file a ticket for this case and support this in next patch?

Thanks @sscalpone for providing this case. The case of allocate array with scalar source is missed in this patch. Can I file a ticket for this case and support this in next patch?

Yes. Thank you!