This is an archive of the discontinued LLVM Phabricator instance.

[flang] stack arrays pass
ClosedPublic

Authored by tblah on Dec 20 2022, 9:16 AM.

Details

Summary

This pass implements the stack arrays RFC at https://reviews.llvm.org/D139617 - see the RFC document for more information. In short, this is a pass to move heap allocated array temporaries to the stack to implement the -fstack-arrays flag.

There are two cases of array temporary allocation which are not transformed by the current analysis. See the RFC for more information. I intend to merge stack arrays with the current design then visit these more complex cases later once I have a better understanding of the HLFIR changes.

In brief, this pass uses data flow analysis to detect heap allocations which are always freed within the same function as the allocation. If these allocations were added by flang (not by allocate statements in the source code) these allocations can be moved to the stack. A single pass was chosen because of concerns that heap temporaries are generated in many places within flang and so it would be difficult to maintain changes at all of those locations going forward. Data flow analysis is needed to detect cases where allocations may not always be freed, depending upon runtime control flow. The RFC provides a full documentation of the design.

Diff Detail

Event Timeline

tblah created this revision.Dec 20 2022, 9:16 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: mehdi_amini. · View Herald Transcript
tblah requested review of this revision.Dec 20 2022, 9:16 AM
clementval added inline comments.Dec 21 2022, 1:15 AM
flang/lib/Optimizer/Builder/MutableBox.cpp
734

Can you define the attr in the central place so we don't hardcode the name here and in the passes.

https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/Dialect/FIRAttr.h
or
https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/Dialect/FIRAttr.td

flang/lib/Optimizer/Transforms/StackArrays.cpp
99

virtual is not necessary.

tblah updated this revision to Diff 484595.Dec 21 2022, 8:38 AM
tblah edited the summary of this revision. (Show Details)

Thanks for review.

The new version of the patch follows the advice in review comments and moves the analysis stage of the pass behind mlir::Pass::getAnalysis.

tblah marked 2 inline comments as done.Dec 21 2022, 8:38 AM
tblah updated this revision to Diff 485646.EditedDec 29 2022, 2:30 PM
tblah edited the summary of this revision. (Show Details)

Changes in the new patch version

  • Calculate allocmem insertion point in the analysis stage so that we can catch all failures during analysis and then succeed when making real code changes
  • Pass the lattice through fir.call operations. This is important so that we move allocations for array temporaries created for function arguments. It is safe to assume that functions do not change the allocation status of the allocations we are tracking because we only track allocations for temporaries created by flang.
  • Add some tests. The last three tests in stack-arrays.f90 don't check anything useful yet: they are placeholders for when the stack arrays can handle these cases correctly.
tblah updated this revision to Diff 486214.Jan 4 2023, 2:39 AM
tblah retitled this revision from [flang] WIP: stack arrays pass to [flang] stack arrays pass.
tblah edited the summary of this revision. (Show Details)

Un-WIP'ed, removed tests for the two unsupported cases (see the RFC for more information).

This patch is ready for review.

tblah updated this revision to Diff 486218.Jan 4 2023, 2:45 AM

Update patch again because I missed a test removal in the last update

The pass implementation is rather nice, thanks for this !

I am wondering about its cost compared to a solution that would apply the option when creating the allocations in lowering in a centralized FirOpBuilder helper to create array temps. Do you have any idea of the complexity of this pass? e.g. is it linear with the number of blocks/FIR instruction?

Otherwise, I have a few small comments inlined.

flang/lib/Optimizer/Transforms/StackArrays.cpp
206

I do not really get why the allocation state is "Allocated" in this case. Do you have an example ?

From what I understand, Unknown was obtained by joining a Freed and Allocated state. So if we merge another Allocated state after this, there isn't there still a path where the status would be Freed ?

260

If the Fortran loop is unstructured (it has branches leaving the loop), lowering has to create a CFG blocks and creating stack allocation could still lead to stack explosion:

  integer, parameter :: k = 100, m=1000000, n = k*m
  integer :: x(n)
  logical :: has_error
  do i=0,m-1
    x(k*i+1:k*(i+1)) = x(k*(i+1):k*i+1:-1)
    if (has_error(x, k)) stop
  end do
end

Is there a way to detect that the block where the fir.alloca would be inserted may be its own successor and also be careful in this case ?

Also, instead of "not fulfilling" -fstack-arrays in those cases, another solution could be to generate stack save / stack restore LLVM intrinsic calls (like at https://github.com/llvm/llvm-project/blob/a8234196c58396c0505ac93983dafee743a67b11/flang/lib/Lower/ConvertCall.cpp#L170). I am not sure though if it would be desirable inside OpenMP loops.

442

OpenMP also needs allocation to be pinned inside openmp region so that they can be outlined. The best would be use FirOpBuilder::getAllocaBlock somewhere for this (see https://github.com/llvm/llvm-project/blob/a1fae71f85994858e402a1fc0ed4d68c46b0a57c/flang/lib/Optimizer/Builder/FIRBuilder.cpp#L198.

tblah added a comment.Jan 5 2023, 5:57 AM

I am wondering about its cost compared to a solution that would apply the option when creating the allocations in lowering in a centralized FirOpBuilder helper to create array temps. Do you have any idea of the complexity of this pass? e.g. is it linear with the number of blocks/FIR instruction?

It should be linear in the number of FIR instructions. When no allocations are found, each FIR operation will be visited once. If an allocation is found, some operations might be visited multiple times. The maximum number of times an operation is revisited is bounded by the maximum depth of possible state transitions.

flang/lib/Optimizer/Transforms/StackArrays.cpp
206

For example (I can't think of a way of writing this without double frees)

integer, allocatable :: arr(:)
logical :: b
...
! state for arr is {}
if (b) then
  allocate(a(n)) ! state for arr is alllocated
endif

! state for arr is allocated

if (b) then
  deallocate(arr) ! state for arr is freed
endif

! state for arr is join(allocated, freed) = unknown

if (!b) then
  deallocate(arr) ! state for arr is freed
endif

! state for arr is join(unknown, freed) = unknown

if (!b) then
  allocate(arr(n)) ! state for arr is allocated
endif

! state for arr is join(unknown, allocated) = allocated

deallocate(arr(n)) ! state for arr is freed

The difference in handling between allocation and frees is because it is safe to allocate memory which may not otherwise have been allocated (e.g. moving a heap allocation inside an if statement to a stack allocation at function scope), but it is not safe to free memory which may not otherwise have been freed (as it might be used later in execution) - for example moving some memory which is conditionally freed to a stack allocation which is invalid after the current function returns.

I think this is largely academic because in practice, once things are in different blocks, the allocation and free are likely to end up using a different SSA value to refer to the pointer and so the current analysis will not be clever enough to realize the same memory is refereed to. But it is important we get this right in case SSA value aliasing (e.g. via fir.result) is added later.

260

Thanks for the information. I can confirm the issue you suggest with CFG blocks.

tblah added inline comments.Jan 5 2023, 5:57 AM
flang/lib/Optimizer/Transforms/StackArrays.cpp
206

Thinking about this, I think it is incorrect to allow both the unknown -> allocated and allocated -> unknown, as these could in-theory loop forever without converging. I will update the patch to fix this.

tblah added inline comments.Jan 5 2023, 8:28 AM
flang/lib/Optimizer/Transforms/StackArrays.cpp
206

I now believe the analysis would still terminate because the original {} -> allocated step can only happen at a fir.allocmem operation. After looping, that same operation will now transition state unknown -> allocated. The lattice immediately after the fir.allocmem statement will be the same in both cases so at that point the analysis will have converged.

peixin added a comment.Jan 5 2023, 5:26 PM

I am wondering about its cost compared to a solution that would apply the option when creating the allocations in lowering in a centralized FirOpBuilder helper to create array temps. Do you have any idea of the complexity of this pass? e.g. is it linear with the number of blocks/FIR instruction?

It should be linear in the number of FIR instructions. When no allocations are found, each FIR operation will be visited once. If an allocation is found, some operations might be visited multiple times. The maximum number of times an operation is revisited is bounded by the maximum depth of possible state transitions.

Maybe it's worth testing SPEC 2017 (and SNAP?) about the compilation time increase and the performance improvement with -fstack-arrays?

tblah added a comment.Jan 6 2023, 3:33 AM

Maybe it's worth testing SPEC 2017 (and SNAP?) about the compilation time increase and the performance improvement with -fstack-arrays?

SPEC 2017 compilation times are vary by around 1% with and without stack arrays. I haven't repeated the measurement so that is probably just measurement error.

I see no change in runtime performance improvement with SPEC 2017. The pass does make modifications to generated code, especially in cam4. In our previous analysis of cam4 performance we did not see memory allocation and deallocation taking a significant proportion of runtime so this is plausible. The importance of -fstack-arrays is mostly because people felt it was important that -Ofast on Flang makes similar changes to -Ofast on other Fortran compilers.

tblah updated this revision to Diff 488685.Jan 12 2023, 9:02 AM
  • Do not move allocations outside of openmp regions
  • Detect loops in the control flow graph
  • Attempt to use llvm.stacksave/llvm.stackrestore to allow stack allocations inside of loops
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
tblah updated this revision to Diff 488914.Jan 13 2023, 2:10 AM

Updating patch context

jeanPerier added inline comments.Jan 17 2023, 2:06 AM
flang/lib/Optimizer/Transforms/StackArrays.cpp
105

It's better to negate the == operator here so that the implementation logic cannot diverge.

121

It is definitely weird to me to have this in the lattice points. It seems expensive to save this at every program point and to compute it every time a state changes on a maybe not final candiate.

Why not computing in StackArraysAnalysisWrapper::analyseFunction on the final candidates (the value in stateMap at that are freed on all return paths) ?

273

This is still odd to me because this breaks the monocity requirement of join:

join(join(freed, unknown), allocated) ) = join(unknown, allocated) = allocated

while

join(freed, join(unknown, allocated)) = join(freed, allocated) = unknown

I still do not think you need anything special here given the fact that an allocation done on a path is considered in the end already seems accounted for in LatticePoint::join since the state is added even if not present in the other latice.

334

As mentioned in my other comment above, I do not get why the insertion point is computed at that point while it seems the analysis (after computing the states, and using the lattice state at the func.return) is not over for the function (I would expect insertion to be computed only for the successfully identified allocmem at the end, not the one that may be candidate on one code path).

461

I think this is not correct: It seems this will consider every FreememOp that could be paired with an allocmem as candidate:

func.func @test(%cdt: i1) {
  %a = fir.allocmem !fir.array<1xi8>
  scf.if %cdt {
    fir.freemem %a : !fir.heap<!fir.array<1xi8>>
  } else {
  }
  return
}

Why not considering the func.return lattice states instead ?

Note that it seems fir.if is not considered a branch operation, so the state of its blocks are reset on entry. That is why scf.if is needed to create a test exposing the issue. Maybe fir.if op should be given the right interface, but that is another topic.

510

Where is blockIsInLoop defined ?

tblah updated this revision to Diff 489753.Jan 17 2023, 3:49 AM
tblah marked 7 inline comments as done.
  • Implement operator!= as !(operator==)
  • Move insertion point computation to StackArraysAnalysisWrapper::analyseFunction
  • Remove special-casing for join(allocated, unknown)
  • Add processing of fir.result to AllocationAnalysis::visitOperation so that lattices are propagated out of nested blocks
  • Walk function returns not freememops
  • Add a test checking that the data flow analysis gives correct results for scf.if
flang/lib/Optimizer/Transforms/StackArrays.cpp
121

Good idea. Thanks!

461

Good spot! To get analysis working with this change I've had to add processing of fir.result operations. These will join the parent operation's lattice with the fir.result.

510
tblah updated this revision to Diff 490810.Jan 20 2023, 6:30 AM

Fix newly added tests

tblah added a comment.Jan 24 2023, 3:46 AM

Ping for review

Thanks for all the updates. This looks functionally correct to me. Since I am not very familiar with this kind of analysis and transformation, it would be great if another reviewer could give his/her opinion. But otherwise, given this solution is well isolated from a code point of view and can be turned and on/off easily, I'll be glad to approve it.

flang/lib/Optimizer/Transforms/StackArrays.cpp
353

I think the early return may be missing here.

447

nit: MLIR/LLVM coding style do not use {} for single line if.

643

If this case must succeed when the other failed, it may be better to place it in an else { and assert that a block was obtained, so that it is certain that the insertion point was correctly set when looking at this code.

tblah updated this revision to Diff 492044.Jan 25 2023, 2:41 AM
tblah marked 5 inline comments as done.
  • Add missing early return for allocations not for arrays
  • Remove braces from if statement with a single statement in its body
  • Assert that a correct insertion point is found for the alloca
flang/lib/Optimizer/Transforms/StackArrays.cpp
353

Thanks, good spot!

Quick questions, and they might not apply here since you seem to only look at explicit Flang generated values, right?
Are you handling unwinding/exceptions, especially in-between the allocation and deallocation?
Are you handling non-accessible stacks (e.g., on GPUs) for escaping pointers?
Do you check the size to (reasonably) ensure we don't overflow the stack?
Are you trying to move the alloca into the entry, if possible?

Did you check LLVM's heap2stack and the corresponding tests?
https://github.com/llvm/llvm-project/blob/c68af565ff0c2fdc5537e9ac0c2d7c75df44b035/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L6480
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack.ll
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll

tblah added a comment.Jan 25 2023, 8:29 AM

Thanks for taking a look, see my responses inline. For more information, the RFC is at https://reviews.llvm.org/D139617

Quick questions, and they might not apply here since you seem to only look at explicit Flang generated values, right?

Yes only heap allocations added by flang are considered. allocate statements in source code are not changed.

Are you handling unwinding/exceptions, especially in-between the allocation and deallocation?

There is no special handling for exceptions.

Are you handling non-accessible stacks (e.g., on GPUs) for escaping pointers?

I am not. I am unfamiliar with this area, do you have any suggestions?

Do you check the size to (reasonably) ensure we don't overflow the stack?

This pass avoids placing stack allocations inside loops, but does not check the size of the stack allocations themselves. In general, Flang will place local arrays of any size on the stack. These allocations can be moved to the heap using the MemoryAllocationOpt pass. In https://reviews.llvm.org/D140972 I made that pass mutually exclusive with this one, but so far as I know, it should be possible to run MemoryAllocaitonOpt after this one to move some of the temporary allocations back to the heap again. Note: you have to set non-default options for the MemoryAllocationOpt pass to move any allocations.

Are you trying to move the alloca into the entry, if possible?

Yes

Did you check LLVM's heap2stack and the corresponding tests?
https://github.com/llvm/llvm-project/blob/c68af565ff0c2fdc5537e9ac0c2d7c75df44b035/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L6480
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack.ll
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll

No I have not seen that. I will take a look, thank you.

tblah added a comment.Jan 26 2023, 3:41 AM

The LLVM pass seems to make quite different design decisions to this pass. The LLVM pass does limit the maximum size of allocations moved to the stack, but does not attempt to avoid placing allocations inside of loops (and does not seem to attempt to move allocations to the entry block). The LLVM pass also supports exceptions, which this pass does not (as there are no exceptions in Fortran).

There is also a similar MLIR pass (promote buffers to stack). The MLIR pass operates on the memref dialect, which is a slightly different problem space because there are no explicit free() instructions to detect. Furthermore, the MLIR pass does not attempt to hoist allocations outside of loops and only detects structured loop operations (LoopLikeOpInterface) not loops formed from branch operations in the control flow graph.

Thanks for this patch @tblah. I had a first look. See comments inline. Have not gone through the core part in full yet.

flang/lib/Optimizer/Transforms/StackArrays.cpp
434

Do we have a test with multiple returns?

536

The op might require a check before further use.

See the following test from arrexp.fir. (run with ./bin/tco f4.fir)

func.func @f4(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : index, %p : index, %f : f32) {
  %c1 = arith.constant 1 : index
  %s = fir.shape_shift %o, %n, %p, %m : (index, index, index, index) -> !fir.shapeshift<2>
  %vIn = fir.array_load %a(%s) : (!fir.ref<!fir.array<?x?xf32>>, !fir.shapeshift<2>) -> !fir.array<?x?xf32>
  %wIn = fir.array_load %b(%s) : (!fir.ref<!fir.array<?x?xf32>>, !fir.shapeshift<2>) -> !fir.array<?x?xf32>
  %r = fir.do_loop %j = %p to %m step %c1 iter_args(%v1 = %vIn) -> !fir.array<?x?xf32> {
    %r = fir.do_loop %i = %o to %n step %c1 iter_args(%v = %v1) -> !fir.array<?x?xf32> {
      %x2 = fir.array_fetch %vIn, %i, %j : (!fir.array<?x?xf32>, index, index) -> f32
      %x = fir.array_fetch %wIn, %i, %j : (!fir.array<?x?xf32>, index, index) -> f32
      %y = arith.addf %x, %f : f32
      %y2 = arith.addf %y, %x2 : f32
      %i2 = arith.addi %i, %c1 : index
      %r = fir.array_update %v, %y2, %i2, %j : (!fir.array<?x?xf32>, f32, index, index) -> !fir.array<?x?xf32>
      fir.result %r : !fir.array<?x?xf32>
    } 
    fir.result %r : !fir.array<?x?xf32>
  }
  fir.array_merge_store %vIn, %r to %a : !fir.array<?x?xf32>, !fir.array<?x?xf32>, !fir.ref<!fir.array<?x?xf32>>
  return
}
621–623

Nit: Braces not required.

631–639

Nit: Move all these close to the creation of the fir:AllocaOp.

641–642

Nit: Use braces for the if block to keep it uniform with the else block.

641–643

Nit: Use braces here to match else.

697–698

From the following code, it seems the functions are processed independently. Can this be a Function pass?

flang/test/Transforms/stack-arrays.f90
136 ↗(On Diff #492044)

Nit: Remove usage of %0.

flang/test/Transforms/stack-arrays.fir
39–49 ↗(On Diff #492044)

Would it be better to capture the variables and check? At least the allocmem and freemem.

203 ↗(On Diff #492044)

Is this a case for future improvement?

tblah updated this revision to Diff 493692.Jan 31 2023, 11:26 AM
tblah marked 10 inline comments as done.

Thanks for review.

Changes:

  • Join the lattices at each return operation to ensure that values are freed at *all* returns, not only *some* return
  • Add tests with multiple return operations
  • Fix nits
flang/lib/Optimizer/Transforms/StackArrays.cpp
434

Thanks for this. It turned out I needed to join across all of the lattices at the return statements to ensure that values were returned at *all* return statements, not at *any* return statement.

697–698

It can't. fir::factory::getLlvm::getStackSave and fir::factory::getLlvmSatckRestore add function declarations to the module-level. If functions are processed in different threads, there is a race condition when the fir::builder first checks to see if the function already exists in the module and if not, adds it.

flang/test/Transforms/stack-arrays.fir
203 ↗(On Diff #492044)

Yes. This is an open TODO. I'll add a comment.

It should be possible to still do stack save/restore if the block containing the free is *always* executed after the memalloc. This might already be guaranteed by the data-flow analysis - I haven't thought enough about it. I haven't seen this happen in the allocations automatically generated by flang, so I don't think it is important to solve now.

Looks OK. I have a few questions and some minor comments inline. It might be good to pull in a bit of info from the RFC into the Summary, particularly saying why a dataflow analysis is necessary, what operations are involved in the analysis etc.

Could we have used the Dominance and PostDominance information to find out the Allocs and Frees that could have been replaced? I saw the following functions for individual Ops but not for the case where a set of ops dominates or post-dominates. So may be not with the existing infra.

bool DominanceInfo::properlyDominatesImpl(Operation *a, Operation *b
bool PostDominanceInfo::properlyPostDominates(Operation *a, Operation *b) {

I guess, we are not capturing the following because of different values.

module {
  func.func @dfa2(%arg0: i1) {
    cf.cond_br %arg0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    %a = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%a : !fir.heap<!fir.array<1xi8>>)
  ^bb2:  // pred: ^bb0
    %b = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%b : !fir.heap<!fir.array<1xi8>>)
  ^bb3(%0: !fir.heap<!fir.array<1xi8>>):  // 2 preds: ^bb1, ^bb2
    fir.freemem %0 : !fir.heap<!fir.array<1xi8>>
    return
  }
}
flang/lib/Optimizer/Transforms/StackArrays.cpp
437–439

Nit: No brace here

443

A comment here would be useful on why we need to look at the freed values only.

483–487

Nit: Braces might not be require here.

533

Might be worth checking whether we have a function for this in MLIR core.

546–548

Theoretically speaking, we can use the dominance info to determine whether one block dominates the other as well to handle cases like the following where we are finding the operands of func. But I guess that is probably not required.

b1:
x = opA
br b2
b2:
y = opB
br b3
b3:
z = func(x,y)
561

Do we have a test for this, and in general for the OpenMP handling?

573–575

Nit: No need for braces here.

594–598

Nit: braces are not required here.

697–698

Not for this patch: May be these can all be preinserted at the beginning of the pass pipeline and removed if not used at the end of the pass pipeline?

733

Nit: Is this error usually given in passes?

tblah updated this revision to Diff 494283.Feb 2 2023, 6:20 AM
tblah marked 8 inline comments as done.

Changes: fix nits from review

tblah added a comment.Feb 2 2023, 6:21 AM

Looks OK. I have a few questions and some minor comments inline. It might be good to pull in a bit of info from the RFC into the Summary, particularly saying why a dataflow analysis is necessary, what operations are involved in the analysis etc.

Could we have used the Dominance and PostDominance information to find out the Allocs and Frees that could have been replaced? I saw the following functions for individual Ops but not for the case where a set of ops dominates or post-dominates. So may be not with the existing infra.

bool DominanceInfo::properlyDominatesImpl(Operation *a, Operation *b
bool PostDominanceInfo::properlyPostDominates(Operation *a, Operation *b) {

I guess, we are not capturing the following because of different values.

module {
  func.func @dfa2(%arg0: i1) {
    cf.cond_br %arg0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    %a = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%a : !fir.heap<!fir.array<1xi8>>)
  ^bb2:  // pred: ^bb0
    %b = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%b : !fir.heap<!fir.array<1xi8>>)
  ^bb3(%0: !fir.heap<!fir.array<1xi8>>):  // 2 preds: ^bb1, ^bb2
    fir.freemem %0 : !fir.heap<!fir.array<1xi8>>
    return
  }
}

Yes we could have used Dominance and PostDominance information to find out if an allocation is always freed. I wasn't aware of mlir::DominanceInfo at the time I wrote this patch. As It is already written, I think the data flow analysis continues to be the correct approach because it will skip dead code (after constant propagation) and I suspect the worst case algorithmic complexity is better than computing dominance between each heap allocation and free.

Yes in that case we cannot detect that the allocation is freed because the free operates on a different SSA value to the allocations. This would have been a problem whether mlir::DominanceInfo or mlir::DataFlowAnalysis were used. I chose not to support allocations and frees using different SSA values as this would have added considerable complexity and is not necessary for the more common cases of Flang-generated allocations. See the RFC for details.

flang/lib/Optimizer/Transforms/StackArrays.cpp
533

Not that I can find. The MLIR verifier checks that all operation arguments properly dominate the operation, but this is done by comparing each in turn against the operation: no last operand is found.

I could use mlir::DominanceInfo to find when the last operand becomes available, which I guess would better handle the case where operands are defined in different blocks. But dominance only provides a partial ordering so there might be cases where domInfo.properlyDominates(arg1, arg2) == domInfo.properlyDominates(arg2, arg1) == false. Looking at the direct operation ordering only within the same block (as I do here) guarantees a total ordering relationship.

546–548

Thank you for pointing out mlir::DominanceInfo - I was not aware of that analysis. I propose we keep this pass as it is for now, to avoid adding more complexity where we don't have a concrete example of flang-generated allocations which need to support alloca arguments defined in different blocks.

561

When writing the tests I discovered that the data flow analysis does not propagate lattices into or out of an omp.section, so currently no allocations inside of an openmp secton will be moved to the stack.

I intend to handle this in a subsequent patch. In the meantime I have added a test to make sure that allocations in an openmp region are not moved.

733

Sorry I don't understand. What change are you requesting here?

tblah edited the summary of this revision. (Show Details)Feb 2 2023, 6:25 AM
tblah added inline comments.Feb 2 2023, 7:18 AM
flang/lib/Optimizer/Transforms/StackArrays.cpp
733

I've checked some other FIR passes and they all follow the same pattern.

kiranchandramohan accepted this revision.Feb 3 2023, 6:14 AM

LGTM. Please wait till end of day Monday before you submit to provide other reviewers with a chance to provide further comments or request changes.

You can consider inlining D141401 in this pass till it is merged.

This revision is now accepted and ready to land.Feb 3 2023, 6:14 AM
tblah updated this revision to Diff 495073.Feb 6 2023, 4:02 AM

Changes: inline mlir::blockIsInLoop

jeanPerier accepted this revision.Feb 7 2023, 12:14 AM

I do not have any further comments, thanks.