This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/docs/tools/
-
docs/
-
tools/
-
clang-formatted-files.txt
-
flang/
-
include/flang/Optimizer/
-
flang/
-
Optimizer/
-
Builder/
-
MutableBox.h
-
Dialect/
-
FIRAttr.h
-
Transforms/
-
Passes.h
-
Passes.td
-
lib/
-
Lower/
-
Allocatable.cpp
-
Optimizer/
-
Builder/
1/1
MutableBox.cpp
-
Transforms/
-
CMakeLists.txt
43/45
StackArrays.cpp
-
test/
-
Lower/
-
HLFIR/
-
allocatable-and-pointer-status-change.f90
-
Intrinsics/
-
c_loc.f90
-
system_clock.f90
-
Transforms/
1/1
stack-arrays.f90
3/3
stack-arrays.fir

Differential D140415

[flang] stack arrays pass
ClosedPublic

Authored by tblah on Dec 20 2022, 9:16 AM.

Download Raw Diff

Details

Reviewers

kiranchandramohan
MatsPetersson
DavidTruby
awarzynski
jeanPerier
clementval
peixin
sscalpone
nicolasvasilache
jdoerfert

Commits

rGcc14bf22bddf: [flang] add a pass to move array temporaries to the stack

Summary

This pass implements the stack arrays RFC at https://reviews.llvm.org/D139617 - see the RFC document for more information. In short, this is a pass to move heap allocated array temporaries to the stack to implement the -fstack-arrays flag.

There are two cases of array temporary allocation which are not transformed by the current analysis. See the RFC for more information. I intend to merge stack arrays with the current design then visit these more complex cases later once I have a better understanding of the HLFIR changes.

In brief, this pass uses data flow analysis to detect heap allocations which are always freed within the same function as the allocation. If these allocations were added by flang (not by allocate statements in the source code) these allocations can be moved to the stack. A single pass was chosen because of concerns that heap temporaries are generated in many places within flang and so it would be difficult to maintain changes at all of those locations going forward. Data flow analysis is needed to detect cases where allocations may not always be freed, depending upon runtime control flow. The RFC provides a full documentation of the design.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	6,240 ms	libcxx CI C++2b > llvm-libc++-shared-cfg-in.libcxx/gdb::gdb_pretty_printer_test.sh.cpp
	3,340 ms	libcxx CI Modules > llvm-libc++-shared-cfg-in.libcxx/algorithms/specialized_algorithms/special_mem_concepts::nothrow_sentinel_for.compile.pass.cpp
	4,340 ms	libcxx CI Modules > llvm-libc++-shared-cfg-in.libcxx/gdb::gdb_pretty_printer_test.sh.cpp

Event Timeline

tblah created this revision.Dec 20 2022, 9:16 AM

Herald added a reviewer: sscalpone. · View Herald TranscriptDec 20 2022, 9:16 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: mehdi_amini. · View Herald Transcript

tblah requested review of this revision.Dec 20 2022, 9:16 AM

tblah mentioned this in D139617: [flang] RFC: -fstack-arrays implementation.

Harbormaster completed remote builds in B204185: Diff 484298.Dec 20 2022, 11:56 AM

clementval added inline comments.Dec 21 2022, 1:15 AM

flang/lib/Optimizer/Builder/MutableBox.cpp
734	Can you define the attr in the central place so we don't hardcode the name here and in the passes. `https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/Dialect/FIRAttr.h` or `https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/Dialect/FIRAttr.td`

Could you have a look at:
https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

flang/lib/Optimizer/Transforms/StackArrays.cpp
100	`virtual` is not necessary.

Thanks for review.

The new version of the patch follows the advice in review comments and moves the analysis stage of the pass behind mlir::Pass::getAnalysis.

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptDec 21 2022, 8:38 AM

tblah marked 2 inline comments as done.Dec 21 2022, 8:38 AM

Harbormaster completed remote builds in B204398: Diff 484595.Dec 21 2022, 10:41 AM

Changes in the new patch version

Calculate allocmem insertion point in the analysis stage so that we can catch all failures during analysis and then succeed when making real code changes
Pass the lattice through fir.call operations. This is important so that we move allocations for array temporaries created for function arguments. It is safe to assume that functions do not change the allocation status of the allocations we are tracking because we only track allocations for temporaries created by flang.
Add some tests. The last three tests in stack-arrays.f90 don't check anything useful yet: they are placeholders for when the stack arrays can handle these cases correctly.

Herald added a project: Restricted Project. · View Herald TranscriptDec 29 2022, 2:30 PM

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 20 others. · View Herald Transcript

Harbormaster completed remote builds in B205182: Diff 485646.Dec 29 2022, 3:32 PM

Herald added a subscriber: jsetoain. · View Herald TranscriptDec 29 2022, 3:32 PM

tblah mentioned this in D140879: [mlir] Allow overriding AbstractDenseDataFlowAnalysis::visitOperation.Jan 3 2023, 3:36 AM

tblah mentioned this in rG5bedd675d741: [mlir] Allow overriding AbstractDenseDataFlowAnalysis::visitOperation.Jan 4 2023, 2:24 AM

Un-WIP'ed, removed tests for the two unsupported cases (see the RFC for more information).

This patch is ready for review.

Update patch again because I missed a test removal in the last update

Harbormaster completed remote builds in B205645: Diff 486218.Jan 4 2023, 3:56 AM

tblah mentioned this in D140972: [flang] Add -fstack-arrays flag.Jan 4 2023, 5:12 AM

tblah added a child revision: D140972: [flang] Add -fstack-arrays flag.

The pass implementation is rather nice, thanks for this !

I am wondering about its cost compared to a solution that would apply the option when creating the allocations in lowering in a centralized FirOpBuilder helper to create array temps. Do you have any idea of the complexity of this pass? e.g. is it linear with the number of blocks/FIR instruction?

Otherwise, I have a few small comments inlined.

flang/lib/Optimizer/Transforms/StackArrays.cpp
206	I do not really get why the allocation state is "Allocated" in this case. Do you have an example ? From what I understand, Unknown was obtained by joining a Freed and Allocated state. So if we merge another Allocated state after this, there isn't there still a path where the status would be Freed ?
260	If the Fortran loop is unstructured (it has branches leaving the loop), lowering has to create a CFG blocks and creating stack allocation could still lead to stack explosion: integer, parameter :: k = 100, m=1000000, n = km integer :: x(n) logical :: has_error do i=0,m-1 x(ki+1:k(i+1)) = x(k(i+1):k*i+1:-1) if (has_error(x, k)) stop end do end Is there a way to detect that the block where the fir.alloca would be inserted may be its own successor and also be careful in this case ? Also, instead of "not fulfilling" -fstack-arrays in those cases, another solution could be to generate stack save / stack restore LLVM intrinsic calls (like at https://github.com/llvm/llvm-project/blob/a8234196c58396c0505ac93983dafee743a67b11/flang/lib/Lower/ConvertCall.cpp#L170). I am not sure though if it would be desirable inside OpenMP loops.
442	OpenMP also needs allocation to be pinned inside openmp region so that they can be outlined. The best would be use FirOpBuilder::getAllocaBlock somewhere for this (see https://github.com/llvm/llvm-project/blob/a1fae71f85994858e402a1fc0ed4d68c46b0a57c/flang/lib/Optimizer/Builder/FIRBuilder.cpp#L198.

tschuett mentioned this in D137956: [Flang] Add/Restore basic debug support (1/n).Jan 4 2023, 8:32 AM

In D140415#4026250, @jeanPerier wrote:

I am wondering about its cost compared to a solution that would apply the option when creating the allocations in lowering in a centralized FirOpBuilder helper to create array temps. Do you have any idea of the complexity of this pass? e.g. is it linear with the number of blocks/FIR instruction?

It should be linear in the number of FIR instructions. When no allocations are found, each FIR operation will be visited once. If an allocation is found, some operations might be visited multiple times. The maximum number of times an operation is revisited is bounded by the maximum depth of possible state transitions.

flang/lib/Optimizer/Transforms/StackArrays.cpp
206	For example (I can't think of a way of writing this without double frees) integer, allocatable :: arr(:) logical :: b ... ! state for arr is {} if (b) then allocate(a(n)) ! state for arr is alllocated endif ! state for arr is allocated if (b) then deallocate(arr) ! state for arr is freed endif ! state for arr is join(allocated, freed) = unknown if (!b) then deallocate(arr) ! state for arr is freed endif ! state for arr is join(unknown, freed) = unknown if (!b) then allocate(arr(n)) ! state for arr is allocated endif ! state for arr is join(unknown, allocated) = allocated deallocate(arr(n)) ! state for arr is freed The difference in handling between allocation and frees is because it is safe to allocate memory which may not otherwise have been allocated (e.g. moving a heap allocation inside an if statement to a stack allocation at function scope), but it is not safe to free memory which may not otherwise have been freed (as it might be used later in execution) - for example moving some memory which is conditionally freed to a stack allocation which is invalid after the current function returns. I think this is largely academic because in practice, once things are in different blocks, the allocation and free are likely to end up using a different SSA value to refer to the pointer and so the current analysis will not be clever enough to realize the same memory is refereed to. But it is important we get this right in case SSA value aliasing (e.g. via fir.result) is added later.
260	Thanks for the information. I can confirm the issue you suggest with CFG blocks.

tblah added inline comments.Jan 5 2023, 5:57 AM

flang/lib/Optimizer/Transforms/StackArrays.cpp
206	Thinking about this, I think it is incorrect to allow both the `unknown -> allocated` and `allocated -> unknown`, as these could in-theory loop forever without converging. I will update the patch to fix this.

tblah added inline comments.Jan 5 2023, 8:28 AM

flang/lib/Optimizer/Transforms/StackArrays.cpp
206	I now believe the analysis would still terminate because the original `{} -> allocated` step can only happen at a `fir.allocmem` operation. After looping, that same operation will now transition state `unknown -> allocated`. The lattice immediately after the `fir.allocmem` statement will be the same in both cases so at that point the analysis will have converged.

In D140415#4028439, @tblah wrote:

In D140415#4026250, @jeanPerier wrote:

I am wondering about its cost compared to a solution that would apply the option when creating the allocations in lowering in a centralized FirOpBuilder helper to create array temps. Do you have any idea of the complexity of this pass? e.g. is it linear with the number of blocks/FIR instruction?

It should be linear in the number of FIR instructions. When no allocations are found, each FIR operation will be visited once. If an allocation is found, some operations might be visited multiple times. The maximum number of times an operation is revisited is bounded by the maximum depth of possible state transitions.

Maybe it's worth testing SPEC 2017 (and SNAP?) about the compilation time increase and the performance improvement with -fstack-arrays?

In D140415#4030299, @peixin wrote:

In D140415#4028439, @tblah wrote:

In D140415#4026250, @jeanPerier wrote:

Maybe it's worth testing SPEC 2017 (and SNAP?) about the compilation time increase and the performance improvement with -fstack-arrays?

SPEC 2017 compilation times are vary by around 1% with and without stack arrays. I haven't repeated the measurement so that is probably just measurement error.

I see no change in runtime performance improvement with SPEC 2017. The pass does make modifications to generated code, especially in cam4. In our previous analysis of cam4 performance we did not see memory allocation and deallocation taking a significant proportion of runtime so this is plausible. The importance of -fstack-arrays is mostly because people felt it was important that -Ofast on Flang makes similar changes to -Ofast on other Fortran compilers.

Do not move allocations outside of openmp regions
Detect loops in the control flow graph
Attempt to use llvm.stacksave/llvm.stackrestore to allow stack allocations inside of loops

Herald added a reviewer: jdoerfert. · View Herald TranscriptJan 12 2023, 9:02 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B207428: Diff 488685.Jan 12 2023, 9:03 AM

tblah added parent revisions: D141401: [mlir] Add `mlir::blockIsInLoop()`, D139617: [flang] RFC: -fstack-arrays implementation.Jan 12 2023, 9:03 AM

Updating patch context

Harbormaster completed remote builds in B207572: Diff 488914.Jan 13 2023, 2:10 AM

jeanPerier added inline comments.Jan 17 2023, 2:06 AM

flang/lib/Optimizer/Transforms/StackArrays.cpp
105	It's better to negate the `== operator` here so that the implementation logic cannot diverge.
121	It is definitely weird to me to have this in the lattice points. It seems expensive to save this at every program point and to compute it every time a state changes on a maybe not final candiate. Why not computing in StackArraysAnalysisWrapper::analyseFunction on the final candidates (the value in stateMap at that are freed on all return paths) ?
273	This is still odd to me because this breaks the monocity requirement of join: `join(join(freed, unknown), allocated) ) = join(unknown, allocated) = allocated` while `join(freed, join(unknown, allocated)) = join(freed, allocated) = unknown` I still do not think you need anything special here given the fact that an allocation done on a path is considered in the end already seems accounted for in LatticePoint::join since the state is added even if not present in the other latice.
334	As mentioned in my other comment above, I do not get why the insertion point is computed at that point while it seems the analysis (after computing the states, and using the lattice state at the func.return) is not over for the function (I would expect insertion to be computed only for the successfully identified allocmem at the end, not the one that may be candidate on one code path).
461	I think this is not correct: It seems this will consider every FreememOp that could be paired with an allocmem as candidate: func.func @test(%cdt: i1) { %a = fir.allocmem !fir.array<1xi8> scf.if %cdt { fir.freemem %a : !fir.heap<!fir.array<1xi8>> } else { } return } Why not considering the func.return lattice states instead ? Note that it seems fir.if is not considered a branch operation, so the state of its blocks are reset on entry. That is why scf.if is needed to create a test exposing the issue. Maybe fir.if op should be given the right interface, but that is another topic.
510	Where is `blockIsInLoop` defined ?

Herald added a subscriber: sunshaoce. · View Herald TranscriptJan 17 2023, 2:06 AM

Implement operator!= as !(operator==)
Move insertion point computation to StackArraysAnalysisWrapper::analyseFunction
Remove special-casing for join(allocated, unknown)
Add processing of fir.result to AllocationAnalysis::visitOperation so that lattices are propagated out of nested blocks
Walk function returns not freememops
Add a test checking that the data flow analysis gives correct results for scf.if

flang/lib/Optimizer/Transforms/StackArrays.cpp
121	Good idea. Thanks!
461	Good spot! To get analysis working with this change I've had to add processing of fir.result operations. These will join the parent operation's lattice with the fir.result.
510	https://reviews.llvm.org/D141401

Harbormaster completed remote builds in B208200: Diff 489753.Jan 17 2023, 3:50 AM

Fix newly added tests

Harbormaster completed remote builds in B208963: Diff 490810.Jan 20 2023, 6:31 AM

Ping for review

Thanks for all the updates. This looks functionally correct to me. Since I am not very familiar with this kind of analysis and transformation, it would be great if another reviewer could give his/her opinion. But otherwise, given this solution is well isolated from a code point of view and can be turned and on/off easily, I'll be glad to approve it.

flang/lib/Optimizer/Transforms/StackArrays.cpp
353	I think the early return may be missing here.
447	nit: MLIR/LLVM coding style do not use `{}` for single line if.
643	If this case must succeed when the other failed, it may be better to place it in an `else {` and assert that a block was obtained, so that it is certain that the insertion point was correctly set when looking at this code.

Add missing early return for allocations not for arrays
Remove braces from if statement with a single statement in its body
Assert that a correct insertion point is found for the alloca

flang/lib/Optimizer/Transforms/StackArrays.cpp
353	Thanks, good spot!

Harbormaster completed remote builds in B209832: Diff 492044.Jan 25 2023, 3:37 AM

Quick questions, and they might not apply here since you seem to only look at explicit Flang generated values, right?
Are you handling unwinding/exceptions, especially in-between the allocation and deallocation?
Are you handling non-accessible stacks (e.g., on GPUs) for escaping pointers?
Do you check the size to (reasonably) ensure we don't overflow the stack?
Are you trying to move the alloca into the entry, if possible?

Did you check LLVM's heap2stack and the corresponding tests?
https://github.com/llvm/llvm-project/blob/c68af565ff0c2fdc5537e9ac0c2d7c75df44b035/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L6480
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack.ll
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll

In D140415#4080080, @jdoerfert wrote:

Thanks for taking a look, see my responses inline. For more information, the RFC is at https://reviews.llvm.org/D139617

Quick questions, and they might not apply here since you seem to only look at explicit Flang generated values, right?

Yes only heap allocations added by flang are considered. allocate statements in source code are not changed.

Are you handling unwinding/exceptions, especially in-between the allocation and deallocation?

There is no special handling for exceptions.

Are you handling non-accessible stacks (e.g., on GPUs) for escaping pointers?

I am not. I am unfamiliar with this area, do you have any suggestions?

Do you check the size to (reasonably) ensure we don't overflow the stack?

This pass avoids placing stack allocations inside loops, but does not check the size of the stack allocations themselves. In general, Flang will place local arrays of any size on the stack. These allocations can be moved to the heap using the MemoryAllocationOpt pass. In https://reviews.llvm.org/D140972 I made that pass mutually exclusive with this one, but so far as I know, it should be possible to run MemoryAllocaitonOpt after this one to move some of the temporary allocations back to the heap again. Note: you have to set non-default options for the MemoryAllocationOpt pass to move any allocations.

Are you trying to move the alloca into the entry, if possible?

Yes

Did you check LLVM's heap2stack and the corresponding tests?
https://github.com/llvm/llvm-project/blob/c68af565ff0c2fdc5537e9ac0c2d7c75df44b035/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L6480
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack.ll
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll

No I have not seen that. I will take a look, thank you.

In D140415#4080080, @jdoerfert wrote:

Did you check LLVM's heap2stack and the corresponding tests?
https://github.com/llvm/llvm-project/blob/c68af565ff0c2fdc5537e9ac0c2d7c75df44b035/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L6480
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack.ll
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Attributor/heap_to_stack_gpu.ll

The LLVM pass seems to make quite different design decisions to this pass. The LLVM pass does limit the maximum size of allocations moved to the stack, but does not attempt to avoid placing allocations inside of loops (and does not seem to attempt to move allocations to the entry block). The LLVM pass also supports exceptions, which this pass does not (as there are no exceptions in Fortran).

There is also a similar MLIR pass (promote buffers to stack). The MLIR pass operates on the memref dialect, which is a slightly different problem space because there are no explicit free() instructions to detect. Furthermore, the MLIR pass does not attempt to hoist allocations outside of loops and only detects structured loop operations (LoopLikeOpInterface) not loops formed from branch operations in the control flow graph.

Thanks for this patch @tblah. I had a first look. See comments inline. Have not gone through the core part in full yet.

flang/lib/Optimizer/Transforms/StackArrays.cpp
433	Do we have a test with multiple returns?
535	The op might require a check before further use. See the following test from `arrexp.fir`. (run with `./bin/tco f4.fir`) func.func @f4(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : index, %p : index, %f : f32) { %c1 = arith.constant 1 : index %s = fir.shape_shift %o, %n, %p, %m : (index, index, index, index) -> !fir.shapeshift<2> %vIn = fir.array_load %a(%s) : (!fir.ref<!fir.array<?x?xf32>>, !fir.shapeshift<2>) -> !fir.array<?x?xf32> %wIn = fir.array_load %b(%s) : (!fir.ref<!fir.array<?x?xf32>>, !fir.shapeshift<2>) -> !fir.array<?x?xf32> %r = fir.do_loop %j = %p to %m step %c1 iter_args(%v1 = %vIn) -> !fir.array<?x?xf32> { %r = fir.do_loop %i = %o to %n step %c1 iter_args(%v = %v1) -> !fir.array<?x?xf32> { %x2 = fir.array_fetch %vIn, %i, %j : (!fir.array<?x?xf32>, index, index) -> f32 %x = fir.array_fetch %wIn, %i, %j : (!fir.array<?x?xf32>, index, index) -> f32 %y = arith.addf %x, %f : f32 %y2 = arith.addf %y, %x2 : f32 %i2 = arith.addi %i, %c1 : index %r = fir.array_update %v, %y2, %i2, %j : (!fir.array<?x?xf32>, f32, index, index) -> !fir.array<?x?xf32> fir.result %r : !fir.array<?x?xf32> } fir.result %r : !fir.array<?x?xf32> } fir.array_merge_store %vIn, %r to %a : !fir.array<?x?xf32>, !fir.array<?x?xf32>, !fir.ref<!fir.array<?x?xf32>> return }
620–622	Nit: Braces not required.
630–638	Nit: Move all these close to the creation of the `fir:AllocaOp`.
640–641	Nit: Use braces for the `if` block to keep it uniform with the `else` block.
640–642	Nit: Use braces here to match `else`.
696–697	From the following code, it seems the functions are processed independently. Can this be a `Function` pass?
flang/test/Transforms/stack-arrays.f90
136	Nit: Remove usage of `%0`.
flang/test/Transforms/stack-arrays.fir
39–49	Would it be better to capture the variables and check? At least the allocmem and freemem.
203	Is this a case for future improvement?

Thanks for review.

Changes:

Join the lattices at each return operation to ensure that values are freed at *all* returns, not only *some* return
Add tests with multiple return operations
Fix nits

flang/lib/Optimizer/Transforms/StackArrays.cpp
433	Thanks for this. It turned out I needed to join across all of the lattices at the return statements to ensure that values were returned at all return statements, not at any return statement.
696–697	It can't. `fir::factory::getLlvm::getStackSave` and `fir::factory::getLlvmSatckRestore` add function declarations to the module-level. If functions are processed in different threads, there is a race condition when the `fir::builder` first checks to see if the function already exists in the module and if not, adds it.
flang/test/Transforms/stack-arrays.fir
203	Yes. This is an open TODO. I'll add a comment. It should be possible to still do stack save/restore if the block containing the free is always executed after the memalloc. This might already be guaranteed by the data-flow analysis - I haven't thought enough about it. I haven't seen this happen in the allocations automatically generated by flang, so I don't think it is important to solve now.

Harbormaster completed remote builds in B211044: Diff 493692.Jan 31 2023, 12:37 PM

Looks OK. I have a few questions and some minor comments inline. It might be good to pull in a bit of info from the RFC into the Summary, particularly saying why a dataflow analysis is necessary, what operations are involved in the analysis etc.

Could we have used the Dominance and PostDominance information to find out the Allocs and Frees that could have been replaced? I saw the following functions for individual Ops but not for the case where a set of ops dominates or post-dominates. So may be not with the existing infra.

bool DominanceInfo::properlyDominatesImpl(Operation *a, Operation *b
bool PostDominanceInfo::properlyPostDominates(Operation *a, Operation *b) {

I guess, we are not capturing the following because of different values.

module {
  func.func @dfa2(%arg0: i1) {
    cf.cond_br %arg0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    %a = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%a : !fir.heap<!fir.array<1xi8>>)
  ^bb2:  // pred: ^bb0
    %b = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%b : !fir.heap<!fir.array<1xi8>>)
  ^bb3(%0: !fir.heap<!fir.array<1xi8>>):  // 2 preds: ^bb1, ^bb2
    fir.freemem %0 : !fir.heap<!fir.array<1xi8>>
    return
  }
}

flang/lib/Optimizer/Transforms/StackArrays.cpp
437–439	Nit: No brace here
443	A comment here would be useful on why we need to look at the freed values only.
483–487	Nit: Braces might not be require here.
533	Might be worth checking whether we have a function for this in MLIR core.
546–548	Theoretically speaking, we can use the dominance info to determine whether one block dominates the other as well to handle cases like the following where we are finding the operands of `func`. But I guess that is probably not required. b1: x = opA br b2 b2: y = opB br b3 b3: z = func(x,y)
561	Do we have a test for this, and in general for the OpenMP handling?
573–575	Nit: No need for braces here.
594–598	Nit: braces are not required here.
696–697	Not for this patch: May be these can all be preinserted at the beginning of the pass pipeline and removed if not used at the end of the pass pipeline?
733	Nit: Is this error usually given in passes?

Changes: fix nits from review

In D140415#4098170, @kiranchandramohan wrote:
Looks OK. I have a few questions and some minor comments inline. It might be good to pull in a bit of info from the RFC into the Summary, particularly saying why a dataflow analysis is necessary, what operations are involved in the analysis etc.

Could we have used the Dominance and PostDominance information to find out the Allocs and Frees that could have been replaced? I saw the following functions for individual Ops but not for the case where a set of ops dominates or post-dominates. So may be not with the existing infra.
bool DominanceInfo::properlyDominatesImpl(Operation *a, Operation *b
bool PostDominanceInfo::properlyPostDominates(Operation *a, Operation *b) {
I guess, we are not capturing the following because of different values.
module {
  func.func @dfa2(%arg0: i1) {
    cf.cond_br %arg0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    %a = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%a : !fir.heap<!fir.array<1xi8>>)
  ^bb2:  // pred: ^bb0
    %b = fir.allocmem !fir.array<1xi8>
    cf.br ^bb3(%b : !fir.heap<!fir.array<1xi8>>)
  ^bb3(%0: !fir.heap<!fir.array<1xi8>>):  // 2 preds: ^bb1, ^bb2
    fir.freemem %0 : !fir.heap<!fir.array<1xi8>>
    return
  }
}

Yes we could have used Dominance and PostDominance information to find out if an allocation is always freed. I wasn't aware of mlir::DominanceInfo at the time I wrote this patch. As It is already written, I think the data flow analysis continues to be the correct approach because it will skip dead code (after constant propagation) and I suspect the worst case algorithmic complexity is better than computing dominance between each heap allocation and free.

Yes in that case we cannot detect that the allocation is freed because the free operates on a different SSA value to the allocations. This would have been a problem whether mlir::DominanceInfo or mlir::DataFlowAnalysis were used. I chose not to support allocations and frees using different SSA values as this would have added considerable complexity and is not necessary for the more common cases of Flang-generated allocations. See the RFC for details.

flang/lib/Optimizer/Transforms/StackArrays.cpp
533	Not that I can find. The MLIR verifier checks that all operation arguments properly dominate the operation, but this is done by comparing each in turn against the operation: no last operand is found. I could use mlir::DominanceInfo to find when the last operand becomes available, which I guess would better handle the case where operands are defined in different blocks. But dominance only provides a partial ordering so there might be cases where `domInfo.properlyDominates(arg1, arg2) == domInfo.properlyDominates(arg2, arg1) == false`. Looking at the direct operation ordering only within the same block (as I do here) guarantees a total ordering relationship.
546–548	Thank you for pointing out `mlir::DominanceInfo` - I was not aware of that analysis. I propose we keep this pass as it is for now, to avoid adding more complexity where we don't have a concrete example of flang-generated allocations which need to support alloca arguments defined in different blocks.
561	When writing the tests I discovered that the data flow analysis does not propagate lattices into or out of an omp.section, so currently no allocations inside of an openmp secton will be moved to the stack. I intend to handle this in a subsequent patch. In the meantime I have added a test to make sure that allocations in an openmp region are not moved.
733	Sorry I don't understand. What change are you requesting here?

tblah edited the summary of this revision. (Show Details)Feb 2 2023, 6:25 AM

tblah added inline comments.Feb 2 2023, 7:18 AM

flang/lib/Optimizer/Transforms/StackArrays.cpp
733	I've checked some other FIR passes and they all follow the same pattern.

Harbormaster completed remote builds in B211475: Diff 494283.Feb 2 2023, 8:01 AM

LGTM. Please wait till end of day Monday before you submit to provide other reviewers with a chance to provide further comments or request changes.

You can consider inlining D141401 in this pass till it is merged.

This revision is now accepted and ready to land.Feb 3 2023, 6:14 AM

Changes: inline mlir::blockIsInLoop

Harbormaster completed remote builds in B212056: Diff 495073.Feb 6 2023, 5:05 AM

I do not have any further comments, thanks.

Closed by commit rGcc14bf22bddf: [flang] add a pass to move array temporaries to the stack (authored by tblah). · Explain WhyFeb 7 2023, 2:28 AM

This revision was automatically updated to reflect the committed changes.

tblah added a commit: rGcc14bf22bddf: [flang] add a pass to move array temporaries to the stack.

tblah mentioned this in rGbf81ba372628: [flang] add -fstack-arrays flag.

Revision Contents

Path

Size

clang/

docs/

tools/

clang-formatted-files.txt

1 line

flang/

include/

flang/

Optimizer/

Builder/

MutableBox.h

4 lines

Dialect/

FIRAttr.h

9 lines

Transforms/

Passes.h

1 line

Passes.td

10 lines

lib/

Lower/

Allocatable.cpp

3 lines

Optimizer/

Builder/

MutableBox.cpp

16 lines

Transforms/

CMakeLists.txt

1 line

StackArrays.cpp

736 lines

test/

Lower/

HLFIR/

allocatable-and-pointer-status-change.f90

4 lines

Intrinsics/

c_loc.f90

2 lines

system_clock.f90

2 lines

Transforms/

stack-arrays.f90

140 lines

stack-arrays.fir

242 lines

Diff 492044

clang/docs/tools/clang-formatted-files.txt

	Show First 20 Lines • Show All 2,293 Lines • ▼ Show 20 Lines
	flang/lib/Optimizer/Transforms/AffinePromotion.cpp			flang/lib/Optimizer/Transforms/AffinePromotion.cpp
	flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp			flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp
	flang/lib/Optimizer/Transforms/CharacterConversion.cpp			flang/lib/Optimizer/Transforms/CharacterConversion.cpp
	flang/lib/Optimizer/Transforms/ExternalNameConversion.cpp			flang/lib/Optimizer/Transforms/ExternalNameConversion.cpp
	flang/lib/Optimizer/Transforms/MemoryAllocation.cpp			flang/lib/Optimizer/Transforms/MemoryAllocation.cpp
	flang/lib/Optimizer/Transforms/MemRefDataFlowOpt.cpp			flang/lib/Optimizer/Transforms/MemRefDataFlowOpt.cpp
	flang/lib/Optimizer/Transforms/PassDetail.h			flang/lib/Optimizer/Transforms/PassDetail.h
	flang/lib/Optimizer/Transforms/RewriteLoop.cpp			flang/lib/Optimizer/Transforms/RewriteLoop.cpp
				flang/lib/Optimizer/Transforms/StackArrays.cpp
	flang/lib/Parser/basic-parsers.h			flang/lib/Parser/basic-parsers.h
	flang/lib/Parser/char-block.cpp			flang/lib/Parser/char-block.cpp
	flang/lib/Parser/char-buffer.cpp			flang/lib/Parser/char-buffer.cpp
	flang/lib/Parser/char-set.cpp			flang/lib/Parser/char-set.cpp
	flang/lib/Parser/characters.cpp			flang/lib/Parser/characters.cpp
	flang/lib/Parser/debug-parser.cpp			flang/lib/Parser/debug-parser.cpp
	flang/lib/Parser/debug-parser.h			flang/lib/Parser/debug-parser.h
	flang/lib/Parser/executable-parsers.cpp			flang/lib/Parser/executable-parsers.cpp
	▲ Show 20 Lines • Show All 6,532 Lines • Show Last 20 Lines

flang/include/flang/Optimizer/Builder/MutableBox.h

	Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	/// Finalize a mutable box if it is allocated or associated. This includes both			/// Finalize a mutable box if it is allocated or associated. This includes both
	/// calling the finalizer, if any, and deallocating the storage.			/// calling the finalizer, if any, and deallocating the storage.
	void genFinalization(fir::FirOpBuilder &builder, mlir::Location loc,			void genFinalization(fir::FirOpBuilder &builder, mlir::Location loc,
	const fir::MutableBoxValue &box);			const fir::MutableBoxValue &box);

	void genInlinedAllocation(fir::FirOpBuilder &builder, mlir::Location loc,			void genInlinedAllocation(fir::FirOpBuilder &builder, mlir::Location loc,
	const fir::MutableBoxValue &box,			const fir::MutableBoxValue &box,
	mlir::ValueRange lbounds, mlir::ValueRange extents,			mlir::ValueRange lbounds, mlir::ValueRange extents,
	mlir::ValueRange lenParams,			mlir::ValueRange lenParams, llvm::StringRef allocName,
	llvm::StringRef allocName);			bool mustBeHeap = false);

	void genInlinedDeallocate(fir::FirOpBuilder &builder, mlir::Location loc,			void genInlinedDeallocate(fir::FirOpBuilder &builder, mlir::Location loc,
	const fir::MutableBoxValue &box);			const fir::MutableBoxValue &box);

	/// When the MutableBoxValue was passed as a fir.ref<fir.box> to a call that may			/// When the MutableBoxValue was passed as a fir.ref<fir.box> to a call that may
	/// have modified it, update the MutableBoxValue according to the			/// have modified it, update the MutableBoxValue according to the
	/// fir.ref<fir.box> value.			/// fir.ref<fir.box> value.
	void syncMutableBoxFromIRBox(fir::FirOpBuilder &builder, mlir::Location loc,			void syncMutableBoxFromIRBox(fir::FirOpBuilder &builder, mlir::Location loc,
	Show All 27 Lines

flang/include/flang/Optimizer/Dialect/FIRAttr.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	public:
using ValueType = mlir::Type;		using ValueType = mlir::Type;

static constexpr llvm::StringRef getAttrName() { return "class_is"; }		static constexpr llvm::StringRef getAttrName() { return "class_is"; }
static SubclassAttr get(mlir::Type value);		static SubclassAttr get(mlir::Type value);

mlir::Type getType() const;		mlir::Type getType() const;
};		};

		/// Attribute which can be applied to a fir.allocmem operation, specifying that
		/// the allocation may not be moved to the heap by passes
		class MustBeHeapAttr : public mlir::BoolAttr {
		public:
		using BoolAttr::BoolAttr;

		static constexpr llvm::StringRef getAttrName() { return "fir.must_be_heap"; }
		};

// Attributes for building SELECT CASE multiway branches		// Attributes for building SELECT CASE multiway branches

/// A closed interval (including the bound values) is an interval with both an		/// A closed interval (including the bound values) is an interval with both an
/// upper and lower bound as given as ssa-values.		/// upper and lower bound as given as ssa-values.
/// A case selector of `CASE (n:m)` corresponds to any value from `n` to `m` and		/// A case selector of `CASE (n:m)` corresponds to any value from `n` to `m` and
/// is encoded as `#fir.interval, %n, %m`.		/// is encoded as `#fir.interval, %n, %m`.
class ClosedIntervalAttr		class ClosedIntervalAttr
: public mlir::Attribute::AttrBase<ClosedIntervalAttr, mlir::Attribute,		: public mlir::Attribute::AttrBase<ClosedIntervalAttr, mlir::Attribute,
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

flang/include/flang/Optimizer/Transforms/Passes.h

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	std::unique_ptr<mlir::Pass>			std::unique_ptr<mlir::Pass>
	createArrayValueCopyPass(fir::ArrayValueCopyOptions options = {});			createArrayValueCopyPass(fir::ArrayValueCopyOptions options = {});
	std::unique_ptr<mlir::Pass> createFirToCfgPass();			std::unique_ptr<mlir::Pass> createFirToCfgPass();
	std::unique_ptr<mlir::Pass> createCharacterConversionPass();			std::unique_ptr<mlir::Pass> createCharacterConversionPass();
	std::unique_ptr<mlir::Pass> createExternalNameConversionPass();			std::unique_ptr<mlir::Pass> createExternalNameConversionPass();
	std::unique_ptr<mlir::Pass> createMemDataFlowOptPass();			std::unique_ptr<mlir::Pass> createMemDataFlowOptPass();
	std::unique_ptr<mlir::Pass> createPromoteToAffinePass();			std::unique_ptr<mlir::Pass> createPromoteToAffinePass();
	std::unique_ptr<mlir::Pass> createMemoryAllocationPass();			std::unique_ptr<mlir::Pass> createMemoryAllocationPass();
				std::unique_ptr<mlir::Pass> createStackArraysPass();
	std::unique_ptr<mlir::Pass> createSimplifyIntrinsicsPass();			std::unique_ptr<mlir::Pass> createSimplifyIntrinsicsPass();
	std::unique_ptr<mlir::Pass> createAddDebugFoundationPass();			std::unique_ptr<mlir::Pass> createAddDebugFoundationPass();

	std::unique_ptr<mlir::Pass>			std::unique_ptr<mlir::Pass>
	createMemoryAllocationPass(bool dynOnHeap, std::size_t maxStackSize);			createMemoryAllocationPass(bool dynOnHeap, std::size_t maxStackSize);
	std::unique_ptr<mlir::Pass> createAnnotateConstantOperandsPass();			std::unique_ptr<mlir::Pass> createAnnotateConstantOperandsPass();
	std::unique_ptr<mlir::Pass> createSimplifyRegionLitePass();			std::unique_ptr<mlir::Pass> createSimplifyRegionLitePass();
	std::unique_ptr<mlir::Pass> createAlgebraicSimplificationPass();			std::unique_ptr<mlir::Pass> createAlgebraicSimplificationPass();
	Show All 10 Lines

flang/include/flang/Optimizer/Transforms/Passes.td

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	Option<"dynamicArrayOnHeap", "dynamic-array-on-heap",
"Allocate all arrays with runtime determined size on heap.">,		"Allocate all arrays with runtime determined size on heap.">,
Option<"maxStackArraySize", "maximum-array-alloc-size",		Option<"maxStackArraySize", "maximum-array-alloc-size",
"std::size_t", /default=/"~static_cast<std::size_t>(0)",		"std::size_t", /default=/"~static_cast<std::size_t>(0)",
"Set maximum number of elements of an array allocated on the stack.">		"Set maximum number of elements of an array allocated on the stack.">
];		];
let constructor = "::fir::createMemoryAllocationPass()";		let constructor = "::fir::createMemoryAllocationPass()";
}		}

		def StackArrays : Pass<"stack-arrays", "mlir::ModuleOp"> {
		let summary = "Move local array allocations from heap memory into stack memory";
		let description = [{
		Convert heap allocations for arrays, even those of unknown size, into stack
		allocations.
		}];
		let dependentDialects = [ "fir::FIROpsDialect" ];
		let constructor = "::fir::createStackArraysPass()";
		}

def SimplifyRegionLite : Pass<"simplify-region-lite", "mlir::ModuleOp"> {		def SimplifyRegionLite : Pass<"simplify-region-lite", "mlir::ModuleOp"> {
let summary = "Region simplification";		let summary = "Region simplification";
let description = [{		let description = [{
Run region DCE and erase unreachable blocks in regions.		Run region DCE and erase unreachable blocks in regions.
}];		}];
let constructor = "::fir::createSimplifyRegionLitePass()";		let constructor = "::fir::createSimplifyRegionLitePass()";
}		}

Show All 14 Lines

flang/lib/Lower/Allocatable.cpp

Show First 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	for (const Fortran::parser::AllocateShapeSpec &shapeSpec :
mlir::Value diff = builder.create<mlir::arith::SubIOp>(loc, ub, lb);		mlir::Value diff = builder.create<mlir::arith::SubIOp>(loc, ub, lb);
extents.emplace_back(		extents.emplace_back(
builder.create<mlir::arith::AddIOp>(loc, diff, one));		builder.create<mlir::arith::AddIOp>(loc, diff, one));
} else {		} else {
extents.emplace_back(ub);		extents.emplace_back(ub);
}		}
}		}
fir::factory::genInlinedAllocation(builder, loc, box, lbounds, extents,		fir::factory::genInlinedAllocation(builder, loc, box, lbounds, extents,
lenParams, mangleAlloc(alloc));		lenParams, mangleAlloc(alloc),
		/mustBeHeap=/true);
}		}

void genSimpleAllocation(const Allocation &alloc,		void genSimpleAllocation(const Allocation &alloc,
const fir::MutableBoxValue &box) {		const fir::MutableBoxValue &box) {
if (!box.isDerived() && !errorManager.hasStatSpec() &&		if (!box.isDerived() && !errorManager.hasStatSpec() &&
!alloc.type.IsPolymorphic() && !alloc.hasCoarraySpec() &&		!alloc.type.IsPolymorphic() && !alloc.hasCoarraySpec() &&
!useAllocateRuntime) {		!useAllocateRuntime) {
genInlinedAllocation(alloc, box);		genInlinedAllocation(alloc, box);
▲ Show 20 Lines • Show All 579 Lines • Show Last 20 Lines

flang/lib/Optimizer/Builder/MutableBox.cpp

Show All 10 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "flang/Optimizer/Builder/MutableBox.h"		#include "flang/Optimizer/Builder/MutableBox.h"
#include "flang/Optimizer/Builder/Character.h"		#include "flang/Optimizer/Builder/Character.h"
#include "flang/Optimizer/Builder/FIRBuilder.h"		#include "flang/Optimizer/Builder/FIRBuilder.h"
#include "flang/Optimizer/Builder/Runtime/Derived.h"		#include "flang/Optimizer/Builder/Runtime/Derived.h"
#include "flang/Optimizer/Builder/Runtime/Stop.h"		#include "flang/Optimizer/Builder/Runtime/Stop.h"
#include "flang/Optimizer/Builder/Todo.h"		#include "flang/Optimizer/Builder/Todo.h"
		#include "flang/Optimizer/Dialect/FIRAttr.h"
#include "flang/Optimizer/Dialect/FIROps.h"		#include "flang/Optimizer/Dialect/FIROps.h"
#include "flang/Optimizer/Dialect/FIROpsSupport.h"		#include "flang/Optimizer/Dialect/FIROpsSupport.h"
#include "flang/Optimizer/Support/FatalError.h"		#include "flang/Optimizer/Support/FatalError.h"

/// Create a fir.box describing the new address, bounds, and length parameters		/// Create a fir.box describing the new address, bounds, and length parameters
/// for a MutableBox \p box.		/// for a MutableBox \p box.
static mlir::Value		static mlir::Value
createNewFirBox(fir::FirOpBuilder &builder, mlir::Location loc,		createNewFirBox(fir::FirOpBuilder &builder, mlir::Location loc,
▲ Show 20 Lines • Show All 680 Lines • ▼ Show 20 Lines	if (box.getEleTy().isa<fir::RecordType>()) {
// somehow.		// somehow.
mlir::Value irBox = createNewFirBox(builder, loc, box, newStorage,		mlir::Value irBox = createNewFirBox(builder, loc, box, newStorage,
std::nullopt, extents, lengths);		std::nullopt, extents, lengths);
fir::runtime::genDerivedTypeInitialize(builder, loc, irBox);		fir::runtime::genDerivedTypeInitialize(builder, loc, irBox);
}		}
return newStorage;		return newStorage;
}		}

void fir::factory::genInlinedAllocation(fir::FirOpBuilder &builder,		void fir::factory::genInlinedAllocation(
mlir::Location loc,		fir::FirOpBuilder &builder, mlir::Location loc,
const fir::MutableBoxValue &box,		const fir::MutableBoxValue &box, mlir::ValueRange lbounds,
mlir::ValueRange lbounds,		mlir::ValueRange extents, mlir::ValueRange lenParams,
mlir::ValueRange extents,		llvm::StringRef allocName, bool mustBeHeap) {
mlir::ValueRange lenParams,
llvm::StringRef allocName) {
auto lengths = getNewLengths(builder, loc, box, lenParams);		auto lengths = getNewLengths(builder, loc, box, lenParams);
llvm::SmallVector<mlir::Value> safeExtents;		llvm::SmallVector<mlir::Value> safeExtents;
for (mlir::Value extent : extents)		for (mlir::Value extent : extents)
safeExtents.push_back(fir::factory::genMaxWithZero(builder, loc, extent));		safeExtents.push_back(fir::factory::genMaxWithZero(builder, loc, extent));
auto heap = builder.create<fir::AllocMemOp>(loc, box.getBaseTy(), allocName,		auto heap = builder.create<fir::AllocMemOp>(loc, box.getBaseTy(), allocName,
lengths, safeExtents);		lengths, safeExtents);
MutablePropertyWriter{builder, loc, box}.updateMutableBox(		MutablePropertyWriter{builder, loc, box}.updateMutableBox(
heap, lbounds, safeExtents, lengths);		heap, lbounds, safeExtents, lengths);
if (box.getEleTy().isa<fir::RecordType>()) {		if (box.getEleTy().isa<fir::RecordType>()) {
// TODO: skip runtime initialization if this is not required. Currently,		// TODO: skip runtime initialization if this is not required. Currently,
// there is no way to know here if a derived type needs it or not. But the		// there is no way to know here if a derived type needs it or not. But the
// information is available at compile time and could be reflected here		// information is available at compile time and could be reflected here
// somehow.		// somehow.
mlir::Value irBox = fir::factory::getMutableIRBox(builder, loc, box);		mlir::Value irBox = fir::factory::getMutableIRBox(builder, loc, box);
		clementvalUnsubmitted Done Reply Inline Actions Can you define the attr in the central place so we don't hardcode the name here and in the passes. `https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/Dialect/FIRAttr.h` or `https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/Dialect/FIRAttr.td` clementval: Can you define the attr in the central place so we don't hardcode the name here and in the…
fir::runtime::genDerivedTypeInitialize(builder, loc, irBox);		fir::runtime::genDerivedTypeInitialize(builder, loc, irBox);
}		}

		heap->setAttr(fir::MustBeHeapAttr::getAttrName(),
		fir::MustBeHeapAttr::get(builder.getContext(), mustBeHeap));
}		}

void fir::factory::genInlinedDeallocate(fir::FirOpBuilder &builder,		void fir::factory::genInlinedDeallocate(fir::FirOpBuilder &builder,
mlir::Location loc,		mlir::Location loc,
const fir::MutableBoxValue &box) {		const fir::MutableBoxValue &box) {
auto addr = MutablePropertyReader(builder, loc, box).readBaseAddress();		auto addr = MutablePropertyReader(builder, loc, box).readBaseAddress();
genFinalizeAndFree(builder, loc, addr);		genFinalizeAndFree(builder, loc, addr);
MutablePropertyWriter{builder, loc, box}.setUnallocatedStatus();		MutablePropertyWriter{builder, loc, box}.setUnallocatedStatus();
▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

flang/lib/Optimizer/Transforms/CMakeLists.txt

	add_flang_library(FIRTransforms			add_flang_library(FIRTransforms
	AbstractResult.cpp			AbstractResult.cpp
	AffinePromotion.cpp			AffinePromotion.cpp
	AffineDemotion.cpp			AffineDemotion.cpp
	AnnotateConstant.cpp			AnnotateConstant.cpp
	CharacterConversion.cpp			CharacterConversion.cpp
	ControlFlowConverter.cpp			ControlFlowConverter.cpp
	ArrayValueCopy.cpp			ArrayValueCopy.cpp
	ExternalNameConversion.cpp			ExternalNameConversion.cpp
	MemoryAllocation.cpp			MemoryAllocation.cpp
				StackArrays.cpp
	MemRefDataFlowOpt.cpp			MemRefDataFlowOpt.cpp
	SimplifyRegionLite.cpp			SimplifyRegionLite.cpp
	AlgebraicSimplification.cpp			AlgebraicSimplification.cpp
	SimplifyIntrinsics.cpp			SimplifyIntrinsics.cpp
	AddDebugFoundation.cpp			AddDebugFoundation.cpp

	DEPENDS			DEPENDS
	FIRBuilder			FIRBuilder
	Show All 15 Lines

flang/lib/Optimizer/Transforms/StackArrays.cpp

This file was added.

				//===- StackArrays.cpp ----------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "flang/Optimizer/Builder/FIRBuilder.h"
				#include "flang/Optimizer/Builder/LowLevelIntrinsics.h"
				#include "flang/Optimizer/Dialect/FIRAttr.h"
				#include "flang/Optimizer/Dialect/FIRDialect.h"
				#include "flang/Optimizer/Dialect/FIROps.h"
				#include "flang/Optimizer/Dialect/FIRType.h"
				#include "flang/Optimizer/Support/FIRContext.h"
				#include "flang/Optimizer/Transforms/Passes.h"
				#include "mlir/Analysis/DataFlow/ConstantPropagationAnalysis.h"
				#include "mlir/Analysis/DataFlow/DeadCodeAnalysis.h"
				#include "mlir/Analysis/DataFlow/DenseAnalysis.h"
				#include "mlir/Analysis/DataFlowFramework.h"
				#include "mlir/Dialect/Func/IR/FuncOps.h"
				#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
				#include "mlir/IR/Builders.h"
				#include "mlir/IR/Diagnostics.h"
				#include "mlir/IR/Value.h"
				#include "mlir/Interfaces/LoopLikeInterface.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Support/LogicalResult.h"
				#include "mlir/Transforms/DialectConversion.h"
				#include "mlir/Transforms/Passes.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/PointerUnion.h"
				#include "llvm/Support/Casting.h"
				#include "llvm/Support/raw_ostream.h"
				#include <optional>

				namespace fir {
				#define GEN_PASS_DEF_STACKARRAYS
				#include "flang/Optimizer/Transforms/Passes.h.inc"
				} // namespace fir

				#define DEBUG_TYPE "stack-arrays"

				namespace {

				/// The state of an SSA value at each program point
				enum class AllocationState {
				/// This means that the allocation state of a variable cannot be determined
				/// at this program point, e.g. because one route through a conditional freed
				/// the variable and the other route didn't.
				/// This asserts a known-unknown: different from the unknown-unknown of having
				/// no AllocationState stored for a particular SSA value
				Unknown,
				/// Means this SSA value was allocated on the heap in this function and has
				/// now been freed
				Freed,
				/// Means this SSA value was allocated on the heap in this function and is a
				/// candidate for moving to the stack
				Allocated,
				};

				/// Stores where an alloca should be inserted. If the PointerUnion is an
				/// Operation the alloca should be inserted /after/ the operation. If it is a
				/// block, the alloca can be placed anywhere in that block.
				class InsertionPoint {
				llvm::PointerUnion<mlir::Operation , mlir::Block > location;
				bool saveRestoreStack;

				/// Get contained pointer type or nullptr
				template <class T>
				T *tryGetPtr() const {
				if (location.is<T *>())
				return location.get<T *>();
				return nullptr;
				}

				public:
				template <class T>
				InsertionPoint(T *ptr, bool saveRestoreStack = false)
				: location(ptr), saveRestoreStack{saveRestoreStack} {}
				InsertionPoint(std::nullptr_t null)
				: location(null), saveRestoreStack{false} {}

				/// Get contained operation, or nullptr
				mlir::Operation *tryGetOperation() const {
				return tryGetPtr<mlir::Operation>();
				}

				/// Get contained block, or nullptr
				mlir::Block *tryGetBlock() const { return tryGetPtr<mlir::Block>(); }

				/// Get whether the stack should be saved/restored. If yes, an llvm.stacksave
				/// intrinsic should be added before the alloca, and an llvm.stackrestore
				/// intrinsic should be added where the freemem is
				bool shouldSaveRestoreStack() const { return saveRestoreStack; }

				operator bool() const { return tryGetOperation() \|\| tryGetBlock(); }

				bool operator==(const InsertionPoint &rhs) const {
				return (location == rhs.location) &&
				tschuettUnsubmitted Done Reply Inline Actions `virtual` is not necessary. tschuett: `virtual` is not necessary.
				(saveRestoreStack == rhs.saveRestoreStack);
				}

				bool operator!=(const InsertionPoint &rhs) const { return !(*this == rhs); }
				};
				jeanPerierUnsubmitted Done Reply Inline Actions It's better to negate the `== operator` here so that the implementation logic cannot diverge. jeanPerier: It's better to negate the `== operator` here so that the implementation logic cannot diverge.

				/// Maps SSA values to their AllocationState at a particular program point.
				/// Also caches the insertion points for the new alloca operations
				class LatticePoint : public mlir::dataflow::AbstractDenseLattice {
				// Maps all values we are interested in to states
				llvm::SmallDenseMap<mlir::Value, AllocationState, 1> stateMap;

				public:
				MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(LatticePoint)
				using AbstractDenseLattice::AbstractDenseLattice;

				bool operator==(const LatticePoint &rhs) const {
				return stateMap == rhs.stateMap;
				}

				/// Join the lattice accross control-flow edges
				jeanPerierUnsubmitted Done Reply Inline Actions It is definitely weird to me to have this in the lattice points. It seems expensive to save this at every program point and to compute it every time a state changes on a maybe not final candiate. Why not computing in StackArraysAnalysisWrapper::analyseFunction on the final candidates (the value in stateMap at that are freed on all return paths) ? jeanPerier: It is definitely weird to me to have this in the lattice points. It seems expensive to save…
				tblahAuthorUnsubmitted Done Reply Inline Actions Good idea. Thanks! tblah: Good idea. Thanks!
				mlir::ChangeResult join(const AbstractDenseLattice &lattice) override;

				void print(llvm::raw_ostream &os) const override;

				/// Clear all modifications
				mlir::ChangeResult reset();

				/// Set the state of an SSA value
				mlir::ChangeResult set(mlir::Value value, AllocationState state);

				/// Get fir.allocmem ops which were allocated in this function and always
				/// freed before the function returns, plus whre to insert replacement
				/// fir.alloca ops
				void appendFreedValues(llvm::DenseSet<mlir::Value> &out) const;

				std::optional<AllocationState> get(mlir::Value val) const;
				};

				class AllocationAnalysis
				: public mlir::dataflow::DenseDataFlowAnalysis<LatticePoint> {
				public:
				using DenseDataFlowAnalysis::DenseDataFlowAnalysis;

				void visitOperation(mlir::Operation *op, const LatticePoint &before,
				LatticePoint *after) override;

				/// At an entry point, the last modifications of all memory resources are
				/// yet to be determined
				void setToEntryState(LatticePoint *lattice) override;

				protected:
				/// Visit control flow operations and decide whether to call visitOperation
				/// to apply the transfer function
				void processOperation(mlir::Operation *op) override;
				};

				/// Drives analysis to find candidate fir.allocmem operations which could be
				/// moved to the stack. Intended to be used with mlir::Pass::getAnalysis
				class StackArraysAnalysisWrapper {
				public:
				MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(StackArraysAnalysisWrapper)

				// Maps fir.allocmem -> place to insert alloca
				using AllocMemMap = llvm::DenseMap<mlir::Operation *, InsertionPoint>;

				StackArraysAnalysisWrapper(mlir::Operation *op) {}

				bool hasErrors() const;

				const AllocMemMap &getCandidateOps(mlir::Operation *func);

				private:
				llvm::DenseMap<mlir::Operation *, AllocMemMap> funcMaps;
				bool gotError = false;

				void analyseFunction(mlir::Operation *func);
				};

				/// Converts a fir.allocmem to a fir.alloca
				class AllocMemConversion : public mlir::OpRewritePattern<fir::AllocMemOp> {
				public:
				using OpRewritePattern::OpRewritePattern;

				AllocMemConversion(
				mlir::MLIRContext *ctx,
				const llvm::DenseMap<mlir::Operation *, InsertionPoint> &candidateOps);

				mlir::LogicalResult
				matchAndRewrite(fir::AllocMemOp allocmem,
				mlir::PatternRewriter &rewriter) const override;

				/// Determine where to insert the alloca operation. The returned value should
				/// be checked to see if it is inside a loop
				static InsertionPoint findAllocaInsertionPoint(fir::AllocMemOp &oldAlloc);

				private:
				/// allocmem operations that DFA has determined are safe to move to the stack
				/// mapping to where to insert replacement freemem operations
				const llvm::DenseMap<mlir::Operation *, InsertionPoint> &candidateOps;

				/// If we failed to find an insertion point not inside a loop, see if it would
				/// be safe to use an llvm.stacksave/llvm.stackrestore inside the loop
				static InsertionPoint findAllocaLoopInsertionPoint(fir::AllocMemOp &oldAlloc);

				/// Returns the alloca if it was successfully inserted, otherwise {}
				jeanPerierUnsubmitted Done Reply Inline Actions I do not really get why the allocation state is "Allocated" in this case. Do you have an example ? From what I understand, Unknown was obtained by joining a Freed and Allocated state. So if we merge another Allocated state after this, there isn't there still a path where the status would be Freed ? jeanPerier: I do not really get why the allocation state is "Allocated" in this case. Do you have an…
				tblahAuthorUnsubmitted Done Reply Inline Actions For example (I can't think of a way of writing this without double frees) integer, allocatable :: arr(:) logical :: b ... ! state for arr is {} if (b) then allocate(a(n)) ! state for arr is alllocated endif ! state for arr is allocated if (b) then deallocate(arr) ! state for arr is freed endif ! state for arr is join(allocated, freed) = unknown if (!b) then deallocate(arr) ! state for arr is freed endif ! state for arr is join(unknown, freed) = unknown if (!b) then allocate(arr(n)) ! state for arr is allocated endif ! state for arr is join(unknown, allocated) = allocated deallocate(arr(n)) ! state for arr is freed The difference in handling between allocation and frees is because it is safe to allocate memory which may not otherwise have been allocated (e.g. moving a heap allocation inside an if statement to a stack allocation at function scope), but it is not safe to free memory which may not otherwise have been freed (as it might be used later in execution) - for example moving some memory which is conditionally freed to a stack allocation which is invalid after the current function returns. I think this is largely academic because in practice, once things are in different blocks, the allocation and free are likely to end up using a different SSA value to refer to the pointer and so the current analysis will not be clever enough to realize the same memory is refereed to. But it is important we get this right in case SSA value aliasing (e.g. via fir.result) is added later. tblah: For example (I can't think of a way of writing this without double frees) ``` integer…
				tblahAuthorUnsubmitted Done Reply Inline Actions Thinking about this, I think it is incorrect to allow both the `unknown -> allocated` and `allocated -> unknown`, as these could in-theory loop forever without converging. I will update the patch to fix this. tblah: Thinking about this, I think it is incorrect to allow both the `unknown -> allocated` and…
				tblahAuthorUnsubmitted Done Reply Inline Actions I now believe the analysis would still terminate because the original `{} -> allocated` step can only happen at a `fir.allocmem` operation. After looping, that same operation will now transition state `unknown -> allocated`. The lattice immediately after the `fir.allocmem` statement will be the same in both cases so at that point the analysis will have converged. tblah: I now believe the analysis would still terminate because the original `{} -> allocated` step…
				std::optional<fir::AllocaOp>
				insertAlloca(fir::AllocMemOp &oldAlloc,
				mlir::PatternRewriter &rewriter) const;

				/// Inserts a stacksave before oldAlloc and a stackrestore after each freemem
				void insertStackSaveRestore(fir::AllocMemOp &oldAlloc,
				mlir::PatternRewriter &rewriter) const;
				};

				class StackArraysPass : public fir::impl::StackArraysBase<StackArraysPass> {
				public:
				StackArraysPass() = default;
				StackArraysPass(const StackArraysPass &pass);

				llvm::StringRef getDescription() const override;

				void runOnOperation() override;
				void runOnFunc(mlir::Operation *func);

				private:
				Statistic runCount{this, "stackArraysRunCount",
				"Number of heap allocations moved to the stack"};
				};

				} // namespace

				static void print(llvm::raw_ostream &os, AllocationState state) {
				switch (state) {
				case AllocationState::Unknown:
				os << "Unknown";
				break;
				case AllocationState::Freed:
				os << "Freed";
				break;
				case AllocationState::Allocated:
				os << "Allocated";
				break;
				}
				}

				/// Join two AllocationStates for the same value coming from different CFG
				/// blocks
				static AllocationState join(AllocationState lhs, AllocationState rhs) {
				// \| Allocated \| Freed \| Unknown
				// ========= \| ========= \| ========= \| =========
				// Allocated \| Allocated \| Unknown \| Unknown
				// Freed \| Unknown \| Freed \| Unknown
				// Unknown \| Unknown \| Unknown \| Unknown
				if (lhs == rhs)
				return lhs;
				return AllocationState::Unknown;
				}

				mlir::ChangeResult LatticePoint::join(const AbstractDenseLattice &lattice) {
				jeanPerierUnsubmitted Done Reply Inline Actions If the Fortran loop is unstructured (it has branches leaving the loop), lowering has to create a CFG blocks and creating stack allocation could still lead to stack explosion: integer, parameter :: k = 100, m=1000000, n = km integer :: x(n) logical :: has_error do i=0,m-1 x(ki+1:k(i+1)) = x(k(i+1):ki+1:-1) if (has_error(x, k)) stop end do end Is there a way to detect that the block where the fir.alloca would be inserted may be its own successor and also be careful in this case ? Also, instead of "not fulfilling" -fstack-arrays in those cases, another solution could be to generate stack save / stack restore LLVM intrinsic calls (like at https://github.com/llvm/llvm-project/blob/a8234196c58396c0505ac93983dafee743a67b11/flang/lib/Lower/ConvertCall.cpp#L170). I am not sure though if it would be desirable inside OpenMP loops. jeanPerier:* If the Fortran loop is unstructured (it has branches leaving the loop), lowering has to create…
				tblahAuthorUnsubmitted Done Reply Inline Actions Thanks for the information. I can confirm the issue you suggest with CFG blocks. tblah: Thanks for the information. I can confirm the issue you suggest with CFG blocks.
				const auto &rhs = static_cast<const LatticePoint &>(lattice);
				mlir::ChangeResult changed = mlir::ChangeResult::NoChange;

				// add everything from rhs to map, handling cases where values are in both
				for (const auto &[value, rhsState] : rhs.stateMap) {
				auto it = stateMap.find(value);
				if (it != stateMap.end()) {
				// value is present in both maps
				AllocationState myState = it->second;
				AllocationState newState = ::join(myState, rhsState);
				if (newState != myState) {
				changed = mlir::ChangeResult::Change;
				it->getSecond() = newState;
				jeanPerierUnsubmitted Done Reply Inline Actions This is still odd to me because this breaks the monocity requirement of join: `join(join(freed, unknown), allocated) ) = join(unknown, allocated) = allocated` while `join(freed, join(unknown, allocated)) = join(freed, allocated) = unknown` I still do not think you need anything special here given the fact that an allocation done on a path is considered in the end already seems accounted for in LatticePoint::join since the state is added even if not present in the other latice. jeanPerier: This is still odd to me because this breaks the monocity requirement of join: `join(join(freed…
				}
				} else {
				// value not present in current map: add it
				stateMap.insert({value, rhsState});
				changed = mlir::ChangeResult::Change;
				}
				}

				return changed;
				}

				void LatticePoint::print(llvm::raw_ostream &os) const {
				for (const auto &[value, state] : stateMap) {
				os << value << ": ";
				::print(os, state);
				}
				}

				mlir::ChangeResult LatticePoint::reset() {
				if (stateMap.empty())
				return mlir::ChangeResult::NoChange;
				stateMap.clear();
				return mlir::ChangeResult::Change;
				}

				mlir::ChangeResult LatticePoint::set(mlir::Value value, AllocationState state) {
				if (stateMap.count(value)) {
				// already in map
				AllocationState &oldState = stateMap[value];
				if (oldState != state) {
				stateMap[value] = state;
				return mlir::ChangeResult::Change;
				}
				return mlir::ChangeResult::NoChange;
				}
				stateMap.insert({value, state});
				return mlir::ChangeResult::Change;
				}

				/// Get values which were allocated in this function and always freed before
				/// the function returns
				void LatticePoint::appendFreedValues(llvm::DenseSet<mlir::Value> &out) const {
				for (auto &[value, state] : stateMap) {
				if (state == AllocationState::Freed)
				out.insert(value);
				}
				}

				std::optional<AllocationState> LatticePoint::get(mlir::Value val) const {
				auto it = stateMap.find(val);
				if (it == stateMap.end())
				return {};
				return it->second;
				}

				void AllocationAnalysis::visitOperation(mlir::Operation *op,
				const LatticePoint &before,
				LatticePoint *after) {
				LLVM_DEBUG(llvm::dbgs() << "StackArrays: Visiting operation: " << *op
				<< "\n");
				LLVM_DEBUG(llvm::dbgs() << "--Lattice in: " << before << "\n");
				jeanPerierUnsubmitted Done Reply Inline Actions As mentioned in my other comment above, I do not get why the insertion point is computed at that point while it seems the analysis (after computing the states, and using the lattice state at the func.return) is not over for the function (I would expect insertion to be computed only for the successfully identified allocmem at the end, not the one that may be candidate on one code path). jeanPerier: As mentioned in my other comment above, I do not get why the insertion point is computed at…

				// propagate before -> after
				mlir::ChangeResult changed = after->join(before);

				if (auto allocmem = mlir::dyn_cast<fir::AllocMemOp>(op)) {
				assert(op->getNumResults() == 1 && "fir.allocmem has one result");
				auto attr = op->getAttrOfType<fir::MustBeHeapAttr>(
				fir::MustBeHeapAttr::getAttrName());
				if (attr && attr.getValue()) {
				LLVM_DEBUG(llvm::dbgs() << "--Found fir.must_be_heap: skipping\n");
				// skip allocation marked not to be moved
				return;
				}

				auto retTy = allocmem.getAllocatedType();
				if (!retTy.isa<fir::SequenceType>()) {
				LLVM_DEBUG(llvm::dbgs()
				<< "--Allocation is not for an array: skipping\n");
				return;
				jeanPerierUnsubmitted Done Reply Inline Actions I think the early return may be missing here. jeanPerier: I think the early return may be missing here.
				tblahAuthorUnsubmitted Done Reply Inline Actions Thanks, good spot! tblah: Thanks, good spot!
				}

				mlir::Value result = op->getResult(0);
				changed \|= after->set(result, AllocationState::Allocated);
				} else if (mlir::isa<fir::FreeMemOp>(op)) {
				assert(op->getNumOperands() == 1 && "fir.freemem has one operand");
				mlir::Value operand = op->getOperand(0);
				std::optional<AllocationState> operandState = before.get(operand);
				if (operandState && *operandState == AllocationState::Allocated) {
				// don't tag things not allocated in this function as freed, so that we
				// don't think they are candidates for moving to the stack
				changed \|= after->set(operand, AllocationState::Freed);
				}
				} else if (mlir::isa<fir::ResultOp>(op)) {
				mlir::Operation *parent = op->getParentOp();
				LatticePoint *parentLattice = getLattice(parent);
				assert(parentLattice);
				mlir::ChangeResult parentChanged = parentLattice->join(*after);
				propagateIfChanged(parentLattice, parentChanged);
				}

				// we pass lattices straight through fir.call because called functions should
				// not deallocate flang-generated array temporaries

				LLVM_DEBUG(llvm::dbgs() << "--Lattice out: " << *after << "\n");
				propagateIfChanged(after, changed);
				}

				void AllocationAnalysis::setToEntryState(LatticePoint *lattice) {
				propagateIfChanged(lattice, lattice->reset());
				}

				/// Mostly a copy of AbstractDenseLattice::processOperation - the difference
				/// being that call operations are passed through to the transfer function
				void AllocationAnalysis::processOperation(mlir::Operation *op) {
				// If the containing block is not executable, bail out.
				if (!getOrCreateFor<mlir::dataflow::Executable>(op, op->getBlock())->isLive())
				return;

				// Get the dense lattice to update
				mlir::dataflow::AbstractDenseLattice *after = getLattice(op);

				// If this op implements region control-flow, then control-flow dictates its
				// transfer function.
				if (auto branch = mlir::dyn_cast<mlir::RegionBranchOpInterface>(op))
				return visitRegionBranchOperation(op, branch, after);

				// pass call operations through to the transfer function

				// Get the dense state before the execution of the op.
				const mlir::dataflow::AbstractDenseLattice *before;
				if (mlir::Operation *prev = op->getPrevNode())
				before = getLatticeFor(op, prev);
				else
				before = getLatticeFor(op, op->getBlock());

				/// Invoke the operation transfer function
				visitOperationImpl(op, *before, after);
				}

				void StackArraysAnalysisWrapper::analyseFunction(mlir::Operation *func) {
				assert(mlir::isa<mlir::func::FuncOp>(func));
				mlir::DataFlowSolver solver;
				// constant propagation is required for dead code analysis, dead code analysis
				// is required to mark blocks live (required for mlir dense dfa)
				solver.load<mlir::dataflow::SparseConstantPropagation>();
				solver.load<mlir::dataflow::DeadCodeAnalysis>();

				auto [it, inserted] = funcMaps.try_emplace(func);
				AllocMemMap &candidateOps = it->second;

				solver.load<AllocationAnalysis>();
				if (failed(solver.initializeAndRun(func))) {
				llvm::errs() << "DataFlowSolver failed!";
				gotError = true;
				return;
				}

				llvm::DenseSet<mlir::Value> freedValues;
				func->walk([&](mlir::func::ReturnOp child) {
				kiranchandramohanUnsubmitted Done Reply Inline Actions Do we have a test with multiple returns? kiranchandramohan: Do we have a test with multiple returns?
				tblahAuthorUnsubmitted Done Reply Inline Actions Thanks for this. It turned out I needed to join across all of the lattices at the return statements to ensure that values were returned at all return statements, not at any return statement. tblah: Thanks for this. It turned out I needed to join across all of the lattices at the return…
				const LatticePoint *lattice = solver.lookupState<LatticePoint>(child);
				// there will be no lattice for an unreachable block
				if (lattice) {
				lattice->appendFreedValues(freedValues);
				}
				});
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: No brace here kiranchandramohan: Nit: No brace here

				for (mlir::Value freedValue : freedValues) {
				fir::AllocMemOp allocmem = freedValue.getDefiningOp<fir::AllocMemOp>();
				jeanPerierUnsubmitted Done Reply Inline Actions OpenMP also needs allocation to be pinned inside openmp region so that they can be outlined. The best would be use FirOpBuilder::getAllocaBlock somewhere for this (see https://github.com/llvm/llvm-project/blob/a1fae71f85994858e402a1fc0ed4d68c46b0a57c/flang/lib/Optimizer/Builder/FIRBuilder.cpp#L198. jeanPerier: OpenMP also needs allocation to be pinned inside openmp region so that they can be outlined.
				InsertionPoint insertionPoint =
				kiranchandramohanUnsubmitted Done Reply Inline Actions A comment here would be useful on why we need to look at the freed values only. kiranchandramohan: A comment here would be useful on why we need to look at the freed values only.
				AllocMemConversion::findAllocaInsertionPoint(allocmem);
				if (insertionPoint)
				candidateOps.insert({allocmem, insertionPoint});
				}
				jeanPerierUnsubmitted Done Reply Inline Actions nit: MLIR/LLVM coding style do not use `{}` for single line if. jeanPerier: nit: MLIR/LLVM coding style do not use `{}` for single line if.

				LLVM_DEBUG(for (auto [allocMemOp, _]
				: candidateOps) {
				llvm::dbgs() << "StackArrays: Found candidate op: " << *allocMemOp << '\n';
				});
				}

				bool StackArraysAnalysisWrapper::hasErrors() const { return gotError; }

				const StackArraysAnalysisWrapper::AllocMemMap &
				StackArraysAnalysisWrapper::getCandidateOps(mlir::Operation *func) {
				if (!funcMaps.count(func))
				analyseFunction(func);
				return funcMaps[func];
				jeanPerierUnsubmitted Done Reply Inline Actions I think this is not correct: It seems this will consider every FreememOp that could be paired with an allocmem as candidate: func.func @test(%cdt: i1) { %a = fir.allocmem !fir.array<1xi8> scf.if %cdt { fir.freemem %a : !fir.heap<!fir.array<1xi8>> } else { } return } Why not considering the func.return lattice states instead ? Note that it seems fir.if is not considered a branch operation, so the state of its blocks are reset on entry. That is why scf.if is needed to create a test exposing the issue. Maybe fir.if op should be given the right interface, but that is another topic. jeanPerier: I think this is not correct: It seems this will consider every FreememOp that could be paired…
				tblahAuthorUnsubmitted Done Reply Inline Actions Good spot! To get analysis working with this change I've had to add processing of fir.result operations. These will join the parent operation's lattice with the fir.result. tblah: Good spot! To get analysis working with this change I've had to add processing of fir.result…
				}

				AllocMemConversion::AllocMemConversion(
				mlir::MLIRContext *ctx,
				const llvm::DenseMap<mlir::Operation *, InsertionPoint> &candidateOps)
				: OpRewritePattern(ctx), candidateOps(candidateOps) {}

				mlir::LogicalResult
				AllocMemConversion::matchAndRewrite(fir::AllocMemOp allocmem,
				mlir::PatternRewriter &rewriter) const {
				auto oldInsertionPt = rewriter.saveInsertionPoint();
				// add alloca operation
				std::optional<fir::AllocaOp> alloca = insertAlloca(allocmem, rewriter);
				rewriter.restoreInsertionPoint(oldInsertionPt);
				if (!alloca)
				return mlir::failure();

				// remove freemem operations
				for (mlir::Operation *user : allocmem.getOperation()->getUsers()) {
				if (mlir::isa<fir::FreeMemOp>(user)) {
				rewriter.eraseOp(user);
				}
				}

				// replace references to heap allocation with references to stack allocation
				rewriter.replaceAllUsesWith(allocmem.getResult(), alloca->getResult());
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: Braces might not be require here. kiranchandramohan: Nit: Braces might not be require here.

				// remove allocmem operation
				rewriter.eraseOp(allocmem.getOperation());

				return mlir::success();
				}

				static bool isInLoop(mlir::Block *block) { return mlir::blockIsInLoop(block); }

				static bool isInLoop(mlir::Operation *op) {
				return isInLoop(op->getBlock()) \|\|
				op->getParentOfType<mlir::LoopLikeOpInterface>();
				}

				InsertionPoint
				AllocMemConversion::findAllocaInsertionPoint(fir::AllocMemOp &oldAlloc) {
				// Ideally the alloca should be inserted at the end of the function entry
				// block so that we do not allocate stack space in a loop. However,
				// the operands to the alloca may not be available that early, so insert it
				// after the last operand becomes available
				// If the old allocmem op was in an openmp region then it should not be moved
				// outside of that
				LLVM_DEBUG(llvm::dbgs() << "StackArrays: findAllocaInsertionPoint: "
				jeanPerierUnsubmitted Done Reply Inline Actions Where is `blockIsInLoop` defined ? jeanPerier: Where is `blockIsInLoop` defined ?
				tblahAuthorUnsubmitted Done Reply Inline Actions https://reviews.llvm.org/D141401 tblah: https://reviews.llvm.org/D141401
				<< oldAlloc << "\n");

				// check that an Operation or Block we are about to return is not in a loop
				auto checkReturn = [&](auto *point) -> InsertionPoint {
				if (isInLoop(point)) {
				mlir::Operation *oldAllocOp = oldAlloc.getOperation();
				if (isInLoop(oldAllocOp)) {
				// where we want to put it is in a loop, and even the old location is in
				// a loop. Give up.
				return findAllocaLoopInsertionPoint(oldAlloc);
				}
				return {oldAllocOp};
				}
				return {point};
				};

				auto oldOmpRegion =
				oldAlloc->getParentOfType<mlir::omp::OutlineableOpenMPOpInterface>();

				// Find when the last operand value becomes available
				mlir::Block *operandsBlock = nullptr;
				mlir::Operation *lastOperand = nullptr;
				for (mlir::Value operand : oldAlloc.getOperands()) {
				kiranchandramohanUnsubmitted Done Reply Inline Actions Might be worth checking whether we have a function for this in MLIR core. kiranchandramohan: Might be worth checking whether we have a function for this in MLIR core.
				tblahAuthorUnsubmitted Done Reply Inline Actions Not that I can find. The MLIR verifier checks that all operation arguments properly dominate the operation, but this is done by comparing each in turn against the operation: no last operand is found. I could use mlir::DominanceInfo to find when the last operand becomes available, which I guess would better handle the case where operands are defined in different blocks. But dominance only provides a partial ordering so there might be cases where `domInfo.properlyDominates(arg1, arg2) == domInfo.properlyDominates(arg2, arg1) == false`. Looking at the direct operation ordering only within the same block (as I do here) guarantees a total ordering relationship. tblah: Not that I can find. The MLIR verifier checks that all operation arguments properly dominate…
				LLVM_DEBUG(llvm::dbgs() << "--considering operand " << operand << "\n");
				mlir::Operation *op = operand.getDefiningOp();
				kiranchandramohanUnsubmitted Done Reply Inline Actions The op might require a check before further use. See the following test from `arrexp.fir`. (run with `./bin/tco f4.fir`) func.func @f4(%a : !fir.ref<!fir.array<?x?xf32>>, %b : !fir.ref<!fir.array<?x?xf32>>, %n : index, %m : index, %o : index, %p : index, %f : f32) { %c1 = arith.constant 1 : index %s = fir.shape_shift %o, %n, %p, %m : (index, index, index, index) -> !fir.shapeshift<2> %vIn = fir.array_load %a(%s) : (!fir.ref<!fir.array<?x?xf32>>, !fir.shapeshift<2>) -> !fir.array<?x?xf32> %wIn = fir.array_load %b(%s) : (!fir.ref<!fir.array<?x?xf32>>, !fir.shapeshift<2>) -> !fir.array<?x?xf32> %r = fir.do_loop %j = %p to %m step %c1 iter_args(%v1 = %vIn) -> !fir.array<?x?xf32> { %r = fir.do_loop %i = %o to %n step %c1 iter_args(%v = %v1) -> !fir.array<?x?xf32> { %x2 = fir.array_fetch %vIn, %i, %j : (!fir.array<?x?xf32>, index, index) -> f32 %x = fir.array_fetch %wIn, %i, %j : (!fir.array<?x?xf32>, index, index) -> f32 %y = arith.addf %x, %f : f32 %y2 = arith.addf %y, %x2 : f32 %i2 = arith.addi %i, %c1 : index %r = fir.array_update %v, %y2, %i2, %j : (!fir.array<?x?xf32>, f32, index, index) -> !fir.array<?x?xf32> fir.result %r : !fir.array<?x?xf32> } fir.result %r : !fir.array<?x?xf32> } fir.array_merge_store %vIn, %r to %a : !fir.array<?x?xf32>, !fir.array<?x?xf32>, !fir.ref<!fir.array<?x?xf32>> return } kiranchandramohan: The op might require a check before further use. See the following test from `arrexp.fir`.
				if (!operandsBlock)
				operandsBlock = op->getBlock();
				else if (operandsBlock != op->getBlock()) {
				LLVM_DEBUG(llvm::dbgs()
				<< "----operand declared in a different block!\n");
				// Operation::isBeforeInBlock requires the operations to be in the same
				// block. The best we can do is the location of the allocmem.
				return checkReturn(oldAlloc.getOperation());
				}
				if (!lastOperand \|\| lastOperand->isBeforeInBlock(op))
				lastOperand = op;
				}

				kiranchandramohanUnsubmitted Done Reply Inline Actions Theoretically speaking, we can use the dominance info to determine whether one block dominates the other as well to handle cases like the following where we are finding the operands of `func`. But I guess that is probably not required. b1: x = opA br b2 b2: y = opB br b3 b3: z = func(x,y) kiranchandramohan: Theoretically speaking, we can use the dominance info to determine whether one block dominates…
				tblahAuthorUnsubmitted Done Reply Inline Actions Thank you for pointing out `mlir::DominanceInfo` - I was not aware of that analysis. I propose we keep this pass as it is for now, to avoid adding more complexity where we don't have a concrete example of flang-generated allocations which need to support alloca arguments defined in different blocks. tblah: Thank you for pointing out `mlir::DominanceInfo` - I was not aware of that analysis. I propose…
				if (lastOperand) {
				// there were value operands to the allocmem so insert after the last one
				LLVM_DEBUG(llvm::dbgs()
				<< "--Placing after last operand: " << *lastOperand << "\n");
				// check we aren't moving out of an omp region
				auto lastOpOmpRegion =
				lastOperand->getParentOfType<mlir::omp::OutlineableOpenMPOpInterface>();
				if (lastOpOmpRegion == oldOmpRegion)
				return checkReturn(lastOperand);
				// Presumably this happened because the operands became ready before the
				// start of this openmp region. (lastOpOmpRegion != oldOmpRegion) should
				// imply that oldOmpRegion comes after lastOpOmpRegion.
				return checkReturn(oldOmpRegion.getAllocaBlock());
				kiranchandramohanUnsubmitted Done Reply Inline Actions Do we have a test for this, and in general for the OpenMP handling? kiranchandramohan: Do we have a test for this, and in general for the OpenMP handling?
				tblahAuthorUnsubmitted Done Reply Inline Actions When writing the tests I discovered that the data flow analysis does not propagate lattices into or out of an omp.section, so currently no allocations inside of an openmp secton will be moved to the stack. I intend to handle this in a subsequent patch. In the meantime I have added a test to make sure that allocations in an openmp region are not moved. tblah: When writing the tests I discovered that the data flow analysis does not propagate lattices…
				}

				// There were no value operands to the allocmem so we are safe to insert it
				// as early as we want

				// handle openmp case
				if (oldOmpRegion) {
				return checkReturn(oldOmpRegion.getAllocaBlock());
				}

				// fall back to the function entry block
				mlir::func::FuncOp func = oldAlloc->getParentOfType<mlir::func::FuncOp>();
				assert(func && "This analysis is run on func.func");
				mlir::Block &entryBlock = func.getBlocks().front();
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: No need for braces here. kiranchandramohan: Nit: No need for braces here.
				LLVM_DEBUG(llvm::dbgs() << "--Placing at the start of func entry block\n");
				return checkReturn(&entryBlock);
				}

				InsertionPoint
				AllocMemConversion::findAllocaLoopInsertionPoint(fir::AllocMemOp &oldAlloc) {
				mlir::Operation *oldAllocOp = oldAlloc;
				// This is only called as a last resort. We should try to insert at the
				// location of the old allocation, which is inside of a loop, using
				// llvm.stacksave/llvm.stackrestore

				// find freemem ops
				llvm::SmallVector<mlir::Operation *, 1> freeOps;
				for (mlir::Operation *user : oldAllocOp->getUsers()) {
				if (mlir::isa<fir::FreeMemOp>(user)) {
				freeOps.push_back(user);
				}
				}
				assert(freeOps.size() && "DFA should only return freed memory");

				// Don't attempt to reason about a stacksave/stackrestore between different
				// blocks
				for (mlir::Operation *free : freeOps)
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: braces are not required here. kiranchandramohan: Nit: braces are not required here.
				if (free->getBlock() != oldAllocOp->getBlock())
				return {nullptr};

				// Check that there aren't any other stack allocations in between the
				// stack save and stack restore
				// note: for flang generated temporaries there should only be one free op
				for (mlir::Operation *free : freeOps) {
				for (mlir::Operation *op = oldAlloc; op && op != free;
				op = op->getNextNode()) {
				if (mlir::isa<fir::AllocaOp>(op))
				return {nullptr};
				}
				}

				return InsertionPoint{oldAllocOp, /shouldStackSaveRestore=/true};
				}

				std::optional<fir::AllocaOp>
				AllocMemConversion::insertAlloca(fir::AllocMemOp &oldAlloc,
				mlir::PatternRewriter &rewriter) const {
				auto it = candidateOps.find(oldAlloc.getOperation());
				if (it == candidateOps.end()) {
				return {};
				}
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: Braces not required. kiranchandramohan: Nit: Braces not required.
				InsertionPoint insertionPoint = it->second;
				if (!insertionPoint)
				return {};

				if (insertionPoint.shouldSaveRestoreStack())
				insertStackSaveRestore(oldAlloc, rewriter);

				mlir::Location loc = oldAlloc.getLoc();
				mlir::Type varTy = oldAlloc.getInType();
				auto unpackName = [](std::optional<llvm::StringRef> opt) -> llvm::StringRef {
				if (opt)
				return *opt;
				return {};
				};
				llvm::StringRef uniqName = unpackName(oldAlloc.getUniqName());
				llvm::StringRef bindcName = unpackName(oldAlloc.getBindcName());
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: Move all these close to the creation of the `fir:AllocaOp`. kiranchandramohan: Nit: Move all these close to the creation of the `fir:AllocaOp`.

				if (mlir::Operation *op = insertionPoint.tryGetOperation())
				rewriter.setInsertionPointAfter(op);
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: Use braces for the `if` block to keep it uniform with the `else` block. kiranchandramohan: Nit: Use braces for the `if` block to keep it uniform with the `else` block.
				else {
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: Use braces here to match `else`. kiranchandramohan: Nit: Use braces here to match `else`.
				mlir::Block *block = insertionPoint.tryGetBlock();
				jeanPerierUnsubmitted Done Reply Inline Actions If this case must succeed when the other failed, it may be better to place it in an `else {` and assert that a block was obtained, so that it is certain that the insertion point was correctly set when looking at this code. jeanPerier: If this case must succeed when the other failed, it may be better to place it in an `else {`…
				assert(block && "There must be a valid insertion point");
				rewriter.setInsertionPointToStart(block);
				}

				return rewriter.create<fir::AllocaOp>(loc, varTy, uniqName, bindcName,
				oldAlloc.getTypeparams(),
				oldAlloc.getShape());
				}

				void AllocMemConversion::insertStackSaveRestore(
				fir::AllocMemOp &oldAlloc, mlir::PatternRewriter &rewriter) const {
				auto oldPoint = rewriter.saveInsertionPoint();
				auto mod = oldAlloc->getParentOfType<mlir::ModuleOp>();
				fir::KindMapping kindMap = fir::getKindMapping(mod);
				fir::FirOpBuilder builder{rewriter, kindMap};

				mlir::func::FuncOp stackSaveFn = fir::factory::getLlvmStackSave(builder);
				mlir::SymbolRefAttr stackSaveSym =
				builder.getSymbolRefAttr(stackSaveFn.getName());

				builder.setInsertionPoint(oldAlloc);
				mlir::Value sp =
				builder
				.create<fir::CallOp>(oldAlloc.getLoc(),
				stackSaveFn.getFunctionType().getResults(),
				stackSaveSym, mlir::ValueRange{})
				.getResult(0);

				mlir::func::FuncOp stackRestoreFn =
				fir::factory::getLlvmStackRestore(builder);
				mlir::SymbolRefAttr stackRestoreSym =
				builder.getSymbolRefAttr(stackRestoreFn.getName());

				for (mlir::Operation *user : oldAlloc->getUsers()) {
				if (mlir::isa<fir::FreeMemOp>(user)) {
				builder.setInsertionPoint(user);
				builder.create<fir::CallOp>(user->getLoc(),
				stackRestoreFn.getFunctionType().getResults(),
				stackRestoreSym, mlir::ValueRange{sp});
				}
				}

				rewriter.restoreInsertionPoint(oldPoint);
				}

				StackArraysPass::StackArraysPass(const StackArraysPass &pass)
				: fir::impl::StackArraysBase<StackArraysPass>(pass) {}

				llvm::StringRef StackArraysPass::getDescription() const {
				return "Move heap allocated array temporaries to the stack";
				}

				void StackArraysPass::runOnOperation() {
				mlir::ModuleOp mod = getOperation();
				kiranchandramohanUnsubmitted Done Reply Inline Actions From the following code, it seems the functions are processed independently. Can this be a `Function` pass? kiranchandramohan: From the following code, it seems the functions are processed independently. Can this be a…
				tblahAuthorUnsubmitted Done Reply Inline Actions It can't. `fir::factory::getLlvm::getStackSave` and `fir::factory::getLlvmSatckRestore` add function declarations to the module-level. If functions are processed in different threads, there is a race condition when the `fir::builder` first checks to see if the function already exists in the module and if not, adds it. tblah: It can't. `fir::factory::getLlvm::getStackSave` and `fir::factory::getLlvmSatckRestore` add…
				kiranchandramohanUnsubmitted Not Done Reply Inline Actions Not for this patch: May be these can all be preinserted at the beginning of the pass pipeline and removed if not used at the end of the pass pipeline? kiranchandramohan: Not for this patch: May be these can all be preinserted at the beginning of the pass pipeline…

				mod.walk([this](mlir::func::FuncOp func) { runOnFunc(func); });
				}

				void StackArraysPass::runOnFunc(mlir::Operation *func) {
				assert(mlir::isa<mlir::func::FuncOp>(func));

				auto &analysis = getAnalysis<StackArraysAnalysisWrapper>();
				const auto &candidateOps = analysis.getCandidateOps(func);
				if (analysis.hasErrors()) {
				signalPassFailure();
				return;
				}

				if (candidateOps.empty())
				return;
				runCount += candidateOps.size();

				mlir::MLIRContext &context = getContext();
				mlir::RewritePatternSet patterns(&context);
				mlir::ConversionTarget target(context);

				target.addLegalDialect<fir::FIROpsDialect, mlir::arith::ArithDialect,
				mlir::func::FuncDialect>();
				target.addDynamicallyLegalOp<fir::AllocMemOp>([&](fir::AllocMemOp alloc) {
				return !candidateOps.count(alloc.getOperation());
				});

				patterns.insert<AllocMemConversion>(&context, candidateOps);
				if (mlir::failed(
				mlir::applyPartialConversion(func, target, std::move(patterns)))) {
				mlir::emitError(func->getLoc(), "error in stack arrays optimization\n");
				signalPassFailure();
				}
				}

				kiranchandramohanUnsubmitted Not Done Reply Inline Actions Nit: Is this error usually given in passes? kiranchandramohan: Nit: Is this error usually given in passes?
				tblahAuthorUnsubmitted Done Reply Inline Actions Sorry I don't understand. What change are you requesting here? tblah: Sorry I don't understand. What change are you requesting here?
				tblahAuthorUnsubmitted Done Reply Inline Actions I've checked some other FIR passes and they all follow the same pattern. tblah: I've checked some other FIR passes and they all follow the same pattern.
				std::unique_ptr<mlir::Pass> fir::createStackArraysPass() {
				return std::make_unique<StackArraysPass>();
				}

flang/test/Lower/HLFIR/allocatable-and-pointer-status-change.f90

	Show All 14 Lines
	! CHECK: %[[VAL_8:.*]] = fir.shape %[[VAL_7]] : (index) -> !fir.shape<1>			! CHECK: %[[VAL_8:.*]] = fir.shape %[[VAL_7]] : (index) -> !fir.shape<1>
	! CHECK: %[[VAL_9:.*]] = fir.embox %[[VAL_6]](%[[VAL_8]]) typeparams %[[VAL_2]] : (!fir.heap<!fir.array<?x!fir.char<1,?>>>, !fir.shape<1>, index) -> !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>			! CHECK: %[[VAL_9:.*]] = fir.embox %[[VAL_6]](%[[VAL_8]]) typeparams %[[VAL_2]] : (!fir.heap<!fir.array<?x!fir.char<1,?>>>, !fir.shape<1>, index) -> !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>
	allocate(x(100))			allocate(x(100))
	! CHECK: %[[VAL_10:.*]] = arith.constant 100 : i32			! CHECK: %[[VAL_10:.*]] = arith.constant 100 : i32
	! CHECK: %[[VAL_11:.*]] = fir.convert %[[VAL_10]] : (i32) -> index			! CHECK: %[[VAL_11:.*]] = fir.convert %[[VAL_10]] : (i32) -> index
	! CHECK: %[[VAL_12:.*]] = arith.constant 0 : index			! CHECK: %[[VAL_12:.*]] = arith.constant 0 : index
	! CHECK: %[[VAL_13:.*]] = arith.cmpi sgt, %[[VAL_11]], %[[VAL_12]] : index			! CHECK: %[[VAL_13:.*]] = arith.cmpi sgt, %[[VAL_11]], %[[VAL_12]] : index
	! CHECK: %[[VAL_14:.*]] = arith.select %[[VAL_13]], %[[VAL_11]], %[[VAL_12]] : index			! CHECK: %[[VAL_14:.*]] = arith.select %[[VAL_13]], %[[VAL_11]], %[[VAL_12]] : index
	! CHECK: %[[VAL_15:.*]] = fir.allocmem !fir.array<?x!fir.char<1,?>>(%[[VAL_2]] : index), %[[VAL_14]] {uniq_name = "_QFallocationEx.alloc"}			! CHECK: %[[VAL_15:.*]] = fir.allocmem !fir.array<?x!fir.char<1,?>>(%[[VAL_2]] : index), %[[VAL_14]] {fir.must_be_heap = true, uniq_name = "_QFallocationEx.alloc"}
	! CHECK: %[[VAL_16:.*]] = fir.shape %[[VAL_14]] : (index) -> !fir.shape<1>			! CHECK: %[[VAL_16:.*]] = fir.shape %[[VAL_14]] : (index) -> !fir.shape<1>
	! CHECK: %[[VAL_17:.*]] = fir.embox %[[VAL_15]](%[[VAL_16]]) typeparams %[[VAL_2]] : (!fir.heap<!fir.array<?x!fir.char<1,?>>>, !fir.shape<1>, index) -> !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>			! CHECK: %[[VAL_17:.*]] = fir.embox %[[VAL_15]](%[[VAL_16]]) typeparams %[[VAL_2]] : (!fir.heap<!fir.array<?x!fir.char<1,?>>>, !fir.shape<1>, index) -> !fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>
	! CHECK: fir.store %[[VAL_17]] to %[[VAL_3]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>			! CHECK: fir.store %[[VAL_17]] to %[[VAL_3]]#1 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x!fir.char<1,?>>>>>
	end subroutine			end subroutine

	subroutine pointer_assignment(p, ziel)			subroutine pointer_assignment(p, ziel)
	real, pointer :: p(:)			real, pointer :: p(:)
	real, target :: ziel(42:)			real, target :: ziel(42:)
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	! CHECK: %[[VAL_4:.*]] = arith.constant 10 : index			! CHECK: %[[VAL_4:.*]] = arith.constant 10 : index
	! CHECK: %[[VAL_5:.*]] = hlfir.designate %[[VAL_3]]#0 (%[[VAL_4]]) : (!fir.ref<!fir.array<10x!fir.type<_QFalloc_compTt{a:!fir.box<!fir.heap<!fir.array<?xf32>>>}>>>, index) -> !fir.ref<!fir.type<_QFalloc_compTt{a:!fir.box<!fir.heap<!fir.array<?xf32>>>}>>			! CHECK: %[[VAL_5:.*]] = hlfir.designate %[[VAL_3]]#0 (%[[VAL_4]]) : (!fir.ref<!fir.array<10x!fir.type<_QFalloc_compTt{a:!fir.box<!fir.heap<!fir.array<?xf32>>>}>>>, index) -> !fir.ref<!fir.type<_QFalloc_compTt{a:!fir.box<!fir.heap<!fir.array<?xf32>>>}>>
	! CHECK: %[[VAL_6:.*]] = hlfir.designate %[[VAL_5]]{"a"} {fortran_attrs = #fir.var_attrs<allocatable>} : (!fir.ref<!fir.type<_QFalloc_compTt{a:!fir.box<!fir.heap<!fir.array<?xf32>>>}>>) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>			! CHECK: %[[VAL_6:.*]] = hlfir.designate %[[VAL_5]]{"a"} {fortran_attrs = #fir.var_attrs<allocatable>} : (!fir.ref<!fir.type<_QFalloc_compTt{a:!fir.box<!fir.heap<!fir.array<?xf32>>>}>>) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>
	! CHECK: %[[VAL_7:.*]] = arith.constant 100 : i64			! CHECK: %[[VAL_7:.*]] = arith.constant 100 : i64
	! CHECK: %[[VAL_8:.*]] = fir.convert %[[VAL_7]] : (i64) -> index			! CHECK: %[[VAL_8:.*]] = fir.convert %[[VAL_7]] : (i64) -> index
	! CHECK: %[[VAL_9:.*]] = arith.constant 0 : index			! CHECK: %[[VAL_9:.*]] = arith.constant 0 : index
	! CHECK: %[[VAL_10:.*]] = arith.cmpi sgt, %[[VAL_8]], %[[VAL_9]] : index			! CHECK: %[[VAL_10:.*]] = arith.cmpi sgt, %[[VAL_8]], %[[VAL_9]] : index
	! CHECK: %[[VAL_11:.*]] = arith.select %[[VAL_10]], %[[VAL_8]], %[[VAL_9]] : index			! CHECK: %[[VAL_11:.*]] = arith.select %[[VAL_10]], %[[VAL_8]], %[[VAL_9]] : index
	! CHECK: %[[VAL_12:.*]] = fir.allocmem !fir.array<?xf32>, %[[VAL_11]] {uniq_name = "_QEa.alloc"}			! CHECK: %[[VAL_12:.*]] = fir.allocmem !fir.array<?xf32>, %[[VAL_11]] {fir.must_be_heap = true, uniq_name = "_QEa.alloc"}
	! CHECK: %[[VAL_13:.*]] = fir.shape %[[VAL_11]] : (index) -> !fir.shape<1>			! CHECK: %[[VAL_13:.*]] = fir.shape %[[VAL_11]] : (index) -> !fir.shape<1>
	! CHECK: %[[VAL_14:.*]] = fir.embox %[[VAL_12]](%[[VAL_13]]) : (!fir.heap<!fir.array<?xf32>>, !fir.shape<1>) -> !fir.box<!fir.heap<!fir.array<?xf32>>>			! CHECK: %[[VAL_14:.*]] = fir.embox %[[VAL_12]](%[[VAL_13]]) : (!fir.heap<!fir.array<?xf32>>, !fir.shape<1>) -> !fir.box<!fir.heap<!fir.array<?xf32>>>
	! CHECK: fir.store %[[VAL_14]] to %[[VAL_6]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>			! CHECK: fir.store %[[VAL_14]] to %[[VAL_6]] : !fir.ref<!fir.box<!fir.heap<!fir.array<?xf32>>>>
	end subroutine			end subroutine

	subroutine ptr_comp_assign(x, ziel)			subroutine ptr_comp_assign(x, ziel)
	type t			type t
	real, pointer :: p(:)			real, pointer :: p(:)
	Show All 16 Lines

flang/test/Lower/Intrinsics/c_loc.f90

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	end			end

	! CHECK-LABEL: func.func @_QPc_loc_non_save_pointer_scalar() {			! CHECK-LABEL: func.func @_QPc_loc_non_save_pointer_scalar() {
	! CHECK: %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "i", uniq_name = "_QFc_loc_non_save_pointer_scalarEi"}			! CHECK: %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "i", uniq_name = "_QFc_loc_non_save_pointer_scalarEi"}
	! CHECK: %[[VAL_1:.*]] = fir.alloca !fir.ptr<i32> {uniq_name = "_QFc_loc_non_save_pointer_scalarEi.addr"}			! CHECK: %[[VAL_1:.*]] = fir.alloca !fir.ptr<i32> {uniq_name = "_QFc_loc_non_save_pointer_scalarEi.addr"}
	! CHECK: %[[VAL_2:.*]] = fir.zero_bits !fir.ptr<i32>			! CHECK: %[[VAL_2:.*]] = fir.zero_bits !fir.ptr<i32>
	! CHECK: fir.store %[[VAL_2]] to %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>			! CHECK: fir.store %[[VAL_2]] to %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>
	! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}> {bindc_name = "ptr", uniq_name = "_QFc_loc_non_save_pointer_scalarEptr"}			! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}> {bindc_name = "ptr", uniq_name = "_QFc_loc_non_save_pointer_scalarEptr"}
	! CHECK: %[[VAL_4:.*]] = fir.allocmem i32 {uniq_name = "_QFc_loc_non_save_pointer_scalarEi.alloc"}			! CHECK: %[[VAL_4:.*]] = fir.allocmem i32 {fir.must_be_heap = true, uniq_name = "_QFc_loc_non_save_pointer_scalarEi.alloc"}
	! CHECK: %[[VAL_5:.*]] = fir.convert %[[VAL_4]] : (!fir.heap<i32>) -> !fir.ptr<i32>			! CHECK: %[[VAL_5:.*]] = fir.convert %[[VAL_4]] : (!fir.heap<i32>) -> !fir.ptr<i32>
	! CHECK: fir.store %[[VAL_5]] to %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>			! CHECK: fir.store %[[VAL_5]] to %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>
	! CHECK: %[[VAL_6:.*]] = arith.constant 10 : i32			! CHECK: %[[VAL_6:.*]] = arith.constant 10 : i32
	! CHECK: %[[VAL_7:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>			! CHECK: %[[VAL_7:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>
	! CHECK: fir.store %[[VAL_6]] to %[[VAL_7]] : !fir.ptr<i32>			! CHECK: fir.store %[[VAL_6]] to %[[VAL_7]] : !fir.ptr<i32>
	! CHECK: %[[VAL_8:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>			! CHECK: %[[VAL_8:.*]] = fir.load %[[VAL_1]] : !fir.ref<!fir.ptr<i32>>
	! CHECK: %[[VAL_9:.*]] = fir.embox %[[VAL_8]] : (!fir.ptr<i32>) -> !fir.box<i32>			! CHECK: %[[VAL_9:.*]] = fir.embox %[[VAL_8]] : (!fir.ptr<i32>) -> !fir.box<i32>
	! CHECK: %[[VAL_10:.*]] = fir.alloca !fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>			! CHECK: %[[VAL_10:.*]] = fir.alloca !fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

flang/test/Lower/Intrinsics/system_clock.f90

Show All 37 Lines	subroutine ss(count)
! CHECK: fir.store %[[V_2]] to %[[V_1]] : !fir.ref<!fir.heap<i64>>		! CHECK: fir.store %[[V_2]] to %[[V_1]] : !fir.ref<!fir.heap<i64>>
! CHECK: %[[V_3:[0-9]+]] = fir.alloca !fir.box<!fir.ptr<i64>> {bindc_name = "count_rate", uniq_name = "_QFssEcount_rate"}		! CHECK: %[[V_3:[0-9]+]] = fir.alloca !fir.box<!fir.ptr<i64>> {bindc_name = "count_rate", uniq_name = "_QFssEcount_rate"}
! CHECK: %[[V_4:[0-9]+]] = fir.alloca !fir.ptr<i64> {uniq_name = "_QFssEcount_rate.addr"}		! CHECK: %[[V_4:[0-9]+]] = fir.alloca !fir.ptr<i64> {uniq_name = "_QFssEcount_rate.addr"}
! CHECK: %[[V_5:[0-9]+]] = fir.zero_bits !fir.ptr<i64>		! CHECK: %[[V_5:[0-9]+]] = fir.zero_bits !fir.ptr<i64>
! CHECK: fir.store %[[V_5]] to %[[V_4]] : !fir.ref<!fir.ptr<i64>>		! CHECK: fir.store %[[V_5]] to %[[V_4]] : !fir.ref<!fir.ptr<i64>>
! CHECK: %[[V_6:[0-9]+]] = fir.alloca i64 {bindc_name = "count_rate_", fir.target, uniq_name = "_QFssEcount_rate_"}		! CHECK: %[[V_6:[0-9]+]] = fir.alloca i64 {bindc_name = "count_rate_", fir.target, uniq_name = "_QFssEcount_rate_"}
! CHECK: %[[V_7:[0-9]+]] = fir.convert %[[V_6]] : (!fir.ref<i64>) -> !fir.ptr<i64>		! CHECK: %[[V_7:[0-9]+]] = fir.convert %[[V_6]] : (!fir.ref<i64>) -> !fir.ptr<i64>
! CHECK: fir.store %[[V_7]] to %[[V_4]] : !fir.ref<!fir.ptr<i64>>		! CHECK: fir.store %[[V_7]] to %[[V_4]] : !fir.ref<!fir.ptr<i64>>
! CHECK: %[[V_8:[0-9]+]] = fir.allocmem i64 {uniq_name = "_QFssEcount_max.alloc"}		! CHECK: %[[V_8:[0-9]+]] = fir.allocmem i64 {fir.must_be_heap = true, uniq_name = "_QFssEcount_max.alloc"}
! CHECK: fir.store %[[V_8]] to %[[V_1]] : !fir.ref<!fir.heap<i64>>		! CHECK: fir.store %[[V_8]] to %[[V_1]] : !fir.ref<!fir.heap<i64>>
! CHECK: %[[V_9:[0-9]+]] = fir.load %[[V_4]] : !fir.ref<!fir.ptr<i64>>		! CHECK: %[[V_9:[0-9]+]] = fir.load %[[V_4]] : !fir.ref<!fir.ptr<i64>>
! CHECK: %[[V_10:[0-9]+]] = fir.load %[[V_1]] : !fir.ref<!fir.heap<i64>>		! CHECK: %[[V_10:[0-9]+]] = fir.load %[[V_1]] : !fir.ref<!fir.heap<i64>>
! CHECK: %[[V_11:[0-9]+]] = fir.is_present %arg0 : (!fir.ref<i64>) -> i1		! CHECK: %[[V_11:[0-9]+]] = fir.is_present %arg0 : (!fir.ref<i64>) -> i1
! CHECK: fir.if %[[V_11]] {		! CHECK: fir.if %[[V_11]] {
! CHECK: %[[V_29:[0-9]+]] = fir.call @_FortranASystemClockCount(%c8{{.}}_i32) {{.}}: (i32) -> i64		! CHECK: %[[V_29:[0-9]+]] = fir.call @_FortranASystemClockCount(%c8{{.}}_i32) {{.}}: (i32) -> i64
! CHECK: fir.store %[[V_29]] to %arg0 : !fir.ref<i64>		! CHECK: fir.store %[[V_29]] to %arg0 : !fir.ref<i64>
! CHECK: }		! CHECK: }
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

flang/test/Transforms/stack-arrays.f90

This file was added.

				! RUN: %flang_fc1 -emit-fir %s -o - \| fir-opt --array-value-copy \| fir-opt --stack-arrays \| FileCheck %s

				! check simple array value copy case
				subroutine array_value_copy_simple(arr)
				integer, intent(inout) :: arr(4)
				arr(3:4) = arr(1:2)
				end subroutine
				! CHECK-LABEL: func.func @_QParray_value_copy_simple(%arg0: !fir.ref<!fir.array<4xi32>>
				! CHECK-NOT: fir.allocmem
				! CHECK-NOT: fir.freemem
				! CHECK: fir.alloca !fir.array<4xi32>
				! CHECK-NOT: fir.allocmem
				! CHECK-NOT: fir.freemem
				! CHECK: return
				! CHECK-NEXT: }

				! check complex array value copy case
				module stuff
				type DerivedWithAllocatable
				integer, dimension(:), allocatable :: dat
				end type

				contains
				subroutine array_value_copy_complex(arr)
				type(DerivedWithAllocatable), intent(inout) :: arr(:)
				arr(3:4) = arr(1:2)
				end subroutine
				end module
				! CHECK: func.func
				! CHECK-SAME: array_value_copy_complex
				! CHECK-NOT: fir.allocmem
				! CHECK-NOT: fir.freemem
				! CHECK: fir.alloca !fir.array<?x!fir.type<_QMstuffTderivedwithallocatable
				! CHECK-NOT: fir.allocmem
				! CHECK-NOT: fir.freemem
				! CHECK: return
				! CHECK-NEXT: }

				subroutine parameter_array_init
				integer, parameter :: p(100) = 42
				call use_p(p)
				end subroutine
				! CHECK: func.func
				! CHECK-SAME: parameter_array_init
				! CHECK-NOT: fir.allocmem
				! CHECK-NOT: fir.freemem
				! CHECK: fir.alloca !fir.array<100xi32>
				! CHECK-NOT: fir.allocmem
				! CHECK-NOT: fir.freemem
				! CHECK: return
				! CHECK-NEXT: }

				subroutine test_vector_subscripted_section_to_box(v, x)
				interface
				subroutine takes_box(y)
				real :: y(:)
				end subroutine
				end interface

				integer :: v(:)
				real :: x(:)
				call takes_box(x(v))
				end subroutine
				! CHECK: func.func
				! CHECK-SAME: test_vector_subscripted_section_to_box
				! CHECK-NOT: fir.allocmem
				! CHECK: fir.alloca !fir.array<?xf32>
				! CHECK-NOT: fir.allocmem
				! CHECK: fir.call @_QPtakes_box
				! CHECK-NOT: fir.freemem
				! CHECK: return
				! CHECK-NEXT: }

				subroutine call_parenthesized_arg(x)
				integer :: x(100)
				call bar((x))
				end subroutine
				! CHECK: func.func
				! CHECK-SAME: call_parenthesized_arg
				! CHECK-NOT: fir.allocmem
				! CHECK: fir.alloca !fir.array<100xi32>
				! CHECK-NOT: fir.allocmem
				! CHECK: fir.call @_QPbar
				! CHECK-NOT: fir.freemem
				! CHECK: return
				! CHECK-NEXT: }

				subroutine where_allocatable_assignments(a, b)
				integer :: a(:)
				integer, allocatable :: b(:)
				where(b > 0)
				b = a
				elsewhere
				b(:) = 0
				end where
				end subroutine
				! TODO: broken: passing allocation through fir.result
				! CHECK: func.func
				! CHECK-SAME: where_allocatable_assignments
				! CHECK: return
				! CHECK-NEXT: }

				subroutine array_constructor(a, b)
				real :: a(5), b
				real, external :: f
				a = [f(b), f(b+1), f(b+2), f(b+5), f(b+11)]
				end subroutine
				! TODO: broken: realloc
				! CHECK: func.func
				! CHECK-SAME: array_constructor
				! CHECK: return
				! CHECK-NEXT: }

				subroutine sequence(seq, n)
				integer :: n, seq(n)
				seq = [(i,i=1,n)]
				end subroutine
				! TODO: broken: realloc
				! CHECK: func.func
				! CHECK-SAME: sequence
				! CHECK: return
				! CHECK-NEXT: }

				subroutine CFGLoop(x)
				integer, parameter :: k = 100, m=1000000, n = k*m
				integer :: x(n)
				logical :: has_error

				do i=0,m-1
				x(ki+1:k(i+1)) = x(k(i+1):ki+1:-1)
				if (has_error(x, k)) stop
				end do
				end subroutine
				! CHECK: func.func
				! CHECK-SAME: cfgloop
				! CHECK-NEXT: %0 = fir.alloca !fir.array<100000000xi32>
				kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: Remove usage of `%0`. kiranchandramohan: Nit: Remove usage of `%0`.
				! CHECK-NOT: fir.allocmem
				! CHECK-NOT: fir.freemem
				! CHECK: return
				! CHECK-NEXT: }

flang/test/Transforms/stack-arrays.fir

This file was added.

				// RUN: fir-opt --stack-arrays %s \| FileCheck %s

				// Simplest transformation
				func.func @simple() {
				%0 = fir.allocmem !fir.array<42xi32>
				fir.freemem %0 : !fir.heap<!fir.array<42xi32>>
				return
				}
				// CHECK: func.func @simple() {
				// CHECK-NEXT: fir.alloca !fir.array<42xi32>
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// Check fir.must_be_heap allocations are not moved
				func.func @must_be_heap() {
				%0 = fir.allocmem !fir.array<42xi32> {fir.must_be_heap = true}
				fir.freemem %0 : !fir.heap<!fir.array<42xi32>>
				return
				}
				// CHECK: func.func @must_be_heap() {
				// CHECK-NEXT: %[[ALLOC:.*]] = fir.allocmem !fir.array<42xi32> {fir.must_be_heap = true}
				// CHECK-NEXT: fir.freemem %[[ALLOC]] : !fir.heap<!fir.array<42xi32>>
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// Check the data-flow-analysis can detect cases where we aren't sure if memory
				// is freed by the end of the function
				func.func @dfa1(%arg0: !fir.ref<!fir.logical<4>> {fir.bindc_name = "cond"}) {
				%7 = arith.constant 42 : index
				%8 = fir.allocmem !fir.array<?xi32>, %7 {uniq_name = "_QFdfa1Earr.alloc"}
				%9 = fir.load %arg0 : !fir.ref<!fir.logical<4>>
				%10 = fir.convert %9 : (!fir.logical<4>) -> i1
				fir.if %10 {
				fir.freemem %8 : !fir.heap<!fir.array<?xi32>>
				} else {
				}
				return
				}
				// CHECK: func.func @dfa1(%arg0: !fir.ref<!fir.logical<4>> {fir.bindc_name = "cond"}) {
				// CHECK-NEXT: %c42 = arith.constant 42 : index
				// CHECK-NEXT: %0 = fir.allocmem !fir.array<?xi32>, %c42 {uniq_name = "_QFdfa1Earr.alloc"}
				// CHECK-NEXT: %1 = fir.load %arg0 : !fir.ref<!fir.logical<4>>
				// CHECK-NEXT: %2 = fir.convert %1 : (!fir.logical<4>) -> i1
				// CHECK-NEXT: fir.if %2 {
				// CHECK-NEXT: fir.freemem %0 : !fir.heap<!fir.array<?xi32>>
				// CHECK-NEXT: } else {
				// CHECK-NEXT: }
				// CHECK-NEXT: return
				// CHECK-NEXT: }
				kiranchandramohanUnsubmitted Done Reply Inline Actions Would it be better to capture the variables and check? At least the allocmem and freemem. kiranchandramohan: Would it be better to capture the variables and check? At least the allocmem and freemem.

				// Check scf.if (fir.if is not considered a branch operation)
				func.func @dfa2(%arg0: i1) {
				%a = fir.allocmem !fir.array<1xi8>
				scf.if %arg0 {
				fir.freemem %a : !fir.heap<!fir.array<1xi8>>
				} else {
				}
				return
				}
				// CHECK: func.func @dfa2(%arg0: i1) {
				// CHECK-NEXT: %[[MEM:.*]] = fir.allocmem !fir.array<1xi8>
				// CHECK-NEXT: scf.if %arg0 {
				// CHECK-NEXT: fir.freemem %[[MEM]] : !fir.heap<!fir.array<1xi8>>
				// CHECK-NEXT: } else {
				// CHECK-NEXT: }
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// check the alloca is placed after all operands become available
				func.func @placement1() {
				// do some stuff with other ssa values
				%1 = arith.constant 1 : index
				%2 = arith.constant 2 : index
				%3 = arith.addi %1, %2 : index
				// operand is now available
				%4 = fir.allocmem !fir.array<?xi32>, %3
				// ...
				fir.freemem %4 : !fir.heap<!fir.array<?xi32>>
				return
				}
				// CHECK: func.func @placement1() {
				// CHECK-NEXT: %[[ONE:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[TWO:.*]] = arith.constant 2 : index
				// CHECK-NEXT: %[[ARG:.*]] = arith.addi %[[ONE]], %[[TWO]] : index
				// CHECK-NEXT: %[[MEM:.*]] = fir.alloca !fir.array<?xi32>, %[[ARG]]
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// check that if there are no operands, then the alloca is placed early
				func.func @placement2() {
				// do some stuff with other ssa values
				%1 = arith.constant 1 : index
				%2 = arith.constant 2 : index
				%3 = arith.addi %1, %2 : index
				%4 = fir.allocmem !fir.array<42xi32>
				// ...
				fir.freemem %4 : !fir.heap<!fir.array<42xi32>>
				return
				}
				// CHECK: func.func @placement2() {
				// CHECK-NEXT: %[[MEM:.*]] = fir.alloca !fir.array<42xi32>
				// CHECK-NEXT: %[[ONE:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[TWO:.*]] = arith.constant 2 : index
				// CHECK-NEXT: %[[SUM:.*]] = arith.addi %[[ONE]], %[[TWO]] : index
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// check that stack allocations which must be placed in loops use stacksave
				func.func @placement3() {
				%c1 = arith.constant 1 : index
				%c1_i32 = fir.convert %c1 : (index) -> i32
				%c2 = arith.constant 2 : index
				%c10 = arith.constant 10 : index
				%0:2 = fir.do_loop %arg0 = %c1 to %c10 step %c1 iter_args(%arg1 = %c1_i32) -> (index, i32) {
				%3 = arith.addi %c1, %c2 : index
				// operand is now available
				%4 = fir.allocmem !fir.array<?xi32>, %3
				// ...
				fir.freemem %4 : !fir.heap<!fir.array<?xi32>>
				fir.result %3, %c1_i32 : index, i32
				}
				return
				}
				// CHECK: func.func @placement3() {
				// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[C1_I32:.*]] = fir.convert %[[C1]] : (index) -> i32
				// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : index
				// CHECK-NEXT: %[[C10:.*]] = arith.constant 10 : index
				// CHECK-NEXT: fir.do_loop
				// CHECK-NEXT: %[[SUM:.*]] = arith.addi %[[C1]], %[[C2]] : index
				// CHECK-NEXT: %[[SP:.*]] = fir.call @llvm.stacksave() : () -> !fir.ref<i8>
				// CHECK-NEXT: %[[MEM:.*]] = fir.alloca !fir.array<?xi32>, %[[SUM]]
				// CHECK-NEXT: fir.call @llvm.stackrestore(%[[SP]])
				// CHECK-NEXT: fir.result
				// CHECK-NEXT: }
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// check that stack save/restore are used in CFG loops
				func.func @placement4(%arg0 : i1) {
				%c1 = arith.constant 1 : index
				%c1_i32 = fir.convert %c1 : (index) -> i32
				%c2 = arith.constant 2 : index
				%c10 = arith.constant 10 : index
				cf.br ^bb1
				^bb1:
				%3 = arith.addi %c1, %c2 : index
				// operand is now available
				%4 = fir.allocmem !fir.array<?xi32>, %3
				// ...
				fir.freemem %4 : !fir.heap<!fir.array<?xi32>>
				cf.cond_br %arg0, ^bb1, ^bb2
				^bb2:
				return
				}
				// CHECK: func.func @placement4(%arg0: i1) {
				// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[C1_I32:.*]] = fir.convert %[[C1]] : (index) -> i32
				// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : index
				// CHECK-NEXT: %[[C10:.*]] = arith.constant 10 : index
				// CHECK-NEXT: cf.br ^bb1
				// CHECK-NEXT: ^bb1:
				// CHECK-NEXT: %[[SUM:.*]] = arith.addi %[[C1]], %[[C2]] : index
				// CHECK-NEXT: %[[SP:.*]] = fir.call @llvm.stacksave() : () -> !fir.ref<i8>
				// CHECK-NEXT: %[[MEM:.*]] = fir.alloca !fir.array<?xi32>, %[[SUM]]
				// CHECK-NEXT: fir.call @llvm.stackrestore(%[[SP]]) : (!fir.ref<i8>) -> ()
				// CHECK-NEXT: cf.cond_br %arg0, ^bb1, ^bb2
				// CHECK-NEXT: ^bb2:
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// check that stacksave is not used when there is an intervening alloca
				func.func @placement5() {
				%c1 = arith.constant 1 : index
				%c1_i32 = fir.convert %c1 : (index) -> i32
				%c2 = arith.constant 2 : index
				%c10 = arith.constant 10 : index
				%0:2 = fir.do_loop %arg0 = %c1 to %c10 step %c1 iter_args(%arg1 = %c1_i32) -> (index, i32) {
				%3 = arith.addi %c1, %c2 : index
				// operand is now available
				%4 = fir.allocmem !fir.array<?xi32>, %3
				%5 = fir.alloca i32
				fir.freemem %4 : !fir.heap<!fir.array<?xi32>>
				fir.result %3, %c1_i32 : index, i32
				}
				return
				}
				// CHECK: func.func @placement5() {
				// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[C1_I32:.*]] = fir.convert %[[C1]] : (index) -> i32
				// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : index
				// CHECK-NEXT: %[[C10:.*]] = arith.constant 10 : index
				// CHECK-NEXT: fir.do_loop
				// CHECK-NEXT: %[[SUM:.*]] = arith.addi %[[C1]], %[[C2]] : index
				// CHECK-NEXT: %[[MEM:.*]] = fir.allocmem !fir.array<?xi32>, %[[SUM]]
				// CHECK-NEXT: %[[IDX:.*]] = fir.alloca i32
				// CHECK-NEXT: fir.freemem %[[MEM]] : !fir.heap<!fir.array<?xi32>>
				// CHECK-NEXT: fir.result
				// CHECK-NEXT: }
				// CHECK-NEXT: return
				// CHECK-NEXT: }

				// check that stack save/restore are not used when the memalloc and freemem are
				kiranchandramohanUnsubmitted Done Reply Inline Actions Is this a case for future improvement? kiranchandramohan: Is this a case for future improvement?
				tblahAuthorUnsubmitted Done Reply Inline Actions Yes. This is an open TODO. I'll add a comment. It should be possible to still do stack save/restore if the block containing the free is always executed after the memalloc. This might already be guaranteed by the data-flow analysis - I haven't thought enough about it. I haven't seen this happen in the allocations automatically generated by flang, so I don't think it is important to solve now. tblah: Yes. This is an open TODO. I'll add a comment. It should be possible to still do stack…
				// in different blocks
				func.func @placement6(%arg0: i1) {
				%c1 = arith.constant 1 : index
				%c1_i32 = fir.convert %c1 : (index) -> i32
				%c2 = arith.constant 2 : index
				%c10 = arith.constant 10 : index
				cf.br ^bb1
				^bb1:
				%3 = arith.addi %c1, %c2 : index
				// operand is now available
				%4 = fir.allocmem !fir.array<?xi32>, %3
				// ...
				cf.cond_br %arg0, ^bb2, ^bb3
				^bb2:
				// ...
				fir.freemem %4 : !fir.heap<!fir.array<?xi32>>
				cf.br ^bb1
				^bb3:
				// ...
				fir.freemem %4 : !fir.heap<!fir.array<?xi32>>
				cf.br ^bb1
				}
				// CHECK: func.func @placement6(%arg0: i1) {
				// CHECK-NEXT: %[[c1:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[c1_i32:.*]] = fir.convert %[[c1]] : (index) -> i32
				// CHECK-NEXT: %[[c2:.*]] = arith.constant 2 : index
				// CHECK-NEXT: %[[c10:.*]] = arith.constant 10 : index
				// CHECK-NEXT: cf.br ^bb1
				// CHECK-NEXT: ^bb1:
				// CHECK-NEXT: %[[ADD:.*]] = arith.addi %[[c1]], %[[c2]] : index
				// CHECK-NEXT: %[[MEM:.*]] = fir.allocmem !fir.array<?xi32>, %[[ADD]]
				// CHECK-NEXT: cf.cond_br %arg0, ^bb2, ^bb3
				// CHECK-NEXT: ^bb2:
				// CHECK-NEXT: fir.freemem %[[MEM]] : !fir.heap<!fir.array<?xi32>>
				// CHECK-NEXT: cf.br ^bb1
				// CHECK-NEXT: ^bb3:
				// CHECK-NEXT: fir.freemem %[[MEM]] : !fir.heap<!fir.array<?xi32>>
				// CHECK-NEXT: cf.br ^bb1
				// CHECK-NEXT: }

This is an archive of the discontinued LLVM Phabricator instance.

[flang] stack arrays passClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 492044

clang/docs/tools/clang-formatted-files.txt

flang/include/flang/Optimizer/Builder/MutableBox.h

flang/include/flang/Optimizer/Dialect/FIRAttr.h

flang/include/flang/Optimizer/Transforms/Passes.h

flang/include/flang/Optimizer/Transforms/Passes.td

flang/lib/Lower/Allocatable.cpp

flang/lib/Optimizer/Builder/MutableBox.cpp

flang/lib/Optimizer/Transforms/CMakeLists.txt

flang/lib/Optimizer/Transforms/StackArrays.cpp

flang/test/Lower/HLFIR/allocatable-and-pointer-status-change.f90

flang/test/Lower/Intrinsics/c_loc.f90

flang/test/Lower/Intrinsics/system_clock.f90

flang/test/Transforms/stack-arrays.f90

flang/test/Transforms/stack-arrays.fir

[flang] stack arrays pass
ClosedPublic