This is an archive of the discontinued LLVM Phabricator instance.

[flang] Restore stack after allocas created by TargetRewrite.
ClosedPublic

Authored by vzakhari on Apr 25 2023, 6:12 PM.

Details

Summary

This resolves issues with running out of stack on examples like
https://fortran-lang.discourse.group/t/modern-fortran-sample-code/2019/18
reported by @clementval.

When target rewrite creates alloca(s) around a call, we need to insert
stacksave/stackrestore to free the allocated stack. Better performant
code may be achieved by placing the alloca(s) outside of loops,
but the placement has to behave correctly with regards to OpenMP/OpenACC/etc.
dialect operations that have special representation for "private"
objects. This is a concervative fix for correctness issue.

Diff Detail

Event Timeline

vzakhari created this revision.Apr 25 2023, 6:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2023, 6:12 PM
vzakhari requested review of this revision.Apr 25 2023, 6:12 PM
clementval accepted this revision.Apr 25 2023, 9:09 PM

LGTM

flang/include/flang/Optimizer/CodeGen/CGPasses.td
56

I think you only to add the dialect as a dependency if you generate op from this dialect. I might be wrong.

This revision is now accepted and ready to land.Apr 25 2023, 9:09 PM

Thank you for the review, Valentin!

flang/include/flang/Optimizer/CodeGen/CGPasses.td
56

You are totally right. The fir::factory::getLlvmStackSave/Restore calls below may generate FuncOps.

jeanPerier accepted this revision.Apr 26 2023, 12:30 AM

This looks like a good solution to me.

but the placement has to behave correctly with regards to OpenMP/OpenACC/etc.
dialect operations that have special representation for "private"
objects.

FYI, fir::FirOpBuilder::createTemporary can be used to create fir.alloca that are hoisted as far as allowed by OpenMP using the "mlir::omp::OutlineableOpenMPOpInterface" (@clementval BTW, I think something similar may be needed for OpenACC regions, because createTemporary is massively used in lowering. See the getAllocaBlock() logic: https://github.com/llvm/llvm-project/blob/faa1043144b3f6c3362fe1d9f43e55e741b40386/flang/lib/Optimizer/Builder/FIRBuilder.cpp#L209

createTemporary is however not a generic solution either, because alloca can only be hoisted if their size does not depend on SSA variables local to the loop (which is the case here, but is not true in general).

So, I feel we may hit this issue here and there, and that we will probably want a generic solution at the do loop construct level (and implicit loop from array expressions) to cleanup loop iteration alloca if this becomes a burden.

But in the meantime, the fix here seems just fine to me, thanks!

FYI, fir::FirOpBuilder::createTemporary can be used to create fir.alloca that are hoisted as far as allowed by OpenMP using the "mlir::omp::OutlineableOpenMPOpInterface" (@clementval BTW, I think something similar may be needed for OpenACC regions, because createTemporary is massively used in lowering. See the getAllocaBlock() logic: https://github.com/llvm/llvm-project/blob/faa1043144b3f6c3362fe1d9f43e55e741b40386/flang/lib/Optimizer/Builder/FIRBuilder.cpp#L209

Yeah we will probably need it but since we did not design how we will handle privatization in acc dialect we did not need it until now.

Thank you for pointing out fir::FirOpBuilder::createTemporary, Jean! I did not know that it was aware of the OpenMP operations.
To use it generically I suppose we will have to accompany it with some sort of "finalizeTemporary", which we will deal with inserting the stackrestore call in cases where the hoisting was not able to place the alloca outside of the scope where the alloca may be repeatedly executed (e.g. due to non-constant size).
I guess another option is to have a "fixup" pass that will detect alloca that can be repeatedly executed without automatic deallocation (such as in OpenMP/OpenACC operation scope) and will insert stacksave/stackrestore around it based on its lifetime. This may work assuming that conformant user programs cannot produce repeatable alloca, e.g. to deliberately overflow the stack.

This revision was landed with ongoing or failed builds.Apr 26 2023, 10:33 AM
This revision was automatically updated to reflect the committed changes.