This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/docs/
-
docs/
57/88
HighLevelFIR.md

Differential D134285

[flang][RFC] Adding higher level FIR ops to ease expression lowering
ClosedPublic

Authored by jeanPerier on Sep 20 2022, 8:38 AM.

Download Raw Diff

Details

Summary

This document describes a new FIR value type (fir.expr) and some new
higher level FIR operations to make lowering to FIR more straightforward
and make pattern matching of high level Fortran concepts (array and
character assignments, character operations, and transformational
intrinsics) easier in FIR.

This should allow implementing the remaining gaps in Fortran 95 features
without increasing the lowering code complexity too much, and get a
clean start for the remaining F2003 and F2018 features.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jeanPerier created this revision.Sep 20 2022, 8:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2022, 8:38 AM

jeanPerier requested review of this revision.Sep 20 2022, 8:38 AM

tschuett added a subscriber: tschuett.Sep 20 2022, 8:48 AM

Harbormaster completed remote builds in B187755: Diff 461577.Sep 20 2022, 8:53 AM

jpenix-quic added a subscriber: jpenix-quic.Sep 20 2022, 10:33 AM

tschuett added inline comments.Sep 20 2022, 11:05 AM

flang/docs/HighLevelFIR.md
961	Isn't this point about the expressiveness of MLIR? What are the engineering costs of adding a FIRX dialect for the higher ops? What would be the benefit of having a separate dialect?

AlexisPerry added a subscriber: AlexisPerry.Sep 20 2022, 11:42 AM

jeanPerier added inline comments.Sep 21 2022, 1:13 AM

flang/docs/HighLevelFIR.md
961	The benefit of having a separate dialect is to strongly split the high level ops that require information about Fortran variables to be retrievable in the IR (via the fir.def/fir.ref) and the current operations that are lower level and do not require such information. After the translation pass of variable related operation, this dialect would be illegal. There is a precedent in FIR with the fir::cg dialect ops (https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/CodeGen/CGOps.td) that helps simplifying the addressing and emboxing operation before codegen. I think the engineering cost are mostly about having different td files, headers and .cpp for these new ops and types, but also to register the dialect in the passes that will work with it. That last point may be a bit more annoying (it does not matter with the fir::cg dialect because codegen is the only pass meant to be run with this dialect). I do not have a strong opinion here.

Very good design. This seems to be one promising method to handle some bugs in forall, pattern match for inline intrinsics and OpenMP reduction (not totally sure if it will work), and expected to have more information of variables and expression for performance optimization in FIR.

flang/docs/HighLevelFIR.md
75	Nit
87	Will the debug information be `mlir::Location`?
89	nit
89	Will this include the aliasing analysis in Fortran 2018 15.5.2.13? Is this `Generic Fortran aliasing analysis` documented somewhere with examples currently?
94	Nit
137	Can `COMPARE` be added in the set of character operations? We found inlining character comparison has great performance improvement in some workloads (the possible reason is the character is in select case statement). Adding it will make inline work easier?
170	Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring the lowering code, is it possible to make some expressions lowering shared with OpenMP using `fir.expr`? Specifically, the following operators, intrinsics, and user-defined operators will be redefined in OpenMP reduction clause: +, *, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor
196	Why keep two complex types instead of only using fir.complex or mlir::complex?
212	editor problem?
227	This should be added in the description when this opertion is added in FIROps.td. I am thinking if `fix.expr<T>` is a good solution for the argument with VALUE attribute in lowering (both caller and callee side), especially for the derived type and BIND(C)?
230
267	Will `fir.declare` be used for common blocks as well?
310
371	`to help` -> `help`?
437	editor problem?
461
530	editor problem?
531	editor problem?
542	nit
545	It seems there is some editor problem? from line 542 to line 547.
699	?
751
781	Another tough: The mask expression lhs variable might be changed in the assignment. There seems to be one bug in current FIR lowering. fir.forall may make it easy to support that case.
793	?
809
970	Same for line 970, 978-981, 992-1007, 1032-1041, 1053-1056, ...

peixin added inline comments.Sep 21 2022, 2:19 AM

flang/docs/HighLevelFIR.md
882
1140
1331

mehdi_amini added inline comments.Sep 21 2022, 6:09 AM

flang/docs/HighLevelFIR.md
961	to register the dialect in the passes that will work with it. To be clear: registration is only ever needed in a pass that produces entities from a dialect (ops, attributes, types) when this dialect isn't in the input already. So a pass lowering from a high-level dialect to a lower-level dialect only defines the lower-level dialects to be registered, no need to do anything for the high-level dialect. Similarly passes that transform within the same dialect don't need to declare anything at all.

Remove non breaking spaces
Fix double spaces issues
Fix some typos
Add note about Forall masks

Thanks for the review and feedback @peixin.

flang/docs/HighLevelFIR.md
87	No, I believe source line location is already making it into the executable. What we are currently missing is the ability to interact with Fortran source variables from a debugger (e.g print, set, or watch a variable). I am not sure exactly how and when the related DWARF will have to be generated. This is a big task I should add on our TODO list. The point here is that fir.declare should allow generating this information late, because it documents everything that needs to be known about a source variable (address or descriptor, bounds, type parameters).
89	Yes, that is the idea. And no, this is not documented yet. This would deserve its own document. The idea here is that when seeing an address in FIR, we will try to find a matching fir.declare or fir.associate node. If it is the parent operation, we will directly have all the Fortran information that allows applying 15.5.2.13. If it is a block argument, the analysis will try to follow the block argument predecessors if it can, and create a list of possible Fortran variables the address belongs to. If such list cannot be built, the analysis will have conservatively to assume the address may overlap with anything.
137	Yes, I had it in mind in the "..." parts. I made it explicit.
170	Interesting, what are you currently pattern matching ? I am not sure the current design would help you a lot in the sense that it intends to not avoid any specific elemental operations, but rather to use a generic fir.elemental concepts. As much as possible, I would tend to extend the mlir arith dialect to support the scalar operations applicable on MLIR integer and floating point types and pattern match that. But then, for some elemental operators and intrinsics for which easy pattern matching is desirable, and where adding an arith op is not the best way, it could makes sense to add custom ops. But this may be orthogonal to the proposed change and something that could be though/designed/done in parallel, operators by operators. Regarding user defined operators, I would need to understand better what you mean by "redefined", semantics is resolving user defined operation to function references already, so we do not really see them in lowering I believe.
196	Why keep two complex types instead of only using fir.complex or mlir::complex? The need arise from the fact that we want to make a difference between C/Fortran complex type, and C++ std::complex type. These types are layout compatible, but not ABI compatible (they are not passed the same way on all architectures). fir::Complex implements C/Fortran complex, and mlir::Complex is translated to an std::complex like struct. In the runtime API, we sometime interface with std::complex<>. Using distinct types to enforce the ABI may not be the only solution. But it was the simplest and more robust so far (especially given we already had both types). If a solution is found regarding this ABI problem, moving to using mlir::complex would sound OK to me. I see it as something orthogonal to these high level changes (it could be done in parallel). The solution is probably to translate std::complex to a struct type and consider mlir::Complex to be the C complex from an ABI point of view (but I think this may conflicts with MLIR current codegen towards LLVM that would have to be overridden). fir.convert between mlir complex and the related struct type would have to be allowed and probably implemented by going through memory and doing the cast there to rely on the layout compatibility property.
212	Yes, thanks. I replaced all the non breaking spaces by normal spaces and remove the redundant ones like here.
227	I am thinking if fix.expr<T> is a good solution for the argument with VALUE attribute in lowering (both caller and callee side), especially for the derived type and BIND(C)? It may help, but since this type is not intended to survive until LLVM codegen, the VALUE aspect would still have to be translated with the current FIR types as it is currently.
267	It will be used for the common block members once the address computation is done (maybe with the common block name as a fir.common attribute so that it can easily be identified which common block a variable belongs to). Same idea with equivalences. I am not sure if there is a need to declare the whole common block in a declare op, but if it turns out useful for OpenMP or other usages, why not.
781	Thanks, indeed. I added note about masks
793	Yes, I agree the mask evaluation cannot affect anything (just like the forall indices evaluation), but the mask evaluation may be affected by the assignments.
970	Thanks, all the none breaking space issues should have been fixed.

Harbormaster completed remote builds in B187967: Diff 461890.Sep 21 2022, 7:31 AM

jeanPerier added inline comments.Sep 21 2022, 7:41 AM

flang/docs/HighLevelFIR.md
961	Thanks for clarifying this point, then I do not see big engineering costs to splitting high level ops depending on the Fortran variable and expression concepts and low level operations operating on memory and simple (integer/floating point/complex) SSA values. Maybe the new fir.declare op should still belong to FIR directly though. I think it would be valuable to keep it until as late as possible (it is not doing anything other than bookkeeping some Fortran level information about memory storages, which can be useful until the end).

tarunprabhu added inline comments.Sep 21 2022, 8:35 AM

flang/docs/HighLevelFIR.md
961	I, too, think it makes sense to keep this in FIR. Any Fortran-specific tooling that operates on MLIR may find this useful, so having it in FIR would be helpful.

kiranchandramohan added inline comments.Sep 21 2022, 8:58 AM

flang/docs/HighLevelFIR.md
22	And the `ArrayValueCopy` pass will not be needed?
84	From the OpenMP side, in the long-term we would like to move privatisation into the OpenMP dialect. So it will be very helpful if variable declarations are available in FIR. On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables. Can you consider adding the `finalizer` also to the declare operation?
170	Thanks @peixin for raising this point. Currently we do a pattern match fir loads -> (optional convert) -> reduction operation in FIR/arith -> (optional convert) -> store and replace it with an `omp.reduction` operation. So for a reduction happening on an addition (say `b = b + a`). We pattern match on the following IR, %b = fir.load %bref %a = fir.load %aref %r = arith.add %a, %b fir.store %r and get the following result IR. %a = fir.load %aref omp.reduction %a, %bref An alternative approach that we are exploring is to see whether this can be done directly during lowering, i.e. to do custom lowering for the Assignment operation if it happens to perform a reduction.

Thanks for addressing the comments and explanations. LGTM.

flang/docs/HighLevelFIR.md
170	Thanks @kiranchandramohan for answer this question. For "redefined", what I mean is that the addition statement `b = b + a` is not lowered as `%b_new_val = arith.add %b_val, %a_val` any more. Instead, it will be lowered as `omp.reduction %a_val, %b_red`. The current expression lowering for this statement is encapsulated in ScalarExprLowering, and it is hard to be changed if with OpenMP. If with these high-level FIR Ops, and given the FIR lowering is going to be refactored, is it possible to expose these statements analysis and lowering to be open to be changed. Beyond this, there are two scenarios having the similar problem: Complex multiplication and complex conjugate multiplication The current lowering and codegen does not generate the "best" LLVM IR. See https://reviews.llvm.org/D134364#3807605 for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction. target/chip dependent transformation One example is as follows: subroutine sub(a, b, c, m, n, k) real :: a(m, k), b(k, n), c(m, n) !DIR ARM-SME (scalar matrix extension) c = a * b end With the compiler directive, the statement `c = a * b` can be lowered using some special transformation. All in all, given the current high-level FIR Ops, and FIR lowering will be refactored, can we make the Fortran statement analysis and lowering more generic and extensible for possible optimization with OpenMP and compiler directives, or else. Some Fortran statement lowering may can be moved to codegen so that custom optimization can be performed.

jeanPerier marked an inline comment as done.Sep 22 2022, 2:25 AM

jeanPerier added inline comments.

flang/docs/HighLevelFIR.md
22	Right, it will be refactored into the (array) fir.assign and fir.forall translation passes.
84	On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables. With the proposed strategy, the variables in FIR and high level FIR would be the same from an SSA type point of view (memory references and boxes like currently). The difference would be that all the shape/type parameters information related to that address would be guaranteed to be retrievable in one case, and not the other. Given you will probably need this information to privatize the variables, you could either decide to make the privatization use the high level concept, but you could also directly use a "lower" level operation where all the side information (bounds, type params...) required for privatization are operands of the privatization operation. Basically, you can freely use a "variable" SSA value produced in a fir.declare in "lower" level operations that do not need to access the info in the fir.declare. It is perfectly fine to have the current FIR operations mixed with the new higher level operations. Can you consider adding the finalizer also to the declare operation? Why not, you would need it to know how to finalize OpenMP private copies after privatizations ? Note that the runtime knows how to retrieve finalizer information from the descriptors, so calling Finalize runtime routine on privatized variables should also be sufficient (in which case you only need to know if a types needs to be finalized, not exactly how).
170	for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction. I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved. For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red. Are you doing this only when b is a scalar or also when b is an array ? Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ? With the compiler directive, the statement c = a * b can be lowered using some special transformation. That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering. I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.

kiranchandramohan added inline comments.Sep 22 2022, 4:18 AM

flang/docs/HighLevelFIR.md
170	Are you doing this only when b is a scalar or also when b is an array ? Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ? Currently, we have implemented it only for scalars, but the pattern matching will become complicated for pointers, allocatables and arrays. Rather than pattern matching on some FIR, once we detect that the Assignment is actually a reduction we can try to write lowering code that generates an appropriate `omp.reduction` operation. Alternatively, if the pattern matching is easier at the new High Level FIR we can do it there.

peixin added inline comments.Sep 22 2022, 4:23 AM

flang/docs/HighLevelFIR.md
170	I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved. For the following case: subroutine sub complex :: a, b, c c = conj(a) * b end With these high level FIR Ops, I hope to generate the following FIR: %a = fir.declare ... %b = fir.declare ... %c = complex.conjmul %a, %b : fir.complex<4> Anyway, I will keep watching if we can do this when refactoring FIR lowering using these high level FIR ops in the future. We had one initial try in classic-flang. If possible, we hope to contribute the middle-end and back-end code together with F18 frontend support. Are you doing this only when b is a scalar or also when b is an array ? It can be either scalar, array, or array section. If it is array or array section, it will be treated as if a reduction clause would be applied to each separate element of the array section. Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ? It generates FIR, and match the pattern, then replace the generated FIR with `omp.reduction`. The current pattern patching is fine, at least no bug found. However, if we can lower it from the evaluate::Expr, it may be better. Here is one example (from the OpenMP Spec attached examples): SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N) REAL :: X(), A, D INTEGER :: Y(), N, B, C INTEGER :: I A = 0 B = 0 C = Y(1) D = X(1) !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) & !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D) DO I=1,N A = A + X(I) B = IEOR(B, Y(I)) C = MIN(C, Y(I)) IF (D < X(I)) D = X(I) END DO END SUBROUTINE REDUCTION1 I think that the pattern match is hard to capture the `max reduction of D`. From the `evaluate::Expr`, we only need to check if the `GetSymbol(lhs)` is the reduction list item. We can emit the assertion if the `rhs` is not reduction identifier such as `+, IEOR, MIN` here. For the `if` statement, it is hard to give an reasonable assertion. That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering. I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation. Thanks for the confirmation.

peixin added inline comments.Sep 22 2022, 4:26 AM

flang/docs/HighLevelFIR.md
170	Agree.

jeanPerier marked an inline comment as done.Sep 23 2022, 2:09 AM

jeanPerier added inline comments.

flang/docs/HighLevelFIR.md
170	For scalar reduction, the FIR pattern will not change with the proposal, for arrays and array section, it will change (you would have a fir.elemental containing the scalar reduction operation plus a fir.assign a bit as in the second example in the example section below). With arrays, you would also need to be careful with the potential overlaps in the assignments. I think we should aim at being able to detect reduction in FIR. OpenMP might not be the only case that would want to do special optimization with a reduction. Now, if replacing the generated FIR by a omp.reduction is required for a correctness point of view, that may be problematic, because I am not sure we can guarantee that MLIR pattern matching will work at a 100% if some in-between passes slightly modify the FIR in correct by unexpected ways.
170	With these high level FIR Ops, I hope to generate the following FIR: I do not think the high level ops will allow to produce exactly what you wrote unless we were to add FIR specific complex operation working on "variables", and that was not my goal since MLIR complex dialect could already. However, I think that what it could be lowered to would still suits your optimization goal. This could be be lowered to: %a = fir.declare ... : fir.ref<mlir::complex<f32>> %b = fir.declare ... : fir.ref<mlir::complex<f32>> %c = fir.declare ... : fir.ref<mlir::complex<f32>> %aval = fir.load %a %aconj = complex.conj %aval %bval = fir.load %b %res = complex.mul %aconj, %b fir.assign %res to %c The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL with a conj operand into a single conjmul op (does it exists yet in the MLIR complex dialect, I could not find it ?). Here is one example (from the OpenMP Spec attached examples): Thanks for the example. One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ? If so, that means that this kind of pattern matching is already available on the parse tree / evaluate expr. I am not a fan of low level intrinsic operations (like +) being lowered differently based on the context, so I would avoid it if possible. But if this turns out to be really required, we could try to find clean ways to overrides certain expression expr node without redefining a completely different parse-tree/expression visitor.

peixin added inline comments.Sep 23 2022, 2:32 AM

flang/docs/HighLevelFIR.md
170	The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL with a conj operand into a single conjmul op Good point. Thanks. (does it exists yet in the MLIR complex dialect, I could not find it ?). No for now. If there is the opportunity for performance improvement, I think there is no reason to oppose to add such a op. One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ? No. I don't find any related semantic restrictions in the OpenMP Spec. I think rsers should guarantee it. @kiranchandramohan Right? Here is another example in the standard attached example: !$omp parallel do num_threads(M) reduction(task,+:x) do i = 1,N x=x+1 if( mod(i,2) == 0) then !$omp task in_reduction(+:x) x=x-1 !$omp end task endif end do For this case, it is hard to know the pattern from the parse-tree/evaluate expr before lowering.

kiranchandramohan added inline comments.Sep 23 2022, 3:07 AM

flang/docs/HighLevelFIR.md
170	One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ? No. I don't find any related semantic restrictions in the OpenMP Spec. I think rsers should guarantee it. @kiranchandramohan Right? Yes, there is no such restriction in the standard. Since the parse-tree models the source more or less directly, the representation for reduction is a clause and not a separate parse-tree node that encapsulates an assignment statement or an expression. We add reduction symbols to the variables involved in the reduction operation but I don't know whether these are carried over to the expression tree.

LGTM

The whole proposal seems very reasonable to me and definitely an improvement to the current status quo. Thanks a lot for putting this together @jeanPerier.

flang/docs/HighLevelFIR.md
577	I'm curious about the choice of the name, why not something closer to Fortran like `array_element`? `apply` seems overly generic to me, but maybe there is precedent elsewhere?

jeanPerier added inline comments.Oct 6 2022, 6:28 AM

flang/docs/HighLevelFIR.md
577	The rational of that name was to insist on the fact that this should not to be seen as a memory addressing operation at that level, but rather like an operation that, in certain circonstances (), can see the fir.expr defining operation as a lambda and "apply" it given a set of indices (inlining the fir.elemental body at the fir.apply). Affine has an operation named "affine.apply" that applies an affine map on a set of indices. Now I am open to other names. Has anyone else opinions regarding `array_element` vs `apply` naming ? () Basically that there should not be any operations between the fir.expr evaluation and use that may affect (or be affected by) the expression evaluation.

klausler added inline comments.Oct 6 2022, 8:13 AM

flang/docs/HighLevelFIR.md
577	I find the name to be entirely appropriate in the function composition view of array programming.

Thanks @jeanPerier for this RFC. Quite detailed, informative and looks good. I had a quick read and have a few questions or comments inline.

I also have the following general questions,
-> Do you see any issues in FIR/MLIR optimisations due to the presence of fir.declare?
-> Would there be scope for the community to get involved in this work to speed it up?
-> Would it be better to delay any work on emitting debug info till HighLevelFIR work is complete?

flang/docs/HighLevelFIR.md
96	Till when will these `fir.declare`'s persist? Will this have an effect on optimisations particularly the mem2reg kind of transformations.
316	Can this be modeled by a region instead? We discussed this in the Flang technical call. The reasons mentioned include difficulty handling exits, branches out of this etc.
442	Will there be `fir.allocate` or `fir.deallocate` operations generated for these?
485	Will `fir.allocate` lower to `fir.allocmem` or to runtime calls? Why?
537	Nit: Could you add some explanation for `one based`?
583
624	I am assuming the simplification of intrinsics pass (with minor modifications) will still be useful for generating an inlined version. Or would lowering and high-level FIR make this redundant?
688	Would this have a lowering to `fir.allocmem`?
935
1043
1056–1061
1077	Is `fir.get_lbound` a new operation?

Clarify fir.alloca/fir.deallocate lowering
Clarify why fir.apply is one based
Add fir.get_lbound description
Fix a few typos

In D134285#3845772, @kiranchandramohan wrote:

Thanks @jeanPerier for this RFC. Quite detailed, informative and looks good. I had a quick read and have a few questions or comments inline.

I also have the following general questions,

Thanks for the review @kiranchandramohan !

-> Do you see any issues in FIR/MLIR optimisations due to the presence of fir.declare?

I think it will help building alias analysis for FIR and that will help optimizations in general. It can be labeled as having no side effects, so in it should not disturb passes that do not care much about address origins. But you are raising a good point that mem2reg will have to do something about it.

-> Would there be scope for the community to get involved in this work to speed it up?

Yes, I think help will be welcomed. I am trying to set-up some skeleton for this new flow. And then, I think some help will be welcomed, especially around the new FIR character ops and intrinsics.

-> Would it be better to delay any work on emitting debug info till HighLevelFIR work is complete?

No, if anyone wants to jump on debug info, please do, bearing in mind that fir.declare should be a good fit to extract the required info. I will try to add this op early if anyone wants to work on this. I anyway think the first piece of debug info will be to work a small document to summarize what LLVM needs, and give a plan.

flang/docs/HighLevelFIR.md
96	My plan was for fir.declare to persist until LLVM codgen since it is a no-op that only carries information. But you are raising a good point about mem2reg. Do you know if there is a generic MLIR mem2reg pass ? In any case, I expect mem2Reg pass might need to be aware of the fir.declare "transparent" aspect and be able to delete it for variables promoted to SSA value (and move its attributes somewhere if debug info is still needed for the variable ? I am not familiar enough with debug info to be assertive here). Regarding optimization in general, I think keeping it is in our interest since it will allow to document Fortran aspects about attributes (like target vs not target) even after inlining, which should allow better alias analysis.
316	One of the main issue I think is that it is intended to deal with argument association on the caller sides. So if we used regions, you would have something like: fir.associate %actual1 to %dummy1 { fir.associate %actual2 to %dummy2 { .... %res = fir.call %foo(%dumm1, %dummy2) ... } } The first issue here is that %res is not accessible after the call where it needs to be used. It could be propagated back, but in general, this means that any SSA values created during the association lifetime is unusable, even if it is not linked to toe variable lifetime. And I find this very restrictive hard to work with. Then, nested regions are harder to reorder than operations in a same block. Fortran tells arguments can be evaluated in any order. But generating such associate nest in lowering would in my opinion make it harder to re-order things here to to better CSE for instance.
442	Not until alias analysis is done. The rational is that the reallocation can be optimized when there is a potential overlap (the temp can be moved to the new allocatable value). So lowering will only add the realloc attribute, and a FIR to FIR pass will lower this to a set of fir.assign (without the realloc attribute) + fir.allocate + fir.deallocate + necessary fir.if nests taking advantage of alias analysis.
485	Will fir.allocate lower to fir.allocmem or to runtime calls Currently, the runtime is only used when allocations are none trivial (include initializations, polymorphic types, or error handling). However, we may want to offer the user the ability to debug their allocations/deallocation by validating that ALLOCATABLE being deallocated by have indeed been allocated, and are not pointing to bad addresses. In those case, the runtime would always be used. In general, users may like the ability too easily hook into Fortran allocatable/pointer allocation via the runtime without recompiling programs. Some may prefer to have all allocations inlined when possible. So I think having the option in fir.allocate/fir.deallocate between fully inlined allocation with fir.allocmem or using the runtime makes sense. I edited the sentence to make this clearer.
537	Nit: Could you add some explanation for `one based`? Sure, I added a note. The purpose if to match Fortran default for array variables so that there is no need to generate bound adjustments when working with one based array variables in an expression.
624	I am assuming the simplification of intrinsics pass (with minor modifications) will still be useful for generating an inlined version. Absolutely, the plan is to use this pass (modifying it so that it can tape into the fir.intrinsic ops ).
688	It depends of the array constructor. I think we should not generate a fir.allocmem for small and easy things like `[%i, %j, %k]` (only scalar, no ac-implied, less than N elements, where N is some option). This should lower to fir.alloca. Otherwise, yes, fir.allocmem will be used, although reallocation will still be needed to deal with the edge cases where the final size cannot be pre-computed (for instance `[foo(), bar(), buzz()]`, where all three functions returns rank-one allocatable arrays).
1077	Thanks for catching this inconsistency. When writing FIR manually, I found it useful to avoid dealing with the details around pointers and allocatable (fir.load %box + fir.bo_dims). But it is not a game changer, and I am still debating whether this is useful or not. I added a description for it above assuming it will be added for now.

Harbormaster completed remote builds in B191240: Diff 466458.Oct 10 2022, 3:06 AM

rogfer01 added inline comments.Oct 10 2022, 7:42 AM

flang/docs/HighLevelFIR.md
577	Hi @jeanPerier, thanks for the context. No objections on the name.

I don't agree that this new dialect should extend FIR. They should be separate dialects.

Update the document to reflect that the operations will be added
in a new dialect HLFIR.

In D134285#3847139, @schweitz wrote:

I don't agree that this new dialect should extend FIR. They should be separate dialects.

Thanks for the feedback. I think that was also the consensus in https://reviews.llvm.org/D134285#inline-1294898 discussion. The exception will be fir.declare (where I see advantages to keep it until LLVM codegen to handle things like debug info and allow using Fortran level information (even after inlining) for alias analysis), I agree that it makes sense to put it in a separate dialect. I updated the document to reflect this.

Update the document to reflect that the operations will be added in a new dialect HLFIR.

Harbormaster completed remote builds in B191453: Diff 466736.Oct 11 2022, 2:39 AM

Thanks @jeanPerier for the quick reply. A couple of comments or questions inline.

So I guess hlfir.expr<T> type is the only new type and is expected to work only with hlfir. Otherwise hlfir is expected to work with existing fir types?

flang/docs/HighLevelFIR.md
74	What will the frontend symbol be bound to after this change? Will it be the value generated by `fir.declare` or the `fir.alloca` ? I guess we might need some changes for the `createHostAssociateVarClone` function.
96	I don't think there is currently a mem2reg pass in MLIR. But there are a few downstream projects which have this and there were a couple of discussions talking about upstreaming this. But I don't think this has happened yet. https://discourse.llvm.org/t/rfc-store-to-load-forwarding/59672 https://discourse.llvm.org/t/upstreaming-from-our-mlir-python-compiler-project/64931/4 I am assuming there will be changes required for our `MemRefDataFlowOpt` pass to account for the `fir.declare` operation. I am also reminded of llvm's `dbg intrinsics` (https://llvm.org/docs/SourceLevelDebugging.html#id13). These intrinsics track variable information and includes a `llvm.dbg.declare` intrinsic which is deprecated. Not sure whether `fir.declare` will also face similar issues.
316	The first issue here is that %res is not accessible after the call where it needs to be used. It could be propagated back, but in general, this means that any SSA values created during the association lifetime is unusable, even if it is not linked to toe variable lifetime. And I find this very restrictive hard to work with. Yes, that is right particularly after `mem2reg` kind of passes. But for load-store kind of code that lowering generates, it should not be a problem.

In D134285#3849053, @kiranchandramohan wrote:

Thanks @jeanPerier for the quick reply. A couple of comments or questions inline.

So I guess hlfir.expr<T> type is the only new type and is expected to work only with hlfir. Otherwise hlfir is expected to work with existing fir types?

Yes, HLFIR will work with the existing FIR types (it has to, since it is not adding any types for variables). And appart from the hlfir.forall, hlfir doe not contain any construct like operation, it is all about expressions and assignments, so it lowering will generate FIR + HLFIR.

flang/docs/HighLevelFIR.md
74	In the symbol table in lowering, it will be bound to the fir.declare result. The goal is that it is possible to emit high level operation (e.g. hlfir.assign) with the SSA value bound to front-end symbol. Regarding `createHostAssociateVarClone`, yes, it might need to emit a new fir.declare so that the new entity can be properly binded.
96	I am assuming there will be changes required for our MemRefDataFlowOpt pass to account for the fir.declare operation. Yes, that seems right to me. I do not think we are running this pass currently. Regardless, fir.declare should allow getting a higher level view in possible aliasing/side-effects affecting an address, and should help getting the store to load forwarding working again safely. I am also reminded of llvm's dbg intrinsics (https://llvm.org/docs/SourceLevelDebugging.html#id13). These intrinsics track variable information and includes a llvm.dbg.declare intrinsic which is deprecated. Not sure whether fir.declare will also face similar issues. Thanks for this pointer. I see fir.declare as being a bit different than llvm.dbg.declare in the sense that fir.declare will give context to its output address, but it will have no impact on the semantics of its input. You could very well imagine implementing storage sharing with fir.declare: subroutine foo() integer :: i, j ! some code using i but not j ! some code using j but not i end subroutine After some optimization pass, you could have: func.func @_QPfoo() { %storage = fir.alloca i32 %i = fir.declare %stoarge {fir.def = _QPfooEi} %j = fir.declare %stoarge {fir.def = _QPfooEj} // ... use %i // ... use %j } llvm.dbg.declare also seems to require that there can only be one per variable. Which makes some optimization where a variable may live in different storage/register during its lifetime harder. I do not think the same will be true of fir.declare. It requires its fir.def tag to be uniq so that the defining op is always accessible in hlfir ops using it, but a same tag could point to the same Fortran variable. You could have: %storage1 = fir.alloca i32 %i_1 = fir.declare %storage {fir.def = _QPfooEi} // use %i_1.... debugger should be told "i" is in %i_1 // for some reason decide that "i" is now tracked as an element in some buffer: %buffer = fir.alloca i32, 10 %storage2 = fir.coordinate_of %buffer, c3 %i_2 = fir.declare %storage {fir.def = _QPfooEi.opt1} // use %i_2. debugger should be told "i" is in %i_2 I think this would translate in llvm.dg.addr being emitted when changing the storage, but this deserves more design work about debug info generation. At least from an HLFIR/FIR point of view, I see no issue with the two patterns above (two fir.declare sharing the same input, or two fir.declare referring to the same Fortran variable). I think fir.declare, despite the name, is more akin to llvm.dbg.addr, and that one big advantage it has over both is that it returns an SSA value, while the other are tagging an SSA existing value. That makes variable info identifiable via the defining operations chain of an address, rather than via an analysis of its usages.

Thanks @jeanPerier. I don't have any further questions or comments. LGTM.

This revision is now accepted and ready to land.Oct 12 2022, 4:27 AM

Closed by commit rG0623ce152a02: [flang][RFC] Adding higher level FIR ops to ease expression lowering (authored by jeanPerier). · Explain WhyOct 13 2022, 5:31 AM

This revision was automatically updated to reflect the committed changes.

jeanPerier added a commit: rG0623ce152a02: [flang][RFC] Adding higher level FIR ops to ease expression lowering.

jeanPerier mentioned this in D135959: [flang] Introduce option to lower expression to HLFIR.Oct 14 2022, 7:05 AM

jeanPerier mentioned this in rG4546397e3958: [flang] Introduce option to lower expression to HLFIR.Oct 17 2022, 1:11 AM

jeanPerier mentioned this in D136181: [flang] Add fir.declare operation.Oct 18 2022, 9:12 AM

jeanPerier mentioned this in rGa398981fb0f0: [flang] Add fir.declare operation.Oct 19 2022, 2:08 AM

jeanPerier mentioned this in D136328: [flang] Add High level Fortran IR dialect.Oct 20 2022, 2:57 AM

jeanPerier mentioned this in rG451b1b1ffbb6: [flang] Add High level Fortran IR dialect.Oct 21 2022, 4:11 AM

Revision Contents

Path

Size

flang/

docs/

HighLevelFIR.md

1410 lines

Diff 467450

flang/docs/HighLevelFIR.md

This file was added.

The approach of FIR and lowering design so far was to start with the minimal set

of IR operations that could allow implementing the core aspects of Fortran (like

memory allocations, array addressing, runtime descriptors, and structured

control flow operations). One notable aspect of the current FIR is that array

and character operations are buffered (some storage is allocated for the result,

and the storage is addressed to implement the operation). While this proved

functional so far, the code lowering expressions and assignments from the

front-end representations (the evaluate::Expr and parser nodes) to FIR has

significantly grown in complexity while it still lacks some F95 features around

character array expressions or FORALL. This is mainly explained by the fact that

the representation level gap is big, and a lot is happening in lowering. It

appears more and more that some intermediate steps would help to split concerns

between translating the front-end representation to MLIR, implementing some

Fortran concepts at a lower-level (like character or derived type assignments),

and how bufferizations of character and array expressions should be done.

This document proposes the addition of two concepts and a set of related

operations in a new dialect HLFIR to allow a simpler lowering to a higher-level

FIR representation that would later be lowered to the current FIR representation

via MLIR translation passes. As a result of these additions, it is likely that

the fir.array_load/fir.array_merge_store and related array operations could be

removed from FIR since array assignment analysis could directly happen on the

kiranchandramohanUnsubmitted

Done

And the ArrayValueCopy pass will not be needed?

kiranchandramohan: And the `ArrayValueCopy` pass will not be needed?

jeanPerierAuthorUnsubmitted

Done

Right, it will be refactored into the (array) fir.assign and fir.forall translation passes.

jeanPerier: Right, it will be refactored into the (array) fir.assign and fir.forall translation passes.

higher-level FIR representation.

The main principles of the new lowering design are:

- Make expression lowering context independent and rather naive

- Do not materialize temporaries while lowering to FIR

- Preserve Fortran semantics/information for high-level optimizations

The core impact on lowering will be:

- Lowering expressions and assignments in the exact same way, regardless of

whether it is an array assignment context and/or an expression inside a

forall.

- Lowering transformational intrinsics in a verbatim way (no runtime calls and

memory aspects yet).

- Lowering character expressions in a verbatim way (no memcpy/runtime calls

and memory aspects yet).

- Argument association side effects will be delayed (copy-in/copy-out) to help

inlining/function specialization to get rid of them when they are not

relevant.

## Variable and Expression value concepts in HLFIR

## Strengthening the variable concept

Fortran variables are currently represented in FIR as mlir::Value with reference

or box type coming from special operations or block arguments. They are either

the result of a fir.alloca, fir.allocmem, or fir.address_of operations with the

mangled name of the variable as attribute, or they are function block arguments

with the mangled name of the variable as attribute.

Fortran variables are defined with a Fortran type (both dynamic and static) that

may have type parameters, a rank and shape (including lower bounds), and some

attributes (like TARGET, OPTIONAL, VOLATILE...). All this information is

currently not represented in FIR. Instead, lowering keeps track of all this

information in the fir::ExtendedValue lowering data structure and uses it when

needed. If unused in lowering, some information about variables is lost (like

non-constant array bound expressions). In the IR, only the static type, the

compile time constant extents, and compile time character lengths can be

retrieved from the mlir::Value of a variable in the general case (more can be

retrieved if the variable is tracked via a fir.box, but not if it is a bare

memory reference).

This makes reasoning about Fortran variables in FIR harder, and in general

forces lowering to apply all decisions related to the information that is lost

in FIR. A more problematic point is that it does not allow generating debug

information for the variables from FIR, since the bounds and type parameters

information is not tightly linked to the base mlir::Value.

The proposal is to add a fir.declare operation that would anchor the

fir::ExtendedValue information in the IR regardless of the mlir::Value used for

the variable (bare memory reference, or fir.box). This operation will have a

kiranchandramohanUnsubmitted

Not Done

What will the frontend symbol be bound to after this change? Will it be the value generated by fir.declare or the fir.alloca ? I guess we might need some changes for the createHostAssociateVarClone function.

kiranchandramohan: What will the frontend symbol be bound to after this change? Will it be the value generated by…

jeanPerierAuthorUnsubmitted

Done

In the symbol table in lowering, it will be bound to the fir.declare result.
The goal is that it is possible to emit high level operation (e.g. hlfir.assign) with the SSA value bound to front-end symbol.

Regarding createHostAssociateVarClone, yes, it might need to emit a new fir.declare so that the new entity can be properly binded.

jeanPerier: In the symbol table in lowering, it will be bound to the fir.declare result. The goal is that…

"fir.def = uniq_mangled_variable_name" that will allow linking it to the Fortran

peixinUnsubmitted

Done

fir::ExtendedValue information in the IR regardless of the mlir::Value used for

- the variable (bare memory reference, or fir.box). This operation will have a

+ the variable (bare memory reference, or fir.box). This operation will have a

"fir.def = uniq_mangled_variable_name" that will allow linking it to the Fortran

Nit

peixin: Nit

source variable, and will take all the bounds and type parameters as operands.

All the high-level operations referring to variables will have a "fir.ref =

uniq_mangled_variable_name" that will allow retrieving back the related

dominating fir.declare and all the variable information. In most of the cases,

the fir.declare should simply be the defining operation of the operand mlir

value.

The fir.declare operation will allow:

- Pushing higher-level Fortran concepts into FIR operations (like array

kiranchandramohanUnsubmitted

Not Done

From the OpenMP side, in the long-term we would like to move privatisation into the OpenMP dialect. So it will be very helpful if variable declarations are available in FIR. On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables.

Can you consider adding the finalizer also to the declare operation?

kiranchandramohan: From the OpenMP side, in the long-term we would like to move privatisation into the OpenMP…

jeanPerierAuthorUnsubmitted

Done

On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we will have to get OpenMP Dialect to work with these different kinds of variables.

With the proposed strategy, the variables in FIR and high level FIR would be the same from an SSA type point of view (memory references and boxes like currently). The difference would be that all the shape/type parameters information related to that address would be guaranteed to be retrievable in one case, and not the other.

Given you will probably need this information to privatize the variables, you could either decide to make the privatization use the high level concept, but you could also directly use a "lower" level operation where all the side information (bounds, type params...) required for privatization are operands of the privatization operation.

Basically, you can freely use a "variable" SSA value produced in a fir.declare in "lower" level operations that do not need to access the info in the fir.declare. It is perfectly fine to have the current FIR operations mixed with the new higher level operations.

Can you consider adding the finalizer also to the declare operation?

Why not, you would need it to know how to finalize OpenMP private copies after privatizations ?

Note that the runtime knows how to retrieve finalizer information from the descriptors, so calling Finalize runtime routine on privatized variables should also be sufficient (in which case you only need to know if a types needs to be finalized, not exactly how).

jeanPerier: > On the other hand, having more kinds of variables in Flang (FIR, HighFIR) would mean that we…

assignments or transformational intrinsics).

- Generating debug information for the variables based on the fir.declare

operation.

peixinUnsubmitted

Not Done

Will the debug information be mlir::Location?

peixin: Will the debug information be `mlir::Location`?

jeanPerierAuthorUnsubmitted

Done

No, I believe source line location is already making it into the executable. What we are currently missing is the ability to interact with Fortran source variables from a debugger (e.g print, set, or watch a variable). I am not sure exactly how and when the related DWARF will have to be generated. This is a big task I should add on our TODO list. The point here is that fir.declare should allow generating this information late, because it documents everything that needs to be known about a source variable (address or descriptor, bounds, type parameters).

jeanPerier: No, I believe source line location is already making it into the executable. What we are…

- Generic Fortran aliasing analysis (currently implemented only around array

assignments with the fir.array_load concept).

peixinUnsubmitted

Done

operation.

- - Generic Fortran aliasing analysis (currently implemented only around array

+ - Generic Fortran aliasing analysis (currently implemented only around array

assignments with the fir.array_load concept).

nit

peixin: nit

peixinUnsubmitted

Not Done

Will this include the aliasing analysis in Fortran 2018 15.5.2.13? Is this Generic Fortran aliasing analysis documented somewhere with examples currently?

peixin: Will this include the aliasing analysis in Fortran 2018 15.5.2.13? Is this `Generic Fortran…

jeanPerierAuthorUnsubmitted

Done

Yes, that is the idea. And no, this is not documented yet. This would deserve its own document. The idea here is that when seeing an address in FIR, we will try to find a matching fir.declare or fir.associate node. If it is the parent operation, we will directly have all the Fortran information that allows applying 15.5.2.13. If it is a block argument, the analysis will try to follow the block argument predecessors if it can, and create a list of possible Fortran variables the address belongs to. If such list cannot be built, the analysis will have conservatively to assume the address may overlap with anything.

jeanPerier: Yes, that is the idea. And no, this is not documented yet. This would deserve its own document.

The fir.declare op is the only operation described by this change that will be

added to FIR and not HLFIR. The rational for this is that it is intended to

survive until LLVM dialect codegeneration so that debug info generation can use

them and alias information can take advantage of them even on FIR.

peixinUnsubmitted

Done

the result of function references returning POINTERs. fir.declare will also

- accept such variables to be described in the IR (a uniq name will be built from

+ accept such variables to be described in the IR (a unique name will be built from

the caller scope name and the function name.). In general, fir.declare will

Nit

peixin: Nit

Note that Fortran variables are not necessarily named objects, they can also be

kiranchandramohanUnsubmitted

Not Done

Till when will these fir.declare's persist? Will this have an effect on optimisations particularly the mem2reg kind of transformations.

kiranchandramohan: Till when will these `fir.declare`'s persist? Will this have an effect on optimisations…

jeanPerierAuthorUnsubmitted

Done

My plan was for fir.declare to persist until LLVM codgen since it is a no-op that only carries information. But you are raising a good point about mem2reg. Do you know if there is a generic MLIR mem2reg pass ? In any case, I expect mem2Reg pass might need to be aware of the fir.declare "transparent" aspect and be able to delete it for variables promoted to SSA value (and move its attributes somewhere if debug info is still needed for the variable ? I am not familiar enough with debug info to be assertive here).

Regarding optimization in general, I think keeping it is in our interest since it will allow to document Fortran aspects about attributes (like target vs not target) even after inlining, which should allow better alias analysis.

jeanPerier: My plan was for fir.declare to persist until LLVM codgen since it is a no-op that only carries…

kiranchandramohanUnsubmitted

Not Done

I don't think there is currently a mem2reg pass in MLIR. But there are a few downstream projects which have this and there were a couple of discussions talking about upstreaming this. But I don't think this has happened yet.
https://discourse.llvm.org/t/rfc-store-to-load-forwarding/59672
https://discourse.llvm.org/t/upstreaming-from-our-mlir-python-compiler-project/64931/4

I am assuming there will be changes required for our MemRefDataFlowOpt pass to account for the fir.declare operation.

I am also reminded of llvm's dbg intrinsics (https://llvm.org/docs/SourceLevelDebugging.html#id13). These intrinsics track variable information and includes a llvm.dbg.declare intrinsic which is deprecated. Not sure whether fir.declare will also face similar issues.

kiranchandramohan: I don't think there is currently a mem2reg pass in MLIR. But there are a few downstream…

jeanPerierAuthorUnsubmitted

Done

I am assuming there will be changes required for our MemRefDataFlowOpt pass to account for the fir.declare operation.

Yes, that seems right to me. I do not think we are running this pass currently. Regardless, fir.declare should allow getting a higher level view in possible aliasing/side-effects affecting an address, and should help getting the store to load forwarding working again safely.

I am also reminded of llvm's dbg intrinsics (https://llvm.org/docs/SourceLevelDebugging.html#id13). These intrinsics track variable information and includes a llvm.dbg.declare intrinsic which is deprecated. Not sure whether fir.declare will also face similar issues.

Thanks for this pointer. I see fir.declare as being a bit different than llvm.dbg.declare in the sense that fir.declare will give context to its output address, but it will have no impact on the semantics of its input.

You could very well imagine implementing storage sharing with fir.declare:

subroutine foo()
  integer :: i, j
  ! some code using i but not j

  ! some code using j but not i
end subroutine

After some optimization pass, you could have:

func.func @_QPfoo() {
  %storage = fir.alloca i32
  %i = fir.declare %stoarge {fir.def = _QPfooEi}
  %j = fir.declare %stoarge {fir.def = _QPfooEj}
   // ... use %i
   // ... use %j
}

llvm.dbg.declare also seems to require that there can only be one per variable. Which makes some optimization where a variable may live in different storage/register during its lifetime harder.

I do not think the same will be true of fir.declare. It requires its fir.def tag to be uniq so that the defining op is always accessible in hlfir ops using it, but a same tag could point to the same Fortran variable.
You could have:

%storage1 = fir.alloca i32
%i_1 = fir.declare %storage  {fir.def = _QPfooEi}
// use %i_1.... debugger should be told "i" is in %i_1
// for some reason decide that "i" is now tracked as an element in some buffer:
%buffer = fir.alloca i32, 10
%storage2 = fir.coordinate_of %buffer, c3
 %i_2 = fir.declare %storage  {fir.def = _QPfooEi.opt1}
// use %i_2. debugger should be told "i" is in %i_2

I think this would translate in llvm.dg.addr being emitted when changing the storage, but this deserves more design work about debug info generation.

At least from an HLFIR/FIR point of view, I see no issue with the two patterns above (two fir.declare sharing the same input, or two fir.declare referring to the same Fortran variable). I think fir.declare, despite the name, is more akin to llvm.dbg.addr, and that one big advantage it has over both is that it returns an SSA value, while the other are tagging an SSA existing value. That makes variable info identifiable via the defining operations chain of an address, rather than via an analysis of its usages.

jeanPerier: > I am assuming there will be changes required for our MemRefDataFlowOpt pass to account for…

the result of function references returning POINTERs. fir.declare will also

accept such variables to be described in the IR (a unique name will be built

from the caller scope name and the function name.). In general, fir.declare

will allow to view every memory storage as a variable, and this will be used to

describe and use compiler created array temporaries.

## Adding an expression value concept in HLFIR

Currently, Fortran expressions can be represented as SSA values for scalar

logical, integer, real, and complex expressions. Scalar character or

derived-type expressions and all array expressions are buffered in lowering:

their results are directly given a memory storage in lowering and are

manipulated as variables.

While this keeps FIR simple, this makes the amount of IR generated for these

expressions higher, and in general makes later optimization passes job harder

since they present non-trivial patterns (with memory operations) and cannot be

eliminated by naive dead code elimination when the result is unused. This also

forces lowering to combine elemental array expressions into single loop nests to

avoid bufferizing all array sub-expressions (which would yield terrible

performance). These combinations, which are implemented using C++ lambdas in

lowering makes lowering code harder to understand. It also makes the expression

lowering code context dependent (especially designators lowering). The lowering

code paths may be different when lowering a syntactically similar expression in

an elemental expression context, in a forall context, or in a normal context.

Some of the combinations described in [Array Composition](ArrayComposition.md)

are currently not implemented in lowering because they are less trivial

optimizations, and do not really belong in lowering. However, deploying such

combinations on the generated FIR with bufferizations requires the usage of

non-trivial pattern matching and rewrites (recognizing temporary allocation,

usage, and related runtime calls). Note that the goal of such combination is not

only about inlining transformational runtime calls, it is mainly about never

generating a temporary for an array expression sub-operand that is a

transformational intrinsic call matching certain criteria. So the optimization

pass will not only need to recognize the intrinsic call, it must understand the

context it is being called in.

The usage of memory manipulations also makes some of the alias analysis more

complex, especially when dealing with foralls (the alias analysis cannot simply

follow an operand tree, it must understand indirect dependencies from operations

peixinUnsubmitted

Done

Can COMPARE be added in the set of character operations? We found inlining character comparison has great performance improvement in some workloads (the possible reason is the character is in select case statement). Adding it will make inline work easier?

peixin: Can `COMPARE` be added in the set of character operations? We found inlining character…

jeanPerierAuthorUnsubmitted

Done

Yes, I had it in mind in the "..." parts. I made it explicit.

jeanPerier: Yes, I had it in mind in the "..." parts. I made it explicit.

stored in memory).

The proposal is to add a !hlfir.expr<T> SSA value type concept, and set of

character operations (concatenation, TRIM, MAX, MIN, comparisons...), a set of

array transformational operations (SUM, MATMUL, TRANSPOSE, ...), and a generic

hlfir.elemental operation. The hlfir.expr<T> type is not intended to be used

with scalar types that already have SSA value types (e.g., integer or real

scalars). Instead, these existing SSA types will implicitly be considered as

being expressions when used in high-level FIR operations, which will simplify

interfacing with other dialects that define operations with these types (e.g.,

the arith dialect).

These hlfir.expr values could then be placed in memory when needed (assigned to

a variable, passed as a procedure argument, or an IO output item...) via

hlfir.assign or hlfir.associate operations that will later be described.

When no special optimization pass is run, a translation pass would lower the

operations producing hlfir.expr to buffer allocations and memory operations just

as in the currently generated FIR.

However, these high-level operations should allow the writing of optimization

passes combining chains of operations producing hlfir.expr into optimized forms

via pattern matching on the operand tree.

The hlfir.elemental operation will be discussed in more detail below. It allows

simplifying lowering while keeping the ability to combine elemental

sub-expressions into a single loop nest. It should also allow rewriting some of

the transformational intrinsic operations to functions of the indices as

described in [Array Composition](ArrayComposition.md).

## Proposed design for HLFIR (High-Level Fortran IR)

### HLFIR Operations and Types

peixinUnsubmitted

Not Done

Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring the lowering code, is it possible to make some expressions lowering shared with OpenMP using fir.expr? Specifically, the following operators, intrinsics, and user-defined operators will be redefined in OpenMP reduction clause:

+, *, .and., .or., .eqv., .neqv., max, min, iand, ior, ieor

peixin: Currently, OpenMP reduction clause uses the fragile pattern match in lowering. When refactoring…

jeanPerierAuthorUnsubmitted

Done

Interesting, what are you currently pattern matching ?
I am not sure the current design would help you a lot in the sense that it intends to not avoid any specific elemental operations, but rather to use a generic fir.elemental concepts.

As much as possible, I would tend to extend the mlir arith dialect to support the scalar operations applicable on MLIR integer and floating point types and pattern match that. But then, for some elemental operators and intrinsics for which easy pattern matching is desirable, and where adding an arith op is not the best way, it could makes sense to add custom ops. But this may be orthogonal to the proposed change and something that could be though/designed/done in parallel, operators by operators.

Regarding user defined operators, I would need to understand better what you mean by "redefined", semantics is resolving user defined operation to function references already, so we do not really see them in lowering I believe.

jeanPerier: Interesting, what are you currently pattern matching ? I am not sure the current design would…

kiranchandramohanUnsubmitted

Not Done

Thanks @peixin for raising this point.

Currently we do a pattern match fir loads -> (optional convert) -> reduction operation in FIR/arith -> (optional convert) -> store and replace it with an omp.reduction operation.

So for a reduction happening on an addition (say b = b + a).
We pattern match on the following IR,

%b = fir.load %bref
%a = fir.load %aref
%r = arith.add %a, %b
fir.store %r

and get the following result IR.

%a = fir.load %aref
omp.reduction %a, %bref

An alternative approach that we are exploring is to see whether this can be done directly during lowering, i.e. to do custom lowering for the Assignment operation if it happens to perform a reduction.

kiranchandramohan: Thanks @peixin for raising this point. Currently we do a pattern match fir loads -> (optional…

peixinUnsubmitted

Not Done

Thanks @kiranchandramohan for answer this question.

For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.

The current expression lowering for this statement is encapsulated in ScalarExprLowering, and it is hard to be changed if with OpenMP. If with these high-level FIR Ops, and given the FIR lowering is going to be refactored, is it possible to expose these statements analysis and lowering to be open to be changed.

Beyond this, there are two scenarios having the similar problem:

Complex multiplication and complex conjugate multiplication

The current lowering and codegen does not generate the "best" LLVM IR. See https://reviews.llvm.org/D134364#3807605 for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.

target/chip dependent transformation

One example is as follows:
subroutine sub(a, b, c, m, n, k)

real :: a(m, k), b(k, n), c(m, n)
!DIR ARM-SME (scalar matrix extension)
c = a * b

end

With the compiler directive, the statement c = a * b can be lowered using some special transformation.

All in all, given the current high-level FIR Ops, and FIR lowering will be refactored, can we make the Fortran statement analysis and lowering more generic and extensible for possible optimization with OpenMP and compiler directives, or else. Some Fortran statement lowering may can be moved to codegen so that custom optimization can be performed.

peixin: Thanks @kiranchandramohan for answer this question. For "redefined", what I mean is that the…

jeanPerierAuthorUnsubmitted

Done

for some performance considerations. With these high-level FIR Ops, can the FIR lowering be factored so that some complex operation lowered directly as one IR instruction.

I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved.

For "redefined", what I mean is that the addition statement b = b + a is not lowered as %b_new_val = arith.add %b_val, %a_val any more. Instead, it will be lowered as omp.reduction %a_val, %b_red.

Are you doing this only when b is a scalar or also when b is an array ?

Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?

With the compiler directive, the statement c = a * b can be lowered using some special transformation.

That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering.

I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.

jeanPerier: > for some performance considerations. With these high-level FIR Ops, can the FIR lowering be…

kiranchandramohanUnsubmitted

Not Done

Are you doing this only when b is a scalar or also when b is an array ?

Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?

Currently, we have implemented it only for scalars, but the pattern matching will become complicated for pointers, allocatables and arrays. Rather than pattern matching on some FIR, once we detect that the Assignment is actually a reduction we can try to write lowering code that generates an appropriate omp.reduction operation. Alternatively, if the pattern matching is easier at the new High Level FIR we can do it there.

kiranchandramohan: ```Are you doing this only when b is a scalar or also when b is an array ? Why are you…

peixinUnsubmitted

Not Done

Agree.

peixin: Agree.

jeanPerierAuthorUnsubmitted

Done

For scalar reduction, the FIR pattern will not change with the proposal, for arrays and array section, it will change (you would have a fir.elemental containing the scalar reduction operation plus a fir.assign a bit as in the second example in the example section below). With arrays, you would also need to be careful with the potential overlaps in the assignments.

I think we should aim at being able to detect reduction in FIR. OpenMP might not be the only case that would want to do special optimization with a reduction. Now, if replacing the generated FIR by a omp.reduction is required for a correctness point of view, that may be problematic, because I am not sure we can guarantee that MLIR pattern matching will work at a 100% if some in-between passes slightly modify the FIR in correct by unexpected ways.

jeanPerier: For scalar reduction, the FIR pattern will not change with the proposal, for arrays and array…

peixinUnsubmitted

Not Done

I think moving to using the mlir::complex type and operation is independent from these high level operation changes. The change proposed here do not rely on complexes being lowered to fir.complex and mlir::complex nor on how they are manipulated. I am not opposed to moving to this as long as the ABIs are solved.

For the following case:

subroutine sub
  complex :: a, b, c
  c = conj(a) * b
end

With these high level FIR Ops, I hope to generate the following FIR:

%a = fir.declare ...
%b = fir.declare ...
%c = complex.conjmul %a, %b : fir.complex<4>

Anyway, I will keep watching if we can do this when refactoring FIR lowering using these high level FIR ops in the future. We had one initial try in classic-flang. If possible, we hope to contribute the middle-end and back-end code together with F18 frontend support.

Are you doing this only when b is a scalar or also when b is an array ?

It can be either scalar, array, or array section. If it is array or array section, it will be treated as if a reduction clause would be applied to each separate element of the array section.

Why are you dissatisfied with the pattern matching at the MLIR level ? Wouldn't pattern matching the RHS and LHS evaluate::Expr in lowering be equally complex (although maybe a bit more stable) ?

It generates FIR, and match the pattern, then replace the generated FIR with omp.reduction. The current pattern patching is fine, at least no bug found. However, if we can lower it from the evaluate::Expr, it may be better.

Here is one example (from the OpenMP Spec attached examples):

SUBROUTINE REDUCTION1(A, B, C, D, X, Y, N)
 REAL :: X(*), A, D
 INTEGER :: Y(*), N, B, C
 INTEGER :: I
 A = 0
 B = 0
 C = Y(1)
 D = X(1)
 !$OMP PARALLEL DO PRIVATE(I) SHARED(X, Y, N) REDUCTION(+:A) &
 !$OMP& REDUCTION(IEOR:B) REDUCTION(MIN:C) REDUCTION(MAX:D)
 DO I=1,N
 A = A + X(I)
 B = IEOR(B, Y(I))
 C = MIN(C, Y(I))
 IF (D < X(I)) D = X(I)
 END DO

END SUBROUTINE REDUCTION1

I think that the pattern match is hard to capture the max reduction of D. From the evaluate::Expr, we only need to check if the GetSymbol(lhs) is the reduction list item. We can emit the assertion if the rhs is not reduction identifier such as +, IEOR, MIN here. For the if statement, it is hard to give an reasonable assertion.

That case is interesting. I am not sure if we could use the high level ops to lower such directive or if it would have to be applied directly in lowering. I guess the fir.assign op could be given a special attribute, and that the application of the directive could be delayed to fir.assign translation.

Thanks for the confirmation.

peixin: > I think moving to using the mlir::complex type and operation is independent from these high…

jeanPerierAuthorUnsubmitted

Done

With these high level FIR Ops, I hope to generate the following FIR:

I do not think the high level ops will allow to produce exactly what you wrote unless we were to add FIR specific complex operation working on "variables", and that was not my goal since MLIR complex dialect could already. However, I think that what it could be lowered to would still suits your optimization goal.

This could be be lowered to:

%a = fir.declare ... : fir.ref<mlir::complex<f32>>
%b = fir.declare ... : fir.ref<mlir::complex<f32>>
%c = fir.declare ... : fir.ref<mlir::complex<f32>>
%aval = fir.load %a
%aconj = complex.conj %aval
%bval = fir.load %b
%res = complex.mul %aconj, %b 
fir.assign %res to %c

The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL with a conj operand into a single conjmul op (does it exists yet in the MLIR complex dialect, I could not find it ?).

Here is one example (from the OpenMP Spec attached examples):

Thanks for the example. One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ?

If so, that means that this kind of pattern matching is already available on the parse tree / evaluate expr.
I am not a fan of low level intrinsic operations (like +) being lowered differently based on the context, so I would avoid it if possible. But if this turns out to be really required, we could try to find clean ways to overrides certain expression expr node without redefining a completely different parse-tree/expression visitor.

jeanPerier: > With these high level FIR Ops, I hope to generate the following FIR: I do not think the high…

peixinUnsubmitted

Not Done

The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL with a conj operand into a single conjmul op

Good point. Thanks.

(does it exists yet in the MLIR complex dialect, I could not find it ?).

No for now. If there is the opportunity for performance improvement, I think there is no reason to oppose to add such a op.

One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ?

No. I don't find any related semantic restrictions in the OpenMP Spec. I think rsers should guarantee it. @kiranchandramohan Right?

Here is another example in the standard attached example:

!$omp parallel do num_threads(M) reduction(task,+:x)
do i = 1,N
  x=x+1

  if( mod(i,2) == 0) then
    !$omp task in_reduction(+:x)
      x=x-1
    !$omp end task
  endif
end do

For this case, it is hard to know the pattern from the parse-tree/evaluate expr before lowering.

peixin: > The MLIR complex dialect canonicalization/optimization passes would then have to fold the MUL…

kiranchandramohanUnsubmitted

Not Done

One related question, is semantics checking that what is written in the REDUCTION clause actually happens in the loop (e.g., is it checking that B appears in an assignments with IEOR) ?

No. I don't find any related semantic restrictions in the OpenMP Spec. I think rsers should guarantee it. @kiranchandramohan Right?

Yes, there is no such restriction in the standard.

Since the parse-tree models the source more or less directly, the representation for reduction is a clause and not a separate parse-tree node that encapsulates an assignment statement or an expression. We add reduction symbols to the variables involved in the reduction operation but I don't know whether these are carried over to the expression tree.

kiranchandramohan: ```One related question, is semantics checking that what is written in the REDUCTION clause…

#### Introduce a hlfir.expr<T> type

Motivation: avoid the need to materialize expressions in temporaries while

lowering.

Syntax: ``` !hlfir.expr<[extent x]* T [, class]> ```

- `[extent x]*` represents the shape for arrays similarly to !fir.array<> type,

except that the shape cannot be assumed rank (!hlfir.expr<..xT> is invalid).

This restriction can be added because it is impossible to create an assumed

rank expression in Fortran that is not a variable.

- `T` is the element type of the static type

- `class` flag can be set to denote that this a polymorphic expression (that the

dynamic type should not be assumed to be the static type).

examples: !hlfir.expr<fir.char<?>>, !hlfir.expr<10xi32>,

!hlfir.expr<?x10x?xfir.complex<4>>

T in scalar hlfir.expr<T> can be:

- A character type (fir.char<10, kind>, fir.char<?, kind>)

- A derived type: (fir.type<t{...}>)

T in an array hlfir.expr< e1 x ex2 .. : T> can be:

- A character or derived type

peixinUnsubmitted

Not Done

Why keep two complex types instead of only using fir.complex or mlir::complex?

peixin: Why keep two complex types instead of only using fir.complex or mlir::complex?

jeanPerierAuthorUnsubmitted

Done

Why keep two complex types instead of only using fir.complex or mlir::complex?

The need arise from the fact that we want to make a difference between C/Fortran complex type, and C++ std::complex type. These types are layout compatible, but not ABI compatible (they are not passed the same way on all architectures). fir::Complex implements C/Fortran complex, and mlir::Complex is translated to an std::complex like struct.

In the runtime API, we sometime interface with std::complex<>.

Using distinct types to enforce the ABI may not be the only solution. But it was the simplest and more robust so far (especially given we already had both types). If a solution is found regarding this ABI problem, moving to using mlir::complex would sound OK to me. I see it as something orthogonal to these high level changes (it could be done in parallel). The solution is probably to translate std::complex to a struct type and consider mlir::Complex to be the C complex from an ABI point of view (but I think this may conflicts with MLIR current codegen towards LLVM that would have to be overridden). fir.convert between mlir complex and the related struct type would have to be allowed and probably implemented by going through memory and doing the cast there to rely on the layout compatibility property.

jeanPerier: > Why keep two complex types instead of only using fir.complex or mlir::complex? The need…

- A logical type (fir.logical<kind>)

- An integer type (i1, i32, ….)

- A floating point type (f32, f16…)

- A complex type (fir.complex<4> or mlir::complex<f32>...)

Some expressions may be polymorphic (for instance, MERGE can be used on

polymorphic entities). The hlfir.expr type has an optional "class" flag to

denote this: hlfir.expr<T, class>.

Note that the ALLOCATABLE, POINTER, TARGET, VOLATILE, ASYNCHRONOUS, OPTIONAL

aspects do not apply to expressions, they apply to variables.

It is possible to query the following about an expression:

- What is the extent : via hlfir.get_extent %expr, dim

- What are the length parameters: via hlfir.get_typeparam %expr [, param_name]

- What is the dynamic type: via hlfir.get_dynamic_type %expr

peixinUnsubmitted

Done

- %element = fir.apply %expr, %i, %j : (!fir.expr<T>, index index) ->

- fir.expr<ScalarType> |! AnyConstantSizeScalarType

+ fir.expr<ScalarType> | AnyConstantSizeScalarType

It is not directly possible to take an address for the expression, but an

editor problem?

peixin: editor problem?

jeanPerierAuthorUnsubmitted

Done

Yes, thanks. I replaced all the non breaking spaces by normal spaces and remove the redundant ones like here.

jeanPerier: Yes, thanks. I replaced all the non breaking spaces by normal spaces and remove the redundant…

It is possible to get the value of an array expression element:

- %element = hlfir.apply %expr, %i, %j : (!hlfir.expr<T>, index index) ->

hlfir.expr<ScalarType> | AnyConstantSizeScalarType

It is not directly possible to take an address for the expression, but an

expression value can be associated to a new variable whose address can be used

(required when passing the expression in a user call, or to concepts that are

kept low level in FIR, like IO runtime calls). The variable created may be a

compiler created temporary, or may relate to a Fortran source variable if this

mechanism is used to implement ASSOCIATE.

- %var = hlfir.associate %expr [attributes about the association]->

AnyMemoryOrBoxType

- hlfir.end_association %var

peixinUnsubmitted

Not Done

This should be added in the description when this opertion is added in FIROps.td.

I am thinking if fix.expr<T> is a good solution for the argument with VALUE attribute in lowering (both caller and callee side), especially for the derived type and BIND(C)?

peixin: This should be added in the description when this opertion is added in FIROps.td. I am…

jeanPerierAuthorUnsubmitted

Done

I am thinking if fix.expr<T> is a good solution for the argument with VALUE attribute in lowering (both caller and callee side), especially for the derived type and BIND(C)?

It may help, but since this type is not intended to survive until LLVM codegen, the VALUE aspect would still have to be translated with the current FIR types as it is currently.

jeanPerier: > I am thinking if fix.expr<T> is a good solution for the argument with VALUE attribute in…

The intention is that the hlfir.expr<T> is the result of an operation, and

should most often not be a block argument. This is because the hlfir.expr is

peixinUnsubmitted

Done

(fir.box<T>), implying that if a fir.expr<T> is passed as a block argument, the

- expression bufferization pass will evaluate the operation producing the

+ expression bufferization pass will evaluate the operation producing the

expression in a temporary, and transform the block operand into a fir.box

peixin:

mostly intended to allow combining chains of operations into more optimal

forms. But it is possible to represent any expression result via a Fortran

runtime descriptor (fir.box<T>), implying that if a hlfir.expr<T> is passed as

a block argument, the expression bufferization pass will evaluate the operation

producing the expression in a temporary, and transform the block operand into a

fir.box describing the temporary. Clean-up for the temporary will be inserted

after the last use of the hlfir.expr. Note that, at least at first, lowering

may help FIR to find the last use of a hlfir.expr by explicitly inserting a

hlfir.finalize %expr operation that may turn into a no-op if the expression is

not later materialized in memory.

It is nonetheless not intended that such abstract types be used as block

arguments to avoid introducing allocations and descriptor manipulations.

#### fir.declare operation

Motivation: represent variables, linking together a memory storage, shape,

length parameters, attributes and the variable name.

Syntax:

```

%var = fir.declare %base [shape %extent1, %extent2, ...] [lbs %lb1, %lb2, ...] [typeparams %l1, ...] {fir.def = mangled_variable_name, attributes} : [(....) ->] T

```

%var will have the same type as %base. When no debug info is generated, the

operation can be replaced by %base when lowering to LLVM.

- Extents should only be provided if %base is not a fir.box and the entity is an

array.

- lower bounds should only be provided if the entity is an array and the lower

bounds are not default (all ones). It should also not be provided for POINTERs

and ALLOCATABLES since the lower bounds may change.

- type parameters should be provided for entities with length parameters, unless

the entity is a CHARACTER where the length is constant in %base type.

- The attributes will include the Fortran attributes: TARGET (fir.target),

POINTER (fir.ptr), ALLOCATABLE (fir.alloc), CONTIGUOUS (fir.contiguous),

OPTIONAL (fir.optional), VOLATILE (fir.volatile), ASYNCHRONOUS (fir.async).

peixinUnsubmitted

Not Done

Will fir.declare be used for common blocks as well?

peixin: Will `fir.declare` be used for common blocks as well?

jeanPerierAuthorUnsubmitted

Done

It will be used for the common block members once the address computation is done (maybe with the common block name as a fir.common attribute so that it can easily be identified which common block a variable belongs to). Same idea with equivalences. I am not sure if there is a need to declare the whole common block in a declare op, but if it turns out useful for OpenMP or other usages, why not.

jeanPerier: It will be used for the common block members once the address computation is done (maybe with…

They will also indicate when an entity is part of an equivalence by giving the

equivalence name (fir.equiv = mangled_equivalence_name).

fir.declare will be used for all Fortran variables, except the ones created via

the ASSOCIATE construct that will use hlfir.associate described below.

fir.declare will also be used when creating compiler created temporaries, in

which case the fir.tmp attribute will be given.

Examples:

| FORTRAN | FIR |

| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

| REAL :: X | %mem = fir.alloca f32 %x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<f32> |

| REAL, TARGET :: X(10) | %mem = fir.alloca f32 %nval = fir.load %n %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.target} : fir.ref<fir.array<10xf32>> |

| REAL :: X(N) | %mem = // … alloc or dummy argument %nval = fir.load %n : i64 %x = fir.declare %mem shape %nval {fir.def = "\_QPfooEx"} : (i64) -> fir.ref<fir.array<?xf32>> |

| REAL :: X(0:) | %mem = // … dummy argument %c0 = arith.constant 0 : index %x = fir.declare %mem lbs %c0 {fir.def = "\_QPfooEx"} : (index) -> fir.box<fir.array<?xf32>> |

| REAL, POINTER :: X(:) | %mem = // … dummy argument, or local, or global %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.ref<fir.box<fir.ptr<fir.array<?xf32>>>> |

| REAL, ALLOCATABLE :: X(:) | %mem = // … dummy argument, or local, or global %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.alloc} : fir.ref<fir.box<fir.heap<fir.array<?xf32>>>> |

| CHARACTER(10) :: C | %mem = // … dummy argument, or local, or global %c = fir.declare %mem lbs %c0 {fir.def = "\_QPfooEc"} : fir.ref<fir.char<10>> |

| CHARACTER(\*) :: C | %unbox = fir.unbox %bochar (fir.boxchar<1>) -> (fir.ref<fir.char<?>>, index) %c = fir.declare %unbox#0 typeparams %unbox#1 {fir.def = "\_QPfooEc"} : (index) -> fir.ref<fir.char<?>> |

| CHARACTER(\*), OPTIONAL, ALLOCATABLE :: C | %mem = // … dummy argument %c = fir.declare %mem {fir.def = "\_QPfooEc", fir.alloc, fir.optional, fir.assumed\_len\_alloc} : fir.ref<fir.box<fir.heap<fir.char<?>>>> |

| TYPE(T) :: X | %mem = // … dummy argument, or local, or global %x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<fir.type<t{...}>> |

| TYPE(T(L)) :: X | %mem = // … dummy argument, or local, or global %lval = fir.load %l %x = fir.declare %mem typeparams %lval {fir.def = "\_QPfooEx"} : fir.box<fir.type<t{...}>> |

| CLASS(\*), POINTER :: X | %mem = // … dummy argument, or local, or global %x = fir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.class<fir.ptr<None>> |

| REAL :: X(..) | %mem = // … dummy argument %x = fir.declare %mem {fir.def = "\_QPfooEx"} : fir.box<fir.array<..xf32>> |

#### hlfir.associate operation

Motivation: represent Fortran associations (both from variables and expressions)

and allow keeping actual/dummy argument association information after inlining.

Syntax:

```

%var = hlfir.associate %expr_or_var {fir.def = mangled_uniq_name, attributes} (AnyExprOrVarType) -> AnyVarType

```

hlfir.associate is used to represent the following associations:

- Dummy/Actual association on the caller side (the callee side uses

fir.declare).

- Host association in block constructs when VOLATILE/ASYNC attributes are added

locally

- ASSOCIATE construct (both from variable and expressions).

peixinUnsubmitted

Done

the variable locally, and to encode certain side-effects (like copy-in/copy-out

- when going from a non-contiguous variable to a contiguous variable, with the

+ when going from a non-contiguous variable to a contiguous variable, with the

help of the related fir.end_association operation).

peixin:

When the operand is a variable, hlfir.associate allows changing the attributes

of the variable locally, and to encode certain side-effects (like

copy-in/copy-out when going from a non-contiguous variable to a contiguous

variable, with the help of the related hlfir.end_association operation).

kiranchandramohanUnsubmitted

Not Done

Can this be modeled by a region instead?

We discussed this in the Flang technical call. The reasons mentioned include difficulty handling exits, branches out of this etc.

kiranchandramohan: Can this be modeled by a region instead? We discussed this in the Flang technical call. The…

jeanPerierAuthorUnsubmitted

Done

One of the main issue I think is that it is intended to deal with argument association on the caller sides. So if we used regions, you would have something like:

fir.associate %actual1 to %dummy1 {

fir.associate %actual2 to %dummy2 {
  ....
  %res = fir.call %foo(%dumm1, %dummy2)
  ...
}

}

The first issue here is that %res is not accessible after the call where it needs to be used. It could be propagated back, but in general, this means that any SSA values created during the association lifetime is unusable, even if it is not linked to toe variable lifetime. And I find this very restrictive hard to work with.

Then, nested regions are harder to reorder than operations in a same block. Fortran tells arguments can be evaluated in any order. But generating such associate nest in lowering would in my opinion make it harder to re-order things here to to better CSE for instance.

jeanPerier: One of the main issue I think is that it is intended to deal with argument association on the…

kiranchandramohanUnsubmitted

Not Done

The first issue here is that %res is not accessible after the call where it needs to be used. It could be propagated back, but in general, this means that any SSA values created during the association lifetime is unusable, even if it is not linked to toe variable lifetime. And I find this very restrictive hard to work with.

Yes, that is right particularly after mem2reg kind of passes. But for load-store kind of code that lowering generates, it should not be a problem.

kiranchandramohan: > The first issue here is that %res is not accessible after the call where it needs to be used.

When the operand is an expression, hlfir.associate allows associating a storage

location to an expression value.

A hlfir.associate must be followed by a related hlfir.end_association that will

allow inserting any necessary finalization or copy-out later.

#### hlfir.end_association operation

Motivation: mark the place where some association should end and some side

effects might need to occur.

The hlfir.end_associate is a placeholder to later insert

deallocation/finalization if the variable was associated with an expression,

and to insert copy-out/deallocation if the variable was associated with another

variable with a copy-in.

Syntax:

```

hlfir.end_association %var [%original_variable] {fir.ref = var_mangled_name, attributes}

```

The attributes can be:

- copy_out (copy out the associated variable back into the original variable

if a copy-in occurred)

- finalize_copy_in (deallocate the temporary storage for the associated

variable if a copy-in occurred but the associated variable was not modified

(e.g., it is intent(in))).

- finalize: indicate that a finalizer should be run on the entity associated

with the variable (There is currently no way to deduce this only from the

variable type in FIR). It will give the finalizer mangled name so that it

can be later called.

If the copy_out or finalize_copy_in attribute is set, “original_variable” (the

argument of the hlfir.associate that produced %var) must be provided. The

rationale is that the original variable address is needed to verify if a

temporary was created, and if needed, to copy the data back to it.

#### hlfir.finalize

Motivation: mark end of life of local variables

Mark the place where a local variable will go out of scope. The main goal is to

retain this information even after local variables are inlined.

Syntax:

```

hlfir.finalize %var {fir.ref = var_mangled_name, attributes}

```

The attributes can be:

- finalize: indicate that a finalizer should be run on the entity associated

with the variable (There is currently no way to deduce this only from the

variable type in FIR).

peixinUnsubmitted

Done

to help -> help?

peixin: `to help` -> `help`?

Note that finalization will not free the local variable storage if it was

allocated on the heap. If lowering created the storage passed to fir.declare via

a fir.allocmem, lowering should insert a fir.freemem after the hlfir.finalize.

This could help making fir.allocmem to fir.alloca promotion simpler, and also

because finalization may be run without the intent to deallocate the variable

storage (like on INTENT(OUT) dummies).

#### hlfir.designate

Motivation: Represent designators at a high-level and allow representing some

information about derived type components that would otherwise be lost, like

component lower bounds.

Represent Fortran designators in a verbatim way: both triplet, and component

parts.

Syntax:

```

%var = hlfir.designate %base [“component”,] [(%i, %k:l%:%m)] [substr ub, lb] [imag|real] [shape extent1, extent2, ....] [lbs lb1, lb2, .....] [typeparams %l1, ...] {fir.ref = base_mangled_name, fir.def = mangled_name, attributes}

```

hlfir.designate is intended to encode a single part-ref (as defined by the

fortran standard). That means that a(:)%x(i, j, k) must be split into two

hlfir.designate: one for a(:), and one for x(i, j, k). If the base is ranked,

and the component is an array, the subscripts are mandatory and must not

contain triplets. This ensures that the result of a fir.designator cannot be a

"super-array".

The subscripts passed to hlfir.designate must be based on the base lower bounds

(one by default).

A substring is built by providing the lower and upper character indices after

`substr`. Implicit substring bounds must be made explicit by lowering. It is

not possible to provide substr if a component is already provided. Instead the

related Fortran designator must be split into two fir.designator. This is

because the component character length will be needed to compute the right

stride, and it might be lost if not placed on the first designator typeparams.

Real and Imaginary complex parts are represented by an optional imag or real

tag. It can be added even if there is already a component.

The shape, lower bound, and type parameter operands represent the output entity

properties. The point of having those made explicit is to allow early folding

and hoisting of array section shape and length parameters (which especially in

FORALL contexts, can simplify later assignment temporary insertion a lot). Also,

if lower bounds of a derived type component array could not be added here, they

would be lost since they are not represented by other means in FIR (the fir.type

does not include this information).

hlfir.designate is not intended to describe vector subscripted variables.

Instead, lowering will have to introduce loops to do element by element

addressing. See the Examples section. This helps keeping hlfir.designate simple,

and since the contexts where a vector subscripted entity is considered to be a

variable (in the sense that it can be modified) are very limited, it seems

reasonable to have lowering deal with this aspect. For instance, a vector

subscripted entity cannot be passed as a variable, it cannot be a pointer

assignment target, and when it appears as an associated entity in an ASSOCIATE,

the related variable cannot be modified.

#### hlfir.assign

Motivation: represent assignment at a high-level (mainly a change for array and

character assignment) so that optimization pass can clearly reason about it

(value propagation, inserting temporary for right-hand side evaluation only when

needed), and that lowering does not have to implement it all.

peixinUnsubmitted

Done

Syntax:

```

- fir.assign %expr_or_var! to %var [attributes]

+ fir.assign %expr_or_var to %var [attributes]

```

The attributes can be:

editor problem?

peixin: editor problem?

Syntax:

```

hlfir.assign %expr_or_var to %var [attributes]

```

kiranchandramohanUnsubmitted

Not Done

Will there be fir.allocate or fir.deallocate operations generated for these?

kiranchandramohan: Will there be `fir.allocate` or `fir.deallocate` operations generated for these?

jeanPerierAuthorUnsubmitted

Done

Not until alias analysis is done. The rational is that the reallocation can be optimized when there is a potential overlap (the temp can be moved to the new allocatable value).

So lowering will only add the realloc attribute, and a FIR to FIR pass will lower this to a set of fir.assign (without the realloc attribute) + fir.allocate + fir.deallocate + necessary fir.if nests taking advantage of alias analysis.

jeanPerier: Not until alias analysis is done. The rational is that the reallocation can be optimized when…

The attributes can be:

- realloc: mark that assignment has F2003 semantics and that the left-hand

side may have to be deallocated/reallocated…

- use_assign=@function: mark a user defined assignment

- no_overlap: mark that an assignment does not need a temporary (added by an

analysis pass).

- unordered : mark that an assignment can happen in any element order (not

true if there is an impure elemental function being called).

This will replace the current array_load/array_access/array_merge semantics.

Instead, a more generic alias analysis will be performed on the LHS and RHS to

detect aliasing, and a temporary inserted if needed. The alias analysis will

look at all the memory references in the RHS operand tree and base overlap

decisions on the related variable declaration operations. This same analysis

should later allow moving/merging some expression evaluation between different

statements.

peixinUnsubmitted

Done

the right hand side was always evaluated in a temporary. The motivation to use

- fir.assign is to help the temporary removal, and also to deal with two edge

+ fir.assign is to help the temporary removal, and also to deal with two edge

cases: user assignment in a FORALL (the forall pass will need to understand that

peixin:

Note about user defined assignments: semantics is resolving them and building

the related subroutine call. So a fir.call could directly be made in lowering if

the right hand side was always evaluated in a temporary. The motivation to use

hlfir.assign is to help the temporary removal, and also to deal with two edge

cases: user assignment in a FORALL (the forall pass will need to understand that

this an assignment), and allocatable assignment mixed with user assignment

(implementing this as a call in lowering would require lowering the whole

reallocation logic in lowering already, duplicating the fact that hlfir.assign

should deal with it).

#### hlfir.ptr_assign

Motivation: represent pointer assignment without lowering the exact pointer

implementation (descriptor address, fir.ref<fir.box> or simple pointer scalar

fir.llvm_ptr<fir.ptr>).

Syntax:

```

hlfir.ptr_assign %var [[reshape %reshape] | [lbounds %lb1, …., %lbn]] to %ptr

```

It is important to keep pointer assignment at a high-level so that they can

later correctly be processed in hlfir.forall.

kiranchandramohanUnsubmitted

Not Done

Will fir.allocate lower to fir.allocmem or to runtime calls? Why?

kiranchandramohan: Will `fir.allocate` lower to `fir.allocmem` or to runtime calls? Why?

jeanPerierAuthorUnsubmitted

Done

Will fir.allocate lower to fir.allocmem or to runtime calls

Currently, the runtime is only used when allocations are none trivial (include initializations, polymorphic types, or error handling). However, we may want to offer the user the ability to debug their allocations/deallocation by validating that ALLOCATABLE being deallocated by have indeed been allocated, and are not pointing to bad addresses. In those case, the runtime would always be used.

In general, users may like the ability too easily hook into Fortran allocatable/pointer allocation via the runtime without recompiling programs. Some may prefer to have all allocations inlined when possible. So I think having the option in fir.allocate/fir.deallocate between fully inlined allocation with fir.allocmem or using the runtime makes sense.

I edited the sentence to make this clearer.

jeanPerier: > Will fir.allocate lower to fir.allocmem or to runtime calls Currently, the runtime is only…

#### hlfir.allocate

Motivation: keep POINTER and ALLOCATABLE allocation explicit in HLFIR, while

allowing later lowering to either inlined fir.allocmem or Fortran runtime

calls. Generating runtime calls allow the runtime to do Fortran specific

bookkeeping or flagging and to provide better runtime error reports.

The main difference with the ALLOCATE statement is that one distinct

hlfir.allocate has to be created for each element of the allocation-list.

Otherwise, it is a naive lowering of the ALLOCATE statement.

Syntax:

```

%stat = hlfir.allocate %var [%shape] [%type_params] [[src=%source] | [mold=%mold]] [errmsg =%errmsg]

```

#### hlfir.deallocate

Motivation: keep deallocation explicit in HLFIR, while allowing later lowering

to Fortran runtime calls to allow the runtime to do Fortran specific

bookkeeping or flagging of allocations.

Similarly to hlfir.allocate, one operation must be created for each

allocate-object-list object.

Syntax:

```

%stat = hlfir.deallocate %var [errmsg=err].

```

#### hlfir.elemental

Motivation: represent elemental operations without defining array level

operations for each of them, and allow the representation of array expressions

as function of the indices.

The hlfir.elemental operation can be seen as a closure: it is defining a

function of the indices that returns the value of the element of the

represented array expression at the given indices. This an operation with an

MLIR region. It allows detailing how an elemental expression is implemented at

the element level, without yet requiring materializing the operands and result

in memory. The hlfir.expr<T> elements value can be obtained using hlfir.apply.

The element result is built with a fir.result op, whose result type can be a

scalar hlfir.expr<T> or any scalar constant size types (e.g. i32, or f32).

peixinUnsubmitted

Done

%op = fir.elemental (%indices) %shape [%type_params] [%dynamic_type] {

- !!!….

+ ….

!!fir.result %result_element

editor problem?

peixin: editor problem?

peixinUnsubmitted

Done

!!!….

- !!fir.result %result_element

+ fir.result %result_element

}

```

editor problem?

peixin: editor problem?

Syntax:

```

%op = hlfir.elemental (%indices) %shape [%type_params] [%dynamic_type] {

….

fir.result %result_element

}

kiranchandramohanUnsubmitted

Not Done

Nit: Could you add some explanation for one based?

kiranchandramohan: Nit: Could you add some explanation for `one based`?

jeanPerierAuthorUnsubmitted

Done

Nit: Could you add some explanation for one based?

Sure, I added a note. The purpose if to match Fortran default for array variables so that there is no need to generate bound adjustments when working with one based array variables in an expression.

jeanPerier: > Nit: Could you add some explanation for `one based`? Sure, I added a note. The purpose if to…

```

Note that %indices are not operands, they are the elemental region block

arguments, representing the array iteration space in a one based fashion.

peixinUnsubmitted

Done

Illustration: “A + B” represented with a fir.elemental.

```

- %add = fir.elemental (%i:index, %j:index) shape %shape (!fir.shape<2>) -> !fir.expr<?x?xf32>> {

+ %add = fir.elemental (%i:index, %j:index) shape %shape (!fir.shape<2>) -> !fir.expr<?x?xf32> {

!!%belt = fir.designate %b, %i, %j {fir.ref = _QPfooEb, fir.def = _QPfooEb.des001}: (!fir.ref<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>

nit

peixin: nit

The choice of using one based indicies is to match Fortran default for

array variables, so that there is no need to generate bound adjustments

when working with one based array variables in an expression.

peixinUnsubmitted

Done

!!%celt = fir.designate %c, %i, %j {fir.ref = _QPfooEa, fir.def = _QPfooEa.des002} : (!fir.ref<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>

- !!%bval = fir.load! %belt : (!fir.ref<f32>) -> f32

+ %bval = fir.load %belt : (!fir.ref<f32>) -> f32

!!%cval = fir.load! %celt : (!fir.ref<f32>) -> f32

It seems there is some editor problem? from line 542 to line 547.

peixin: It seems there is some editor problem? from line 542 to line 547.

Illustration: “A + B” represented with a hlfir.elemental.

```

%add = hlfir.elemental (%i:index, %j:index) shape %shape (!fir.shape<2>) -> !hlfir.expr<?x?xf32> {

%belt = hlfir.designate %b, %i, %j {fir.ref = _QPfooEb, fir.def = _QPfooEb.des001}: (!fir.ref<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>

%celt = hlfir.designate %c, %i, %j {fir.ref = _QPfooEa, fir.def = _QPfooEa.des002} : (!fir.ref<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>

%bval = fir.load %belt : (!fir.ref<f32>) -> f32

%cval = fir.load %celt : (!fir.ref<f32>) -> f32

%add = arith.addf %bval, %cval : f32

fir.result %res : f32

}

```

In contexts where it can be proved that the array operands were not modified

between the hlfir.elemental and the hlfir.apply, the region of the

hlfir.elemental can be inlined at the hlfir.apply. Otherwise, if there is no

such guarantee, or if the hlfir.elemental is not “visible” (because its result

is passed as a block argument), the hlfir.elemental will be lowered to an array

temporary. This will be done as a HLFIR to HLFIR optimization pass. Note that

MLIR inlining could be used if hlfir.elemental implemented the

CallableInterface and hlfir.apply the CallInterface. But MLIR generic inlining

is probably too generic for this case: no recursion is possible here, the call

graphs are trivial, and using MLIR inlining here could introduce later

conflicts or make normal function inlining more complex because FIR inlining

hooks would already be used.

hlfir.elemental allows delaying elemental array expression buffering and

combination. Its generic aspect has two advantages:

- It avoids defining one operation per elemental operation or intrinsic,

instead, the related arith dialect operations can be used directly in the

elemental regions. This avoids growing HLFIR and having to maintain about a

rogfer01Unsubmitted

Not Done

I'm curious about the choice of the name, why not something closer to Fortran like array_element? apply seems overly generic to me, but maybe there is precedent elsewhere?

rogfer01: I'm curious about the choice of the name, why not something closer to Fortran like…

jeanPerierAuthorUnsubmitted

Done

The rational of that name was to insist on the fact that this should not to be seen as a memory addressing operation at that level, but rather like an operation that, in certain circonstances (*), can see the fir.expr defining operation as a lambda and "apply" it given a set of indices (inlining the fir.elemental body at the fir.apply).

Affine has an operation named "affine.apply" that applies an affine map on a set of indices.

Now I am open to other names. Has anyone else opinions regarding array_element vs apply naming ?

(*) Basically that there should not be any operations between the fir.expr evaluation and use that may affect (or be affected by) the expression evaluation.

jeanPerier: The rational of that name was to insist on the fact that this should not to be seen as a memory…

klauslerUnsubmitted

Not Done

I find the name to be entirely appropriate in the function composition view of array programming.

klausler: I find the name to be entirely appropriate in the function composition view of array…

rogfer01Unsubmitted

Done

Hi @jeanPerier, thanks for the context.

No objections on the name.

rogfer01: Hi @jeanPerier, thanks for the context. No objections on the name.

hundred operations.

- It allows representing transformational intrinsics as functions of the indices

while doing optimization as described in

[Array Composition](ArrayComposition.md). This because the indices can be

transformed inside the region before being applied to array variables

according to any kind of transformation (semi-affine or not).

kiranchandramohanUnsubmitted

Done

This is the addressing equivalent for expressions. A notable difference is that

- it cannot only take simple scalar indices (no triplets) because it is not clear

+ it can only take simple scalar indices (no triplets) because it is not clear

why supporting triplets would be needed, and keeping the indexing simple makes

kiranchandramohan:

#### Introducing the hlfir.apply operation

Motivation: provide a way to get the element of an array expression

(hlfir.expr<?x…xT>)

This is the addressing equivalent for expressions. A notable difference is that

it can only take simple scalar indices (no triplets) because it is not clear

why supporting triplets would be needed, and keeping the indexing simple makes

inlining of hlfir.elemental much easier.

If hlfir.elemental inlining is not performed, or if the hlfir.expr<T> array

expression is produced by another operation (like fir.intrinsic) that is not

rewritten, hlfir.apply will be lowered to an actual addressing operation that

will address the temporary that was created for the hlfir.expr<T> value that

was materialized in memory.

hlfir.apply indices will be one based to make further lowering simpler.

Syntax:

```

%element = hlfir.apply %array_expr %i, %j: (hlfir.expr<?x?xi32>) -> i32

```

#### Introducing operations for transformational intrinsic functions

Motivation: Represent transformational intrinsics functions at a high-level so

that they can be manipulated easily by the optimizer, and do not require

materializing the result as a temporary in lowering.

An operation will be added for each Fortran transformational functions (SUM,

MATMUL, TRANSPOSE....). It translates the Fortran expression verbatim: it takes

the same number of arguments as the Fortran intrinsics and returns a

hlfir.expr<T>. The arguments may be hlfir.expr<T>, simple scalar types (e.g.,

i32, f32), or variables.

The exception being that the arguments that are statically absent would be

passed to it (passing results of fir.absent operation), so that the arguments

can be identified via their positions.

kiranchandramohanUnsubmitted

Not Done

I am assuming the simplification of intrinsics pass (with minor modifications) will still be useful for generating an inlined version. Or would lowering and high-level FIR make this redundant?

kiranchandramohan: I am assuming the simplification of intrinsics pass (with minor modifications) will still be…

jeanPerierAuthorUnsubmitted

Done

I am assuming the simplification of intrinsics pass (with minor modifications) will still be useful for generating an inlined version.

Absolutely, the plan is to use this pass (modifying it so that it can tape into the fir.intrinsic ops ).

jeanPerier: > I am assuming the simplification of intrinsics pass (with minor modifications) will still be…

This operation is meant for the transformational intrinsics, not the elemental

intrinsics, that will be implemented using hlfir.elemental + mlir math dialect

operations, nor the intrinsic subroutines (like random_seed or system_clock),

that will be directly lowered in lowering.

Syntax:

```

%res = hlfir."intrinsic_name" %expr_or_var, ...

```

These operations will all inherit a same operation base in tablegen to make

their definition and identification easy.

Without any optimization, codegen would then translate the operations to

exactly the same FIR as currently generated by IntrinsicCall.cpp (runtime calls

or inlined code with temporary allocation for array results). The fact that

they are the verbatim Fortran translations should allow to move the lowering

code to a translation pass without massive changes.

An operation will at least be created for each of the following transformational

intrinsics: all, any, count, cshift, dot_product, eoshift, findloc, iall, iany,

iparity, matmul, maxloc, maxval, minloc, minval, norm2, pack, parity, product,

reduce, repeat, reshape, spread, sum, transfer, transpose, trim, unpack.

For the following transformational intrinsics, the current lowering to runtime

call will probably be used since there is little point to keep them high level:

- command_argument_count, get_team, null, num_images, team_number, this_image

that are more program related (and cannot appear for instance in constant

expressions)

- selected_char_kind, selected_int_kind, selected_real_kind that returns scalar

integers

#### Introducing operations for character operations and elemental intrinsic functions

Motivation: represent character operations without requiring the operand and

results to be materialized in memory.

fir.char_op is intended to represent:

- Character concatenation (//)

- Character MIN/MAX

- Character MERGE

- “SET_LENGTH”

- Character conversions

- REPEAT

- INDEX

- CHAR

- Character comparisons

- LEN_TRIM

The arguments must be scalars, the elemental aspect should be handled by a

hlfir.elemental operation.

Syntax:

```

%res = hlfir.“char_op” %expr_or_var

```

Just like for the transformational intrinsics, if no optimization occurs, these

operations will be lowered to memory operations with temporary results (if the

result is a character), using the same generation code as the one currently used

in lowering.

kiranchandramohanUnsubmitted

Not Done

Would this have a lowering to fir.allocmem?

kiranchandramohan: Would this have a lowering to `fir.allocmem`?

jeanPerierAuthorUnsubmitted

Done

It depends of the array constructor.
I think we should not generate a fir.allocmem for small and easy things like [%i, %j, %k] (only scalar, no ac-implied, less than N elements, where N is some option). This should lower to fir.alloca.
Otherwise, yes, fir.allocmem will be used, although reallocation will still be needed to deal with the edge cases where the final size cannot be pre-computed (for instance [foo(), bar(), buzz()], where all three functions returns rank-one allocatable arrays).

jeanPerier: It depends of the array constructor. I think we should not generate a fir.allocmem for small…

#### hlfir.array_ctor

Motivation: represent array constructor without creating temporary

Many array constructors have a limited number of elements (less than 10), the

current lowering of array constructor is rather complex because it must deal

with the generic cases.

Having a representation to represent array constructor will allow an easier

lowering of array constructor, and make array ctor a lot easier to manipulate.

For instance, for small array constructors, loops could could be unrolled with

peixinUnsubmitted

Done

%array_ctor = fir.array_ctor %expr1, %expr2 ….

```

- Note that fir.elemental could be used to implement some!ac-implied-do, although

+ Note that fir.elemental could be used to implement some ac-implied-do, although

this is not yet clarified since ac-implied-do may contain more than one scalar

peixin: ?

the array ctor elements without ever creating a dynamically allocated array

temporary and loop nest using it.

Syntax:

```

%array_ctor = hlfir.array_ctor %expr1, %expr2 ….

```

Note that hlfir.elemental could be used to implement some ac-implied-do,

although this is not yet clarified since ac-implied-do may contain more than

one scalar element (they may contain a list of scalar and array values, which

would render the representation in a hlfir.elemental tricky, but maybe not

impossible using if/then/else and hlfir.elemental nests using the index value).

One big issue though is that hlfir.elemental requires the result shape to be

pre-computed (it is an operand), and with an ac-implied-do containing user

transformational calls returning allocatable or pointer arrays, it is

impossible to pre-evaluate the shape without evaluating all the function calls

entirely (and therefore all the array constructor elements).

#### hlfir.get_extent

Motivation: inquire about the extent of a hlfir.expr, variable, or fir.shape

Syntax:

```

%extent = hlfir.get_extent %shape_expr_or_var, dim

```

dim is a constant integer attribute.

This allows inquiring about the extents of expressions whose shape may not be

yet computable without generating detailed, low level operations (e.g, for some

transformational intrinsics), or to avoid going into low level details for

pointer and allocatable variables (where the descriptor needs to be read and

loaded).

#### hlfir.get_typeparam

Motivation: inquire about the type parameters of a hlfir.expr, or variable.

Syntax:

```

%param = hlfir.get_typeparam %expr_or_var [, param_name]

```

- param_name is an optional string attribute that must contain the length

parameter name if %expr_or_var is a derived type.

#### hlfir.get_dynamic_type

Motivation: inquire about the dynamic type of a polymorphic hlfir.expr or

variable.

peixinUnsubmitted

Done

Motivation: represent conformity requirement/information between two array

- operands so that later optimization can chose the best shape information source,

+ operands so that later optimization can choose the best shape information source,

or insert conformity runtime checks.

peixin:

Syntax:

```

%dynamic_type = hlfir.get_dynamic_type %expr_or_var

```

#### hlfir.get_lbound

Motivation: inquire about the lower bounds of variables without digging into

the implementation details of pointers and allocatables.

Syntax:

```

%lb = hlfir.get_lbound %var, n

```

Note: n is an integer constant attribute for the (zero based) dimension.

#### hlfir.shape_meet

Motivation: represent conformity requirement/information between two array

operands so that later optimization can choose the best shape information

source, or insert conformity runtime checks.

Syntax:

```

%shape = hlfir.shape_meet %shape1, %shape2

```

Suppose A(n), B(m) are two explicit shape arrays. Currently, when A+B is

lowered, lowering chose which operand shape gives the result shape information,

peixinUnsubmitted

Done

Another tough: The mask expression lhs variable might be changed in the assignment. There seems to be one bug in current FIR lowering. fir.forall may make it easy to support that case.

peixin: Another tough: The mask expression lhs variable might be changed in the assignment. There seems…

jeanPerierAuthorUnsubmitted

Done

Thanks, indeed. I added note about masks

jeanPerier: Thanks, indeed. I added note about masks

and it is later not retrievable that both n and m can be used. If lowering

chose n, but m later gets folded thanks to inlining or constant propagation, the

optimization passes have no way to use this constant information to optimize the

result storage allocation or vectorization of A+B. hlfir.shape_meet intends to

delay this choice until constant propagation or inlining can provide better

information about n and m.

#### hlfir.forall

Motivation: segregate the Forall lowering complexity in its own unit.

Forall is tough to lower because:

peixinUnsubmitted

Done

assignments. Any assignments syntactically before the where mask occurrence

- must be performed before the mask evaluation.

+ must be performed before the mask evaluation.

Note that forall forbids impure function calls, hence, no calls should modify

peixin: ?

jeanPerierAuthorUnsubmitted

Done

Yes, I agree the mask evaluation cannot affect anything (just like the forall indices evaluation), but the mask evaluation may be affected by the assignments.

jeanPerier: Yes, I agree the mask evaluation cannot affect anything (just like the forall indices…

- Lowering it in an optimal way requires analyzing several assignments/mask

expressions.

- The shape of the temporary needed to store intermediate evaluation values is

not a Fortran array in the general case, and cannot in the general case be

maximized/pre-computed without executing the forall to compute the bounds of

inner forall, and the shape of the assignment operands that may depend on

the bound values.

- Mask expressions evaluation should be affected by previous assignment

statements, but not by the following ones. Array temporaries may be

required for the masks to cover this.

- On top of the above points, Forall can contain user assignments, pointer

assignments, and assignment to whole allocatable.

The hlfir.forall syntax would be exactly the one of a fir.do_loop. The

difference would be that hlfir.assign and hlfir.ptr_assign inside hlfir.forall

peixinUnsubmitted

Done

evaluating previous assignments)

- -! 3. For each assignments,!check if the RHS/LHS operands value may depend

+ - 3. For each assignments, check if the RHS/LHS operands value may depend

on the LHS base:

peixin:

have specific semantics (the same as in Fortran):

- Given one hlfir.assign, all the iteration values of the LHS/RHS must be

evaluated before the assignment of any value is done.

- Given two hlfir.assign, the first hlfir.assign must be fully performed

before any evaluation of the operands of the second assignment is done.

- Masks (fir.if arguments), if any, should be evaluated before any nested

assignments. Any assignments syntactically before the where mask occurrence

must be performed before the mask evaluation.

Note that forall forbids impure function calls, hence, no calls should modify

any other expression evaluation and can be removed if unused.

The translation of hlfir.forall will happen by:

- 1. Determining if the where masks value may be modified by any assignments

- Yes, pre-compute all masks in a pre-run of the forall loop, creating

a “forall temps” (we may need a FIR concept to help here).

- No, Do nothing (or indicate it is safe to evaluate masks while evaluating

the rest).

- 2. Determining if a hlfir.assign operand expression depends on the

previous hlfir.assign left-hand side base value.

- Yes, split the hlfir.assign into their own nest of hlfir.forall loops.

- No, do nothing (or indicate it is safe to evaluate the assignment while

evaluating previous assignments)

- 3. For each assignments, check if the RHS/LHS operands value may depend

on the LHS base:

- Yes, split the forall loops. Insert a “forall temps” before the loops for

the “smallest” part that may overlap (which may be the whole RHS, or some

RHS sub-part, or some LHS indices). In the first nest, evaluate this

overlapping part into the temp. In the next forall loop nest, modify the

assignment to use the temporary, and add the [no_overlap] flag to indicate

no further temporary is needed. Insert code to finalize the temp after its

usage.

### Tagging variable uses in high-level operations (fir.ref attribute)

All operations defined above that accept "variables" (i.e: memory addresses or

box values that were produced by fir.declare, hlfir.associate, or

hlfir.designate) must have a fir.ref = mangled_name_attribute that matches the

fir.def on the operation that created them (it will be added automatically by

the operation builder). That is to ensure optimization passes do not merge

seemingly identical operations using variables with different properties, and

also to ensure that the matching defining operation can always be retrieved to

get all the variable properties (shape, bounds, type parameters and

attributes).

Two other alternatives have been considered and rejected:

- Using MLIR symbols. This has been rejected because MLIR symbols are mainly

intended to deal with globals and functions that may refer to each other

before being defined. Their processing is not as light as normal values, and

would require to turn every FIR operation with a region into an MLIR symbol

table. This would especially be annoying given fir.designator also produce

variables with their own properties, which would imply creating a lot of MLIR

symbols. All the operations that both accept variable and expression operands

would also either need to be more complex in order to both accept SSA values

or MLIR symbol operands (or some fir.as_expr %var operation should be added to

turn a variable into an expression). Given all variable definitions will

dominates their uses, it seems more adequate to use an SSA model with named

attributes. Using SSA values also makes the transition and mix with

lower-level FIR operations smoother: a variable SSA usage can simply be

replaced by lower-level FIR operations using the same SSA value.

- Another alternative could be making all operations defining variables return

fir.box, and repeating the variable attributes (fir.target...) on all

operations using the variable. This would allow the link between the variable

definition and usage to become broken (variable could travel as block

arguments). But this would risk littering the codegen with fir.box

manipulations (creating, writing and reading to descriptors) that may lead to

poor performance. Maintaining all the attributes on the operations would also

be more cumbersome than only maintaining the variable name in the fir.ref

attribute.

Lower-level operations (the current FIR operations), do not require this strong

link between a memory address and the variable definition, and it will not be

necessary to add fir.ref attributes to those. During alias analysis on FIR using

peixinUnsubmitted

Done

fir.expr, and all the operations that may produce it like transformational

- intrinsics and fir.elemental, fir.apply).

+ intrinsics and fir.elemental, fir.apply).

- Call interface argument association lowering (getting rid of fir.associate

peixin:

lower-level operations (like loads and stores), any memory reference that cannot

be resolved to a Fortran variable or some unrelated temporary allocation is

considered as potentially overlapping.

The variable definition will be guaranteed to have a unique name after lowering,

and some care might have to be taken when later duplicating regions that define

variables in a way that could lead a variable usage to have two dominating

definitions with the same name (this could for instance happen after inlining

two calls to the same procedure inside the same region). Inlining will need to

take care of those conflicts. This could be done by randomizing the inlined

variable name attributes (like by adding a counter index that is incremented

after each call inlining).

## New HLFIR Transformation Passes

### Mandatory Passes (translation towards lower-level representation)

Note that these passes could be implemented as a single MLIR pass, or successive

passes.

- Forall rewrites (getting rid of hlfir.forall)

- Array assignment rewrites (getting rid of array hlfir.assign)

- Bufferization: expression temporary materialization (getting rid of

hlfir.expr, and all the operations that may produce it like transformational

intrinsics and hlfir.elemental, hlfir.apply).

- Call interface argument association lowering (getting rid of hlfir.associate

and hlfir.end_associate)

- Lowering high level operations using variables into FIR operations

operating on memory (translating hlfir.designate, scalar hlfir.assign,

hlfir.finalize into fir.array_coor, fir.do_loop, fir.store, fir.load.

fir.embox/fir.rebox operations).

Note that these passes do not have to be the first one run after lowering. It is

intended that CSE, DCE, algebraic simplification, inlining and some other new

high-level optimization passes discused below be run before doing any of these

translations.

After that, the current FIR pipeline could be used to continue lowering towards

LLVM.

### Optimization Passes

- Elemental expression inlining (inlining of hlfir.elemental in hlfir.apply)

- User function Inlining

- Transformational intrinsic rewrites as hlfir.elemental expressions

- Assignments propagation

- Shape/Rank/dynamic type propagation

These high level optimization passes can be run any number of times in any

order.

## Transition Plan

kiranchandramohanUnsubmitted

Done

expression concepts (requires 1.).

- - 3. Intoduces the new translation passes, using the fir::ExtendedValue helpers

+ - 3. Introduce the new translation passes, using the fir::ExtendedValue helpers

(requires 1.).

kiranchandramohan:

The new higher-level steps proposed in this document will require significant

refactoring of lowering. Codegen should not be impacted since the current FIR

will remain untouched.

A lot of the code in lowering generating Fortran features (like an intrinsic or

how to do assignments) is based on the fir::ExtendedValue concept. This

currently is a collection of mlir::Value that allows describing a Fortran object

(either a variable or an evaluated expression result). The variable and

expression concepts described above should allow to keep an interface very

similar to the fir::ExtendedValue, but having the fir::ExtendedValue wrap a

single value or mlir::Operation* from which all of the object entity

information can be inferred.

That way, all the helpers currently generating FIR from fir::ExtendedValue could

be kept and used with the new variable and expression concepts with as little

modification as possible.

The proposed plan is to:

- 1. Introduce the new HLFIR operations.

- 2. Refactor fir::ExtendedValue so that it can work with the new variable and

expression concepts (requires part of 1.).

- 3. Introduce the new translation passes, using the fir::ExtendedValue helpers

(requires 1.).

- 3.b Introduce the new optimization passes (requires 1.).

- 4. Introduce the fir.declare and hlfir.finalize usage in lowering (requires 1.

and 2. and part of 3.).

tschuettUnsubmitted

Not Done

Isn't this point about the expressiveness of MLIR? What are the engineering costs of adding a FIRX dialect for the higher ops?

What would be the benefit of having a separate dialect?

tschuett: Isn't this point about the expressiveness of MLIR? What are the engineering costs of adding a…

jeanPerierAuthorUnsubmitted

Done

The benefit of having a separate dialect is to strongly split the high level ops that require information about Fortran variables to be retrievable in the IR (via the fir.def/fir.ref) and the current operations that are lower level and do not require such information.
After the translation pass of variable related operation, this dialect would be illegal.

There is a precedent in FIR with the fir::cg dialect ops (https://github.com/llvm/llvm-project/blob/main/flang/include/flang/Optimizer/CodeGen/CGOps.td) that helps simplifying the addressing and emboxing operation before codegen.

I think the engineering cost are mostly about having different td files, headers and .cpp for these new ops and types, but also to register the dialect in the passes that will work with it. That last point may be a bit more annoying (it does not matter with the fir::cg dialect because codegen is the only pass meant to be run with this dialect).

I do not have a strong opinion here.

jeanPerier: The benefit of having a separate dialect is to strongly split the high level ops that require…

mehdi_aminiUnsubmitted

Not Done

to register the dialect in the passes that will work with it.

To be clear: registration is only ever needed in a pass that produces entities from a dialect (ops, attributes, types) when this dialect isn't in the input already. So a pass lowering from a high-level dialect to a lower-level dialect only defines the lower-level dialects to be registered, no need to do anything for the high-level dialect.
Similarly passes that transform within the same dialect don't need to declare anything at all.

mehdi_amini: > to register the dialect in the passes that will work with it. To be clear: registration is…

jeanPerierAuthorUnsubmitted

Done

Thanks for clarifying this point, then I do not see big engineering costs to splitting high level ops depending on the Fortran variable and expression concepts and low level operations operating on memory and simple (integer/floating point/complex) SSA values.

Maybe the new fir.declare op should still belong to FIR directly though. I think it would be valuable to keep it until as late as possible (it is not doing anything other than bookkeeping some Fortran level information about memory storages, which can be useful until the end).

jeanPerier: Thanks for clarifying this point, then I do not see big engineering costs to splitting high…

tarunprabhuUnsubmitted

Not Done

I, too, think it makes sense to keep this in FIR. Any Fortran-specific tooling that operates on MLIR may find this useful, so having it in FIR would be helpful.

tarunprabhu: I, too, think it makes sense to keep this in FIR. Any Fortran-specific tooling that operates on…

The following steps might have to be done in parallel of the current lowering,

to avoid disturbing the work on performance until the new lowering is complete

and on par.

- 5. Introduce hlfir.designate and hlfir.associate usage in lowering.

- 6. Introduce lowering to hlfir.assign (with RHS that is not a hlfir.expr),

hlfir.ptr_assign.

- 7. Introduce lowering to hlfir.expr and related operations.

peixinUnsubmitted

Done

subroutine foo(a, b)

- !!!real :: a(:), b(:)

+ ! real :: a(:), b(:)

!!!a = b

Same for line 970, 978-981, 992-1007, 1032-1041, 1053-1056, ...

peixin: Same for line 970, 978-981, 992-1007, 1032-1041, 1053-1056, ...

jeanPerierAuthorUnsubmitted

Done

Thanks, all the none breaking space issues should have been fixed.

jeanPerier: Thanks, all the none breaking space issues should have been fixed.

- 8. Introduce lowering to hlfir.forall.

At that point, lowering using the high-level FIR should be in place, allowing

extensive testing.

- 9. Debugging correctness.

- 10. Debugging execution performance.

The plan is to do these steps incrementally upstream, but for lowering this will

most likely be safer to do have the new expression lowering implemented in

parallel upstream, and to add an option to use the new lowering rather than to

directly modify the current expression lowering and have it step by step

equivalent functionally and performance wise.

## Examples

### Example 1: simple array assignment

```Fortran

subroutine foo(a, b)

real :: a(:), b(:)

a = b

end subroutine

```

Lowering output:

```HLFIR

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>

hlfir.assign %b to %a {fir.ref = "_QPfooEb,_QPfooEa"}: !fir.box<!fir.array<?xf32>>

return

}

```

HLFIR array assignment lowering pass:

- Query: can %b value depend on %a? No, they are two different argument

associated variables that are neither target nor pointers.

- Lower to assignment to loop:

```HFLIR

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>

%ashape = hlfir.shape_of %a {fir.ref = "_QPfooEa"}

%bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}

%shape = hlfir.shape_meet %ashape, %bshape

%extent = hlfir.get_extent %shape, 0

%c1 = arith.constant 1 : index

fir.do_loop %i = %c1 to %extent step %c1 unordered {

%belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", "fir.def=_QPfooEb.des001"}

%aelt = hlfir.designate %a, %i {fir.ref = "_QPfooEa", "fir.def=_QPfooEa.des002"}

hlfir.assign %belt to %aelt {fir.ref = "_QPfooEb.des001,_QPfooEa.des002"}: fir.ref<f32>, fir.ref<f32>

}

return

}

```

HLFIR variable operations to memory translation pass:

- hlfir.designate is rewritten into fir.array_coor operation on the variable

associated memory buffer, and returns the element address

- For numerical scalar, hlfir.assign is rewritten to fir.store (and fir.load

of the operand if needed), for derived type and characters, memory copy

(and padding for characters) is done.

- hlfir.shape_of are lowered to fir.box_dims, here, no constant information

was obtained from any of the source shape, so hlfir.shape_meet is a no-op,

selecting the first shape (a conformity runtime check could be inserted

under debug options).

- fir.declare are kept (they are no-ops) so that it will be possible to

generate debug information for LLVM.

kiranchandramohanUnsubmitted

Not Done

%bval = fir.load %belt : f32

- fir.store %bval to %alet : fir.ref<f32>

+ fir.store %bval to %aelt : fir.ref<f32>

}

return

kiranchandramohan:

This pass would wrap operations defining variables (fir.declare/hlfir.designate)

as fir::ExtendedValue, and use all the current helpers operating on it

(e.g.: fir::factory::genScalarAssignment).

```

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1:

!fir.box<!fir.array<?xf32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>

%c1 = arith.constant 1 : index

%dims = fir.box_dims %a, 1

fir.do_loop %i = %c1 to %dims#1 step %c1 unordered {

%belt = fir.array_coor %b, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32>

%aelt = fir.array_coor %a, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32>

%bval = fir.load %belt : f32

fir.store %bval to %aelt : fir.ref<f32>

}

kiranchandramohanUnsubmitted

Done

```Fortran

subroutine foo(a, b, p, c)

- real. target :: a(:)

- real : b(:), d(100)

+ real, target :: a(:)

+ real :: b(:), c(100)

real, pointer :: p(:)

a = b*p + c

end subroutine

```

Lowering output:

kiranchandramohan:

return

}

```

This reaches the current FIR level (except fir.declare_op that can be kept until

LLVM codegen and dropped on the floor if there is no debug information

generated).

### Example 2: array assignment with elemental expression

```Fortran

subroutine foo(a, b, p, c)

real, target :: a(:)

real :: b(:), c(100)

real, pointer :: p(:)

a = b*p + c

kiranchandramohanUnsubmitted

Not Done

Is fir.get_lbound a new operation?

kiranchandramohan: Is `fir.get_lbound` a new operation?

jeanPerierAuthorUnsubmitted

Done

Thanks for catching this inconsistency. When writing FIR manually, I found it useful to avoid dealing with the details around pointers and allocatable (fir.load %box + fir.bo_dims). But it is not a game changer, and I am still debating whether this is useful or not.

I added a description for it above assuming it will be added for now.

jeanPerier: Thanks for catching this inconsistency. When writing FIR manually, I found it useful to avoid…

end subroutine

```

Lowering output:

```HLFIR

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>

%p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>

%c = fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>

%bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}

%pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}

%shape1 = hlfir.shape_meet %bshape, %pshape

%mul = hlfir.elemental(%i:index) %shape1 {

%belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}

%p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}

%i_zero = arith.subi %i, %c1

%i_p = arith.addi %i_zero, %p_lb

%pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}

%bval = fir.load %belt : f32

%pval = fir.load %pelt : f32

%mulres = arith.mulf %bval, %pval : f32

fir.result %mulres : f32

}

%cshape = hlfir.shape_of %c

%shape2 = hlfir.shape_meet %cshape, %shape1

%add = hlfir.elemental(%i:index) %shape2 {

%mulval = hlfir.apply %mul, %i : f32

%celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}

%cval = fir.load %celt

%add_res = arith.addf %mulval, %cval

fir.result %add_res

}

hlfir.assign %add to %a {fir.ref = "_QPfooEa"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>

return

}

```

Step 1: hlfir.elemental inlining: inline the first hlfir.elemental into the

second one at the hlfir.apply.

```HLFIR

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>

%p = fir.declare %arg2 {fir.def = "_QPfooEa", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>

%c = fir.declare %arg3 {fir.def = "_QPfooEp"} : !fir.ref<!fir.array<100xf32>>

%bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}

%pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}

%shape1 = hlfir.shape_meet %bshape, %pshape

%cshape = hlfir.shape_of %c

%shape2 = hlfir.shape_meet %cshape, %shape1

%add = hlfir.elemental(%i:index) %shape2 {

%belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}

%p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}

%i_zero = arith.subi %i, %c1

%i_p = arith.addi %i_zero, %p_lb

%pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}

%bval = fir.load %belt : f32

%pval = fir.load %pelt : f32

%mulval = arith.mulf %bval, %pval : f32

peixinUnsubmitted

Done

- Insert temporary, and duplicate array assignments, that can be lowered to

- loops at that!point

+ loops at that point

Note that the alias analysis could have already occurred without inlining the

peixin:

%celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}

%cval = fir.load %celt

%add_res = arith.addf %mulval, %cval

fir.result %add_res

}

hlfir.assign %add to %a {fir.ref = "_QPfooEa"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>

return

}

```

Step2: alias analysis around the array assignment:

- May %add value depend on %a variable?

- Gather variable and function calls in %add operand tree (visiting

hlfir.elemental regions)

- Gather references to %b, %p, and %c. %p is a pointer variable according to

its defining operations. It may alias with %a that is a target. -> answer

yes.

- Insert temporary, and duplicate array assignments, that can be lowered to

loops at that point

Note that the alias analysis could have already occurred without inlining the

%add hlfir.elemental.

```HLFIR

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>

%p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>

%c = fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>

%bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}

%pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}

%shape1 = hlfir.shape_meet %bshape, %pshape

%cshape = hlfir.shape_of %c

%shape2 = hlfir.shape_meet %cshape, %shape1

%add = hlfir.elemental(%i:index) %shape2 {

%belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}

%p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}

%i_zero = arith.subi %i, %c1

%i_p = arith.addi %i_zero, %p_lb

%pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}

%bval = fir.load %belt : f32

%pval = fir.load %pelt : f32

%mulval = arith.mulf %bval, %pval : f32

%celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}

%cval = fir.load %celt

%add_res = arith.addf %mulval, %cval

fir.result %add_res

}

%extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index

%tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>>

%temp = fir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.heap<fir.array<?xf32>>

hlfir.assign %add to %temp : no_overlap {fir.ref = "QPfoo.temp001"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>

hlfir.assign %temp to %a : no_overlap {fir.ref = " QPfoo.temp001,_QPfooEa"} : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>

hlfir.finalize %temp {fir.ref = "QPfoo.temp001"}

fir.freemem %tempstorage

return

}

```

Step 4: Lower assignments to regular loops since they have the no_overlap

attribute, and inline the hlfir.elemental into the first loop nest.

```HLFIR

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>

%p = fir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>

%c = fir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>

%bshape = hlfir.shape_of %b {fir.ref = "_QPfooEb"}

%pshape = hlfir.shape_of %p {fir.ref = "_QPfooEp"}

%shape1 = hlfir.shape_meet %bshape, %pshape

%cshape = hlfir.shape_of %c

%shape2 = hlfir.shape_meet %cshape, %shape1

%extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index

%tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>>

%temp = fir.declare %tempstorage, shape %extent (index) fir.def = QPfoo.temp001} : fir.heap<fir.array<?xf32>>

fir.do_loop %i = %c1 to %shape2 step %c1 unordered {

%belt = hlfir.designate %b, %i {fir.ref = "_QPfooEb", fir.def= "_QPfooEb.des001"}

%p_lb = hlfir.get_lbound %p, 1 {fir.ref = "_QPfooEp"}

%i_zero = arith.subi %i, %c1

%i_p = arith.addi %i_zero, %p_lb

%pelt = hlfir.designate %p, %i_p {fir.ref = "_QPfooEp", fir.def= "_QPfooEp.des002"}

%bval = fir.load %belt : f32

%pval = fir.load %pelt : f32

%mulval = arith.mulf %bval, %pval : f32

%celt = hlfir.designate %c, %i {fir.ref = "_QPfooEc", fir.def= "_QPfooEc.des003"}

%cval = fir.load %celt

%add_res = arith.addf %mulval, %cval

%tempelt = hlfir.designate %temp, %i {fir.ref = "_QPfoo.temp001", fir.def="_QPfoo.temp001.des004"}

hlfir.assign %add_res to %tempelt {fir.ref = "_QPfoo.temp001.des004"}: f32, fir.ref<f32>

}

fir.do_loop %i = %c1 to %shape2 step %c1 unordered {

%aelt = hlfir.designate %a, %i {fir.ref = "_QPfooEa", fir.def= "_QPfooEa.des005"}

%tempelt = hlfir.designate %temp, %i {fir.ref = "_QPfoo.temp001", fir.def="_QPfoo.temp001.des006"}

hlfir.assign %add_res to %tempelt {fir.ref = "_QPfoo.temp001.des005,_QPfooEa.des005"}: f32, fir.ref<f32>

}

hlfir.finalize %temp {fir.ref = "QPfoo.temp001"}

fir.freemem %tempstorage

return

}

```

Step 5 (may also occur earlier or several times): shape propagation.

- %shape2 can be inferred from %cshape that has constant shape: the

hlfir.shape_meet results can be replaced by it, and if the option is set,

conformance checks can be added for %a, %b and %p.

- %temp is small, and its fir.allocmem can be promoted to a stack allocation

```HLFIR

func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {

// .....

%cshape = fir.shape %c100

%extent = %c100

// updated fir.alloca

%tempstorage = fir.alloca %extent : fir.ref<fir.array<100xf32>>

%temp = fir.declare %tempstorage {fir.def = "_QPfoo.temp001"} : fir.ref<fir.array<100xf32>>

fir.do_loop %i = %c1 to %c100 step %c1 unordered {

// ...

}

fir.do_loop %i = %c1 to %c100 step %c1 unordered {

// ...

}

hlfir.finalize %temp {fir.ref = "QPfoo.temp001"}

// deleted fir.freemem %tempstorage

return

}

```

Step 6: lower hlfir.designate/hlfir.assign in a translation pass:

At this point, the representation is similar to the current representation after

the array value copy pass, and the existing FIR flow is used (lowering

fir.do_loop to cfg and doing codegen to LLVM).

### Example 3: assignments with vector subscript

```Fortran

subroutine foo(a, b, v)

real :: a(*), b(*)

integer :: v(:)

a(v) = b(v)

end subroutine

```

Lowering of vector subscripted entities would happen as follow:

- vector subscripted entities would be lowered as a hlfir.elemental implementing

the vector subscript addressing.

- If the vector appears in a context where it can be modified (which can only

be an assignment LHS, or in input IO), lowering could transform the

hlfir.elemental into hlfir.forall (for assignments), or a fir.iter_while (for

input IO) by inlining the elemental body into the created loops, and

identifying the hlfir.designate producing the result.

```HFLFIR

func.func @_QPfoo(%arg0: !fir.ref<!fir.array<?xf32>>, %arg1: !fir.ref<!fir.array<?xf32>>, %arg2: !fir.box<<!fir.array<?xi32>>) {

%a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.ref<!fir.array<?xf32>>

%b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.ref<!fir.array<?xf32>>

%v = fir.declare %arg2 {fir.def = "_QPfooEv"} : !fir.box<!fir.array<?xi32>>

%vshape = hlfir.shape_of %v : fir.shape<1>

%bsection = hlfir.elemental(%i:index) %vshape : (fir.shape<1>) -> hlfir.expr<?xf32> {

%v_elt = hlfir.designate %v, %i {fir.ref = "_QPfooEv", fir.def="_QPfooEv.des001"} : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32>

%v_val = fir.load %v_elt : fir.ref<i32>

%cast = fir.convert %v_val : (i32) -> index

%b_elt = hlfir.designate %b, %v_val {fir.ref = "_QPfooEb", fir.def="_QPfooEb.des002"} : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32>

%b_val = fir.load %b_elt : fir.ref<f32>

fir.result %b_elt

}

%extent = hlfir.get_extent %vshape, 0 : (fir.shape<1>) -> index

%c1 = arith.constant 1 : index

hlfir.forall (%i from %c1 to %extent step %c1) {

%b_section_val = hlfir.apply %bsection, %i : (hlfir.expr<?xf32>, index) -> f32

%v_elt = hlfir.designate %v, %i {fir.ref = "_QPfooEv", fir.def="_QPfooEv.des003"} : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32>

%v_val = fir.load %v_elt : fir.ref<i32>

%cast = fir.convert %v_val : (i32) -> index

%a_elt = hlfir.designate %a, %v_val {fir.ref = "_QPfooEa", fir.def="_QPfooEa.des004"} : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32>

hlfir.assign %b_section_val to %a_elt {fir.ref="_QPfooEa.des004"} : f32, fir.ref<f32>

}

return

}

```

This would then be lowered as described in the examples above (hlfir.elemental

will be inlined, hlfir.forall will be rewritten into normal loops taking into

account the alias analysis, and hlfir.assign/hlfir.designate operations will be

lowered to fir.array_coor and fir.store operations).

# Alternatives that were not retained

## Using a non-MLIR based mutable CFG representation

peixinUnsubmitted

Done

dot, and sum.

- Issues:!

+ Issues:

- The linalg dialect is tightly linked to the tensor/memref concepts that

peixin:

An option would have been to extend the PFT to describe expressions in a way

that can be annotated and modified with the ability to introduce temporaries.

This has been rejected because this would imply a whole new set of

infrastructure and data structures while FIR is already using MLIR

infrastructure, so enriching FIR seems a smoother approach and will benefit from

the MLIR infrastructure experience that was gained.

## Using some existing MLIR dialects for the high-level Fortran.

### Why not using Linalg dialect?

The linalg dialects offers a powerful way to represent array operations: the

linalg.generic operation takes a set of input and output arrays, a related set

of affine maps to represent how these inputs/outputs are to be addressed, and a

region detailing what operation should happen at each iteration point, given the

input and output array elements. It seems mainly intended to optimize matmul,

dot, and sum.

Issues:

- The linalg dialect is tightly linked to the tensor/memref concepts that

cannot represent byte stride based discontinuity and would most likely

require FIR to use MLIR memref descriptor format to take advantage of it.

- It is not clear whether all Fortran array expression addressing can be

represented as semi affine maps. For instance, vector subscripted entities

can probably not, which may force creating temporaries for the related

designator expressions to fit in this framework. Fortran has a lot more

transformational intrinsics than matmul, dot, and sum that can and should

still be optimized.

So while there may be benefits to use linalg at the optimization level (like

rewriting fir.sum/fir.matmul to a linalg sum, with dialect types plumbing

around the operand and results, to get tiling done by linalg), using it as a

lowering target would not cover all Fortran needs (especially for the non

semi-affine cases).

So using linalg is for now left as an optimization pass opportunity in some

cases that could be experimented.

### Why not using Shape dialect?

MLIR shape dialect gives a set of operations to manipulate shapes. The

shape.meet operation is exactly similar with hlfir.shape_meet, except that it

returns a tensor or a shape.shape.

The main issue with using the shape dialect is that it is dependent on tensors.

Bringing the tensor toolchain in flang for the sole purpose of manipulating

shape is not seen as beneficial given that the only thing Fortran needs is

shape.meet The shape dialect is a lot more complex because it is intended to

deal with computations involving dynamically ranked entity, which is not the

case in Fortran (assumed rank usage in Fortran is greatly limited).

## Using embox/rebox and box as an alternative to fir.declare/hlfir.designate and hlfir.expr/ variable concept

All Fortran entities (*) can be described at runtime by a fir.box, except for

some attributes that are not part of the runtime descriptors (like TARGET,

OPTIONAL or VOLATILE). In that sense, it would be possible to have

fir.declare, hlfir.designate, and hlfir.associate be replaced by embox/rebox,

and also to have all operation creating hlfir.expr to create fir.box.

This was rejected because this would lack clarity, and make embox/rebox

semantics way too complex (their codegen is already non-trivial), and also

because it would then not really be possible to know if a fir.box is an

expression or a variable when it is an operand, which would make reasoning

harder: this would already imply that expressions have been buffered, and it is

not clear when looking at a fir.box if the value it describe may change or not,

while a hlfir.expr value cannot change, which allows moving its usages more

easily.

This would also risk generating too many runtime descriptors read and writes

that could make later optimizations harder.

Hence, while this would be functionally possible, this makes the reasoning about

the IR harder and would not benefit high-level optimizations.

(*) This not true for vector subscripted variables, but the proposed plan will

also not allow creating vector subscripted variables as the result of a

hlfir.designate. Lowering will deal with the assignment and input IO special

case using hlfir.elemental.

This is an archive of the discontinued LLVM Phabricator instance.

[flang][RFC] Adding higher level FIR ops to ease expression loweringClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 467450

flang/docs/HighLevelFIR.md

[flang][RFC] Adding higher level FIR ops to ease expression lowering
ClosedPublic